Where Did I Go From June to Now?

Where did I go and what did I do from June 2012 to now? I know I was away, but there was so much happening!

In bullet points:

  • iOS development @ Twitter
  • Lived in the heart of San Francisco with awesome friends
  • Attended WWDC 2012 in SF, and DefCon 20 in Las Vegas
  • Took a motorcycle class…
  • Visited Napa Valley a few times
  • Stanford Start-Up School
  • Met so many awesome, inspirational, and fun people

That pretty much sums it up!

“Naturalness of Software Languages” – Thoughts

I recently read a paper called “On the naturalness of software” by Prem Devanbu, a CS professor @ UC Davis, on the idea of transplanting the ideas of Natural Language Processing from languages like English and products like Siri to software developers and writing code.

Devanbu applies the n-gram model of natural language processing to software languages and development. This is a neat idea, and his paper shows the effectiveness of applying statistical models to software languages which have precise syntax and structure, even more strict than the English language for example. Given that natural language can be applied to English and other common spoken languages, it is potentially easier for software. The best example that Devanbu uses is the for(int i=0, i<10 and how the statistical model should readily predict that ; i++) is the next piece of software. One severe limitation that Devanbu’s statistical model falls short is predicting logic. The logic of a software developer is crucial, since code can be written in different ways using different logic. I don’t think statistical models can be used to predict the logic of a software developer anytime soon. If that is the case, then we might be at the point where software can write itself.

One idea in the “Future Directions” section that stood out to me was applying the statistical models to help developers who are disabled or have RSI. I am someone who deals with RSI and agree that code is repetitive and predictable; I would find great value in using his plug-in if only it was for Xcode for the Mac. I posit that an improved auto-completion will improve productivity of individuals significantly. Who wouldn’t want that?

However, I wonder when we should not use this. Even if we can do it, should we? Even if the application of statistical models for software languages is perfect, is there a time when it isn’t needed or even avoided entirely? What are the security implications if malicious hackers used this as a tool? I am hedging my bets that this will affect school curriculums for Computer Science around the nation, for better or worse.

If statistical models can be applied to spoken language and now software languages, what about the use of computer vision combined with statistical models for body language, which is estimated to be 90% of communication. Law enforcement would love to use this idea to identify nervous behaviors among travelers at airports and other transportation hubs around the world. Body language is indeed structured like spoken languages–humans make the same gestures when nervous and uncomfortable, and the same for happiness and sadness. I see body language as the next frontier, but this could bring us one step close to the 1984 world of surveillance and limited behavior.

Hal Varian’s “Predicting the Present” with Google Trends

I just read this neat paper by Hal Varian, Chief Economist @ Google and emeritus Professor of UC Berkeley’s School of Information. Titled “Predicting the Present with Google Trends,” he goes on to explain the possibilities!

Reading his paper on predicting the present with Google Trends came as no surprise. There is significant value in accurately measuring the pulse of the present, especially for economic measures like alleviating or even bracing for an upcoming recession. His statistical methods were sound, simple, and straight-forward. I’m pretty happy that I am taking Stats 201 on Statistical Methods for Data Analysis because I was able to follow Varian’s methodology and understand the R output. His examples on the auto-industry and travel destinations were good examples, but I wonder what we can truly do with such information. In a sense, it’s almost finding out what the future will hold for us in the next 4-6 weeks. Now we are presented with the options of accepting the inevitable or fighting it to change course.

Moreover, I want to see whether social networks like Facebook, Twitter, and Google+ can contribute anything. I’ve seen talks on how the choice of words used by the media and Twitter can indicate the general mood of the population, and therefore be used in the stock markets. Just by looking at the References of the paper, many people have tried predicting various things like monitoring influenza and other diseases.

When I was interning at Twitter this summer, there was an ongoing project to measure the political mood of the US population to predict which candidate will win the US Candidacy. The method was to check the tweets using various algorithms to decide how well Mitt Romney and Barack Obama was doing, and the public’s perception of them. Furthermore, during the Presidential debates, CNN displayed live Twitter counts by the minute. On a microcosm scale, I see the entire effort of predicting the present equivalent to someone getting immediate feedback on one’s status, whether it is working out, productivity at work, and so forth! Then one can make decisions based on this new information, such as whether how sedentary that individual will be in for the rest of the day. If I knew I was going to be seated for the next few hours, I would definitely change that behavior.