Monday, December 8, 2008

Open Access: Computational Linguistics

I’m delighted to see that one of my favorite journals, Computational Linguistics, has gone open access, not because I won’t have to pay anymore, because Thomson Reuters picks up my tab, but because there doesn’t seem to be any valid reason why students and academics should have to pay publishers to read the results of government-funded research. CL is also going all-electronic, which means that their principle costs are now editing and formatting, which they have retained MIT Press to do. These costs will be paid by their parent professional association, the Association for Computational Linguistics.

CL has maintained consistently high standards over the last 8 years, and I commend their editorial staff and reviewers for the awesome job they have done. I look forward to reading them online in future, or maybe on my Kindle one day (when that device becomes more “open”).

Tuesday, November 25, 2008

Malcolm Gladwell's "Outliers"

Malcolm Gladwell’s book is perhaps the only convincing study of success I have ever read. I very much enjoy reading biographies of notable people, especially artists and scientists (less so politicians and business people) who are true innovators. But one is always left with the question, why was this person successful, at this time and in this place? Why is this person an "outlier", in sense of standing out from the crowd?

Gladwell’s answer is that, yes, talent is important, yes, hard work is important, but that other factors intervene, and it’s not just blind luck or random events. Timing seems to be crucial, but not in the sense that most people think. It appears that when you are born is also important, not just what year, but also which month in the year, for some occupations. This has nothing to do with astrology and everything to do with how institutions like academia and sports pick “winners” to invest in.

The book opens with a study of Canadian hockey, and how the birthdays of team picks are disproportionately distributed in the first 3 months of the year. This is because the cut-off for trials is January 1st, so school children born in the early months of the year are older, bigger and more mature than their fellows. This gives them a natural advantage that is then amplified by subsequent attention, training and other opportunities. Academia is a similar story, in which it really pays to be among the older people in the class. What is so surprising is the lasting nature of the advantages so conveyed.

Gladwell goes on to argue that Bill Gates, Bill Joy, and other pioneers of modern computing, were born at just the right time to take advantage of the advent of time-sharing, such that they were able to acquire expertise ahead of the pack. A disproportionate number of such people were born in or around 1955. Earlier folks were already too settled in their jobs at IBM and elsewhere to ride the wave, while later folks simply missed that opportunity to distinguish themselves. Yes, Gates and Joy deserve credit for what they did, but they had a lot of factors working for them, including the year of their birth.

Another key part of Gladwell's thesis is that it takes 10,000 hours to become a true expert or master of some skill or topic, and that this figure (which typically translates into 10 years of part-time labor) is very robust across disciplines. The main example he uses to illustrate this point is the rise of the Beatles. A true distinguishing feature of their early career was the fact that they did about 1,200 gigs (mostly in Hamburg) before they became famous. This is more concerts than many bands do in a lifetime. Most of these were 5 or even 8 hour shows (typically in strip clubs), getting them well on the way towards their 10,000 hours. The fact that they got this opportunity, through a twist of fate, powered their stage act and writing careers.

I don’t want to review every chapter, but I’ll close with Gladwell’s central message, which concerns the importance of culture. People are given or denied opportunities to excel by their personal histories, including the histories of their family and race. For example, he argues that Asians excel at math partly because of their work ethic, derived from rice farming. Math is inherently hard, but yields to the kind of patient cultivation and reward system that it takes to manage a rice paddy. The book is full of simple theories of this kind that appear to have great explanatory power.

In short, Malcolm Gladwell has done it again: produced an extremely readable book in the tradition of “Tipping Point” and “Blink” which both challenges conventional wisdom and airs some truly original ideas. His writing style is very transparent, in the sense that the stories he tells seem to be very uncolored by attitude or bias. (He is also an outstanding public speaker; I had the pleasure of hearing him give a talk at a Council on Crime and Justice fundraiser in Minneapolis in 2007.) We need more of this kind of analytical thinking if we are to understand and solve the many problems that we currently face.

Sunday, November 16, 2008

How Twitter May Save the World

Having spent the last week in bed with flu, I caught up on my reading, including some fairly light fare, such as Robin Cook's "Invasion". The least plausible part of this tale of an alien virus taking over our planet was the fact that no-one seemed to notice what was going on, not the media, ever hungry for 'man bites dog' stories, or the Internet, which was well on its way when the book was written in 1999. Today, one feels that Bloggers and Twitterers everywhere would be burning up the airwaves with postings about their neighbors' strange behavior, and any kind of "bodysnatcher" coup would simply be out of the question.

Another implausible part of the story was that none of the teenage characters seemed to spend much time talking on their cellphones, to the point that often they didn't know where their friends were. How likely is that? Today, you would have to take down the cell system as well as the Internet to have any hope of shutting these people up. So, one good effect of all this over-communication is that any aliens out there hoping to invade the planet by stealth simply missed there chance, in my opinion. I'll try and remember that next time the person next to me in an airplane has one of those mundane "I'm here" conversations.

Tuesday, November 11, 2008

Web 2.0 Summit: Social Media Platforms

Richard Rosenblatt of Demand Media gave an interesting talk in which he described their various Internet properties and platforms. DM connects content creators, publishers and users in a media marketplace that delivers content to long tail sites and provides tools for attracting eyeballs from social media hubs. Their Pluck On Demand service delivers related content, prepackaged with ads, through free widgets and social media apps that can be installed on your site.

Mark Zuckerberg of Facebook talked about their continuing quest for growth; FB grew from 50M users to over 100M in 2008. They are less focused on revenue, although they did do a lucrative ads deal with Microsoft earlier this year for $15B. They also increased their global reach by going into Europe, opening a Paris office this year. Facebook Connect is now in beta: the latest version of their API, which allows users to access their identity, friends, photos, etc. on other sites.

Chris de Wolfe of mySpace appeared on a panel with Edgar Bronfman of Warner to talk about the next iteration of mySpace Music. The two companies have done a deal to make DRM-free downloads available on the site, so that people can share actual tracks, as well as the usual playlists. mySpace already hosts home pages for 5M bands, so they seem to be well ahead of the game with respect to the blending of music and social media.

Monday, November 10, 2008

Web 2.0 Summit: Cloud Computing

There were two cloud computing sessions at Web 2.0, both panels run by Tim O’Reilly. One focused on the platform, with Padmasree Warrior (CTO, Cisco) and Shane Robison (CTO, HP); the other focused on apps, with Paul Maritz (CEO, VMWare), Marc Benioff (CEO,, Kevin Lynch (CTO, Adobe), and David Girouard (Google Enterprise). The first session actually spent a lot of time focusing on the role of the modern CTO which, while interesting, is not entirely relevant here, so I’m going to talk about the second session first.

Each of the represented companies had a different emphasis with respect to the cloud, given the business they are in. VMWare helps big corporate clients make their internal environment more cloud-like, e.g., by making information more independent of particular apps and devices, creating an ‘infobank’ that they manage and even enrich for their customers, presumably with things like links. Google is more interested in making the enterprise experience more like the consumer experience, and plans to open up some of the Google stack as a Web platform (not much in the way of specifics there). Adobe also focuses on the client experience, particularly on rich apps supported by Flash and Air (a development toolkit that delivers apps that run across different operating systems). Salesforce is perhaps the most interesting company currently in this space. As well as hosting data and providing compute power, they have also partnered with Google to field a range of open SaaS apps that replace the usual desktop Office programs.

From the vendor point of view, there is a debate as to whether cloud computing is a high margin or a low margin business. To vendors like Amazon, who provide ‘burst computing’ (ECC) and ‘burst storage’ (S3), it’s a commodity business model, which is something they are good at. To vendors like salesforce, it’s clearly meant to be high value and high margin. Another issue is interoperability. Salesforce can integrate with SAP/R3 for existing customers, Facebook (for recruiting), and Amazon for extra storage and cycles. Whatever happens in the cloud, Oracle and SAP aren’t going away any time soon. Meanwhile, SAP is using Adobe products like Flash to improve their interfaces, and VMWare is calling for new data representations and annotation schemes to make content more valuable, clearly a business they would like to get into. Even Microsoft is now jumping into the fray, although Ray Ozzie admits that Azure won’t be ready for 2 years, which puts them well behind the curve.

From a customer point of view, high availability and reliability are big issues, yet vendors seem to be winning their confidence. It is now believed that Gmail is about 5x more reliable than Outlook Exchange, so free doesn’t always mean a lower service level. Security is always an issue, but so is portability (the ability to move your data out of a cloud), since going to a cloud is a big commitment that gives the vendor a serious lock-in advantage in customer retention and product pricing. Lastly, there is the issue of intellectual property, not only who owns my data, but who owns any annotations, links, or other value-addition done to my data.

Going back to the CTO panel, both Cisco and HP seem to be using cloud technology to drive their basic businesses, e.g., HP is incubating MagCloud, a service that will allow anyone to produce a glossy magazine. Tim O’Reilly brought up the interesting question of how paying for cloud computing affects a company’s balance sheet, since what was hardware capex now becomes a service expense. (I assume that depreciation is also an issue.) Other related facts: data centers are 2% of the world’s carbon footprint; emerging countries may go straight to cloud, since there is less of a legacy obstacle; and existing infrastructure players won’t go out of business, since they are selling to cloud companies and they always have big customers, like banks, who are unlikely to go the cloud route.

Sunday, November 9, 2008

Web 2.0 Summit: Electric Cars

There were two illuminating interviews on the topics of electric cars, one with Elon Musk of Tesla Motors and one with Shai Agassi of Better Place. Listening to these two gentlemen, I am led to the inescapable conclusion that, even in my best moments, I am a total slacker and underachiever. Musk has established successful ventures in three industries (the Internet, personal transportation, and space exploration), while Agassi followed a successful career at SAP with some groundbreaking ideas on how we end our addiction to oil. Both believe that the future lies in electric cars, but take a quite different road to get there. Musk is building a series of viable electric vehicles that prove the technology; Agassi is proposing a total rethink of the auto industry’s value proposition. What is fascinating about the comparison is the different routes they have taken to innovation.

I remember in the 80s when a certain Robotics professor at Edinburgh University rode to work in an electric milk cart to protest the wastefulness of the internal combustion engine. Were he alive today, he would at least have the option of tearing around town in a Tesla Roadster, which is now in production. (Of course, being Edinburgh, there still wouldn’t be anywhere to park.) This awesome vehicle is all electric, goes from 0 to 60 in under 4 seconds, and can go 250 miles between charges. I want one! (But it sells for over $100k, so I’ll stick with my BMW 330 for now. A sedan is also in the works, which will be more in my price range.)

Agassi is proposing something quite different and totally radical. Electric versions of our favorite car models should have batteries that can either be charged or swapped out at the equivalent of a filling station. Simply put, you pay for the miles that you use in order to support this network; the car itself will be a smaller cost element, and may even be free. This is by analogy with the cellular industry, where you pay for airtime as you use it; the phone itself is not the major purchase. The auto industry moves from being totally product-oriented to being more service-oriented.

Both solutions use technology, and Tesla’s is certainly disruptive, but Agassi’s also changes the business model and revolutionizes the industry. Arguably, industry innovation = disruptive technology + new business model. This is not to minimize Musk’s achievement, which is already showing the way to automakers such as GM.

Web 2.0 Summit: Web Meets World: Overview

I really enjoyed this Summit, which focused upon the connection between the Web and the real world, with contributions covering broad issues in politics, energy, entertainment and health, as well as the more usual coverage of new developments in platform technology and social applications.

Many of the sessions featured truly inspirational speakers, panelists and interviewees; people who are real innovators, rather than camp followers of the kind one encounters at so many other conferences. The meeting was also pleasant from a social point of view, with a broad range of attendees and a relative absence of cliques, compared with the academic or standards communities, for example.

I decided the best way to blog this was to put up a series of posts on a number of themes, rather than soldier through the whole thing chronologically. I won’t try and cover everything, although I attended about 85% of the sessions, missing a few due to business meetings in San Francisco. The main topics I will cover are electric cars, cloud computing, and social media platforms.

First Posting: My Position and Background

I'm chief scientist at Thomson Reuters, where I am responsible for an R&D lab of over 40 computer scientists. This department has been in existence since the early 90s; I have been running it since 1996. We have a lot of expertise and experience in search, natural language processing, and machine learning, and have helped roll out many new products that rely on these technologies.

My background is in artificial intelligence; my Ph.D. thesis was on knowledge representation and inference. I have taught post-graduate classes in AI (Edinburgh U, Scotland), expert systems (Washington U, St Louis), and parallel architectures and algorithms (Clarkson U, NY). Prior to joining Thomson in 1995, I also worked at McDonnell Douglas Research Laboratories, where I was a principal scientist and consultant to their Space Systems company.

My current interests are in the following areas: alignment of business and technology strategies in Internet companies, especially as this relates to R&D; the deployment of Web 2.0 and Semantic Web technologies by content providers; and the fostering of innovation in technology groups through the use of free time and the promotion of knowledge sharing. For more information about my activities, feel free to visit my home page.

Please note that the usual disclaimers apply. This blog contains my personal views and perceptions; it does not reflect those of my employer, Thomson Reuters. My next posting will be a trip report on the Web 2.0 Summit, held last week in San Francisco.