Saturday, November 28, 2009

Book Review: Planet Google by Randall Stross

This 2008 book by NYT columnist and San Jose State U professor Randall Stross is nicely complementary to Ken Auletta’s more recent book, reviewed elsewhere in this blog. The author concentrates on three things: (1) Google as technology powerhouse, (2) the competitive situation between Google, Microsoft and Yahoo, and (3) Google’s forays into books, news, maps, email, videos, and question-answering, not all of which have worked out well.

Page and Brin decided early on that search would be completely algorithmic - no editors, no taxonomies, no pollution from ads. This distinguished them from Yahoo, whose early attempts to organize the Web quickly ran out of steam. Having suffered from hardware deprivation at Stanford, they decided (like Scarlett O’Hara) that they would never be hungry again, and took the highly unusual step of building their own machines. It’s often overlooked that Google couldn’t possibly scale their data centers and give away cycles and storage to the degree that they do if they bought big ticket boxes from Sun and IBM.

Moving to the competitive situation, it’s pretty clear that Google didn’t have much to fear from either Yahoo or Microsoft in the first half of this decade. Yahoo made the mistake of adopting Google as their search engine, giving their rival more exposure and more data to play with. (Ironically, it was Yahoo’s cofounder, David Filo, who discouraged Page and Brin from licensing their technology to others initially and building their own Web site.) Microsoft lollygagged around, first ignoring search (stupid) and then paying people to use their search engine in 2006 (really stupid). Bing post-dates the book, of course.

Finally, Stross does a good job of documenting Google’s somewhat mixed attempts to get beyond textual Web search and refute Steve Ballmer’s jibe about being a one-trick pony. Their book-scanning project is curious in many ways. It’s hardly a moneymaking opportunity for them, yet they were too impatient to try and get the publishers on board, thereby sparking lawsuits. Their video portal was an outright failure, leading to the expensive YouTube purchase and more legal woes. Google Answers was a loser and has been far outstripped by Yahoo Answers. Google News remains controversial to this day, vide Rupert Murdoch's recent approach to Microsoft.

Gmail and Google Maps, however, have done better, and Google Docs must be giving Microsoft pause. Google is a successful company by any standards, well ahead of their rivals on basic search, and Stross gives them their due. But he also notes their reluctance to embrace Web 2.0 and Web 3.0 concepts, e.g., social media and the Semantic Web. I'm not a great fan of the latter, but the relative failure of Orkut and the rise of Facebook must be giving Google pause. Whatever the future holds for Google, it won’t be dull, and we’ll all have a ringside seat.

Friday, November 27, 2009

The Limits of Journalism

Malcolm Gladwell’s recent book ‘What the Dog Saw’ has received a mixed press. It collects some of his articles from the New Yorker over a 10+ year period, and the topics range from interesting people to social issues to matters of problem solving and problem complexity.

On the positive side, these short pieces are thought-provoking and mostly well-written. In many ways, he is an original thinker, and I like the way he challenges our orthodoxies around important issues like intelligence, success and justice.

On the negative side, I think that he occasionally strays into areas that show the limits of his understanding. This is normal for journalists and nothing to be ashamed of. One is typically not an expert in the subject matter, and experts disagree, in any case.

The most glaring evidence of this was a reference to ‘igon values’ in his article on Nassim Taleb of ‘black swan’ fame. The correct term is ‘eigenvalue’ (roughly speaking, one of the roots of the characteristic equation of a matrix), and the misspelling was apparently caught by the original New Yorker editors, but not by the Brown Little editors of this book. (Shame on them.)

Steven Pinker recently faulted Gladwell in the New York Times for this gaffe, and used it as a means of casting doubt on the author’s probity. Gladwell acknowledged the faulty spelling, but went on to counter-attack. Needless to say, ‘igon values’ is more than a spelling error. It was an unlucky attempt to add verbal color to the story that instead revealed both ignorance of the topic and an oversight in fact checking.

We all pretend to know more than we do. Ironically, Taleb’s book ‘The Black Swan’ is an almost endless diatribe on this very topic, and Gladwell’s 2002 article on Taleb (‘Blowing Up’) is a rather successful summary of the book that Taleb had yet to write. But I find it a salutary fact of life that whenever I read a newspaper or magazine article on any topic that I know a lot about, I usually find both subtle errors of emphasis and unsubtle errors of commission and omission.

Henry Blodgett, Tim O’Reilly and other died-in-the-wool Webizens would say that this is why citizen journalism is such a great thing. Errors now get corrected quickly (by Pinker in this case) and then promulgated by bloggers. Unfortunately, news and book printing do not admit of instant correction, so Gladwell will have to live with this mistake until the second printing (if there is one), whereas the public corrigendum will live on indefinitely.

Wednesday, November 25, 2009

Book Review: Googled by Ken Auletta

Ken Auletta's book is outstanding; I read it in a single day. Three interesting threads run through it: (1) the sheer audacity of Google’s founders and their determination to do things differently; (2) the engineering philosophy at Google, with its emphasis on data, pragmatism, and empirical methods; (3) the woes of Old Media and other naysayers who regard Google with fear and loathing.

We have all read versions of the Google story. But Auletta brings out how incredibly focused Larry Page and Sergey Brin were in their early days, and how they deftly avoided most of the pitfalls that other dot coms succumbed to. They built a brand around their world-class technology, but also around putting users first. Everyone talks the customer-centric talk, but Google went ahead and walked the walk. They worried about ease of use and quality of results first, monetization second, and marketing not at all. Yes, they got lucky with ads, but you could argue that they made their own luck.

Their engineering culture led them straight to an emphasis on data mining as a source of inspiration, rather than surveys, focus groups and other, more traditional, methods. Everything at Google has to be done to scale. Why sample a percentage of your users when you can monitor all of them? In a very even-handed treatment, Auletta examines how this same culture sometimes leads the company to tread on people’s toes by lacking a human touch and discounting things like privacy concerns. He also notes that none of Page, Brin or Schmidt is a charismatic or inspirational leader in the conventional sense, yet this doesn’t seem to have held them back at all.

The chapters “Is Old Media Drowning?” and “Compete or Collaborate?” do a great job of outlining the dilemmas faced by the recording, radio, television, book and newspaper industries in dealing with the Google (+ Apple + Amazon) threat. Auletta points out that the music companies were not “murdered by technological forces beyond their control” but rather “they committed suicide by neglect.” The same could be said for many other industries now on the ropes. The near universal distrust of Google in these quarters is in stark contrast to the trust invested in Google by the vast majority of their daily users.

If you only read one book on the information industry over the holiday, I would recommend this one. It is based on sound reporting, original interviews, and painstaking research. It is also entertaining without being shallow. The last quarter of the book is devoted to sources and other supporting references.

Finally, Ken Auletta is giving a keynote at the Software Information Industry Association’s Information Industry Summit in New York on January 26. In the interests of full disclosure, I should say I’m on the steering committee for that conference. We were delighted to get him, and he gives a good talk by all accounts.

Monday, November 23, 2009

The Future of Newsprint

Ken Auletta’s recent book, “Googled”, contains a good analysis of the woes of the newspaper industry. (I’ll be writing a review of this shortly, when I’ve mulled it over some more.) The fact is that the moguls fiddled while Rome burned, so it’s a bit late to wake up and start worrying about it now. Even if they reach the kind of settlement with Google that the book publishers did, it won’t support the cost structure of print.

I think most paper news is doomed, thanks to a combination of lifestyle changes, competition for ads, and other entertainment choices. It’s tough to read the paper every day if you drive to work instead of taking a train or a bus, then spend most evenings ferrying your kids around. Meanwhile, eBay and Craig’s List are killing the classifieds section, and bored people in airports have a wide variety of gadgets to play with.

Decades ago, my idea of a good time was to spend the Sunday lunch hour in a London pub with the Times and a pack of cigarettes. (That was in the good old days of Harold Evans, before Rupert Murdoch got his hands on it.) One had time on one’s hands and the bandwidth to enjoy the serendipity of bundled news. I also did the Telegraph crossword every day, puzzling over cryptic clues and references to the classics.

Now, it seems I don’t have time for a paper; even when I buy one I don’t read it. I get generic news from NPR during my hour in the car. Everything else that’s daily I get through my Mac or iPhone, supplemented in any given month by the only two news magazines I feel are worth reading: the Economist and Foreign Policy. I don’t watch Network TV for the same reason I never took LSD: I like my mind the way it is.

On the business side, print banner ads and classifieds are a thing of the past. So if people don’t want to pay real money for paper news, no one is going to pick up the tab for them. Government subsidies are a bad idea; the Repugnant Party is right for once. Private ownership of newspapers is the lesser of two evils, but it’s a moot point, because newsprint is going the way of stone tablets, papyrus scrolls and illuminated manuscripts.

Time Warner and News Corp keep pretending the Emperor has clothes, because nudism (free) isn’t an option for them. But, if there were business models out there other than ads or subscriptions, we would have found them by now. Messing around with portals and micropayments is just rearranging the deck chairs on the Titanic. The ocean’s pouring in, but the band plays on. We’ve already seen what 99c album tracks have done to the record industry. (Network TV is next for the great unbundling; I can’t wait.)

One can only imagine what would have happened if the news moguls had encouraged their talent to create vertical Web sites around travel, food, wine, real estate, geographies, etc., turned them into virtual communities, and then AdSensed them. Such a strategy might not have saved the day, but I think they’d be in better shape than they are now. Online ads for electronic news are currently worth about a tenth of paper ads, but that’s partly because the targeting is demographic and not by interest, as the Web has come to expect.

But I sometimes think the real issue isn’t money or advertising. It’s that people have less time to reflect than ever before, while the world is getting more and more complex. The blurring of traditional male and female roles has left everyone responsible for everything: earning money, food prep, child rearing, transportation, etc. The blurring of work and non-work has similarly pushed everyone towards being ‘always on’ with not much downtime. It’s not good or bad; it’s just the way it is.

Meanwhile, since the end of the Cold War, global politics has become much more complicated. Reading a multi-page article about Somalia requires time, effort and thought; not many people have that kind of resource to spare during a 21st Century day. My sense is that they find it easier to dial up their favorite talk show and listen to someone guaranteed to agree with them while pretending to have all the answers.

The only interesting question that remains is: who will pay for in-depth reporting? My suspicion is that most people won’t miss it, any more than most people miss opera or orchestral music. Like music, journalism is being ripped, remixed, mashed up and deskilled. Not many folks noticed when drum machines replaced drummers, samples replaced players, and singers could no longer carry a tune (just as not many notice that Glenn Beck isn’t a journalist). The only problem is, cannibalism isn't a long-term dietary strategy; someone has to create.

Those who want quality news may have to pay a fairly high price for it in future, because those who produce it will have to be supported one way or another. One form of support is bundling business news with an expensive feed of must-have professional information. But what about the stuff business typically doesn’t care about, e.g., human rights, corruption in high places, and the fate of animal habitats? My prediction is that stuff will go not-for-profit, electronic, and be highly politicized for the most part.

Earlier this year, Michael Hirschorn's Atlantic article "End Times" predicted that the financially endangered NYT might end up as a "bigger, better, and less partisan version of the Huffington Post", in which reduced staff would mix original reporting with aggregated stories having the NYT seal of approval. I wouldn't bet on it. My prediction is that the new, improved Huffington Post will be the Huffington Post.

Tuesday, October 27, 2009

Book Review: Wired for Innovation

“Wired for Innovation: How Information technology is Reshaping the Economy” by Erik Brynjolfsson & Adam Saunders, MIT Press, 2010.

Main argument of the book is that the so-called “productivity paradox” of the period 1970-1995, whereby the adoption of IT failed to boost per capita productivity growth in the US, is now seen to be due to a lack of concomitant investment in organizational capital. In other words, people tended to “computerize” existing tasks in a brain-dead way, without investing in new business processes, employee training, and the like. The literature shows that the period 1995-2008 saw far greater gains, but only for those companies that combined new technology with a range of complementary practices around job redesign, openness of information, and empowerment of workers.

These “complementarities” are not just the “best practices” beloved of management consultants of every stripe, but aspects of a company’s culture, including things like profit sharing, flexible work hours and reorganization of workflows to really capitalize on new technology, so that even where there are fewer jobs, these jobs are more interesting than before. These practices are only effective if they work together to reinforce each other, both incenting workers and helping them innovate. Studies show that productivity programs such as TQMS don’t produce results otherwise.

Secondary related topic is that GDP tends to underestimate the value generated by IT innovations. The argument is that traditional measures of inputs and outputs used by the government do not capture “consumer surplus”, i.e., the net benefit that consumers derive from a product or service after you subtract the amount paid. E.g., various studies claim that eBay alone generates several billion a year in consumer surplus, and Amazon’s used book sales generate tens of millions in surplus, given aggressively low pricing. Another study found that the government’s 10 year regulatory delay in allowing cell phone usage cost consumers about $100 billion in lost surpluses!

Chapter 6, on ‘Incentives to Innovate in the Information Economy’ is especially worth reading, for an economist’s view of our world. The authors characterize information as a ‘non-rival good’, i.e., if I consume a piece of information that doesn’t prevent anyone else from consuming it (unlike a piece of cake, say). They also argue that people are reluctant to pay full-price for information goods, since they don’t know how useful they will be until they consume them (by which time they have already paid). Bundling is advocated, because it gets you away from the problem of pricing individual pieces of information in a situation where the marginal cost is close to zero (so that formulas like marginal cost plus a markup don’t work).

Another consequence of information being a non-rival good is ‘knowledge spillovers’, which typically mean that the private return for innovation will be less than the return for society as a whole. (Think about cell phones, and the value they create for society compared to the returns for cell phone companies.) The authors argue that this kind of disparity leads to a ‘chronic underinvestment in R&D’ by the private sector. Even within a single corporation, like Thomson Reuters, you can see this play out. Individual business units that invest in R&D often feel that other businesses that benefit from the ‘spillover’ are freeloaders who have not taken the risk or borne the cost of R&D, but nonetheless derived its benefits. This is why companies like 3M try to take a broader view of R&D cost-benefits across the organization as a whole.

Finally, the authors address the issues of price dispersion and low-cost copies of information goods, and how disruptive they really are. With respect to pricing, studies show that Amazon retains market share, even though it doesn’t actually have the lowest online books prices, thanks to their customer experience and reputation. (This doesn’t bode very well for Walmart in their current price war with Amazon over holiday sales.) With respect to low-cost copies, the authors argue that high availability can create demand for publishers’ wares under some circumstances. For example, lending libraries created readership and stimulated book sales rather than depressing them; more recently, VCRs created demand an insatiable demand for video.

At 150 pages, this is a slim, dense book that contains many insights into the forces behind innovation. Unlike many of the volumes that one finds in airport bookstores, the facts and claims are backed by citations to recent research. Although the main theme is really the relationship between innovation, IT and productivity, there are many interesting reflections upon leadership, change, knowledge, value and the distribution of wealth. It’s not a light read, because it doesn’t shrink from complexity, but it’s not turgid either. The authors are from MIT’s Sloan School and U Penn’s Wharton School, so you are getting an up-to-date view of the very latest thinking at these august institutions.

Thursday, October 8, 2009

Book Review: Free

Chris Anderson of “Long Tail” fame has done it again with his latest book, “Free”. I reviewed his NetGain talk earlier in this blog, so I won’t go through all the basics again. His concept is that, in a highly competitive online market, price falls to the marginal cost of adding a new user, which is close to zero.

He distinguishes between three different models of “free”: the cross-subsidy (basically a come-on, like the free sample or the loss leader), the two-sided market (one customer subsidizes the other, e.g., advertisers subsidizing magazine readers), and the “freemium” model (upgrade users from a free version to a more capable version of your product, or a range of ancillary experiences).

His historical preamble is both amusing and informative, explaining the origins of common phrases such as "free lunch" and "jumping on the bandwagon." There are sidebars containing detailed examples of companies that play in the free space to a non-trivial extent. His comments on the dilemma currently facing the media industry are unsparing and incisive.

"Free" is a good read, being both very clear and non-redundant. Anderson has had a long and successful career in quality publishing, as well as being a well-known blogger. For my money, he is up there with Nicholas Carr and Malcolm Gladwell as a really insightful writer on technology, business and social trends. At about 250 pages, this is a book you can read in a small number of sittings, and then refer back to, as needed.

His message can be summarized as follows: a version of your current digital assets will one day be abundantly available free of charge, so you have to keep moving up the value chain. Furthermore, it often makes sense to give away some part of your asset base to some part of your potential market to gain attention, reputation and good will. You can then successfully monetize other, higher value, opportunities arising from all this buzz, possibly by charging just a fraction of your users.

He cites many interesting examples, e.g., Skype giving away computer-to-computer calls, but charging for computer-to-phone calls, Second Life giving away virtual tourism but charging for virtual land, etc. My main problem is that they are mostly drawn from the mass-market, consumer space, not the professional space.

Even in the consumer space, experiments with free are ambiguous. For example, local newspapers have had a mixed experience with free, as Anderson acknowledges. There is the whole issue of how readers (and more to the point advertisers) value free publications, especially those that were once sold at a price. He also acknowledges that free has a habit of turning billion dollar businesses (classified ads in newspapers) into million dollar businesses (like Craigslist). Scale seems to be everything, as well as getting your sums right.

Perhaps the most interesting passages for me concerned Google: its vast economies of scale; its use of free for various purposes (goodwill, data gathering, ad delivery); and the concern of execs like Eric Schmidt that publishing companies may suffer to an extent that they are no longer able to generate quality content to be searched. Not every aspect of the Internet economy is a zero-sum game, and Google probably knows that the sooner the media industry finds new business models to maintain its creative output the better for all concerned.

The book ends with a handy guide to the rules, tactics and business models that seem to work in the free economy. Appropriately enough, free versions of the book are available, e.g., in an abridged form as an audiobook. (For a while, an advanced copy was also free on the Web as a pdf.)

Thursday, September 10, 2009

Book Review: Beautiful Data

This collection, edited by Toby Segaran and Jeff Hammerbacher and published by O’Reilly, contains 20 recent papers on the theme of helping data tell its own story - via an array of data management, data mining, data analysis, data visualization, and data reconciliation techniques. I’ll start by reviewing what I thought were the standouts, and then summarize the rest.

“Cloud storage design in a PNUTShell” gives a concise description of the distributed database architecture deployed at Yahoo! for storing and updating user data. The system must scale horizontally to handle the vast number of transactions and replicate consistently across geographic locations. This article contains a very interesting discussion of the trade-off between availability and consistency, and compares their design with that of other systems, such as Amazon’s Dynamo and Google’s BigTable.

“Information Platforms and the Rise of the Data Scientist” provides some interesting reflections on data warehousing and business intelligence in the context of Facebook. They used Hadoop to build a system called Lexicon that processes terabytes of wall posts every day, counting words and phrases. Again, the emphasis is on scale, and the ability to apply relatively simple algorithms to massive amounts of data.

“Data Finds Data” takes us beyond search to a realm in which relationships between key data items are discovered in real-time and immediately relayed to the right users. “Portable Data in Real Time” addresses some of the issues inherent in sharing data between applications, e.g., the Flickr/Friendfeed example, where user behavior needs to propagate between different social media sites. “Surfacing the Deep Web” describes work at Google to make data behind Web forms amenable to search by automatic query generation.

“Natural Language Corpus Data” describes some simple experiments with a trillion-word content set created by Google and now available through the Linguistic Data Consortium. Simple Python programs are used to build a language model of the corpus, i.e., a probability distribution over the words and phrases occurring in the documents. This model can then be applied to common problems like spell correction and spam detection.

Other papers focused on particular domains, such as the structure of DNA, image processing on Mars, geographic and demographic data, and drug discovery, and many of these were interesting. The first two and last two chapters can be described as more generic, focusing on a range of issues, such as statistics, visualization, entity resolution, and user interface design.

Altogether, this is a very timely and useful collection of papers. As one might expect, the book contains many URLs for further exploration, and some pointers into the literature. If you care about data, this will make a great addition to your bookshelf, alongside Bill Tancer’s somewhat lightweight but nonetheless engaging “Click” and Ian Ayres’ entertaining and provocative “Super Crunchers”.

Monday, July 27, 2009

ICBO Keynote: Harold "Skip" Garner

Skip Garner of the UT SW Medical Center at Dallas gave an interesting talk on text data mining, primarily for drug discovery. Their primary tool seems to be a paragraph level text similarity tool, called eTBlast, which powers a search engine with some fairly powerful results post-processing capabilities. One of their goals is to build a literature network based on biomedical entities, and they have a Swanson-style approach in which they look for similarities across different domain literatures. Results are input into their IRIDESCENT hypothesis generation engine, which looks for interesting connections, e.g., among microsatellites, motifs, and other DNA sequencing phenomena in different species.

Another application area involved coming up with new therapeutic uses for existing drugs. It takes 15 years and costs $500-800M to develop a new drug, so finding new uses for old drugs has the potential to cut costs and boost profitability. Also, many old drugs are now coming off patent, so new uses would allow reformulation and refiling. The classic example of recent years is Sildenafil citrate (a.k.a. Viagra), which began life as a drug for reducing hypertension. IRIDESCENT has discovered that Chrlopromazine (an anti-psychotic) may be efficacious against cardiac hypertrophy (swelling of the heart), and also found anti-biotic and anti-epileptic agents that might help, based on side effect data mined from the literature.

Sunday, July 26, 2009

ICBO: Tools for Annotation and Ontology Mapping

The International Conference on Biomedical Ontology was held in Buffalo, NY this weekend. There were a number of papers that addressed the tools side of enriching documents with annotations derived from domain ontologies, using more than one source of terminology.

Michael Bada (from Larry Hunter's group at U. Colo. Denver) described a project called CRAFT (Colorado Richly Annitated Full-Text) to create a corpus of scientific documents annotated with respect to the Gene Ontology (GO) and other sources from CheBI and NCBI. The annotation tool is called Knowtator, which performs some automatic tagging which can then be corrected by a domain expert in an editorial environment, e.g., Knowtator plugs into Protege. All annotation is 'stand-off', i.e., not inserted into the document as inline tags, but kept in another file. The semantic tagging performed in this effort relies in part upon syntactic annotations performed elsewhere (U. Colo. Boulder).

Various representational issues arise in the course of applying GO to documents, e.g., the fact that verb nominalizations in English tend to be ambiguous between process and result, e.g., "mutation" can refer to either the process by which change occurs or the results of that change. These ambiguities pose difficulties for automatic annotation, and can also perplex human annotators.

Cui Tao (from Chris Chute's group at the Mayo) presented a paper on LexOwl, an information model and API that connects LexGrid's lexical resources with Owl's logic-based representation. LexOwl is being released as a plug-in for the Protege 4 editorial environment. It's heartening to see high-quality tools being made available in a manner that can be shared and deployed by the community as a whole.

ICBO: Panel on Ontology and Publishing

ICBO is the International Conference on Biomedical Ontology, held this year in Buffalo, NY, and this is the second in a series of blog posts on that meeting.

I appeared on a panel with Colin Batchelor (Royal Soc of Chem), Larry Hunter (U. Colo), Alan Ruttenberg (Science Commons), David Shotton (U. Oxford) and Jabe Wilson (Elsevier). The session was chaired by Matt Day of Nature, and the discussion ranged far and wide, but here are the key themes.

On the subject of who should enrich scientific content with tags and other ontological annotations, many agreed with Larry when he said that authors don't have the skills to do knowledge representation. Publishers should hire and train personnel to do this kind of annotation, using industry standard ontologies, where they exist. Others felt that only the authors understand the paper sufficiently to do this task, but this is a strange argument, considering that the purpose of scientific papers is to inform the reader on the chosen topic. Either way, better tools are needed to enable annotation and promote inter-annotator agreement.

The other main topic centered around academic publishers, who are seen as not giving value to the scientific community and not sharing enough with them. Scientific authors provide publishers with their work product for free, but then they and their students have to pay to access. This growing dissatisfaction has led to the death of some technical journals (e.g., Machine Learning) and their replacement by open access publications. Clearly, publishers need to be adding significant value if they wish to continue to charge high prices, and as a good will gesture they need to share more content and data with the community.

Saturday, July 25, 2009

ICBO: The Role of Ontology in Biomedicine

ICBO is the International Conference on Biomedical Ontology, held this year in Buffalo, NY. Ontologies can be defined in various ways; I prefer to think of them as formal systems for describing domain concepts and their instantiations. Topics include: Ontologies for Chemistry and Molecular Biology, Cellular and Anatomical Ontologies, Disease Ontologies, Clinical Ontologies, and IT topics such as computing with ontologies and getting ontologies to work together.

Much of the effort in this area is geared towards extracting biomedical data from various sources and representing it in a computable form. Doing this requires the ability to both represent universal truths about biological systems and describe instances of these truths as applied to individual cases. One can think of ontologies as playing a key role in the irresistible move from dispersed documents to linked data.

Biomedicine contains many difficult modeling problems, e.g., the classification of lipids, where the nomenclature is currently very heterogeneous, and based upon graphical concepts. Even attempts to axiomatize chemistry are full of pitfalls for the unwary, e.g., it doesn't make much sense to talk about a molecule having a boiling point, however tempting it might be to attach the information there, since heat is something that applies to substances, rather than individual molecules. Similarly, the modern view of RNA has changed from its taking a passive role in gene transcription to a more active role in gene expression.

Wednesday, June 24, 2009

"The Big Switch" by Nicholas Carr

Every now and again, you read a book about IT that is well-written, informative, and thought-provoking, while managing to avoid all the usual pitfalls and clich├ęs. ‘The Big Switch’ is definitely one of those books. Nicholas Carr’s previous book (‘Does IT Matter?’) was based on his controversial HBR article, which claimed that IT, though a necessary investment, typically doesn’t convey competitive advantage to companies. In this work, he goes even further, arguing that computing and storage will soon be utilities, like electricity.

Carr explores the parallels with electricity for power and light in the early chapters, explaining how industry switched from generating its own capacity to relying on a grid, once innovations in power transduction, load balancing, and demand metering made this possible. Similarly, he argues that the ability to assemble huge data centers, make software available as a service, and supply cycles and disk space on demand provides companies with the opportunity to get out of the IT business. The resemblance between these two situations, set a century apart, is quite uncanny.

In the later chapters, Carr examines the downsides of IT progress: the stagnation of middle class wages as jobs are computerized, the disembowelment of information and media industries as content is unbundled, as well as subtle cultural and social effects that fall short of the utopian vision. He states that “All technological change is generational change … As the older generations die, they take with them the knowledge of what was lost … and only the sense of what was gained remains.” I think we are all seeing this play out very clearly as the pace of technological change continues to accelerate.

Thursday, May 7, 2009

Judy Estrin: "Closing the Innovation Gap"

Judy Estrin's book, brought out this year by McGraw-Hill, is called "Closing the Innovation Gap: Reigniting the Spark of Creativity in a Global Economy." The subtitle is a bit misleading, since Estrin focuses almost entirely on the US. The “gap” of her title reflects her thesis that that we are now living off the fat of the 1950s and 60s, when spending on R&D was much higher than it is today.

Estrin interviewed about 100 business and technology luminaries for her book, including Marc Andreesen, John Seely Brown, Vint Cerf, to name but a few. Many of the problems she and others identify are cultural, e.g., the get-rich-quick philosophy of Wall Street; the waning of interest in science at school, even as technology becomes more and more a part of our lives; and corporations that would rather offshore R&D and lie in wait for promising start-ups than innovate themselves.

Estrin has an interesting CV that includes successful start-ups, academia, and serving on the boards of major corporations. I was fortunate enough to hear her give a keynote at SIIA's NetGain 2009 in San Francisco, and her talk followed the contents and spirit of the book fairly closely, see my review. Her book ends with 'A Call to Action', but one wonders whether or not it's already too late, given the erosion of what she calls our "innovation ecosystem."

Tuesday, May 5, 2009

Chris Anderson on "Free" at SIIA NetGain

Chris Anderson of Wired magazine, and author of "The Long Tail", gave a fascinating talk today that is aligned with the contents of his latest book, "Free". He began by examining the ambiguity inherent in the English word "free", which can mean (among other things) either the same as "gratis" (i.e., no cost or charge) or can mean unrestricted or independent. (It's clear that sayings such as "information wants to be free" are really a pun on these two meanings, and beg many important questions.)

The key question Anderson asks in this talk is: what happens when key resources become so cheap as to be (almost) not worth measuring, when they become essentially free? The answer is that people (rightly) start "wasting" them. Thus the plummeting price of processing power (down 50% every 18 months) has led to the proliferation of personal computers that people didn't know they needed. Cheap storage (about $110 a TB today) makes a mockery of company policies that expect highly-paid professionals to take time out of their day to decide which documents to delete to meet disk quotas. Cheap bandwidth gave rise to mass media, first radio then TV, where the marginal cost of supplying an extra customer with prime-time, lowest common denominator entertainment is essentially $0. Even now, the cost to stream a Netflix movie is about 5 cents and falling.

Anderson made some interesting statements in the course of this talk which are worth quoting verbatim. Talking about how mammals are unusual in their nurturing attitude towards a small number of young, he states that "Nature wastes life in pursuit of a better life", i.e., the norm is for most of an upcoming generation to perish, while only a few survive and even fewer find better environments. Analogously, people "waste" bandwidth on YouTube, trying anything and everything, just because doing so is a cost-free activity, beyond the investment of time, energy, etc. Of course, survival (measured as popularity) is very highly skewed, but diversity ensures that pretty much every taste is catered for. For example, many 9-year olds would rather watch amateur stop-motion animations of Star Wars made by their friends than watch a conventional Star Wars movie on a plasma TV with surround sound. As Anderson says, "Relevance is the most important determiner of quality", not conventional production values.

The central axiom is the following: "In a competitive market, price falls to the marginal cost." This quote, from the French mathematician and economist Bertrand, has acquired new meaning in the world of the Web, where the marginal cost of any bit-based product is close to $0. It follows (says Anderson) that, sooner or later, there will be a free version of everything.

How does business survive in a "free" world? One strategy is called "freemium". This inverts the 20th century marketing model of giving away a few free samples to drive sales volume. Instead, one gives a version of a product away to the majority to users to drive the adoption of a premium version by a minority of users.

How does this work? The assumption is the only 5% or so of users are really needed to cover your marginal cost. Going big via freemium ensures that this is 5% of a large number. You can then try and drive above 5% to pay back your fixed costs and get your business into the black. Sounds easy, doesn't it? Obviously, your sums have to be right, and adopting the strategy takes intestinal fortitude (as well as access to capital).

The premium version can be differentiated from the free one along any number of dimensions, e.g., it has time/effort saving features, it's limited by number of seats, it's only available to certain users (such as small companies), etc. The lesson from freemium in online gaming is that people will pay to save time, lower risk, enhance their status, and other inducements.

This is certainly a brave new world for the software and information industry, and one that will require many mental (and price) adjustments. But, if Anderson is right, the triumph of "free" is inevitable, in the sense that in future there will tend to be a free version of almost any given resource. I like his account, because it also allows for the other reality, which we live at Thomson Reuters. Namely that people who want more accurate legal research, more up-to-date financial quotes, and deeper analysis of scientific trends, will always be prepared to pay for the advantage so conveyed. The trick is to stay ahead of the value curve and provide must-have information that is both intelligible and actionable.

Nature may waste life, but life is still a zero-sum game.

Friday, March 6, 2009

Whither Technology and Wall Street?

This has not been a good 6 months for the marriage of Wall Street and technology. Hedge funds and algo trading were supposed to stabilize markets (see e.g., here), but it’s arguable that aggressive shorting coupled with increased automation has had the opposite effect. Meanwhile, the one-year old MoneyTech conference was cancelled, presumably due to lack of registrations.

Having said that, it’s obvious that technology will continue to drive trading practices, particularly in the areas of managing trade path risk and seeking alpha in non-numerical sources. As the far as the former is concerned, there is still the challenge of minimizing of transaction costs while making substantial trades without moving prices. Re the latter, as blogs and other non-traditional news and opinion sites burgeon, the trader is left with a bewildering array of information sources, while the programmer is subject to the temptations and pitfalls of sentiment analysis and data mining (c.f., David Leinweber, ”Algo vs. Algo”, The Institutional Investor’s Alpha, February 2007).

There is no way back, and I think that the old truism applies, whereby the antidote to problems created by technology is more and better technology. I also think that, as Moore’s Law runs out of steam, gains driven largely by computing power will have to be augmented by real advances in algorithms and representations. In other words, computer scientists may actually have to work for a living once more.

Sunday, February 1, 2009

Kris Hammond on Frictionless Information at SIIA

The basic message of this talk can be summed up as "Don't compete with Google, but do an end run around the search box by pushing relevant content to users as they create or consume information." Dr. Hammond is co-director of the Intelligent Information Laboratory at Northwestern University. His talk was unusually technical for SIIA, and was greatly enjoyed by some, but not others. (I enjoyed it, of course.)

Frictionless information is defined as "information that proactively serves people based on the context of their activities." The idea is: know the user, be embedded in their workflow, understand their context, and get content to help them. Push from any relevant source and deliver to any format. Two applications were presented which illustrate the concept.

One is a desktop "relevance engine" that suggests content in the research pane of Word, based on what the user is writing. This idea is not totally new (c.f., the Watson search tool), but the magic is all in the execution. How often do you query, what sources, how do you ensure pinpoint relevance and up-to-the-minute freshness? The strategy seemed to be to query a lot initially, e.g., when the user opens or starts a document, and then only present things that are new, e.g., when the focus changes.

The other is called "Beyond Broadcast" - a program that watches TV along with you (presumably embedded in a Tivo box?) that builds a micro-site based on what you're seeing. On the site, there is related Web material, such as news, YouTube videos, blogs, and of course ads. The application apparently takes the kind of show into account, e.g., feature film, comedy show, etc. The client is Titan TV, whose primary business is delivering TV to your PC.

I do like these ideas. At Thomson Reuters, we built a recommendation system based on the relevance engine concept back in 2003, called ResultsPlus. ResultsPlus reads queries being run against case opinions on Westlaw and suggests other sources of information, such as briefs, law reviews etc. Thanks to high relevance driven by personalization, this service has been a huge hit. Annotating video with Web and other data is another active area of research for us, and will appear in a new product launch shortly.

I believe that traditional publishers can indeed preserve their relevance in the face of new media if they leverage their domain expertise, and that of their users, to support customers proactively in the performance of common, but high value, tasks. Basic search is now a commodity, but what I am describing here (which I call "expert search") is not a commodity, since it requires both scarce knowledge and the existence of an online community.

Mark Walsh on "micro-scripts": SIIA Interview

Mark Walsh is a bone fide pundit of politics and media, and also the CEO of GeniusRocket, a crowd-sourcing company for creative content and brand marketing. If you accept the conventional definition of Washington as "Hollywood for Unattractive People", then you have to believe that politicians are typically judged more by what they say than how they look, and that how they say it is increasingly important as 24/7 media coverage intensifies.

Mr. Walsh suggests that politicians have started using "micro-scripts" to penetrate the consciousness of an electorate suffering from Attention Deficit Disorder. He cited the use of terms like "maverick", "(bridge to) nowhere", "lipstick" and even "change" (as in "change we can believe in") as evidence that the last election had gone in this direction. Of course, Madison Avenue hit Washington a long time ago, so in some ways there's nothing new here.

I struggled a bit to see how a micro-script is different from a slogan or a sound bite. A micro-script seems to be a sound bite that has become institutionalized, and therefore part of the brand. It also brings with it back story, or a set of assumptions, that is hard to question once it gets established. (Dan Schaible of BurrellesLuce has a good take on this.)

Mr. Walsh also suggested that President Obama's tendency to speak more slowly and deliberately may end up having the effect of "human Ritalin" on the current political debate, making us all more thoughtful. One certainly hopes so. As I said in my last post, and as Henry Blodgett suggested at SIIA, the Internet does have the ability promote truth and reason, so long as the debate is lifted above knee-jerk phrases (like "big government") and disinformation (the Swift boat campaign).

I am reminded of an interview I saw on British television many years ago, where someone asked an aged contemporary of Wyatt Earp whether or not the legendary marshall really was fast with a gun. "No, he was deliberate," the old man said. In other words, he took his time and made his bullets count. I think that his may turn out to describe Barack Obama quite well, and it stands in stark contrast with some recent presidents, who have generally adopted a "shoot from the hip" philosophy.

Saturday, January 31, 2009

What if we'd had the Internet in the 60s?

The recent rise in the use of the Internet by grassroots political organizations made me wonder what the 60s would have been like if we had all had Internet access. Would political discourse have been more reasoned and textual, rather than slogan-oriented and demonstration-based? Would the counter-culture have been more cerebral, and less laced with sex, drugs and rock 'n' roll? (I always thoughts that drugs and politics were a bad mix, because it gave the establishment the excuse to arrest you.)

These may seem like a frivolous questions, but I don't think they are. People can only express themselves using the media at their disposal, and we see extreme cases of this, where people sometimes resort to guns and bombs because they have no voice. (People resort to guns and bombs for other reasons, of course, e.g., because they can.) Even public demonstrations (I went on a few) are a rather blunt instrument when it comes to getting your point across.

If people feel they can make a difference without burning cars, occupying buildings, or simply shocking the bourgeoisie, then I suspect that many will choose less confrontation instead of more. Obviously, some folks enjoy taking to the streets, witness the WTO meetings of recent years, which have served as a magnet for protest groups of all kinds. But I like the fact that the new administration is leveraging technology to promote a more meaningful dialog than is possible across a police cordon.

Glenn Goldberg of McGraw Hill: SIAA Interview

McGraw Hill has 6 businesses in their Information and Media group: Aviation Week, TV Broadcasting (all ABC affiliates), Business Week, JD Power, Construction, and Platts (energy pricing). Glenn Goldberg is the President of this division. (Their other businesses are in the Education and Financial Services divisions, the latter being basically Standard and Poor's.)

JD Power serves global marketers with surveys (customer satisfaction, etc) and other information in a variety of industries, including: automotive, environmental/energy, and healthcare. Since McGraw acquired them some 4 years ago, they have also moved up the value chain by providing consulting, e.g., how do you launch brands. Surveys are also augmented by sentiment analysis these days, thanks to the recent acquisition of Umbria. This company's computer scientists scrape web sites, such as blogs, to do data mining on relevant postings and aggregate the results demographically.

My comments: These may sound like moderately robust businesses to be in, and revenues in I&M grew about 4% YOY 2007-2008, but increases in profitability seem to have come from restructuring, involving layoffs and reductions in incentive compensation. Meanwhile, Business Week joins the rest of the magazine industry in suffering somewhat from falling ad revenues, as well as the normal recessionary effects. BW is online, of course, but online ads don't compensate for the decline in print ads, since the margins are much lower.

Mr. Goldberg gave good advice when he exhorted information companies to create communities as well as content, curating other people's content for their customers to consume. He also advised aligning technology and business and generally holding people accountable, easy to say but surprisingly hard to do in many companies.

Interesting developments at McGraw Hill in recent years include the use of the AutoDesk web service to connect Construction customers with AutoCAD software, the growth of the Dodge network for aggregating and distributing information about building products, and Platts' continued expansion beyond its roots in petroleum markets to encompass other energy sources.

Wednesday, January 28, 2009

How to Monetize Social Media: SIIA Panel

Given these troubled times, it's not surprising that people are worrying about how to make money from nifty and cool things like social media platforms and video (see previous post). This panel featured John Blossom (Shore Communications), Joe Robinson (A Small World), Rob Barber (Environmental Data Resources), Chuck Shilling (Neilsen) and Shawn Gold (SocialApproach) as moderator.

The short story is that in order to make money you have to layer stuff like ads, surveys, virtual goods and services, games, etc. on top of social media platforms. We have all seen such things on Facebook, where groups, fan bases, and other coalitions drive 'word of mouth' marketing through memberships, testimonials, and iconography including badges and widgets. We are also seeing more e-commerce on sites like mySpace, where people sell CDs, T-shirts and jewelry.

Neilsen has a well-known 'listening platform' called BuzzMetrics that monitors mentions of brands, sentiment around them, and even proximity of competitor names to the brand names in text. EDR aggregates public environmental data from government sites such as EPA for people doing property due diligence and the like, allowing them to update the information. A Small World is a subscription portal for globe-trotters to share information and meet like-minded people, e.g., going to the same conference.

My own comments: Facebook is rumored to have 2008 revenues of around $250M, which sounds OK until you realize that they have significant expansion costs, and that adding customers probably isn't making them much money. They got about this amount from the Microsoft deal in 2007, but they're still smaller than mySpace and they're not cash positive yet, so they must be wondering where this year's growth is coming from.

Profiting from Video: SIIA Panel

This panel featured Sandy Malcom (, Andy Plesser (Beet.TV), Kathy Yates ( and Nicholas Ascheim (NYT) as moderator. I really enjoyed listening to this conversation, and took copious notes. The following summary captures the gist (I hope), but doesn't always attribute who said what, or who agreed with whom, and occasionally mixes in my own observations.

The Web is becoming more like TV, but estimates of the video ad market have always been overblown. In the summer of 2007, it was a little over $500M, about half of what had been predicted. Meanwhile, CNN is churning out 100 news clips a day, and niche players like Beet.TV and AllBusiness are building lower cost libraries of less perishable goods for much narrower audiences.

The CNN story is pretty instructive, I think. They started with online video in 2002, in partnership with Real Networks, and kept their product behind a pay wall. Then they made it free, except for live video, employing ads and cross-platform selling (presumably of their other service offerings). Now it's all free, including the live stuff, and bloggers, etc, are allowed to link in.

Beet.TV, meanwhile, makes 10 videos a week, mostly of tech gurus and academics who want to get on the small screen. Some, like Akamai and Adobe, want to reach out to customers. (Thomson Reuters has used Beet.TV to highlight its Open Calais service.) Others are probably building their personal brands and promoting their careers. This can be done at relatively low cost with webcams and Skype. AllBusiness spends about $250-$300 per 3-5 minute video, which includes the cost of a videographer and a production assistant.

Video search is still a wide open field, guided mostly by metadata tags, blog context and other cues. Ads can travel with a video, embedded in a branded player. YouTube (like Google) has adwords available to be associated with video content. And Google is making video more discoverable through its ordinary, universal search engine.

Bottom line is that YouTube is monetizing less than 5% of its clips, while even CNN must only be making a few thousand per clip in revenues, with production costs 10 times that of Beet.TV et al. Increased use of syndication may be part of the answer, in which ad revenues derived from product are split. (Thomson Reuters already does this in both directions.) In any event, the convergence of Web and TV is proceeding apace.

Henry Blodgett at SIIA: Online Journalism

Henry Blodgett is CEO of Silicon Valley Insider Inc., has a substantial background in financial services, and his book, The Wall Street Defense Manual, came out in 2007. His talk was on "The Rise of Online Journalism", and concentrated on how sites like Gawker Media and Huffington Post are competing with "old media", i.e., print newspapers who have gone online, with varying degrees of success. It was an entertaining and informative talk, which I attempt to summarize here.

New media isn't just old media put online - it's conversational (informal), interactive (you can talk back), 'snackable' (delivered in bite size chunks), real-time (as the story unfolds), and involves both serious aggregation of multiple sources and the kind of high-velocity production you associate with broadcasting rather than text. You float stories, get reactions, refine, correct, elaborate, and hopefully converge on the truth. It mixes video, images, text, audio, etc (a.k.a. omnimedia).

Some interesting facts. Gawker has multiple Tivos running at any given time, watching TV to find interesting clips. HuffPo aggregates stories from all over, and puts its own spin on them (usually political). Gawker now has 2 times the online traffic of the LA Times, while HuffPo has overtaken the Boston Globe in online page views. This despite Gawker haviung 80 editors to LA Times's 200; HuffPo has about 20 editors apparently.

What does it all mean? Mr. Blodgett claims that we are seeing "creative destruction" of old media leading to a better future. It's hard to be cheerful about destruction at the present time, but it's also hard to disagree that B2C information sources like traditional newspapers are in a difficult bind. If they embrace the Web, it's hard to build margin with online ads, but if they stay offline, or just give it away, they're in trouble anyway.

Wall Street Journal's hybrid model is held up as a good compromise. Unlike New York Times, they do charge a subscription for their online property (I have one), but they are also crawled by search engines like Google, and monetized by ads. Consumers still get quality journalism, but now their are multiple routes to consuming it. It remains to be seen how sustainable this is, or whether it works for everyone, but in the meantime newspapers (like our own Minneapolis Star Tribune) are struggling.

Software & Information Industry Association

The next several posts will attempt to summarize the talks and panels I enjoyed at the SIIA's Information Industry Summit in New York. In the interests of full disclosure, I should say that I am a member of the board of their Content Division, and that I was also on the steering committee for the conference. Nevertheless, I will try to be objective.

I have attended the Summit every year for the last 5 or so years, because I find it a very convenient way to keep up with what is happening in all corners of the content industry, and the software companies that serve that sector. It is also a pleasant social occasion, with plenty of opportunities for networking.

I thought that highlights of the meeting included Henry Blodgett's keynote this morning, panels on video and social media, and a technical briefing by Kris Hammond. These and other items will be summarized in subsequent postings. It may take a while for me to get it all up there, as I do have a day job!

Sunday, January 11, 2009

Google doing "semantic search"?

A posting on Facebook by Frank van Harmelen drew my attention to the fact that Google now attempts to perform a kind of question answering. E.g., if you type "capital Poland" into the search box, the first result is as follows:

Poland — Capital: Warsaw 52°13′N 21°02′E / 52.217, 21.033
According to

The answer appears to be a straight lift from the relevant table in Wikipedia. But there are other instances where the answer appears to have been extracted from running text (not always correctly). E.g., if you type "Mick Jagger wife" you get:

Mick Jagger — Spouse: Jerry Hall
According to - More sources »

which is wrong, since Jerry Hall is one of his ex-wives. Ms Hall and Bianca Jagger are both mentioned as such in the referenced article, although each is described as as "ex wife", i.e. without the hyphen. It may be significant that Jerry Hall is mentioned after Bianca Jagger, thereby filling the slot perhaps.

Not sure how this is done or where this is going, or how long it has been going on. The first search result has previously been used to present a definition (e.g., try simply typing "e" - you get the mathematical constant), but this is the first time I have noticed relationships being presented.