Monday, May 31, 2010

Spring Conferences, Part 3

SIIA NetGain was a one-day meeting in San Francisco this year, followed by a one-day field trip to Adobe, Google and Apple campuses. I will only cover the highlights here.

Ann Michaels chaired a good panel on social networking with Johna Burke (BurrellesLuce), Tabrez Syed (Spiceworks) and Serena Wellen (Lexis-Nexis). Most interesting segment was on Lexis-Nexis Communities and Martindale-Hubble Connect. Communities is an open networks that combines free and fee-based content currently aimed at 28 communities, some of which are practice-based (such as bankruptcy and insurance) and some of which are role-based (associates and paralegals). The content mix contains news, blogs, podcasts, video and ads. MH Connect is a closed referral network with multiple subscription levels, including free, built on top of an existing attorney directory. L-N gets content from contributors, who are also their customers, and monitors the spending of members in other parts of the business. The idea is that subject matter experts who generate content drive the visiting and viewing behavior of the larger population, providing revenue opportunities.

Andy Weissberg of Bowker chaired a panel on "How E-Readers Are Changing the Publishing Game" with Colin Crawford of Media7 and and Miles McNamee of Copyright Clearance Center (CCC). There was some discussion of how music went from paid analog (legal) to free digital (illegal) and to subscription digital (legal). Obviously there are some parallels to the book business, in that getting quickly to a convenient and affordable service model is in everyone's long term interest. But publishers now need to worry about the pirating of customers, even if the content ownership issue is settled. Customers, and the data they provide, are an important (if neglected) asset of publishing companies, and right now neither Apple nor Amazon is sharing customer data with content owners. (Google has no plans to do so either, as we discovered later in the context of Google Editions.)

Ken Doctor chaired a panel on "News Start-Ups" with Robert Rosenthal of the Center for Investogative Reporting and Jonathan Weber of the Bay Citizen. I found the idea of start-ups getting into the turbulent news business a little odd, until I realized that CIR is a non-profit, and the Citizen plans a public radio style business model of sponsorships and grants from foundations. Both plan to experiment with new business models, including ads and subscriptions, in order to sustain real journalism, as opposed to op-ed style blogging. CIR plans to move to video, in search of "more CPMs", though one wonders if they understand the cost structure of shooting and storing large amounts of rich media content, or the fact that even Google's highly efficient CPC model is unable to fully pay for their YouTube portal. (CPM on highly popular Facebook is a mere $1, compared to AdSense's $25.)

I may write about the Adobe and Google visits later; I jumped ship at Google and didn't proceed to the Apple campus.

Spring Conferences, Part 2

The New Directions in Text Analysis conference at Harvard was an eclectic gathering that mixed social scientists and computer scientists interested in tools for mining or otherwise taming large amounts of text. I will only describe a handful of the papers here.

The conference began with a technical paper about statistictical tools for data driven science policy decisions by Hanna Wallach of UMass that focused on Latent Dirichlet Allocation for topic models. A topic is operationalized as a specialized probability distribution over all the words in the vocabulary. Documents can be considered as the product of an underlying generative model that may have hidden structure, and whose parameters must be learned from observables by statistical inference.

Lars Backstrom of Facebook gave an interesting paper about memes: topical text fragments that persist through many articles. He described a graphical model that represents quotes and misquotes as nodes linked by weighted edges, the weights depending upon repetition and edit distance between the corresponding strings. The graph can be partitioned into memes by deleting low-weight edges (NP hard, but can be managed with heuristics). See the MemeTracker web site for more details.

Jure Leskovec of Stanford's paper looked at influence and dynamics in online media, especially the interplay between mainstream media (MSM) and the blogosphere. Different memes seem to have different temporal signatures, depending on where they originate and who adopts them. Most big stories break in MSM and are then taken up by blogs; a lesser number break in the blogosphere and are taken up by MSM. The time/volume graphs of these events look quite different when you control for scale and duration.

Does bad news travel fast, or at any rate faster than good news? Previous research by Berger and Milkman on NYT articles suggested not. The paper by Michael Macy of Cornell presented a data mining study of Twitter that showed that positive affect dominates all forms of tweet. Furthermore, a study of half a billion US tweets showed that certain words have a pronounced diurnal rhythm when you adjust for time zones, including some words you might expect ("coffee") and words you might not expect ("random").

All in all, a worthwhile meeting that also showcased some Harvard research into such topics as data visualization, Chinese history and Japanese politics. I gave a talk on 15 years of R&D at Thomson Reuters that shared some of the things we have learned along the way about text analytics, machine learning, and user data.

Spring Conferences, Part 1

I recently attended SIIA NetGain, including site visits to Adobe and Google, plus a couple of conferences at MIT and Harvard. This may take more than one post, but I wanted to share my impressions of these meetings, since I took quite a few notes. (I'm currently in LA for D8 - All Things Digital - but that will form the subject matter of a subsequent post.)

Let's start with the Center for Digital Business conference at MIT. The morning session was mostly about "web morphing", i.e., how do you adapt a Web page dynamically to a visitor's cognitive style. Cognitive style was defined along a number of dimensions, e.g., verbal/graphical, small/large infomation load, active/passive, etc. The dependant variable was consideration (i.e., is the user really thinking abut your product) rather than sales, and this was measured in terms of clicks. Glen Urban's work on ad morphing was particularly interesting. It varies ads along dimensions like more or less visual, more or less detail, and used a 2x2 cognitive matrix tat combined deliberative/impulsive with intuitive/rational. Morphing ads in the right direction got a lift in all quadrants, but the biggest lift came from the rational-deliberative users, who presumably got the technical detail they were looking for.

Another fascinating study used fMRI technology to examine the neuropsychology of financial risk. (fMRI scans use magnetic fields to map blood flow in the brain. When an area is active, blood flow increases.) The nucleus accumbens (NAcc) appears to implicated in the anticipation of a reward, while the amygdala seems to implicated in matters of trust. One finding was that faces of financial consultants that had been digitally morphed with the user's face were deemed as more trustworthy! In other words, we trust people who look like ourselves.

The afternoon sessions were focuseed on "digital advantage", i.e., how to get competitive advantage out of IT and innovation. Andrew McAfee argued that, while the price of digital assets is falling linearly on a log scale, this trend does not benefit all companies equally. High tech industries show a greater spread of gross profit margins since the mid 1990s, suggesting more competitiveness than other industries. Johnson Sikes of McKinsey reported on a survey done jointly with MIT which found a correlation between data-driven decision-making and productivity, showing the benefit of having highly-qualified staff who are given access to data for analytical purposes.

Michael Cusumano gave a sneak preview of his forthcoming book, Staying Power, in which he identifies six principles relevant to strategy and innovation in an uncertain world: platform (not just products); services on top of products and platforms; capabilities (not just strategy); pull (don't just push); scope (not just scale); and flexibility (not just efficiency). I look forward to reading the book when it comes out later this year.

In addition, there was a lunchtime session, in which Sherry Turkle gave a talk entitled "Alone Together" about the darker side of social networking. She posed the question, do the technology affordances serve our human purposes, or do they exploit our human vulnerabilities? Her extensive studies with both adults and teenagers suggest that people are lonely, but fear intimacy. Asynchronous social communications allow us to be very controlling with respect to the amount of time and emotional exposure we grant to people. Even a phone call is too much commitment for many of us; we would rather email, post or text. She made many memorable points, e.g., "intimacy and democracy require privacy" yet "we have become the instruments of our own surveillance." The tech mantra that people who have nothing to hide have nothing to fear from the Googles and Facebooks of this world completely misses this point.