Skip Garner of the UT SW Medical Center at Dallas gave an interesting talk on text data mining, primarily for drug discovery. Their primary tool seems to be a paragraph level text similarity tool, called eTBlast, which powers a search engine with some fairly powerful results post-processing capabilities. One of their goals is to build a literature network based on biomedical entities, and they have a Swanson-style approach in which they look for similarities across different domain literatures. Results are input into their IRIDESCENT hypothesis generation engine, which looks for interesting connections, e.g., among microsatellites, motifs, and other DNA sequencing phenomena in different species.
Another application area involved coming up with new therapeutic uses for existing drugs. It takes 15 years and costs $500-800M to develop a new drug, so finding new uses for old drugs has the potential to cut costs and boost profitability. Also, many old drugs are now coming off patent, so new uses would allow reformulation and refiling. The classic example of recent years is Sildenafil citrate (a.k.a. Viagra), which began life as a drug for reducing hypertension. IRIDESCENT has discovered that Chrlopromazine (an anti-psychotic) may be efficacious against cardiac hypertrophy (swelling of the heart), and also found anti-biotic and anti-epileptic agents that might help, based on side effect data mined from the literature.