Machine Learning, TDA and the Future of Invention
Last week Ayasdi came out of stealth mode and told the world it had a new way to analyze big data and I think the implications for CRM and social are very large indeed. The new way is called topological data analysis (TDA) and it has the feel of hearing about Relativity for the first time (or Salesforce.com) and learning that space is curved. Who would have thought it but Big Data is not some amorphous mass but something with topology, an entity with curves, and folds and shapes?
Why is that important? Well, understanding the shape of data turns out to be, mathematically, a short cut to understanding it or to extracting meaning from it. Shapes include clusters and they can tell us where the interesting bits are. Consider the implications. No longer does one have to be inspired to ask good questions of data so as to write queries that deliver information. With topological data analysis, you can first identify the interesting clusters of data and then ask what’s so interesting about that?
It’s a big shift in your perspective and maybe your philosophy. Certainly it takes human race down a notch in its own esteem. Now we don’t rack our brains to ask piercing questions of our data, we have machines that do it better so we have to stand back and watch. This may seem odd, but what if there’s a discovery lurking in your data that you were never inspired to ask about? Would the data hold its secrets forever? Well not any more.
Right now, topological data analysis is a very geeky mathematical concept, just a couple of years removed from Stanford and a Darpa lab but the potential it holds is bigger than anything else we’ve been discussing.
I believe that the Information Age is winding down, just like the Age of Steam did and just as all “Ages” do. That’s not to be feared but something to be embraced. What will take the place of information as the major disruptor and economic driver? Whatever it is, it will have to stand on the shoulders of the Information Age and use the latest and greatest tools. Part of that means topological data analysis for the simple reason that our ability to exploit discoveries in pharmaceuticals and oil and gas, to take two for the moment, is maxing out.
It costs upwards of $100 million to drill an oil well in the Gulf of Mexico; it takes a team of people a few billion dollars and a decade to bring a new drug to market. It hardly gets said but these investments cost the same whether or not the oil well has oil at the bottom of it and it’s the same story if the pharmaceutical comes a cropper. Those numbers are big — so big that they represent ceilings to further discovery unless we find breakthroughs that will reduce the costs and the risks of getting it all wrong.
Already we’re seeing topological data analysis crack some amazingly hard nuts in the aforementioned pharmaceuticals, oil and gas but also in financial services and government. Anywhere there’s big data there is an opportunity for topological analysis and that means the mass of social data we generate too.
People at Ayasdi tell me that when they apply topological data analysis to twenty-year old data from pharmaceutical research they find new and interesting information. So far I don’t think they’ve come up with any new drugs but it’s early days.
The market has other entrants too and while Ayasdi might be taking the highest road to the biggest customers, and perhaps the hardest problems, other companies using machine learning are implementing roughly the same idea. Consider Mintigo for example. This company focuses on identifying sales prospects, which is not the same as generating leads, but it’s a cool and important idea nonetheless and essential in many industries.
Mintigo analyzes existing customers to build a sophisticated data model of what a successful customer looks like for your organization. This is to say that Mintigo looks at the data given off by those customers and identifies the clusters of relevant data that qualifies them as a match for your company and its products. From there it’s a simple matter of targeting the machine’s model on the general marketplace to see what it drags in. They call it identifying your CustomerDNA.
Call it CustomerDNA or TDA or more broadly, machine learning. Whatever you call it, we’re on the cusp of another revolution that simplifies a major headache and reduces the cost of important business processes to manageable levels again. With these as catalysts can new discoveries and economic growth be far behind?