Last week Ayasdi came out of stealth mode and told the world it had a new way to analyze big data and I think the implications for CRM and social are very large indeed. The new way is called topological data analysis (TDA) and it has the feel of hearing about Relativity for the first time (or Salesforce.com) and learning that space is curved. Who would have thought it but Big Data is not some amorphous mass but something with topology, an entity with curves, and folds and shapes?
Why is that important? Well, understanding the shape of data turns out to be, mathematically, a short cut to understanding it or to extracting meaning from it. Shapes include clusters and they can tell us where the interesting bits are. Consider the implications. No longer does one have to be inspired to ask good questions of data so as to write queries that deliver information. With topological data analysis, you can first identify the interesting clusters of data and then ask what’s so interesting about that?
It’s a big shift in your perspective and maybe your philosophy. Certainly it takes human race down a notch in its own esteem. Now we don’t rack our brains to ask piercing questions of our data, we have machines that do it better so we have to stand back and watch. This may seem odd, but what if there’s a discovery lurking in your data that you were never inspired to ask about? Would the data hold its secrets forever? Well not any more.
Right now, topological data analysis is a very geeky mathematical concept, just a couple of years removed from Stanford and a Darpa lab but the potential it holds is bigger than anything else we’ve been discussing.
I believe that the Information Age is winding down, just like the Age of Steam did and just as all “Ages” do. That’s not to be feared but something to be embraced. What will take the place of information as the major disruptor and economic driver? Whatever it is, it will have to stand on the shoulders of the Information Age and use the latest and greatest tools. Part of that means topological data analysis for the simple reason that our ability to exploit discoveries in pharmaceuticals and oil and gas, to take two for the moment, is maxing out.
It costs upwards of $100 million to drill an oil well in the Gulf of Mexico; it takes a team of people a few billion dollars and a decade to bring a new drug to market. It hardly gets said but these investments cost the same whether or not the oil well has oil at the bottom of it and it’s the same story if the pharmaceutical comes a cropper. Those numbers are big — so big that they represent ceilings to further discovery unless we find breakthroughs that will reduce the costs and the risks of getting it all wrong.
Already we’re seeing topological data analysis crack some amazingly hard nuts in the aforementioned pharmaceuticals, oil and gas but also in financial services and government. Anywhere there’s big data there is an opportunity for topological analysis and that means the mass of social data we generate too.
People at Ayasdi tell me that when they apply topological data analysis to twenty-year old data from pharmaceutical research they find new and interesting information. So far I don’t think they’ve come up with any new drugs but it’s early days.
The market has other entrants too and while Ayasdi might be taking the highest road to the biggest customers, and perhaps the hardest problems, other companies using machine learning are implementing roughly the same idea. Consider Mintigo for example. This company focuses on identifying sales prospects, which is not the same as generating leads, but it’s a cool and important idea nonetheless and essential in many industries.
Mintigo analyzes existing customers to build a sophisticated data model of what a successful customer looks like for your organization. This is to say that Mintigo looks at the data given off by those customers and identifies the clusters of relevant data that qualifies them as a match for your company and its products. From there it’s a simple matter of targeting the machine’s model on the general marketplace to see what it drags in. They call it identifying your CustomerDNA.
Call it CustomerDNA or TDA or more broadly, machine learning. Whatever you call it, we’re on the cusp of another revolution that simplifies a major headache and reduces the cost of important business processes to manageable levels again. With these as catalysts can new discoveries and economic growth be far behind?
It is very hard to pinpoint a disruptive innovation and the moment it hits the market and I have said this many times before. It’s easy to know that you use this or that technology today and couldn’t imagine living or working without it but, really, can’t you imagine the time before you started using this stuff? Social media might be a good example since most of today’s users still kind of remember what life was like before. Today is one of those days, I think.
Today Ayasdi came out of start-up stealth-mode and announced itself and you can see an article about it here in the New York Times.
So what is it in a nutshell? It’s only the first really new and different way to analyze big data since we started collecting it. Ayasdi uses something called topological data analysis and here’s one place where it’s different. Rather than type a query or ask for a report from a big data set, Ayasdi just looks at the whole data set and tells you where the interesting clusters of data are — clusters in places you may not have thought about.
So that means you no longer have to more or less know what you are looking for to use analytics, you simply need to know that you want to understand the interesting clusters. That’s a disruptive moment if you ask me — presuming it works as well as the early hype says it does. To me this sounds more like advanced data mining than business intelligence but I am not an analytics guru so this is simply speculation.
So what’s it good for? Well, if you’ve collected a lot of data about a molecule you think might have beneficial pharmaceutical properties, rather than performing a lot of screening tests, you might first examine the data topology and then investigate where the data says there are interesting relationships. And, yes, substitute customer for molecule in the above and more interesting things happen.
As with any disruption, it’s hard to think of what the world will be like in the aftermath, but if this works as advertised a few years hence we might all be scratching our heads trying to recall what life was like before.