Data and Its Derivatives

I took a trip in the Wayback machine last week when I attended DataWeek at the invitation of Dun and Bradstreet. As you might recall D&B is a data company collecting copious amounts of the stuff about individuals and companies and selling it for use in filling out profiles for business purposes — an over simplification but it will suffice. That data helps companies evaluate business risk such as when to extend credit to an unknown entity and companies happily subscribe to the service to help with decision-making.
D&B has been doing this kind of business for decades and my Wayback machine experience had little to do with them directly. It was more about all the other companies that were at the show. They were embryonic for the most part focused on Big Data and analytics and the experience was much like a few decades ago when the tech industry was beginning and every company had data or technology in its name.
Then and now the discussion was about data though in the earlier incarnation the conversation focused on just having data in digital form. Today it’s all about storing the stuff and extracting meaning from it in the form of information that businesses can use to create the knowledge they need to make decisions.
DataWorld is an interesting venue for a certain kind of technologist interested in data analysis but I’ve seen this movie. There was very little discussion of business or of the cultivation of data beyond analyzing it for some not-well-defined downstream uses. So my Wayback experience was all about what typically happens in early markets. Vendors talk about their technologies hoping that early adopter buyers will find merit in the goods and be willing to attempt doing something useful with them.
From my perspective the proceedings seemed to mirror what I’ve been discerning from data I’ve collected on the business uses of data, which is to say that it really is early days for business too. While many companies are adopting analytics and approaches to Big Data to help them make better sense of their businesses, many are still using the new technology to answer old questions rather than explore new ones.
For instance, one of my big issues lately has come down to time-stamping data in sales and marketing funnels. You might think this idea is old enough to have universal acceptance but it doesn’t. Time stamps on major milestones in either sales or marketing can augment the data we collect routinely from all sources and lead to a great deal of new information.
For example, a simple subtraction operation on two time stamps can tell you how long it takes to do something such as move from one stage to another in a process and it therefore turns static data about a process into a real representation of the process itself. With that you can begin to determine average speed through a stage or the entire pipeline and deduce — for predictive purposes — what a “good” opportunity looks like from your data. Prediction is one of the grail quests of modern analytics and not time stamping shows how early we are in getting it all right.
That’s really the utility of big data in a nutshell and the opportunity of predictive analytics; a simple tweak that provides significant business value. But we’re not there yet, at least not all of us and that brings be back to the Wayback machine. DataWeek convinced me of how early we are in the data revolution and how much we need to do to truly enable our front office business processes to run on information.
While we’re at it, perhaps this is a good time to seriously consider what we call the era that is being shaped by big data. While it’s true that data is at the core, the business need, and something that wasn’t much in evidence at DataWeek, is for knowledge. Knowledge, and not raw data, is what people use to make business decisions and it is delivered whenever information enters a brain to be manipulated and evaluated. Naming a show, or an era for that matter, after its most salient feature rather than after some benefit is not the best way to enlist powerful new followers and the big data movement could use a few more big thinkers.
While big data and data more generally is all the buzz these days, it is not helping people envision a solution that makes life better or more profitable. Perhaps this is a partial explanation of the hype-cycle, the early phase of a market when the promise of a new technology begins to greatly outstrip its ability to deliver tangible results. We’ll get through that phase as we usually do and on the other side things like DataWeek will get new names as we figure out how to make money through simple applications of big concepts — like time stamping.
Data, Information, and Knowledge

I still see far too many examples of content confusing the ideas of data and information. Sometimes, it seems like a writer is simply trying to not be redundant when he or she uses data and information in the same sentence to mean the same thing. But of course they are different and the result is unnecessary confusion.
I just wrote a paper for a European law journal on the topic and I learned more about the topic than is healthy for one person. The piece will be out in August. Generally, I admire the effort the Europeans are making to get it right though they are less concerned with data and information per se than they are with privacy and security. These things all intersect but in sometimes unpredictable ways. The more I think about things the less I am sure of and the more questions I have.
The European parliament is trying to figure out laws that protect individual rights to privacy, which necessarily affect what data is kept and what is not. That makes sense, and it sounds simple, but how do you do that? Does a person walking on a street have a right to privacy and thus a right to determine how you use a crowd photo? And what if a corporation like Google or a government takes the photo? Are we to prevent photos based on the premise that someone someday might do something to a person in one of the photos based on the picture? From there it gets silly but there are some concrete situations that are nothing to laugh at.
Take the case of a nurse in Connecticut who was arrested for possessing a small amount of pot. According to an article in the New York Times, the case was dismissed when she agreed to take some drug education courses. In the good old days, that would have been the end of it because according to Connecticut law, and the laws in many other states, her record was wiped clean with the dismissal. Connecticut law says she can even testify under oath that she has never been arrested now that the record has been cleared.
That all makes good sense to me. It might not be factually correct but these expungement laws are one of the fictions we create in modern life to keep the world spinning. But with the Internet there’s no such thing as expungement and a search still comes up with the original news article that, while true when it was published, is now false. It matters because this nurse can’t find a job any more thanks to the simple expedient of doing a rudimentary searching on every new job applicant. What to do? She’s suing the news organizations that wrote the story for slander but the story was true when it was reported. Yikes!
The Internet and our modern world are full of examples like this. Society used to be able to conveniently forget small indiscretions and we all got on with life. But that’s being taken away without anyone even giving permission or law being made. The Internet is the defacto repository of all things digital about us, but should it be? The Europeans take all this very seriously and perhaps we should too.
It seems to me that the biggest issue we have with data and information today is not data security even though lots of it gets stolen — I’m talking to you People’s Liberation Army unit 61398. In fact, I think we’ve put too much emphasis on physically securing data and given too little thought to how it is transformed into information. After all, data by itself is useless. An MP3 file is garbage without software to render it into a song, which is a kind of information. Ditto with your bank balance and the video you shot over the holiday or the formula or source code for a new product.
Wouldn’t we be better off focusing on data transformation? A new photo sharing service SnapChat takes this approach by delivering photos that disintegrate after ten seconds. That’s far from ideal for most applications but it’s on the right track. Generally, I think data ought to be handled like milk in a supermarket; it ought to have an outdate after which it automatically becomes archival. You might be able to access archival data but transforming it back into its original information content would have to be restricted in some way.
Look, we can still access information about various flat earth theories but we all know this is archival and historic but no longer scientific. Some of us can still take it seriously if we want too but we can’t take it to the bank or whatever, you know what I mean. We don’t have anything like that for data yet — something that says this does not yield not the information it once did. On another parallel path, if we were better able to control the conversion of data to information so that only the data’s owners could de-encrypt it, might we have less data theft and the loss of intellectual property that goes with it?
If any of this makes sense then it’s not data security we should be focused on as much as secure data conversion or transformation into information — those are different issues with different approaches. When you think of it this way the differences between data and information are starkly clear. It gives us all good reason to consciously choose the right words to convey our meaning.