It is sometimes refreshing to read disconcerting things about your industry. A few recent blog posts on the subject of “Big Data” have raised pertinent questions and legitimate concerns about the trajectory of data in business, of what it means to be a data scientist. There is no shortage of gloom and doom in the world of technology, to be sure. There are always voices telling us to be careful, to not count our chickens, to be wary of getting big-headed or falling prey to hype. For the most part, these kinds of conversations are healthy and constructive, and while they may make us uncomfortable, they offer valuable insight from differing perspectives.
Take this post by the blogger mathbabe. As a data scientist and PhD, she unequivocally states her belief that the future of Big Data is in peril, owed mostly to an overabundance of hype and an industry-wide misunderstanding of what Big Data can do. In “The Bursting of the Big Data Bubble“, she writes:
It seems like data and the ability to use data is the secret sauce in so many of the big success stories. Look at Google. They managed to think of the entire web as their data source, and have earned quite a bit of respect and advertising money for their chore of organizing it [. . .].
Okay, so using Google as a model is a bit like aiming at the moon. This is a perfectly valid criticism, exactly for the reasons she gives: just because a company like Google, with its vast resources of technology, money, and personnel are able to create these big, bold solutions to problems doesn’t mean anyone with a computer can duplicate them. Fair enough.
Her next criticism addresses why this occurs, and she blames it on two factors: a lack of good communication between business users and data scientists, and a lack of standards for those data scientists. The first criticism is easy to see, because most people in the technical industries already understand the inherent problems in communication between disciplines — indeed, there are entire fields of writing dedicated to making technical stuff easy to understand for the layperson. And everyone knows what it is like to deal with a person/division/department in their own business that doesn’t quite understand what they do. So, it is not surprising that business folks and data scientists end up with crossed wires and unrealistic expectations.
The second point is more problematic, insofar as she is 100% correct. There does not seem to be any universally recognized qualifications for a data scientist, at least not yet, and this poses a huge problem for the company who is shopping around for one, whether as a new hire or as a potential vendor. This makes it very easy for the less-than-qualified entity to be placed in a position of respnosibility or authority, and without the proper experience it is very probable that they will mess up your Big Data project.
But is it really all bad? Another post by Matt Asay points a better (though still cynical) picture of Big Data’s future by referencing a recent report by Gartner. As he writes:
The problem for many of these same enterprises is that they struggle to understand what Big Data is all about, and how to make it work. When Gartner asked what the biggest Big Data challenges were, the responses suggest that for all these companies plans to move ahead with Big Data projects, they still don’t have a good idea as to what they’re doing, and why.
This is a pretty big problem! Why on Earth, we wonder, would a company intiate a Big Data project without knowing whether or not it adds value to the organization? Isn’t that the first thing you should know before you start any project? Asay uses this problem as a way to illustrate his larger point: hype is ruling the conversation right now. Organizations are climbing all over themselves to get a Big Data project started, but they have no idea what they are getting themselves into.
The good news here is that companies are embracing data, and as Asay points out the numbers keep going up — the Big Data train is picking up speed. But to continue the analogy, a lot of people don’t know where the next station is, or when they need to get off.
This leads us back to mathbabe’s original critique, and the dearth of qualifications for data scientists. It is entirely possible that so many Big Data initiatives are floundering (like I wrote about here) because of all the confusion surrounding it. Not only do businesses not know what Big Data is, they have problems finding people who can explain it to them, or explain why it is useful. On the other hand, Big Data is not just a term. It is an environment, and in many ways it exists now whether or not we can exploit it, or even recognize it. The question remains, though: just how useful is it to the average business? Will it beceome ubiquitous to every business, like the computer? Or will it remain as a special tool for the giant movers and shakers, like Big Blue?
Where do you think Big Data fits into your enterprise strategy?