At our annual CEO Summit last week, Phil Simon, author of The Visual Organization and Too Big To Ignore: The Business Case for Big Data (and New Jersey native!) led a session, What’s Your Big Data Credo? And Why Should You Care? Phil’s insights are helping corporations, entrepreneurs, and investors alike understand what Big Data is all about. Similar to “cloud computing,” this can be a nebulous concept. I’ve excerpted two aspects of Phil’s session that I think best explain not only why it matters, but also the potential implications for harnessing it for your organization.
1. Volume, Variety, Velocity
These three Vs were coined by Gartner to characterize big data. Phil goes further to suggest that the data is primarily external to the enterprise, and unstructured.
In the late 90s, a gigabyte of storage cost roughly $100, and today it’s $.05. As storage costs have decreased exponentially over the years, the volume of information individuals and corporations need to store is increasing exponentially. And where data, once upon a time, was generated solely by companies, their partners, and their customers, today, data, more often than not, is generated by machines. But the sheer volume of data alone isn’t what’s the most interesting or challenging about big data. Rather, it’s how to readily store, process, and retrieve the same high volume of data without sacrificing performance. Innovators are tirelessly finding new ways to scale. Today, NoSQL databases (e.g., Aerospike) and distributed processing (e.g., Hadoop) are among the many ways businesses manage their ever-expanding volume of data.
Today, businesses are capturing data from diverse sources as they strive to understand their customers. To truly understand a customer is to understand a variety of data produced by the customer across a multitude of devices and form-factors: smart phones, tablets, laptops, beacons, etc. As devices become more mobile, data types such as geo-location and NFC (Bluetooth) become more accessible and useful, which drive more intelligent business decisions.
What’s more, the social media revolution has made information access a commodity. The velocity by which a tweet can become viral is astounding. In the past, data was transferred and analyzed through a batch process, where a data payload is submitted to a server and results are delivered after a period of processing time. This scheme quickly breaks down when the rate of submission exceeds the rate of processing. Today, data throughput is continuously improved through higher processing power, advanced network infrastructure, innovative compression algorithms, parallel computing methodologies, and much more. Businesses today face important decisions on methods for collecting and delivering.
2. Behind the Hype: It’s Getting Technical, Fast
To capture the value behind big data, businesses will have to invest in understanding the technical aspects of working with big data. Those who develop a concrete grasp of the following will be much more effective in extracting value from their data initiatives.
- Managing meta-data: Meta-data is the data about data, and can often be more complex than the actual data it describes. Managing meta-data effectively will translate into swifter data queries and better performance for customers. Meta-data footprint can also become increasingly taxing over time in complex systems.
- Managing data across repositories: Distinguish between “hot” and “cold” data and architecturally allocate proper resources for each. Hot data is more frequently accessed, thus demands more robust hardware with higher processing power. Cold data is less frequently accessed and usually leaves room for cost savings.
- Managing data history: Leave the proper bread crumbs or paper trail for data scientists to quickly understand the data’s source and propagation. When results are not as expected, data source and methodologies can often be called into question. It’s neither right nor wrong, just human nature. Effective management of data history can help get to the bottom of issues quickly when they arise.
- Managing data reliability or “hygiene”: Place the necessary automated validation procedures on incoming and outgoing data streams, e.g., ensure millions of product SKUs have the right price and quantity. Dealing with sensitive data (consider bank data) will require minimum to zero tolerance for inaccuracy. Data validation can help discover and resolve issues timely.
- Managing the right amount of data: Analyzing data is expensive, so over-hiring data scientists and over-analyzing data can do more harm than good.
Bottom line: Don’t run the risk of getting caught up in the fad of big data without really understanding the implications for your organizations. Dan Ariely, Professor of Behavior Economics at Duke University has become known for this statement:
While reality is not so extreme, these are words of caution to start small and be thoughtful about it – avoid being dragged down into the hype, or even worse, crossing an ethical line. For example, the NY Times published an article regarding Target’s ability to determine which shoppers are pregnant. Phil references this story in his book, Too Big To Ignore. Why wouldn’t Target, or any retailer, for that matter, want to use as much information as it could to sell more merchandise, especially given the threat of pure-play online retailers turning brick-and-mortars into showrooms? That said, Phil suggests using this as a general rule: Just because you can doesn’t mean you should. I tend to agree.
I’d like to hear your feedback and comments. Please feel free to comment here or reach out to me directly at email@example.com