Saturday, May 17, 2014

The Task of Democratizing Big Data

Companies that fail to take advantage of the opportunities presented by "big data" management and analytics technologies can expect to fall behind the competition and possibly go out of business altogether.

The world is just getting started with big data technologies like Hadoop and MapReduce, and several obstacles – such as a dearth of skills and old-fashioned thinking about data -- continue to stand in the way of their adoption.

But, companies that embrace the concept now are the ones who will lead the way in the not-too-distant future when entry barriers are not so high. Companies that exploit big data will gain the ability to make more informed decisions about the future and will ultimately bring in more money than those that do not.

The phrase "big data" is most often used to refer to the massive amounts of both structured and unstructured information being generated by machines, social media sites and mobile devices today. The phrase is also used to refer to the storage, management and analytical technologies used to draw valuable business insights from such information. Some of the more well-known big data management technologies include the Apache Hadoop Distributed File System, MapReduce, Hive, Pig and Mahout.

There is certainly no shortage of hype around big data management technologies, but actual adoption levels remain low for two main reasons. First, Hadoop and other big data technologies are extremely difficult to use and the right skill sets are in short order. Today, organizations often hire PhDs to handle the analytics side of the big data equation, and those well-educated individuals demand high salaries.

The skills used to manage, deploy and monitor Hadoop are not necessarily the same skills that an Oracle DBA might have. For instance, if you want to be a data scientist on the analytics side, you need to know how to write MapReduce jobs, which is not the same as writing SQL queries by any means.

The second major obstacle standing in the way of increased adoption centers on the notion that most companies currently lack the mindset required to get the most out of big data.

Most large companies today are accustomed to gaining business insights through a combination of data warehousing and business intelligence (BI) re¬porting technologies. But, the BI/data warehousing model is about using data to examine the past, whereas big data technologies are about using data to predict the future. To take advantage of big data requires a shift, a very basic shift in some organizations, to actually trusting data and actually going where the data leads you. Big data is about looking forward, making predictions and taking action.

As with all emerging technologies, big data management and analytics will eventually become more accessible to the masses -- or democratized -- over time. But some important things need to happen first.

For starters, new tools and technologies will be needed to reduce the complexity associated with working with big data technologies. Several companies -- like Talend, Hortonworks and Cloudera -- are working to reduce big data difficulties right now. But, more innovation is needed to make it easier for users to deploy, administer and secure Hadoop clusters and create integrations between processes and data sources.

Right now you need some pretty sophisticated skills around MapReduce and other languages, or SAS and others to be a top line data scientist. We need tools that can abstract away some of that expertise so that you don't need to have a PhD to really explore big data.

The task of democratizing big data will also require a great deal of user training and education on topics like big data infrastructure, deploying and managing Hadoop, integration and scheduling MapReduce jobs. We really need to tackle the problem from both ends. One is to make the tools and technologies easier to use. But we also have to invest in training and education resources to help DBAs and business analysts up their game and operate in the big data world.


