Dr. Thomas Keil
30. October 2014 0

Big data, analytics & Hadoop: fishing in a sea of data

Hadoop and big data are often used synonymously, or at least mentioned in the same breath. I am reminded, for example, of a discussion in BITKOM’s big data working group. One speaker virtually demanded that we deal almost exclusively with Hadoop, because this technology, and no other, was the basis for the entire debate. Of course, the speaker was harshly contradicted by representatives of analytical databases, sophisticated storage systems, in-memory technologies and the like. That was around two years ago.

In the meantime, the situation has changed: the supporters of column-oriented databases and in-memory systems have now integrated Hadoop in their technology platforms as well. And the storage representatives no longer attend the working group meetings as often.
As a disclaimer, I must note that the big data debate is not purely about the technology, nor can it be. Digital transformation, monitoring scenarios, societal impacts, the Internet of Things, Industry 4.0 – and so on: the range of topics is nearly infinite. But when we talk about the technology, even the skeptics have to admit: it’s always about Hadoop as well.

The world around Hadoop

At the same time, however, Hadoop is a universe in and of itself. It is difficult for non-specialists to name and classify the numerous Hadoop projects: Flume, Hue, Kerberos, HDFS, Hive, Pig and many others. In fact, a “ZooKeeper” has even been developed tongue-in-cheek: a tool to tame all these strange animals.
Of course, the simplest reason for the popularity of Hadoop is the dramatically reduced costs for data storage, even for large data volumes. When it proves itself in practice – which many companies are now testing under buzzwords like “data warehouse offload” and “DWH rightsizing” – then there’s no avoiding it.
But what are the companies supposed to do with the vast mountains of data that Hadoop now lets them accumulate so cheaply? They need to use advanced methods to start fishing. This can involve individual fly fishermen, who use refined expertise to land the biggest fish. Or it can involve large nets that many fishermen handle and harvest together. In short, the task now involves building the right instruments, practicing their use and baiting the hooks.
When I consider what is really new about big data, then that’s it: it isn’t clear ahead of time what actually lies in the data. With curiosity, creativity and a culture of experimentation, we can go on a tour of discovery – or, to remain in the metaphor, to set sail. This is something that is not embedded in the DNA of traditional German companies. And this is why Hadoop is the enabler for big data: otherwise, the risk of making major investments with only a vague idea of the outcome would be too great.

But if we start with small, manageable application scenarios to learn the technology and its application, we will open many additional opportunities for its use: such as exploratory research of the data. In this area, for example, SAS Visual Analytics provides the specific combination of advanced analytics and attractive visualization that enables many business experts to find what they are looking for in the millions of data points. Therefore you need to move the analytics to the data and the SAS LASR Analytic Server reads the Hadoop data direct from the T-Systems Hadoop as a Service platform that was launched in May 2014.”

It also requires the individual, high-performance “fly fishermen”(in layman’s terms: data scientists). But in today’s competitive environment, organizations have to do more and make it possible for as many employees as possible to go fishing in the data seas.

Dr. Thomas Keil

Big Data Analytics Forum, November 4, 2014, Frankfurt/Main
The innovative meeting for all decision-makers and strategists: www.sas.com/bda-forum2014.

Leave a Reply

Your email address will not be published. Required fields are marked *

By sending this comment you accept our comment policy.

a) Blog visitors are always invited to comment.

b) Comments are supposed to increase the value of this weblog.

c) Comments will be activated only after validation.

d) Comments which do not relate to the topic, obviously violate copyrights, have offensive content or contain personal attacks will be deleted.

e) Links can be inserted to the comment but should refer to the topic of the blog post. Links to other websites or blogs which do not refer to the posting will be considered as spam and will be deleted.



tsystemsCom @tsystemsCom
T-Systems  @tsystemsCom
Let's take a quiz: What do you remember about #WannaCry? @BackofenD about the need of a comprehensive immunization… https://t.co/jZmL5zeSTr 
T-Systems  @tsystemsCom
#Digitisation also holds great promise in Public #Healthcare but fundamental challenges be overcome to create the r… https://t.co/o82LNVq9dt 
T-Systems  @tsystemsCom
#DigitalTransformation is front of mind for many senior executives, but too often #security is left behind, knows… https://t.co/7INkOONQca 
T-Systems  @tsystemsCom
Making the most from your #cloud: 10 best practices recommended by cloud computing experts and IT managers who are… https://t.co/MLF3xxrVDu 
T-Systems  @tsystemsCom
The Internet of Things needs your #Mobile: #IoT can’t connect the next billion until we reinvent mobile @SAP:… https://t.co/WxDHNUPfZP 
T-Systems  @tsystemsCom
Seven techniques that will help create natural project checkpoints, stakeholder feedback loops, and system adjustme… https://t.co/KowaYzEy4W 
T-Systems  @tsystemsCom
Big savings with the #cloud: Just being in the cloud costs about the same as for 500 workloads as it does for 2,000… https://t.co/kWygNZYJdf 
T-Systems  @tsystemsCom
Digital twins are the next innovation in manufacturing and by 2020 at least 50% of big manufacturers will have a… https://t.co/oS4ZAbQPVw 
T-Systems  @tsystemsCom
#Blockchain and your enterprise: There are four areas to consider when deciding if your business is ready for block… https://t.co/Gg99AqmudS