Detecting incidents before they occur
Not only does big data involve analyzing huge amounts of data, it can also be used to make predictions that will help T-Systems reduce the number of incidents and continue to improve the quality of services.
The employees in the Service Center are working feverishly and coordinating all of their efforts. A customer line from Munich to New York is being interrupted sporadically, and the connection is in danger of breaking down completely. The customer is unaware of this, but it won’t take long before the outages become apparent. A module in the switching node in Munich is faulty. And just yesterday the new release was installed. Everything was running smoothly until then.
In my conversation with Gerhard Keller, Head of Production Support Systems and Project Manager for Big Data in the Quality Area, I learn that situations like this one are not uncommon: New releases are deployed, everything is stable, yet after a while an outage can occur.
Troubleshooting through review
I also learn that statistical analyses based on historical data are conducted as a way to determine correlations during such incidents. These statistics are quite precise and enable experts to make forecasts about potential future incidents. This is the approach taken by T-Systems when intensively analyzing major problems, known as critical and high incidents, to discover weak spots that could negatively impact the service delivered to customers. For example, an analysis can reveal whether issues arose due to previous changes. However, only some of the available data is isolated and actually analyzed.
Big data thinks ahead – predictive analytics
Today big data can help service personnel look into the future: Several data sources are compiled and subjected to an integrated analysis in real time – for example, outages and symptoms, also known as incidents and events, can be analyzed thoroughly. At T-Systems all the incidents and events from the past two years are analyzed, and this unstructured data is then mapped. Thanks to specific analysis tools, this data can be correlated for the first time. All it basically takes is a click of the mouse to analyze incidents based on the age of the hardware, the manufacturer, operating systems, time of day, occurrence during vacation time and data center location.
Results will improve quality
This big data approach is scheduled for implementation in Delivery and will be productive before the end of 2014. The next step will be to include order management data and unstructured information from relevant log files in the analysis.
The big data early warning system
This method will uncover hidden spatial, temporal and even logical patterns among incidents and events so that an early warning system can be established to prevent incidents from taking place. Correlations between incidents and the age of the hardware or the manufacturer can be identified as well. This information can then be used as the basis for investment decisions as to when hardware needs to be replaced or which manufacturer provides the most stable hardware and operating system combination.
In the case of the line between Munich and New York, this approach would have indicated – in advance of the new release – that potential problems or channel fluctuations could be expected. And this would have enabled engineers to take steps in advance to prevent the incident from actually happening.
Let’s communicate big
Thanks go to Gerhard Keller, Head of Production Support Systems and Big Data Project Manager in the Quality Area