Risk prevention: Big data improves quality
“Zero outage” is the aspiration and declared objective that T-Systems wishes to achieve when it comes to delivering quality to its customers. One key component of our “quality journey,” particularly in terms of risk prevention, is our big data initiative.
The core of big data – and its value added – stems from the interplay of different data sources and the targeted evaluation of relevant data sets. A business enterprise that relies on big data must realize that big data is not just a matter that concerns the IT department. It has a business impact and is thus part of the business strategy. The technology is applied to reach an overarching business objective, namely to generate competitive advantages. Based on secure and direct access, data is analyzed – if possible in real time – in compliance with data protection and data privacy regulations. The use of big data as a supplement to conventional enterprise DWH applications lays the foundation for storing and analyzing enormous amounts of data.
What does big data have to do with quality?
To improve quality for our customers, we need to recognize negative trends very early and react quickly with the right decisions. Big data can help when transforming our reactive approach to resolving incidents to a method based on preventive risk management. What’s more, detailed analyses also uncover weaknesses and repetitive events that can be remedied efficiently, sometimes even with automated mechanisms. That improves quality and reduces costs at the same time.
There are three basic approaches:
The main objective is to detect major incidents well in advance and prevent them from taking place. This involves knowing which events are related to major incidents. We need to evaluate the events documented in our system monitoring processes, not to mention configuration items (CI) data, incidents, changes and relevant log files in order to determine the statistical correlations. This information enables us to take appropriate countermeasures if the events occur – and that will also reduce the number of critical incidents and the mean time to repair (MTTR). By doing this, the number of major incidents at T-Systems has declined considerably.
Another way to prevent incidents and countless information tickets is to cluster the incidents and queries according to customer-specific and overarching parameters using big data. We then use traditional problem management processes to analyze and process the main issues revealed. The goal here is to improve quality by reducing the number of tickets. Fewer queries and tickets lighten the workload and reduce costs in the support organization.
A third objective is to increase data quality. This begins with the data in our asset and configuration management systems – this data is the foundation of our support processes. The big data approach can be used to automatically find incomplete or corrupted data – this information can then be given to the business departments where steps will be taken to rectify the situation. This improves data quality.
Specific measures that increase quality
When taking this approach, the first step involves the establishment of the technical and structural environment needed for big data – in other words, the production environment must be linked to all of the required data sources. It goes without saying that the employees in the big data and data mining departments must be fully trained and skilled at analyzing the data generated.
T-Systems has set up its own program of quality and efficiency measures designed to avoid tickets based on the analysis of historical data.
Hidden spatial, temporal or even logical patterns between incidents and events are uncovered and used as the basis for establishing an early warning system to prevent incidents. Correlations between incidents and the age of the hardware or the manufacturer can be identified as well. This information is helpful when making decisions related to hardware investments.
With experience gained from pilot projects, T-Systems has also designed structured approach models known as playbooks that can be applied to similar or recurring customer situations.
This method prevents incidents from taking place at all. An additional plus is an improvement in the speed and efficiency of the support unit, not to mention reductions in cost.