Optimize Support Ticket Routing from 4 hours to 500 ms
Hey my name is Christian Bernecker and I working for IBM. In my early days I worked as an Technical L2 Support Agent. In these days I faced that triaging tickets, bugs, incident reports are a big problem and still be handled by humans. My first thought was that this is a really inefficient way. Because some high skilled engineers looking over each new incoming ticket and they try to find the right person to solve the problem.
I’d like to take you with me on my journey how I implemented an automated bug triaging system for IBM. This system is currently used for a worldwide usage within IBM (16000 agents) and helps IBM to solve new incoming tickets 24 % faster than before. (Ref: Link )
Let’s start— The journey of a Data Scientist
My first recommendation is to follow these 3 simple and important rules before you start:
- Understand and analyze the problem
- Evaluate different solutions
- Implement the solution
1. Understand and analyze the problem!
Software products can be divided into different components or knowledge areas. People that already worked with IBM software products, know the complex structure. For that reason IBM divides the products in knowledge areas. In that way IBM can ensure and deliver such a high qualitative support all over the world.
Here is an example how a product can be divided:
That means each product has different components and needs teams of experts to solve a specific problem. The challenge is to bring the problem to the best person/team soon as possible so that the customer gets a quick solution.
2. Evaluate different solutions
2.1. What about a human approach to solve the problem?
Human Solutions (two possible thoughts):
- The customer choose the right component when he opens a new ticket.
- An engineer looks over and choose the right component for each new ticket (manual routing).
Concerns and Problems:
Customers are often unskilled technical persons and they having trouble to choose the correct component. In addition they don’t know the intern structure and naming of IBM what makes it much more difficult to choose the right category for an existing problem.
Compel an engineer to review and triage new tickets. For this approach they often use the words “Qmon”, “Queue Monitor”, “Quarterback”. In my opinion this is the best way how you can waste time of a high skilled engineers and a guarantee to end in frustration and dissatisfaction. ;-)
Additional, each miss-classified case has to be rerouted again. So you lose money, time and you waste your human resources. On the other side you frustrating your customers because they don’t get a good and quick solution.
“The success of the support strongly depends on the quality and speed of a response.”
2.2. What is about keywords?
Another simple approach that comes to my mind. Was to create a list of keywords to do a triage of new incoming cases.
First, I talked to different teams if they can provide me an unique list of keywords for each component in a product. Kindly, they provided me lists of keywords. So, I started to implement a first and simple keyword approach.
Concerns and Problems
- The teams need a lot of time to discover and gather unique keywords
- Keywords frequently changing
Conclusion
First of all I realized that a simple keyword method isn’t enough to match the cases. Because you have different suffixes like: burst, bursted, bursting. The second problem are typos and acronyms. Finally it happens that different keywords of different components occur in a single problem description.
To solve the problem with endings and the typos I built a fuzzy matching keyword algorithm. If you want to know more about it check out the article: https://link.medium.com/WPyT8OEaI1
The acronyms is something I tried to catch with the experts of the component. In general gathering keywords with humans is a very time consuming task and I realized that this will not work for the next 1000 software products.
So I started to implement an algorithm to extract unique keywords from the components. There are many articles around. For example check: https://medium.com/analytics-vidhya/automated-keyword-extraction-from-articles-using-nlp-bfd864f41b34
But automated extracting of keywords hasn’t the same accuracy and evidence of a human selection.
2.3. Can IBM WATSON solve the problem?
The next action of each IBM’er that tries to solve a problem goes to our AI technology WATSON. After some investigations I came along with the following two services that should help me to solve the problem
Natural Language Understanding (NLU ) is the answer of IBM to the NLTK toolkit. The smart thing is that you don’t have to setup your environment. You can use this service easily via REST-API calls.
Natural Language Classifier (NLC) is a WATSON service that helps you to classify text. Simply it is a text classification service. The cool thing is that you don’t need any knowledge of how to do text classification. It provides a simple UI where you can upload your text and labels. Than you click train and the service will find the best model for you. Afterwards you can call the model via REST-API.
Concerns and Problems
During my investigation I understood that the traditional natural language processing isn’t the right choice I’m looking for. Because I didn’t want to measure moods, sentiments or extracting keywords from a customer description. My task was to classify new problems into existing classes. Especially, in our case the classes are our software components (see Picture 1). So my decision goes to NLC.
I tried to upload all the data to the NLC service but I got an issue that the length of a text is limited to 1000 characters. I guess for the most application this length is good but in my case we have complex problem descriptions which are even longer. I decided to use chunks of the descriptions (1000 characters per chunk). Unfortunately, the result was not very accurate. I assume splitting the descriptions causing that problem. Because you have to consider the description as whole.
The conclusion was to have a look at Machine Learning and AI because all other out of the box solutions failed.
3. Machine Learning and AI
After my illusion was burst of feeding WATSON and getting a valuable result. I decided to look into machine and deep learning approaches. The start is quite easy if you follow the tutorials of Tensorflow and scikit-learn. My personal recommendation is to use Keras for Tensorflow to start faster.
3.1. Preprocessing is the key of success.
What does preprocssing in our context mean? We have to understand that a customer opens a ticket against IBM in a human written language (we call that the technical customer voice). That means we have to deal with non Englisch Cases, Stack Traces, Typos, Acronyms and old legacy templates.
Believe me when I say pre-processing is the key of success. Especially when you have to work with unstructured human text from all over the world.
Let’s continue. We all know that a machine can only understand 1 or 0. So, the problem with human language is that we have to transform it. In a way the machine can understand them. There two ways in common art:
- Bag-of-words - TF/IDF
- Sequence Vectors -Word-Embedding
TF/IDF: Counts the frequency of terms in a single document and compares it with the over all term frequency in all other documents. That gives you weights for each term and an idea how important a certain term is. But it doesn’t consider the term order.
Sequence Vectors: Tries to achieve the syntactical problem. It considers the order and the relationship of the terms. It works pretty well for big documents what was demonstrated by Google with the Word2vec but fails for a small corpus.
3.2. Choose the right Machine Learning Model.
The next step was to find a good machine or deep learning approach to solve the problem. At the beginning I started to look into the traditional algorithms of machine learning
- Logistic Regression
- Naive Bayes
- k-nearest Neighbor
- Random Forest
- Support Vector Machine
- Stochastic Gradient Descent
- Multilayer Perceptron
I don’t want to spend many time to explain each of this. Because there are many good articles around these topics. I only want to cover some experiences that may help you do build your own approach.
The problem with Random Forest is that it is very sensitive with an unbalanced distribution among the classes. I tried to fit this problem with over- and undersampling methods. But the result was still the same, because I assume that the variety of features was not high enough after sampling.
For the k-nearest Neighbor I used a weighted distance to deal with the unbalanced dataset. But the problem is that these approach takes very long for a single prediction because of the high data volume and the dimensions within in the data.
I decided to go ahead with Support Vector Machine (SVM) and the Stochastic Gradient Descent (SGD) model. The models had an equal result of 72 percent. I decided to run a parameter test to tweak the models. For SVM I got 82 and for SGD I got 78 percent. You have to know that running such a test takes a lot of time in relation to what parameters you want to test. But SGD can be run in parallel what is a big benefit in comparison to SVM that are single threaded. So you have to know that SVM can run for a very long time if you test to much parameters. So be careful and try to find and determine only important parameters.
4. The Secret of Success!
The first secret is to take your time to prepare the data. Because in software products you have a very short life-cycle. Components are deprecated and new components are added. These are all facts that causing a lot of noise within your data. The trick is to use and defined time frame of a product family. So that you automatically exclude legacy components. In my project I choose a 3 years time frame. That means data older than 3 years are removed from the data.
The second key of success was to built an ensemble of both classifiers. Because with that solution is was possible to boost the accuracy to 92% what is extremely high for that variety of data. To decide which engine I trust I compared first both answers. When they were equal I used the result. When the results are unequal I used that one with the highest confidence. I think this point has the best potential to increase that number.
The third and last important point is to built a feedback loop that shows how good the algorithms are performing. On our platform that we use for support we track if an engineer send the case to a different component. That allows to have a frequently measurement of the models. Additional that allows us to retrain the model with new fresh data.
5. Finally
The following picture illustrate the big picture architecture of a routing system:
This picture shows the opening and and triaging process in step 1 and 2. These steps were explained in the article above. The step 3 and 4 illustrating the feedback loop and the automatic retraining process. I explained these steps very shortly in section 4. If you are interested in a detailed view of how you can design a good feedback and retrain process — let me know in the comments.
6. Take Away:
In this chapter I want to summarize the insight of this article. I highly recommend to you for any data science project:
- Get familiar with your data.
- Clean up your data and do.
- Pre-process your data.
- Try different models (deep learning & traditional machine learning models).
- Run sub-samples of your data to safe time and determine the right parameters for your model.
- Combine and embed different models together to bypass problems of different models.
- Built a continuous feedback loop.
- Create an automatic retrain system.
- Monitor key values