Improving Rasa's results by 30% with artificial training data: Part II

Increasing bot accuracy has never been so easy. How? Generating artificial training data, not manually, but using auto-generated query variations. We have benchmarked Rasa and other platforms, and their accuracy comes up to a 93% thanks to Bitext artificial training data tech.

One of the main problems of chatbots is their need for large amounts of training data, as commented in Improving Rasa’s results. Part I post.

As mentioned there, chatbots will be able to recognize a specific intent only if a big number of sentences related to it are also included. Until now, this process has been carried out in a manual way that was clearly inefficient and time-consuming.

To solve this problem Bitext offers its technology based on the creation of artificial training data that allows to automatically generate many different variants from a single query with the same meaning as the original one, automating the generation of a bot training set.

One of our aims was to prove that a well-known chatbot platform as Rasa could benefit from this approach. We did that by comparing a bot trained with hand-tagged sentences with another one trained through our technology (there called as NLG).

Our tests show that if you train your bot with automated-generated sentences, it will improve a lot bringing outstanding outcomes – 93% accuracy. However, if you just add 1 or 2 sentences per intent, it gets terrible results (3% accuracy).

What’s more, even if you train it with 10 sentences per intent, it only brings mediocre outcomes (68% accuracy). To get real accuracy, it is necessary to generate thousands of sentence variations for each intent (automatically with our artificial training data technology, for instance).

Let’s take a closer look at the test

We did two different tests (A and B). Both use five different intents related to the management of the lights in a house; these include the same five types of slots as well (action, object, place, percentage, and hour):

Switch on the lights (switch on the lights in the living room)
Switch off the lights (switch off the lights in the living room)
Change the color of the lights (change the lights to blue)
Dim the lights (dim the living room lights to 20%)
Program lights for a specific hour (program the garden lights for 21:00)

First test: just few sentences per intent

In the first test (A), we trained two different bots. The first model (A1) was trained with only 12 hand-tagged sentences, while the second one (A2) was trained with a set of 455 sentences. These sentences were the result of auto-generated variants of the sentences of A1 by using Bitext artificial training data system.

We used the same 114 independent sentences to evaluate both models, and got, at the end, these results regarding both: intent identification and slot filling:

artificial-training-data-results

Second test: up to hundreds of sentences

In the second test (B), just the number of sentences used in training and evaluation sets was different. In this case, the first bot (B1) was trained with a hand-tagged training set of 50 sentences (10 per intent).

The second one (B2) was trained with 906 variants generated by Bitext system. We used the same 226 independent sentences for the evaluation of both models. Now let’s take a look at the results below:

To sum up, Bitext artificial training data system allows you to create huge training sets at ease. If you just want to write one or two sentences per intent, our system will be able to generate the rest of variants needed to go from inaccurate and unreliable results to great precision.

Even if you want to write dozens of variants per intent, our system will also increase your accuracy in an impressive way, reaching excellent results.

While we completed these tests just with Rasa, our conclusions were relevant for other ML-based bot platforms, like Microsoft LUIS, Amazon LEX, Wit.AI or Google Dialogflow. You can check our experiment in detail by clicking here.

Now let’s get down to business and check how your platform can benefit from this brand-new technology. Try our test and see for yourself how your training corpora can improve for the better with the help of Bitext.

Improving Rasa’s results by 30% with artificial training data: Part II

Let’s take a closer look at the test

First test: just few sentences per intent

Second test: up to hundreds of sentences

Submit a Comment Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta