Increasing bot accuracy has never been so easy. How? Generating artificial training data, not manually, but using auto-generated query variations. We have benchmarked Rasa and other platforms, and their accuracy comes up to a 93% thanks to Bitext artificial training data tech.
One of the main problems of chatbots is their need for large amounts of training data, as commented in Improving Rasa’s results. Part I post.
As mentioned there, chatbots will be able to recognize a specific intent only if a big number of sentences related to it are also included. Until now, this process has been carried out in a manual way that was clearly inefficient and time-consuming.
To solve this problem Bitext offers its technology based on the creation of artificial training data that allows to automatically generate many different variants from a single query with the same meaning as the original one, automating the generation of a bot training set.
One of our aims was to prove that a well-known chatbot platform as Rasa could benefit from this approach. We did that by comparing a bot trained with hand-tagged sentences with another one trained through our technology (there called as NLG).
Our tests show that if you train your bot with automated-generated sentences, it will improve a lot bringing outstanding outcomes – 93% accuracy. However, if you just add 1 or 2 sentences per intent, it gets terrible results (3% accuracy).
What’s more, even if you train it with 10 sentences per intent, it only brings mediocre outcomes (68% accuracy). To get real accuracy, it is necessary to generate thousands of sentence variations for each intent (automatically with our artificial training data technology, for instance).
We did two different tests (A and B). Both use five different intents related to the management of the lights in a house; these include the same five types of slots as well (action, object, place, percentage, and hour):
In the first test (A), we trained two different bots. The first model (A1) was trained with only 12 hand-tagged sentences, while the second one (A2) was trained with a set of 455 sentences. These sentences were the result of auto-generated variants of the sentences of A1 by using Bitext artificial training data system.
We used the same 114 independent sentences to evaluate both models, and got, at the end, these results regarding both: intent identification and slot filling:
In the second test (B), just the number of sentences used in training and evaluation sets was different. In this case, the first bot (B1) was trained with a hand-tagged training set of 50 sentences (10 per intent).
The second one (B2) was trained with 906 variants generated by Bitext system. We used the same 226 independent sentences for the evaluation of both models. Now let’s take a look at the results below:
To sum up, Bitext artificial training data system allows you to create huge training sets at ease. If you just want to write one or two sentences per intent, our system will be able to generate the rest of variants needed to go from inaccurate and unreliable results to great precision.
Even if you want to write dozens of variants per intent, our system will also increase your accuracy in an impressive way, reaching excellent results.
While we completed these tests just with Rasa, our conclusions were relevant for other ML-based bot platforms, like Microsoft LUIS, Amazon LEX, Wit.AI or Google Dialogflow. You can check our experiment in detail by clicking here.
Now let’s get down to business and check how your platform can benefit from this brand-new technology. Try our test and see for yourself how your training corpora can improve for the better with the help of Bitext.
A robust discussion persists within the technical and academic communities about the suitability of LLMs…
Chinese, Southeast Asian, and Arabic names require transliteration, often resulting in inconsistent spellings in Roman…
Customizing Large Language Models in 2 steps via fine-tuning is a very efficient way to…
Discover the advantages of using symbolic approaches over traditional data generation techniques in GenAI. Learn…
In the blog "General Purpose Models vs. Verticalized Enterprise GenAI," the focus is on the…
Bitext introduced the Copilot, a natural language interface that replaces static forms with a conversational,…