slide image

Getting It Right From the Start: Chatbot Implementation

By Martyn Redstone, CEO

Recently, we've seen a couple of examples of bad chatbot implementations hitting the news - whether social or mainstream - and it does frustrate me.

Most recently, it was [UK delivery service] DPD's turn, where Twitter user Ashley Beauchamp was able to prompt the bot to produce a poem about how terrible the company was and also was able to get it to swear at him. This made mainstream news over the weekend.

Immediately, the recruitment world jumps up and down.


Well, I'm sorry to say that this isn't representative of chatbots, but just of a poor implementation of a chatbot.

Simply, it looks like DPD implemented a GPT-powered chatbot. They launched a bot-in-a-box style of chatbot, which tends to be cheap and easy (you'll see them on sites like etc.). Just point them to a knowledge store - either file uploads or URL addresses - and hey presto, you have a chatbot that's able to talk about your data.

But here's the problem. This style of chatbot is still susceptible to the same problems that a large language model is - prompt injections, hallucinations etc. And that's what has happened here. A chatbot with no guardrails was prompted to do what the user asked of it.

In short, there's no shortcut to implementing a chatbot. It's not a quick task that your marketing or IT intern should just get on with. It's a project - just like building a website or app. Except the interface is conversational.

Since the weekend, I've been contacted by several people who asked "how do we implement a chatbot using generative AI/large language models that work well"?

There are a few answers. But I am a big fan of the hybrid NLU/LLM model. Implementing this reduces the "I'm sorry but I don't understand" style of response you tend to see in chatbots because we use LLMs to help classify what the user is trying to say.

The harmony of using both an NLU (Natural Language Understanding) engine and an LLM (Large Language Model) classifier together in a chatbot system can significantly enhance the user experience in several ways:

Complementary Strengths: NLU engines are often rule-based or employ machine learning techniques specifically tuned for understanding user intents and extracting entities from the user's input. They excel at handling structured queries and common interaction patterns. LLM classifiers, on the other hand, can handle more complex, nuanced, and conversational language due to their extensive training on diverse datasets. Combining the two can provide a robust solution that handles a wide variety of user inputs effectively.

Improved Understanding: The NLU can quickly decipher common user requests and commands with high accuracy. For inputs that are less structured or more conversational, the LLM can step in to interpret the intent and context, filling in the gaps where the NLU might struggle.

Efficiency and Scalability: By using an NLU engine for the more straightforward tasks, the system can conserve computational resources since rule-based or simple machine learning models are generally less resource-intensive than LLMs. The LLM can then be reserved for more complex queries, optimising the system's efficiency and scalability. As well as reducing the cost of using an LLM from a token perspective.

Error Handling and Recovery: When the NLU engine fails to understand a query, the LLM classifier can serve as a second line of defense. It can provide a different approach to understanding and even assist in dialog repair strategies if the initial intent is not clear, thus enhancing the chatbot's ability to recover from misunderstandings without frustrating the user.

Continuous Learning and Improvement: LLM classifiers can continue to learn from interactions and can be fine-tuned based on the specific domain and data they encounter. This can feed back into the NLU to refine its rules and models, creating a system that improves over time and adapts to its users.

User-Centric Interaction: With both systems in place, the chatbot can handle a range of interactions from simple FAQ-type questions to more complex discussions. This flexibility ensures that the user feels understood and that the chatbot can provide meaningful and contextually appropriate responses.

Reduced Time to Resolution/Outcome: By effectively handling a wider range of queries, the combined system can reduce the time it takes for a user to get a resolution/outcome.

Human Escalation When Necessary: If both systems are unable to understand or accurately respond to the user's input, the process includes a mechanism to escalate the query to a human operator. This ensures that the user is not left with an unresolved issue, maintaining trust in the service.

Enhanced Security and Reduced Vulnerability: When an NLU engine is paired with an LLM classifier, the system benefits from the structured, rule-based security measures inherent in traditional NLU systems. These systems can be designed with specific guardrails that prevent the execution of unauthorised commands or the sharing of sensitive information, mitigating the risks associated with prompt injection attacks.

In summary, using both an NLU engine and an LLM classifier in a chatbot system creates a layered approach to understanding user inputs, offering a balance between efficiency and depth of comprehension. This not only enhances the user experience by providing accurate, context-aware responses but also ensures that the system can handle a broader range of interactions with greater resilience. It's also more economic in its token usage and better for the environment 🙂