Why Chatbots Need the Right Data to Get Chatty

A Conversation About Conversational AI with Pypestream’s Chief Product Officer

If you’ve surfed the web recently, you may have been inundated with advertising for language learning solutions. “The App That Teaches You a Language in 20 Minutes a Day” is the typical headline vying for your attention. It certainly sounds promising; who wouldn’t want to learn a new language in no time flat? Whether or not these claims are valid are to be seen, but whatever these apps are doing, they must have been built with some extraordinary fundamental teaching techniques.  After all, without the proper training, how are you going to learn?

The same can be said for the rise of another language tool – this one artificial in nature. Driven by the promise to deliver 24/7 customer service, virtual assistants, or chatbots, have become increasingly popular solutions for companies. In fact, a recent study found that 53% of organizations expect to use chatbots within 18 months — a 136% growth rate that suggests businesses see real value in these tools. Looking at some of the figures, this investment area should come as no surprise.

Every year, there are a reported 265 billion customer requests. According to Chatbots Magazine, businesses spent nearly $1.3 trillion to service these requests because they are estimated to reduce customer service costs by up to 30%.

Don’t Forget About the Data

But while companies are spending big money in developing and building chatbot technology, the investment in the data to teach these virtual assistants can’t be denied.  Like the apps promising to teach us how to learn a second language, chatbots need a foundational learning system to ensure it understands how to talk to customers.

“One of the biggest barriers to conversational Ai is data,” explained Pypestream’s Rahul Garg. As the Chief Product Officer heading up product and engineering, his team is responsible for building conversational AI platforms and products. And while the future for conversational AI looks bright, it’s not without its challenges.

“Conversational AI is not some magic bullet that can instantly start conversing with customers; it requires foundational data to understand utterances and how humans speak.” Garg says this data is what’s needed to feed the AI models that are powering the chatbots. “Most of our customers lack the foundational data needed to guide the development of these conversational solutions,” added Garg. “And for those that do have the data, most don’t have data that can be easily used to build out a conversational model.

“While organizations have tons of transcriptions from call logs, they’re not usually very precise, labeled, or relevant as they are from voice transcripts. People tend to speak very differently when they type versus what they say on the phone. So, we spend a lot of time either cleansing the data or trying to acquire data that addresses their specific use case.”

Conversational AI requires the right data being fed into the right models. “Data really drives helps build a great natural language understanding model using algorithms like a support vector machine,” said Garg.  “At the end of the day, you give it 50 examples of how somebody says something, and then it creates a model which understands with some confidence what a customer is trying to say. It’s not a keyword system because underneath it there’s an algorithm that’s based on words and articles that has a semantic understanding of how those words relate to their dialogue. For example, a sentence where you use the word blue – indicating a feeling versus a color – the algorithm uses that semantic dictionary underneath it to understand the difference and create a custom model.”

Ultimately, conversational AI is built on the right data. “It’s very important to understand the market you’re in and obtain the right data sets in advance to be able to understand and train your models to succeed with conversational AI,” said Garg.