The AI in Conversational AI : Analysis

Last time I discussed how we use AI to probe in Conversational AI chats and generate large volumes of rich, open ended feedback. Once we’ve gathered all that unstructured data, we have the challenge of how to analyse it all, particularly when project timelines are tight. So this week, I’ve partnered with our CTO and AI expert, Josh Seltzer, to explain the second way in which AI works in Conversational AI - that is to cluster and theme verbatim feedback.

Natural language processing (NLP) is the key ingredient of inca's Conversational AI capabilities. You can think of it as using machines to try to understand human language - which, as we'll see, in this context can be used for automatically clustering texts into meaningful themes (or, in market research terminology, 'codeframes').

A nice explanation of NLP is given by IBM; “NLP combines computational linguistics - rule-based modeling of human language - with statistical, machine learning, and deep learning models. Together, these technologies enable computers to process human language in the form of text or voice data and to ‘understand’ its full meaning, complete with the speaker or writer’s intent and sentiment.”

Moving beyond this basic definition of NLP, however, the way in which it is applied differs depending on the use case. Many applications of NLP make use of a large 'training set' of labelled text or voice data, and use machine learning to then classify new (unseen) text or voice data into those predefined labels. This works well in certain contexts, for example for call centre conversations, where common problems and questions predominate across many different calls.

With market research verbatim data, however, it is not as straightforward. Market research surveys cover a wide variety of topics across different projects, each with a substantially smaller sample size than what might be seen in other contexts (think hundreds instead of thousands, or even millions). Different approaches used in the market research context have different ways of dealing with these constraints; many try to focus on common topics that generalize across many surveys, and amass as much training data as possible corresponding to those topics.

inca, however, makes use of unsupervised clustering techniques, fine-tuned on troves of open-ended market research data, which looks for semantic relationships between all the verbatims within a survey and groups them accordingly into themes. From there, a representative sentence is chosen from each theme, which describes that theme in the participants' own words.

Each verbatim is chunked, so that if a participant mentions multiple ideas, each one might be assigned to a different theme. The end result is that inca produces "bottom-up" themes which don't need to conform to a preconceived codeframe, so that unique and context-specific ideas won't be discarded in favour of the few topics which reoccur across many surveys. Or, in other words, inca's thematic clustering has been trained on tons of market research surveys, but it isn't limited to themes that it has seen before, and can therefore generate themes for whatever people are saying in your survey!

There's one other aspect of inca's thematic clustering that makes it particularly smart. Although it still takes into account keywords and other common features such as sentiment, we all know that language is incredibly complicated, and that there are a lot of different ways of expressing the same idea. For that reason, inca looks beyond what is called 'lexical similarity' (e.g. sentences that contain the same words), and instead represents sentences based on 'semantic similarity' (where the underlying meaning of the participants’ utterances are represented).

As a concrete example, even though the phrases "oh I can’t afford that” and “it’s way too expensive" don't use any of the same words, the idea is the same, and so with semantic similarity they can be grouped together into the same theme.

This process happens in real time, so that all of inca’s open ended data is provided in themes on the dashboard as soon as fieldwork is finished. An example is shown below, in this case the AI classified the verbatims for a question about “how to improve the ad” into 11 themes. The verbatims for each theme are shown on the right hand side, in this case the verbatims associated to the second (highlighted) theme are shown. The AI selects what it feels is a representative verbatim to use as the title for each theme.

As you can see from the example above, the clustering into themes is good but not perfect. Therefore, with inca we include a feature called Quick Tag. This allows the researcher to easily and quickly edit the themes. For example, the researcher may want to title one of the themes differently – e.g. theme 3 would be clearer if it was titled “more clearly explain what the program is”. The researcher can simply over-write the theme title to make this change.

Or the researcher may feel that the 5th verbatim shown above, i.e. “ Some people might lose interest if they don’t truly know what the program is” would fit better in the 3rd theme. To make this change, the researcher simply drags and drops the verbatim into the different theme.

Generally we find that the NLP is very good at identifying the key themes. This provides a great analysis tool for the researcher who can quickly understand the key patterns in the data and dive into the verbatims for each theme to find good examples that help tell the story from the data.

Often this process is enough for the researcher to be able to make the most of the rich verbatim data from inca to identify key insights and tell a compelling story. However, if the researcher wants to build the best possible codeframe, we typically find that the AI gets us 80-90% of the way to a really good codeframe. The researcher then only needs to spend a little time to use Quick Tag to finalise the codeframe.

Hopefully this blog and my last one have explained how AI is used in Conversational AI to gather rich, insightful verbatim data and to theme the verbatim to enable the researcher to quickly identify the key storyline(s) for their analysis. Next time I’m going to turn my focus to more typical quant survey questions and illustrate how Conversational AI can bring these questions to life for participants in a fun and engaging manner.