The Death Of The Survey?: The Future of Surveys in a World of Synthetic Data
2023: The Year Of Generative AI
2023 was the year of generative AI and the impact on market research was arguably as big as any other field. In a short time, researchers were using generative AI applications throughout the market research process. From using ChatGPT to develop hypotheses or to generate survey or discussion guide content, to using prompts to summarize and code large volumes of unstructured data, to using natural language queries to interrogate data held in knowledge management systems, AI gave researchers new powers, driving efficiencies that cut costs and increased productivity.
The Controversial Development: Synthetic Data
So far so good, but as the year developed a much more controversial use case of generative AI emerged – synthetic data - and, with it, the creation of synthetic respondents for quantitative research and synthetic personas for qualitative research. As others, e.g. Ray Poynter1, have pointed out, the use of synthetic data, essentially data created by computer models, is nothing new in market research, with a number of applications such as imputing missing data and in conjoint models.
However, what is new is the emergence of Large Language Models and, with it, the ability of AI to predict natural language answers to questions based on using AI to mine and model the huge volumes of data held in the LLMs. It is these developments that have led to start-ups emerging offering synthetic respondents / synthetic users / synthetic personas. Not surprisingly, these developments have been controversial and have left many market researchers feeling threatened.
Peak controversy was reached with Mark Ritson’s Marketing Week article2at the end of October where he opined; “The ability of AI to answer accurately for – and instead of – actual consumers has gigantic implications”, and, here’s the really controversial bit, said; “Certainly, the market research industry is already responding to the suggestion of synthetic data with all the grace and humility of their cigar chomping forebears from the 19th century American railroad industry”.
The Death Of The Survey?
So what's going to happen? Will the emergence of synthetic data mean the death of the survey? Well, expect providers of synthetic respondents to publish lots of case studies in 2024 heralding the remarkable similarity of results from synthetic respondents compared to quantitative surveys conducted in parallel with online panelists, for applications like idea, concept and creative testing. And also expect lots of commentary about why such comparisons may be flawed.
Nevertheless, what I can confidently predict is that many clients will increasingly use synthetic data for certain research use cases. Consequently, synthetic data will impact on the revenues of survey software suppliers, online panel companies and market research agencies, as clients start to use synthetic data for research studies that they would previously have conducted using online surveys.
The fact that synthetic data is quick and cheap makes this inevitable if buyers believe the data to be “good enough”. This was highlighted to me recently when I presented about the use cases of generative AI for market research at a webinar organized by The Global Research Business Network.
In the discussion that followed, one leader of an insights team at a large, multinational CPG company said something along the lines of; “If I believe that I can get a good steer on which of a large set of concepts to take forward to the next stage of development based on synthetic data then I’m all in”.
The Future Of The Survey Is Qual
Giving this article the title of “The Death Of The Survey” was a deliberately provocative move. However whilst it might not kill the survey, synthetic data, as explained above, will have an impact and I expect the impact to be greater for quantitative research use cases than qualitative research use cases. That’s despite the fact that ‘Synthetic Personas’ are being widely promoted by some vendors for use cases that mimic qualitative research.
The reason why I expect qualitative research to be less impacted, is that I find it hard to believe that synthetic data will be able to replace the essence of what qual delivers, that is the deep human insight that comes from talking to real people in real-time to get a rich understanding of their behaviors, beliefs, motivations and emotions through probing and projective techniques.
So whilst I expect synthetic personas to have a role to play in qualitative research going forward, I expect them to be a complementary approach, for example, to build hypotheses and stress test ideas that will then be refined and assessed with qual with real people.
This distinction provides a route map for how surveys can survive and thrive. In short, for surveys to have a future role in the age of synthetic data they need to become more qual-like. The good news is this is already possible. Conversational AI technology enables the survey to be re-imagined as an interaction between chatbot and participant. The chatbot can ask typical quantitative questions but it can go much further, engaging participants in a dialogue by using generative AI to probe the answers they give to open-ended questions.
Not only does this deliver much deeper insight than traditional online surveys, it also drives participant engagement as people feel they are being listened to and heard. And more engaged participants, pay better attention and provide better quality data.
What’s more, the gamification enabled by conversational design means that digitized versions of projective techniques can be used to help participants better articulate feelings and emotions. With the increasing power of generative AI, conversational methodologies will enable people to answer and be probed using voice or video and, in time, realistic avatars will be used to interview people at scale and in-depth.
A New Model For Participant Engagement?
It’s an obvious point that the quality of verbatim data captured by Conversational AI is a function not only of the technology but also of the quality of the sample. We’ve found some panels to be better than others, but even with the best panels that provide us with participants who give insightful data and respond well to probing, we have to reject a portion of the sample.
More widely the issues of declining quality of sample are well documented, with increasing problems from respondent fraud and bot farms. I hope that an increase in the use of synthetic data could lead to a new model for participant engagement. When it comes to sample quality, the only way I see to beat the bots is for panels to have authenticated relationships with panelists whom they can guarantee are real people. And that will cost more money.
However, if buyers of research are saving money by diverting a portion of their research spend to synthetic respondents, could they be encouraged to invest that saved money in better quality sample? If more money from buyers is on the table that will encourage panel providers to invest in delivering better quality, validated panelists. Now wouldn’t that be a great thing.
References
1. Ray Poynter Blog : Synthetic data is the future of a large part of market research and insights – but the road from here to there might be bumpy | NewMR
2. Mark Ritson Marketing Week : Synthetic data suddenly makes very real ripples (marketingweek.com)