The Death Of The Survey?: The Future of Surveys in a World of Synthetic Data

2023: The Year Of Generative AI 

2023 was the year of generative AI and the impact on market research was arguably as big as any  other field. In a short time, researchers were using generative AI applications throughout the market research process. From using ChatGPT to develop hypotheses or to generate survey or discussion guide content, to using prompts to summarize and code large volumes of unstructured data, to using  natural language queries to interrogate data held in knowledge management systems, AI gave  researchers new powers, driving efficiencies that cut costs and increased productivity. 

The Controversial Development: Synthetic Data 

So far so good, but as the year developed a much more controversial use case of generative AI  emerged – synthetic data - and, with it, the creation of synthetic respondents for quantitative  research and synthetic personas for qualitative research. As others, e.g. Ray Poynter1, have pointed out, the use of synthetic data, essentially data created by computer models, is nothing new in market  research, with a number of applications such as imputing missing data and in conjoint models.  

However, what is new is the emergence of Large Language Models and, with it, the ability of AI to  predict natural language answers to questions based on using AI to mine and model the huge volumes of data held in the LLMs. It is these developments that have led to start-ups emerging  offering synthetic respondents / synthetic users / synthetic personas. Not surprisingly, these  developments have been controversial and have left many market researchers feeling threatened. 

Peak controversy was reached with Mark Ritson’s Marketing Week article2at the end of October where he opined; “The ability of AI to answer accurately for – and instead of – actual consumers has gigantic implications”, and, here’s the really controversial bit, said; “Certainly, the market research industry is already responding to the suggestion of synthetic data with all the grace and humility of  their cigar chomping forebears from the 19th century American railroad industry”. 

The Death Of The Survey?  

So what's going to happen? Will the emergence of synthetic data mean the death of the survey?  Well, expect providers of synthetic respondents to publish lots of case studies in 2024 heralding the remarkable similarity of results from synthetic respondents compared to quantitative surveys conducted in parallel with online panelists, for applications like idea, concept and creative testing. And also expect lots of commentary about why such comparisons may be flawed.  

Nevertheless, what I can confidently predict is that many clients will increasingly use synthetic data  for certain research use cases. Consequently, synthetic data will impact on the revenues of survey software suppliers, online panel companies and market research agencies, as clients start to use  synthetic data for research studies that they would previously have conducted using online surveys.  

The fact that synthetic data is quick and cheap makes this inevitable if buyers believe the data to be  “good enough”. This was highlighted to me recently when I presented about the use cases of generative AI for market research at a webinar organized by The Global Research Business Network. 

In the discussion that followed, one leader of an insights team at a large, multinational CPG company  said something along the lines of; “If I believe that I can get a good steer on which of a large set of  concepts to take forward to the next stage of development based on synthetic data then I’m all in”. 

The Future Of The Survey Is Qual  

Giving this article the title of “The Death Of The Survey” was a deliberately provocative move. However whilst it might not kill the survey, synthetic data, as explained above, will have an impact  and I expect the impact to be greater for quantitative research use cases than qualitative research  use cases. That’s despite the fact that ‘Synthetic Personas’ are being widely promoted by some  vendors for use cases that mimic qualitative research.  

The reason why I expect qualitative research to be less impacted, is that I find it hard to believe that  synthetic data will be able to replace the essence of what qual delivers, that is the deep human  insight that comes from talking to real people in real-time to get a rich understanding of their behaviors, beliefs, motivations and emotions through probing and projective techniques.

So whilst I  expect synthetic personas to have a role to play in qualitative research going forward, I expect them to be a complementary approach, for example, to build hypotheses and stress test ideas that will  then be refined and assessed with qual with real people.  

This distinction provides a route map for how surveys can survive and thrive. In short, for surveys to  have a future role in the age of synthetic data they need to become more qual-like. The good news is  this is already possible. Conversational AI technology enables the survey to be re-imagined as an  interaction between chatbot and participant. The chatbot can ask typical quantitative questions but  it can go much further, engaging participants in a dialogue by using generative AI to probe the  answers they give to open-ended questions.  

Not only does this deliver much deeper insight than traditional online surveys, it also drives  participant engagement as people feel they are being listened to and heard. And more engaged participants, pay better attention and provide better quality data.

What’s more, the gamification enabled by conversational design means that digitized versions of projective techniques can be used to help participants better articulate feelings and emotions. With the increasing power of generative AI, conversational methodologies will enable people to answer and be probed using voice or video  and, in time, realistic avatars will be used to interview people at scale and in-depth.  

A New Model For Participant Engagement? 

It’s an obvious point that the quality of verbatim data captured by Conversational AI is a function not  only of the technology but also of the quality of the sample. We’ve found some panels to be better  than others, but even with the best panels that provide us with participants who give insightful data  and respond well to probing, we have to reject a portion of the sample.  

More widely the issues of declining quality of sample are well documented, with increasing problems  from respondent fraud and bot farms. I hope that an increase in the use of synthetic data could lead  to a new model for participant engagement. When it comes to sample quality, the only way I see to  beat the bots is for panels to have authenticated relationships with panelists whom they can  guarantee are real people. And that will cost more money.

However, if buyers of research are saving  money by diverting a portion of their research spend to synthetic respondents, could they be  encouraged to invest that saved money in better quality sample? If more money from buyers is on  the table that will encourage panel providers to invest in delivering better quality, validated panelists.  Now wouldn’t that be a great thing.  

References

1. Ray Poynter Blog : Synthetic data is the future of a large part of market research and insights  – but the road from here to there might be bumpy | NewMR 

2. Mark Ritson Marketing Week : Synthetic data suddenly makes very real ripples  (marketingweek.com)