7 min read

Ghostwork: the invisible world of work behind AI

Ghostwork: the invisible world of work behind AI
Photo credit: Martijn Arets

Claartje ter Hoeven (Utrecht University) and her research team reveal the hidden world of European data or ghostworkers. They are often highly educated, seeking flexibility, but their working conditions are usually poor.What drives them? What impact do they have on algorithm development, and vice versa?

They are invisible, ubiquitous and indispensable for the development of artificial intelligence (AI): ‘ghost workers’. Millions of people worldwide annotate, check and translate texts and images so that AI can understand and process the information. Who are these people and what drives them? What about their well-being? And what impact do their poor working conditions have on the development of AI? To learn more about this, I sat down with researcher Claartje ter Hoeven of Utrecht University for The Gig Work Podcast from the WageIndicator Foundation. She is conducting research into this phenomenon with a European Research Council (ERC) grant.

Annotating, checking and correcting so AI can learn

Ter Hoeven and her team are researching the working conditions and well-being of so-called ‘ghostworkers’ finding work through online platforms. They build on the work of Mary Gray and Siddharth Suri, ‘Ghost Work: How to Stop Silicon Valley from Building a New Global Underclass’. While Gray and Suri researched working conditions of ghostworkers in the United States and Asia, Ter Hoeven and her research team focus on Europe and examine the relationship between working conditions and worker well-being. It is a five-year study and the team is now about halfway through.

‘We are investigating the hidden labour behind AI,’ she says. ‘Ghostwork is a catchy term, but in science these days we prefer to call it ‘datawork’. Dataworkers make texts and pictures readable to AI in order for the machine to learn from them. For example, they indicate what a lamppost or a bicycle is, so that the algorithm of a self-driving car learns to recognise these objects. They also check and correct the output of AI models and algorithms.’

Claartje ter Hoeven. Photo credit: Martijn Arets

Low-paid

Data workers often do this from home through online platforms such as Amazon Mechanical Turk, Clickworker or Microworkers. They usually get paid per mini-task or ‘microtask’: an annotation, check or translation. The pay is often low and data workers have to search for the microtasks themselves on various platforms. Searching takes time, but they do not get paid for that. They often earn less than the minimum wage.

There are also companies that employ data workers, so-called Business Process Outsourcing organisations (BPOs). They work in physical office locations, are often paid by the hour and are supplied with the tasks. Although they have no unpaid search time, their pay too is often below the poverty line. 

We still know quite little about these data workers. and many big tech companies prefer not to talk about the contribution of humans in the development of AI, because it does not fit the narrative that AI is ‘self-learning’. This is not only a shame, but also detrimental to the development of responsible AI. It is therefore beneficial that Ter Hoeven and her team are researching datawork through platforms in Europe.

Highly educated with a distance to the labour market

Ter Hoeven used various research methods to discover how working conditions of dataworkers affect their well-being. It started with a short survey. This they distributed as a microtask on various platforms in Europe to get as many responses as possible. In the end, more than 5,000 people completed the survey.

‘A striking result was that many data workers are highly educated,’ says the researcher. ‘They often have a certain distance to the labour market. Think migrants or people who combine work with caring responsibilities for family members.’

💡
On 28 March 2025, WageIndicator will host the webinar ‘The Ghost Workers: Do you Know Who's Behind Your AI?’. During this webinar, researchers and practitioners by experience will address issues about the workers behind AI, the problems poor working conditions have on the quality of AI and possible solutions. You can register via this link.

Four drivers

Among data workers working via platforms, Ter Hoeven distinguishes four groups based on their motives , based on her survey of over 5,000 respondents: explorers, enthusiasts, supplementers and dependents:

  • Explorers: this is the largest group (73.48%). They often do the work temporarily or sporadically. This includes older people who see it as a kind of pastime. For them, it is similar to Candy Crush: they do simple tasks, earn some money and at the same time learn something about automation (e.g. gain new insights about self-driving cars).
  • Enthusiasts: like the ‘explorers’, this category of workers (10.46%) has only recently started platform work. Unlike the ‘explorers’, this group is more likely to have an above-average household income and be relatively old. This category has been renamed ‘enthusiasts’ because they do not seem to depend on this work for their income.
  • Supplementers: these people (8.08%) use data work platforms to supplement their income, for example because they do not have a full-time job or do not earn enough.
  • Dependents: they (7.98%) are completely dependent on datawork as their main source of income. They suffer most from poor working conditions.
Claartje ter Hoeven. Photo credit: Martijn Arets

Rather an algorithm than a human as boss

Ter Hoeven and her team conducted 137 face-to-face interviews with data workers from Europe. She discovered all kinds of motivations. ‘Those who work through platforms are dependent on algorithms,’ she says. ‘Algorithms determine whether you get a task and sometimes what you earn with it. There are all kinds of drawbacks to this. It is often unclear how algorithms make decisions, and platforms make it almost impossible to complain or discuss decisions. Yet many data workers told me during interviews that they would rather work for an algorithm, than a human manager.’

For example, she spoke to a migrant in Germany who had bad experiences with nasty bosses. Thanks to microwork, he could at least work from home and decide his own working hours. Another interviewee was a neuroscientist with a medical condition, which meant she needed more time to get up in the morning. She had to stop working at the university as a result. Thanks to datawork, she can still earn money. Ter Hoeven: ‘So our research not only says something about microwork, but also raises questions about the way we organise more traditional work.’

Need for colleagues and appreciation

The researchers present their findings not only on paper, but also in a documentary. They invited six European dataworkers to participate in video recordings. ‘We asked them questions and brought them together for panel discussions,’ says the researcher. ‘This cinematic research offered very interesting insights.’

Trailer of the film 'Ghost Workers' by Lisette Olsthoorn in collaboration with Erasmus University Rotterdam. This film was funded by the Erasmus Initiative Societal Impact of AI and by the European Research Council (ERC) as part of the European Union's Horizon 2020 research and innovation programme (grant agreement no. 101003134).


While most dataworkers indicated during the interviews that they generally did not miss colleagues, it became clear during the recordings that they actually needed to do so. ‘Suddenly they could complain, brainstorm and share experiences with like-minded people,’ says Ter Hoeven. ‘They had simply never had a dataworker colleague before. Furthermore, I saw their self-confidence grow during the filming now that they were the ones in the spotlight. Dataworkers may need contact and appreciation more than they themselves sometimes think.’

Data quality

Datawork raises a lot of questions. These are not only about the well-being of employees, but also about data quality. AI has an increasing impact on our daily lives. Ter Hoeven: ‘An example: some dataworkers annotate medical procedures. For example, they have to indicate whether a doctor's hand is shaking during an operation. But these people usually have no medical background. So how reliable is that data?’



What consequences does it have if AI learns from people without sufficient expert knowledge and information about the context of the data to be ‘translated’? To improve quality, it may make sense to better match workers' skills to tasks and provide better guidance. This is only possible if you invest more in data workers.

More transparency

In her book The Tech Coup, Marietje Schaake discusses how tech companies are conducting real-time experiments with user data to optimise their platforms, often without users' knowledge. This can have serious consequences for privacy, democracy and personal freedom. That is why she calls for stricter regulation and more transparency.

The same applies here. In my opinion, organisations should be more transparent about the contributions of data workers and their potential risks. I therefore hope that European legislation like the Corporate Sustainability Due Diligence Directive (CSDDD) will also apply to how companies develop their AI. After all, this smart technology is increasingly affecting all kinds of processes. In short, the conversation with Ter Hoeven again leads me to many new questions. I will be seeking answers to those in this podcast in the coming months.

Want to know more? Listen to the podcast episode on Ghostwork here. This blog was also published on Gigpedia.org.

The GHOSTWORK-project received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme Grant agreement No. 101003134