Crowdsourced Production of AI Training Data – How Human Workers Teach Self-Driving Cars How to See

Choir room

Crowdsourced Production of AI Training Data – How Human Workers Teach Self-Driving Cars How to See

Crowdsourced Production of AI Training Data – How Human Workers Teach Self-Driving Cars How to See


Dr. Florian A. Schmidt, Professor for Conceptual Design and Media Theory, University of Applied Sciences HTW Dresden

Division: Business & Economics


Since 2017 the automotive industry has developed a high demand for ground truth data. Without this data, the ambitious goal of producing fully autonomous vehicles will remain out of reach. The self-driving car depends on self-learning algorithms, which in turn have to undergo a lot of supervised training. This requires vast amounts of manual labour in data annotation, performed by crowdworkers across the globe. The crowdworkers both train AI-systems and are trained by AI-systems. Humans and machines work together in ever more complex structures.
An end of work is not in sight and according to interviews with experts conducted for the study the demand for this type of labour will continue to grow rapidly in the foreseeable future. However, as the study also shows, while this type of labour creates a new class of skilled crowdworkers, the precariousness of this work remains high because individual tasks are continuously under threat to either be automated or further outsourced to an even cheaper region in the world. As the study shows, 2018 has seen an influx of hundreds of thousands of crowdworkers from Venezuela specialising on these tasks. On some new platforms, this group now makes up 75 per cent of the workforce. These recent geographical shifts in the supply of labour are a symptom of deeper structural changes within the crowdsourcing industry that are reshaping work in 2019:
Prototypical microtasking platforms such as MTurk, here described as »established generalists«, typically serve as intermediaries that allow their clients to directly pitch any kind of tasks to a distributed crowd. While the platform does take some influence on the organisation of work, its preferred role is that of an infrastructure provider that does not want to be held responsible for the quality of the results or the working conditions. Within this old system, however, the established generalist, or »legacy platforms«, can’t deliver the degree of accuracy required by the automotive clients. This has led to the emergence of a number of crowdsourcing platforms designed to cater almost exclusively to clients from the intersection of automotive industry and AI research. Prominent examples for the »New Specialists« are Mighty AI, Hive (.AI), Playment, Scale (.AI) and The new specialists are well funded, fast growing, and have quickly gathered substantial crowd sizes – several hundred thousand workers each.
Crucially, they guarantee their clients at least 99.x per cent accuracy of the data. To be able to achieve this, they must invest in new, often AI-enhanced, special production tools that both support and control the workforce, and they must furthermore invest in the pre-selection and training of the crowdworkers, in more community support for the workers, and in complex layers of quality management and sub-outsourcing.
For the clients this offers an expensive but reliable full service package. For the crowdworkers this offers to some extent better working conditions, because they don’t have to deal with the various clients and their heterogenous tools and demands anymore and are reliably paid directly by the platform. However, this new arrangement raises far-reaching questions regarding the classifications of the workers as independent contractors.
Established generalists, too, now have started to transform their services towards producing AI training data, for example, Appen, which now also owns Figure Eight (previously CrowdFlower), CloudFactory, Samasource, and Alegion.; while MTurk, clickworker and Crowd Guru continue to follow a generalist approach.
More and more digital labour platforms now market themselves as AI companies. The term »crowd« is pushed into the background. This development is also reflected in the fact that the new specialist platforms appear »Janus-faced«: They have a client-facing company name, website and appearance focussed on »AI« – and an entirely different, crowd-facing name, platform and appearance, promising prospective workers easy money through microtasks. Because clients and workers now access separate websites, it has become easier to analyse the constellation of the respective workforces and their fluctuation between the platforms.

The report it is based on direct observation of the platforms, their communication with crowdworkers, their community forums, their press releases and advertising, and their tools – this was partly done by logging in as a crowdworker. Other sources were trade shows, business reports, journalistic articles, and interviews in publications such as Wired and TechCrunch. Most importantly, it is based on six qualitative interviews with CEOs of crowdsourcing platforms in the field: Daryn Nakhuda of Mighty AI; Kevin Guo of Hive; Siddarth Mall of Playment; Christian Rozsenich of clickworker; Marc Mengler of; and Hans Speidel of Crowd Guru. Complementary to this, five qualitative interviews with crowdworkers from Venezuela, Brazil and Italy were conducted.

Dr. Florian A. Schmidt, born 1979 in Berlin, has been studying digital labour platforms since 2009. A designer by training he is also a prolific writer and researcher. He received his PhD from the Royal College of Art in London in 2015. His doctoral thesis is a critical analysis of the design of crowdsourcing platforms, their history and their mechanics; published as the book Crowd Design in 2017. Schmidt was also involved in the development of the first instalment of the website by the German labour union IG Metall. Since 2018 Schmidt is professor for conceptual design and media theory at the University of Applied Sciences HTW Dresden.

Recent reports and studies on Digital Labour:
• Schmidt, Crowd-Produktion von Trainingsdaten, Hans-Böckler-Stiftung, 2019
• Schmidt, Crowd Design – From Tools for Empowerment to Platform Capitalism, Birkhäuser, 2017.
• Schmidt & Kathmann: Der Job als Gig – Digital vermittelte Dienstleistungen in Berlin, ArbeitGestalten, 2017.
• Schmidt: Digital Labour Markets in the Platform Economy, Friedrich-Ebert-Stiftung, 2017.



2019 Collaborators