India’s Data Workers: The Human Labour Making Machines Learn

Technological advancements reshape models of interaction among individuals, firms, governments, and across these groups. These changes often influence the fundamental nature of work, along with several dynamics of the labour market. Lin (2011) used historical US Census data for the period 1965-2000 (that used the Dictionary of Occupation Titles) to show the novel job titles it captured, for example, “web developer”, “chat room host”, and “radiopharmacist”. Using Lin’s approach, Autor et al. (2021) found that over 60% of employment in 2018 was under job titles that did not exist in 1940.However, while technology, such as automation, can displace workers from existing jobs or tasks, it also creates new work, reinstating demand for workers with specific expertise (Acemoglu and Restrepo 2018). Identifying these new occupational titles is especially crucial for developing countries like India, which have a large, informal and vulnerable workforce and are experiencing rapid technological advancements.Technological innovations, demographic changes, and macroeconomic fluctuations have enabled proliferation of several forms of non-standard employment (NSE) worldwide. These include part-time/on-call work, temporary agency work/multi-party employment arrangements, disguised employment/dependent self-employment1 (International Labour Organization (ILO), 2016). The Covid-19 pandemic further accelerated this trend.Developing countries, marked by a high degree of casual employment, have also experienced this change in the nature of employment. A 2019 global survey found that 80% of the respondents preferred flexible work opportunities, and 65% of businesses reported cost-efficiency gains of such flexibilisation (International Workplace Group (IWG), 2019). India’s flexi-staffing industry rose by 15.3% during 2023-24 driven mainly by FMCG (fast-moving consumer goods), e-commerce, manufacturing, healthcare, retail, logistics, banking, and energy sectors (Indian Staffing Federation (ISF), 2024). This surge raises concerns about increasing informality and vulnerable employment, especially amid the rapid expansion of the gig and platform economy.The human labour behind artificial intelligenceWhile broad issues around India’s gig and platform economy have gained prominence, the emerging category of “data workers” (new work that is vital for Artificial Intelligence (AI) systems) remains largely overlooked in the discourse. Since the term AI was coined by a group of computer scientists (McCarthy, Minsky, Rochester, and Shannon) in 1956, it has evoked a mix of hope, fear, and uncertainty for the future of work. After several efforts, the ongoing AI revolution is now observing a global race for AI leadership. Generative AI (GenAI) models are swiftly becoming popular, like OpenAI’s ChatGPT,- which became the fastest-growing web application in history with 100 million monthly active users within 2.5 months and the attainment of 500 million users in a short span of time (Hadi and Najm 2023, Paris 2025).The term “generative” underscores the fact that these AI systems can create or generate new material autonomously without human input (Feuerriegel et al. 2023). However, a huge amount of human labour goes into development of these AI systems. Many of these AI systems (including ChatGPT, Google’s Gemini, DALL-E, among others) are based on a complex “human-in-the loop” (HITL) model (Rani and Dhir 2024). HITL uses the judgement of human data workers for annotation, labelling, and categorising raw data (like text files, images or videos) to train machine learning models (IBM, 2025). Data curators, labellers, content moderators, validators, and human feedback providers work to ensure that AI does not perform poorly or dangerously (for instance, in autonomous cars). Accuracy of these data is crucial for efficiency and better predictability/performance of AI models. Thus, data workers are the backbone of AI systems, ensuring their functionality, accuracy, and safety – ironically while themselves working in precarious, fragmented, and often invisible conditions.Why is this an important contemporary and future concern for India? To meet cost-efficiency goals, businesses increasingly rely on gig workers in the AI supply chain – often outsourcing tasks to crowd workers via digital labour platforms (DLPs) or smaller firms employing data workers.For instance, at the announcement of Amazon’s Mechanical Turk or MTurk (a virtual labour marketplace/crowdwork platform) in 2006, Amazon’s CEO Jeff Bezos referred to it as “artificial artificial intelligence”. It signified that the “Human Intelligence Tasks” (HITs) available on the platform were microtasks (often simple and repetitive) to be performed by a reserve army of cheap labour.A prominent outcome of data workers using MTurk (called ‘turkers’) for a project (by Jia Deng et al.), was the release of ImageNet dataset, the largest labelled image dataset, in 2009. It was fuelled by the work of millions of workers across the globe, who manually labelled a million images for very low wages.A 2016 survey by the Pew Research Center, of almost 3,000 turkers from the US revealed that over 50% of all workers reported hourly earnings below $5 (Pew, 2016). Unsurprisingly, a large majority of data workers are in the Global South, where wages are significantly lower. For instance, in Kenya, data workers mostly receive hourly wages of only US$2, while in Argentina hourly wages go as low as US$1.7.In addition, workers are often bound by Non-Disclosure Agreements (NDAs) by companies, further invisibilising their contributions to AI systems (Dachwitz 2024). Besides low wages, concerns have also been raised about adverse mental health outcomes for data workers engaged in content moderation. Content moderators are regularly exposed to traumatising content, which has long-term psychological implications – sometimes even leading to drug dependency (Gebrekidan 2025).India’s role in global AI supply chainsAccording to the European Commission, India registered one of the fastest rates of digitalisation (11%) during the period 2011-2019 – similar to China – making the National Industrial Classification (NIC) (2008) used by labour surveys too dated to capture most digitally-driven new work. Where then, were gig, platform, and data workers captured? India’s annual Periodic Labour Force Survey (PLFS) uses the National Classification of Occupations (NCO) (2015), in which “data entry clerks” are captured by “Family 4132” (Figure 1). Essentially, the categories include traditional clerical data input roles and do not explicitly cover modern AI-related data work. Overall, therefore, these workers remain statistically invisible, as is the case for digital platform-based gig workers.Platform gig and data work in AI supply chains are key present-day illustrations of the “reinstatement effect” (Acemoglu and Restrepo 2019) of technology. Job advertisements (see Figures 2, 3, and 4 below) for roles involving data work, like data validation and data annotation, list key competencies including data analysis, excellent written and verbal communication skills, attention to detail, among others. Figure 5 illustrates the rising demand for data workers in India (now emerging as a key hub for data annotation), powered by a diverse workforce, producing high-quality datasets for global use. In 2024, an estimated 50,000 Indian (freelance) annotators were present on international digital platforms, and 20,000 full-time annotators within India, according to this Economic Times report (citing data from TeamLease).The same report also states that the global market for data annotations is valued at an estimated US$8.22 billion, and is expected to grow swiftly at nearly 26.2% annually by 2028. From US$250 million in 2020-21, India is expected to service over US$7 billion of the global annotation market by 2030 (National Association of Software and Service Companies (NASSCOM, 2021). Even India’s ‘National Strategy for Artificial Intelligence’ identifies data annotation work as having the potential of “absorbing a large portion of the workforce that may find itself redundant due to increasing automation” (NITI Aayog, 2018).But, besides other issues, a concern is that the HITL model may lead to potential de-skilling of workers performing repetitive tasks to train or improve AI systems. Additionally, while location-based gig work has gained regulatory2 attention through collectivisation efforts (Tiwari 2025, Jain 2025, Elizabeth 2024) – often supported by informal labour unions and widespread public discussions – AI data workers remain largely absent from mainstream discourse.Figure 1. Occupations under ‘Family 4132- Data Entry Clerks’ group from NCO-2015Group CodeOccupation TitleDescription4132.0401Data Entry Machine OperatorEnters alphabetic, numeric, or symbolic data into computer, and verifies it.4132.0402Domestic Data Entry OperatorElectronically enters data (daily/ hourly work reports) on client or office sites.4132.0600Coding Machine OperatorHandles coding machines to print codes on different materials.4132.0800Duplicating Machine Operator/PhotocopierOperates and monitors photocopying machines.4132.0900Embossing Machine OperatorOperates power driven embossing machines.4132.1000Addressing Machine OperatorOperates electrically-driven printing machines.4132.1300Book Keeping Machine OperatorRecords business transactions using computer softwares, and performs general clerical duties.4132.1400Bill Processing ClerkPrepares bills, statements, calculates payrolls and other amounts, using computer software.4132.9900Data Entry Clerks, OtherOperates book-keeping and computing machines not elsewhere classifiedFigure 2. Data labelling – permanentFigure 3. Data annotator – freelanceFigure 4. Classification data annotation – freelanceFigure 5. Demand for data workersPolicy directionsAlthough India ranks 14th in AI research globally, with a share of 1.4% during 2018-2023, compared to US’s share of 30.4% and China’s share of 22.8%. However, it has already come into focus as a global market for AI technologies – recently emerging as the second largest, and among the fastest growing markets globally for ChatGPT. As the future implications of the ongoing AI revolution remain obscure for all, India stands at a pivotal moment to shape its role in the global AI supply chain. To fully leverage AI’s economic and (decent) employment potential, a coordinated policy approach is needed. While a national AI strategy lays down a blueprint, updating NCO to encompass AI data-related jobs (including crowdsourced microtask work), establishing AI-focused skill development hubs, regulating gig work in the AI supply chain, and promoting AI-related research and development in an equitable and inclusive manner, are crucial.There is a need to identify such gig work via digital labour registries, promote the upskilling of workers, and ensure the accountability of platforms throughout the chain. The uncertain AI era needs proactive measures to avoid continued polarisation3 of skills and jobs (Kuriakose and Iyer 2020) in India’s labour market. Moreover, declining labour share of income (Karabarbounis and Neiman 2013) owing to technological advancements and popularity of work fragmentation, need immediate regulatory, civil society, and legislative responses. Building a resilient, ethical AI workforce requires both innovation and inclusion. As a ‘hub’ for AI supply chain labour, India has an opportunity as well as a responsibility to improve labour market conditions for these data workers, who must not be disconnected from the wider benefits they generate.This article was originally published on Ideas For India.