How Data in India Went From Being a Tool of Economic Planning to Big Data Aggregation

The following is an excerpt from Lives of Data: Essays on Computational Cultures from India, edited by Sandeep Mertia, foreword by Ravi Sundaram, Institute of Network Cultures (Amsterdam, 2020). This book is published under the Creative Commons Attribution-NonCommercial- NoDerrivatives 4.0 International (CC BY-NC-SA 4.0) license and is available here.It is not difficult to see what is wrong with official statistics in India. There is gap between theory and practice. There is gap between the means and the end in the absence of any clearly perceived purpose.~ P. C. Mahalanobis, Statistics as Key Technology, 1965Data is its own means. It is an unlimited non-rivalrous resource. Yet, it isn’t shared freely. What began as a differentiator is now the model itself.~ Nandan Nilekani, Why India needs to be a Data Democracy, 2017Data shadows our situation. Many believe it can determine our situation. There were enthusiastic claims that ‘Big Data’ would lead to a ‘fourth industrial revolution’ and the ‘end of Theory’, and that it will ‘transform how we live, work, and think’. Arguably, much of the early 2010s hype around the big data revolution has already been replugged into popular narratives of artificial intelligence (AI). The media infrastructures that enliven digital data and the fast-moving claims of data revolution are now evidently more globalized and capitalized than ever before. If we look a little under the hood, techniques such as data mining have moved from the margins of techno-scientific practice to normative centers of global computing in less than two decades. How did data become so powerful, pervasive, and relatable in the first place? To understand the global momentum of the data revolution, it is crucial to inquire into the many lineages, affinities, and relations of data in context-sensitive ways.§Lives of Data: Essays on Computational Cultures from India Ed. Sandeep MertiaInstitute of Network Cultures (Amsterdam, 2020)Data Revolution(s) in ContextThe contrast between the two epigraphs above is a good place to begin tracking lives of data. The first epigraph is from a lecture in 1965 at the 125th Annual Meeting of the American Statistical Association by P. C. Mahalanobis, founder of the Indian Statistical Institute (ISI) and a member of the Planning Commission, a powerful body at that time. In this lecture, he emphasized the need to establish a ‘purposive’ view of statistics as a ‘fully developed technology of a multi-discipline character’. This was especially so in the ‘underdeveloped countries’ where the ‘principle of authority’ of the government reigned supreme over ‘independent’ statistical analysis and interpretation. Mahalanobis made these observations at a time when the ISI and India’s official statistics and economic planning system were receiving global recognition for pioneering work in research, training, sample-survey methods, and economic planning. He clearly placed statistical knowledge production in the service of postcolonial nation-building. The desire to perceive a clearly defined ‘purpose’ when the ISI was already at the cutting edge of large-scale data collection and processing stands in puzzling contrast to contemporary modes of data-driven governance which claim ‘data is its own means’.The second epigraph is from an opinion piece by Nandan Nilekani, co-founder of Infosys and founding Chairman of Unique Identification Authority of India (UIDAI), the government body responsible for the world’s largest biometric database, Aadhaar. In this article he argues for the value of big data and artificial intelligence for disrupting existing patterns of information management, and cautions against ‘data colonization’ by state and global platforms. It is important to note that what we now know as Aadhaar actually began in 1999 as an identity card project for citizens living in border states. The Rangarajan Commission, set up in January 2000 to look into the ‘growing concern regarding the quality of data’ in the entire statistical system, recommended the creation of a ‘centralized database of citizens (population register)’ in which every citizen would have a unique identification number. Within a few years of the UIDAI being set up in 2009, Aadhaar became a primary key linking databases of bank accounts, mobile phones, income tax returns, payment apps, email IDs, and so on, even if such a linking is not mandated by the law. Aadhaar has afforded development of application programming interfaces (APIs), and web and mobile applications with payment interfaces demanding Aadhaar verification for government and private services across domains. Perhaps nobody in 2009 could have imagined connecting biometric data to mobile phone SIM cards. Anumeha Yadav (Chapter 7) draws on her detailed field reports to show how the project grew from select pilot implementation in 2011 to a national legal and policy imperative by 2017. She notes a growing public alertness to the importance of enrolling with Aadhaar to ensure the ratification of rights, irrespective of the unclear legal status and the widespread technological glitches in the everyday functioning of the project. The story of Aadhaar raises questions about what counts as data, who can design its purposes, and how its means and ends are discovered. It is a story that is at once expansionist and contingent: in India, the evolution of Aadhaar indicates that we need to reflect on computational culture without prefiguring the object of computation and its potential relationship to taxonomies of social control.To understand the shift that has taken place between the data in the mid-20th-century statistical regime of economic planning and big data aggregation and prediction in the contemporary, we need to re-examine the history of computing in India, which has been largely tethered to the IT revolution. We examine different techniques and affordances of computation in different media ecologies consisting of human computers and mass-media such as telecom in the decades before the emergence of the internet. In Chapter 1, I explore the role of the ‘first computers’ of India—both human and electronic—from the 1930s to 1960s in generating official statistics. In Chapter 2, Karl Mendonca analyses the role of computerization in the 1980s at a major advertising company involved in the cinema business, and how the company later repurposed its cinema distribution network into a courier company. In different ways, both chapters challenge the notion of a clear and stable rationale for the evolution of computers and big data.It was not until the early 2000s that database practitioners began to seriously look at data mining as a mode of knowledge production. New concepts of scale and computational processing power emerged and developed through trade-offs and reconfigurations of statistical accuracy, localized data storage and retrievability, hardware and software load balancing, and electricity consumption. Of particular importance was the shift from ‘relational’ (structured design) to ‘non-relational’ (distributed design) database management systems. Here, we must not forget the co-production of affordances, users, and publics. After all, a computer database is only one specific instance of a wider set of relationalities made durable by the thoroughly material and well-constructed craft of software engineering—even if it is widely imagined to be abstract and mystical. In the Indian context, while the IT industry has become symbolic of a new middle-class imaginary of technology and social mobility, the epistemic cultures of software engineering and their relations with global developments are yet to be adequately unpacked. We do not know how India’s political and infrastructural conditions affect Aadhaar’s database design or the development of high energy-consuming data centers for ‘data sovereignty’, to name but two examples.In a post-colony like India, any critical engagement with data-driven knowledge production has to consider the persistent role of colonial biopolitics. It is well established that statistics—formerly termed ‘political arithmetic’—have played a key role in the production of people, identity, and nation-states. From the construction of enlightenment ideas such as the ‘individual’, national populations in Europe, and the ‘citizen’ in the USA, the intended and unintended consequences of counting and categorizing people run far and wide. European colonies became sites for exotic and imperious enumerative and classificatory systems framed by orientalist pedagogies that displaced and serialized existing social orders. From the inventions of fingerprinting and the enumeration of complex traditions of faith and social difference into the fixities of religious identity and objectification of caste, such a biopolitics sought to make populations knowable and governable.Post-independence India saw an expansion of bureaucracy, official statistics, and planning. Subsequently, government and transnational businesses used data modelling of the economy and populations to understand citizenship entitlements and consumer profiles. The intersections of state and market interests after economic liberalization in 1991 transformed the national political economy as well as the everyday cultural conditions of governance. In particular, the entry of private digital technology vendors and consultants in state and international development projects afforded new means and incentives for collecting and analyzing data. Supporters of the Aadhaar project often claim that the state is a much more benign collector of data than companies such as Google and Facebook. Putting questions of veracity aside, the implications of this distinction are suggestive. The purported commensurability between data imaginaries and practices of India’s welfare state and those of big technology companies widens the scope of inquiry into the politics of data-driven governance and bureaucracy. From state-owned biometrics to state-promoted transnational mobile apps, the contemporary (surveillance-friendly) road between the ideology of the state and that of popular digital media is punctuated by diverse and distributed data-driven pathways.Representative image of an Aadhaar card. Photo: PTIAt one level, the shift from colonial fingerprinting to contemporary biometric technologies shows some continuity in terms of tactics of governance and subjectification of bodies. If we look closely though, the machinic-readability of fingerprints opens new analytical challenges for theorizing governmentality. The contemporary modes of data-driven subjectification are deeply entangled with proliferation of digital technologies of identification in governance, finance, media, and consumer products across developmental and business models. How can we map this expansion and proliferation in sociotechnically specific ways? From navigating the nudge marketing of discount codes on mobile payment apps to facing new determinations of citizenship and identity through myriad paper-based and digital documents, among other things, the emergent mutations of power, subjectivity, and data demand a closer look into the design and material form of media. This is particularly challenging in conditions of fragmented digital infrastructures, where diverse intermedial forms emerge and coalesce in everyday practices for bypassing the lack of end-to-end connectivity and formal access.Sandeep Mertia is a PhD Candidate at the Department of Media, Culture, and Communication, and Urban Doctoral Fellow at New York University. He is an ICT engineer by training, and former Research Associate at The Sarai Programme, Centre for the Study of Developing Societies.Lives of Data emerged from research projects and workshops at the Sarai programme, Centre for the Study of Developing Societies. It seeks to better understand the status of data objects, relationalities, and difference in computational cultures. A critical focus on India necessitates pluralistic vantage points for examining the contemporary global discourse of data revolution in relation to the enduring legacies of colonialism and 20th-century modernisation programs. From state-supported technological boosterism of its ‘digital superpower’ status to everyday lives of over a billion people in one of the most diverse and unequal societies in the world, India’s sociotechnical conditions assemble deeply contrasting lives of data.This collection of essays features a diverse group of interdisciplinary scholars and practitioners, engaging the emergence, limits, potentialities, politics, practices, and consequences of data-driven knowledge production and circulation. Encompassing history, anthropology, science and technology studies (STS), media studies, civic technology, data science, digital humanities, and journalism, the essays open up possibilities for a truly situated global and sociotechnically specific understanding of data, computing, and society.