Searching for Sanskrit Speakers in the Indian Census

This series has two parts, this is the first. The first part discusses Sanskrit from having analysed at depth its relative rankings within the Indian census results. It focuses on the last two censuses in 2011 and 2001. The second part will be a discussion of how Sanskrit is operationalised for strategic soft power applications related to the under-appreciated realm of faith-based development. 

As far as historical linguists and sociologists are concerned, Sanskrit became a second language around the beginning of the post-Vedic Period (ca. 500 BCE). This is when it entered a post-vernacular phase, regardless of the theological presumption that the perceived devabhasha is eternal; as emeritus professor of Sanskrit, Madhav Deshpande, explains, it has undergone significant historical changes.

One of the key assets used to justify the use Sanskrit as tool for the development of society is its perceived linguistic purity. However, it is argued that only a “pure” Sanskrit can deliver the utopian world it is used to inspire.

What, exactly, might a pure Sanskrit sound like? 

This is a particularly vexing question, not only for the descriptive linguist, who is aware that even in the earliest layers of the Vedic corpus (Ṛgveda), hundreds of loan words from other languages and language families are found. But also, the metaphysics of a political theology that is based on a moral framework which is indelibly flawed. The Sanskrit promoted and spoken today is not “pure”.

Put another way, its linguistic fuel, or, dare I say, śabda śakti, is more like the fuel pumped into the tank of one’s Bajaj Discover motorbike, rather than the fuel that will be used to power India’s first rocket to the sun, Aditya-L1, later this year. How, then, could this reclaimed hybrid form of Sanskrit, spoken today, possibly be used to generate the moral the intended reformation?

It was during the post-Vedic period that vernacular Sanskrit, otherwise known as bhāṣā, begins to show significant changes, simplification and loss of archaic forms. In the contemporary revival movement, simplification is essential to increase its acquisition as a second language. This is at the point where vernacular Sanskrit can be feasibly equated with a Prakritised version, or a Sanskritised form, of say Hindi.

Many readers would be aware of the rumours about a village, somewhere in India, in which all the inhabitants apparently only speak fluent Sanskrit. There are countless lists on the internet waiting to confirm anyone’s bias about the truth of these claims. What are we to do, then, with these lists and these claims?

Also read: Centre Spent 22 Times More on Promoting Sanskrit Than Other 5 Classical Languages Combined

While, certainly, there is an intangible right enshrined in the UN Declaration of Human Rights and the Indian constitution that enables people to speak, learn and promote their heritage language, there are other issues in relation to Sanskrit, which might require further inspection.

Over the past 16 months, I have gone over the available census data (released late 2018) to find which districts and sub-districts returned the highest numbers of tokens related to Sanskrit as a “mother tongue” (L1), and second (L2) and third (L3) language. All these data are found in the C-16, C-17, and ST-15 tables on the government’s Census website, as excel spreadsheets.

While the data does not prove that people do speak Sanskrit, it shows us where those who have an affinity to, and an aspiration for, might be. It is, in some ways, a map of affect in relation to Sanskrit.

Before we get into comparing the 2011 and 2001 census results, this first table is based on archival research. It shows all the total mother tongue L1-Sanskrit tokens returned for every Indian census, beginning in 1881.

Based on the 2011 tokens, this map shows the total mother tongue L1-Sanskrit tokens for each state. Maharashtra (3,802), Bihar (3,388), Uttar Pradesh (3,062), Rajasthan (2,375) and Madhya Pradesh (1,871) make up the top five states.

If we compare the better performing states in 2011, notice the dramatic changes that occurred since 2001. Did Uttar Pradesh have a mass exodus of Sanskrit speakers to other states? While it was clearly the highest ranking state in 2001, and suffered a significant 57% reduction, it still ranks third in 2011. This table shows the predominance of Sanskrit tokens to be located in the Hindi belt.

Having compared the 2011 and 2001 censuses, it is clear from one census to the next that a district might lose upwards of 90% of its tokens. This is even more curious a thing, in some ways. Let’s look more closely at Uttar Pradesh. This next map shows the district-level mother tongue tokens from 2011. Kanpur Nagar, Sitapur and Sultanpur are the top three districts. Interestingly, places one might associate with Sanskrit, say, like Benares, do not have high numbers of tokens.

Let’s zoom in more closely on Sitapur district. This map shows the location of Sitapur district within Uttar Pradesh. In the following map, Sitapur district’s sub-districts (tehsils) are compared.

In 2001, Sitapur district returned the highest number of L1-Sanskrit tokens (4,222) across the nation. However, its fortunes have since turned. In 2011, it only returned 722. Where did all the alleged Sanskrit speakers in Biswan sub-district go?

The following table ranks the nation’s 2011 top 12 districts. Notice that Sitapur district has dropped to fifth place. Maharashtra has four districts, Bihar has three districts, Uttar Pradesh has two districts, and Madhya Pradesh, Rajasthan and Karnataka each have one district. Also, 52% of the top ten total is urban. The main distinction between rural and urban is whether a community has a population over 5,000 residents.

The next table ranks the nation’s 2011 top ten sub-districts. While Uttar Pradesh and Maharashtra both have three sub-districts, Rajasthan, Madhya Pradesh and Bihar each have one sub-district. At this administrative level, the predominance of urban tokens is even more pronounced, at 58%. In Karnataka, 56% of the state’s total is located in the Bangalore sub-district.

Let’s move across to Madhya Pradesh and focus, first, on Jhiri, which is known as the “Jurassic Park” of Sanskrit villages. As part of my Imagining Sanskritland project, I have written several articles and made a few short films about Jhiri. It is located in Sarangpur sub-district, Rajgarh district. Supposedly the 976 villagers only speak fluent Sanskrit, all the time. They shifted from the kheti bhasha, Malvi, after inviting Samskrita Bharati to come and run a Sanskrit training programme almost 20 years ago. Today, however, the sub-district barely returns any L1-Sanskrit tokens. While the sub-district total is 18 (sixth in MP), the district total is 61 (22nd in MP). Yet, this hamlet supposedly remains “lost in time” and indelibly mentioned on India’s Sanskrit village lists.

This next map shows all the L1-Sanskrit returns at the district level.

Yet, respectively, Pipariya sub-district and Hoshangabad district are both the highest-ranked sub-district (26% of MP’s total) and district (28% of MP’s total). A quarter of the district’s total, and 5% of the state total, comes from L1-Sanskrit Scheduled Tribes (ST) tokens.

The L2 languages are Bhili, Bhilodi, Gondi, etc. But between the two censuses their numbers have declined. In some cases by over 50%. For example, the following table shows L3-Sanskrit’s position in this district in relation to both Bhili/Bhilodi and Gondi as the L1.

How is it that these villages in Hoshangabad district, which supposedly have mother tongue speakers of Sanskrit, are not on any Sanskrit village list on the internet? If we glance back up to the top ten sub-district table, we notice that Pipariya is not the highest ranked sub-district. Both Dighalbank, Bihar (558) and Pachpahar, Rajasthan (531) return higher token amounts. Both are also overwhelmingly Rural tokens, as well. Logically, this means that these L1-Sanskrit tokens are to be found in villages. Yet, they too do not find any mention on any Sanskrit village list on the internet.


While the L1-Sanskrit total increased by 43% from its 2001 total of 14,135 to 24,821 in 2011, the L2 and L3 numbers dropped by 9% and 48%. It is possible that the L2 and L3 figures reflect a more real-world situation. My position is that the mother tongue Sanskrit figure is possibly more aspirational than literal. If that is the case, then this is a worrying sign for Sanskrit’s future.

Also, Sanskrit clusters in such a predictable pattern that it is surprising if variations of the Hindi-English-Sanskrit cluster do not feature. This H-E-S clustering predominates regardless of Sanskrit’s position as an L1, L2 or L3. For example, considering Sanskrit as the L1, the table below shows how Hindi and English combine to give respective L2 and L3 totals of 69% and 77%. In several cases, this rises to above 90%.

What this data tells us is that it is very difficult to believe the notion that Jhiri is a “Sanskrit village” where everyone only speaks fluent Sanskrit at a mother tongue level. It is also difficult to accept that the lingua franca of the rural masses is Sanskrit, when most the majority of L1, L2 and L3 Sanskrit tokens are linked to urban areas.

The predominance of Sanskrit across the Hindi belt also shows a particular cultural/geographic affection that does not spread equally across the rest of the country. In addition, the clustering with Hindi and English, in the majority of variations possible, also suggests that a certain class element is involved.

Essentially, people who identify as speakers of Sanskrit appear to be urban and educated, which possibly implies that the affiliation with Sanskrit is related in some way to at least some sort of Indian, if not, Hindu, nationalism.

Patrick McCartney, PhD, is a Research Affiliate at the Anthropological Institute at Nanzan University, Nagoya, Japan. He is trained in archaeology, anthropology, sociology and historical linguistics. His research agenda focuses on charting the biographies of Yoga, Sanskrit and Buddhism through a frame that includes the politics of imagination, the sociology of spirituality, the anthropology of religion, and the economics of desire. His social media handle is Patrick McCartney