India's Endangered Languages Need to Be Digitally Documented

Two hundred and twenty Indian languages have died in the last 50 years. With minimal and ineffective government efforts, what can we do to save our languages?

Every two weeks, a language dies and with it, a wealth of knowledge forever. In India alone, there exist more than 780 languages. The rate at which languages are dying here is extremely high as over 220 languages have died in the last 50 years. In India, 197 languages are categorised as endangered. These are further divided into four subcategories – vulnerable, definitely endangered, severely endangered, critically endangered – by the UNESCO.

Out of these 197 language, only two languages – Boro and Meithei – have official status in India while many others do not even have a writing system. When one takes into account the fact that 7.8 million Indians are visually impaired, there is a drastic need to use digital tools to preserve and grow India’s endangered languages. While there has been some effort to do the same for the 22 recognised official languages of India, the remaining languages have not received any focus.

The recent death of a language like Eyak confirms that more often than not, a language dies with the death of the older members of a tribe.

The endangered languages – which mark 96% of the total number of languages in the country – and indigenous languages of India largely lack accessibility tools. In fact, accessibility tools for most Indian languages are not affordable and are proprietary in nature.

India’s Internet user population is growing by leaps and bounds – the total number of Internet users in this country will reach 450-465 million by the end of this month. Of India’s 1.27 billion people, more than 30% are illiterate, and only 10-30% understand English, which is predominantly the language of the Internet. A recent Google-KPMG report states that more than 70% of the India’s Internet users trust content in their native language over English. The lack of native language content and the lack of electronic accessibility tools therefore plays an important factor in stopping a large number of people from accessing information and contributing to the knowledge commons.

When confronted with a problem of this magnitude, there are a few vital things that must be to done to preserve dying languages. Creation of audio-visual documentation of some of the most important socio-cultural aspects of the language such as storytelling, folk literature, oral culture and history is a start. When done by native language speakers, along with annotations of the same in done in a widely-spoken language such as English or Hindi, it is one way of creating digital resources in a language. These resources can be used to create content and linguistic tools to grow the languages’ reach.

Sadly, there is little that the central government is doing for these languages. Several organisations however, are making some effort to document native languages.

There is something that every single individual, who speaks a less-spoken language, or is in contact with a native speaker of an endangered/indigenous language, can do. Languages that are dying need digital activism to grow educational and accessibility tools. That can happen when more public and open repositories like dictionaries, pronunciation libraries, and audio-visual content is created.

However, not many people know how to contribute in a form that can used by others to grow resources in a language. Especially in India, contributing to a language is largely skewed by the notion of producing and promoting literature. But in a country where more than 30% of the population is illiterate and a large number of languages are orally spoken, it is important that the language content is predominantly audio-visual and not just text. More importantly, there is a need for openness so that the whole idea of growing languages does not get jeopardised by proprietary methods and standards.

But if not literature, what does one really contribute to?

There are plenty of things one can do to contribute towards documenting a language, depending on the skill-set. Every language has a wealth of oral literature which is the most crucial thing to document for a dying language. Several cultural aspects like folk storytelling, folk songs, other narratives like cooking, local festival celebration, performing art forms and so on can be documented in audio-visual forms.

Thanks to cheaper smartphones and an ocean of free and open source software – one can now record audio, take pictures and shoot videos in really good quality without spending anything on gears. There are open toolkits that aggregate open source tools, educational resources and sample datasets that one can modify and use for their own language.

In the age of AI and IoT, one can indeed build resources that will enable their languages to be more user friendly. As explained earlier, most screen reader software that the visually impaired or illiterate people would use do not exist because of the lack of good quality text-to-speech engines. Creating pronunciation libraries of words in a language can help in building both text-to-speech and speech to text engines that eventually can better the screen readers and other electronic accessibility solutions. Cross-language open source tools like LinguaLibre, Kathabhidhana, and Pronuncify help record large number of pronunciations. Similarly, for languages with an alphabet, educational resources for language learning can be created with open source tools like Poly and OpenWords.

Building these resources might not result in transforming the state of many endangered languages quickly but will certainly help in gradually bettering the way many people access knowledge in their language.

The work of some of the groundbreaking initiatives like the Global Language Hotspots by the Living Tongues Institute for Endangered Languages and National Geographic can be used to start language documentation projects. But it is always recommended to make the work output available with open standards so that others can build solutions on the top of existing interventions.

In 1969, the Indian government had established the Central Institute of Indian Languages (CIIL) in Mysore to further research and documentation of Indian languages, and a scheme called “Protection and Preservation of Endangered Languages of India” was introduced in 2014 by the Ministry of Human Resource Development to enable CIIL in initiating projects for endangered language conservation.

However, there is not been much in the way of actual outcomes when it comes to government-led activities for endangered language documentation; especially when it comes to open access of any published works. People’s Linguistic Survey of India”(PLSI), a non-government-led survey was being conducted during 2012-13 in the leadership of Ganesh Devy.

A few year back, Gregory Anderson, founder of Living Tongues, and Professor K. David Harrison, associate professor of Swarthmore College in Pennsylvania, US discovered a hidden language called Koro spoken in Arunanchal Pradesh. In 2014, Marie Wilcox, the last living speaker Wukchumni, a North American language, created a dictionary to keep her language alive.

Imagine, where these languages would have ended up if Anderson and Harrison, and Marie did not take those baby steps back then. India’s linguistic and tech community must start doing the same.

Subhashish Panigrahi is a Bangalore-based communications, partnership and community strategist, educator, and a long time Free/Libre and Open Source advocate and contributor. He has worked over six years in global nonprofits like Mozilla, the Centre for Internet and Society, and Wikimedia Foundation. Follow him on Twitter at @subhapa