Daily News

LDC-IL launches 16 datasets to drive AI research in Indian languages

Published

3 years ago

January 10, 2024

The Linguistic Data Consortium for Indian Languages (LDC-IL), operating under the Ministry of Education’s scheme, focuses on creating digital corpora in various Indian languages. During the 8th Project Advisory Committee meeting at the Central Institute of Indian Languages (CIIL) in Mysuru, chaired by Shailendra Mohan, director of CIIL, LDC-IL introduced 16 novel datasets in Indian languages. This ground-breaking initiative aims to advance research in Artificial Intelligence (AI) and Machine Learning (ML) by providing valuable resources.

These datasets, a first of their kind, are designed to support the development of technologies in Indian languages, including Automatic Speech Recognition and Live Voice Translation. They are instrumental in enhancing the precision and efficacy of tools in Indian languages. The datasets encompass 12 scheduled languages like Hindi, Bengali, Tamil, Marathi, Kannada, Malayalam, Odia, Assamese, Konkani, Maithili, Urdu, and Nepali. Additionally, there are two variants of Indian English, specifically the Bengali variant of Indian English and the Kannada variant of English.

In a notable move, the institute also released datasets for Chhattisgarhi, traditionally grouped with Hindi. This reflects the government’s commitment to advancing education and technology for all mother tongues in India, aligning with the recommendations of the National Education Policy-2020.

The availability of these datasets on the Data Distribution Portal of LDC-IL, accessible at https://data.ldcil.org, marks a significant contribution to linguistic research and the AI and ML development landscape. The Linguistic Data Consortium for Indian Languages, as the largest repository of curated text and speech resources in Indian languages, now boasts a total of 57 datasets covering 21 Indian languages.

These datasets, distinct from real-world data collected from verified sources rather than crowd-sourced, serve as crucial resources for training and benchmarking AI and Generative AI-based technologies. The applications derived from these datasets are expected to promote and strengthen linguistic diversity in India.

Apeejay Newsroom

LDC-IL launches 16 datasets to drive AI research in Indian languages

Daily News

LDC-IL launches 16 datasets to drive AI research in Indian languages

Related Stories

The Musical Interview with Anamika Jha

How college challenges can make you emotionally stronger

Student Artwork: Sketch by Seerat Bansi, Class IX, Apeejay School, Rama Mandi

‘Recruiters value a positive attitude, curiosity, and good communication as much as technical knowledge’

Precision, Energy, and Unity mark the mass PT Drill on the school turf

Nurturing excellence through strong school culture

This Apeejay Noida topper didn’t let Covid, father’s death, keep him down

Apeejay School, Panchsheel Park hosts a heartfelt farewell

On YouTube, content is king, says Sanvi Narula, a 13-year-old YouTuber

Delhi girl reveals deep, dark secrets of wildlife photography

Apeejay School of Management infuses with Christmas spirit

Little explorers, big adventures at Summerlude 2026

Apeejay School, Saket students visit Mother Teresa Jeevan Jyoti Home

Welcoming New Beginnings: Apeejay School, Saket hosts parent orientation 2025–26

Vrindavan Dandiya Utsav 2025: A celebration of culture, joy, and learning

A dazzling evening of rhythm and Joy

Apeejay Newsroom

LDC-IL launches 16 datasets to drive AI research in Indian languages

Share this story:

Related Stories

The Musical Interview with Anamika Jha

How college challenges can make you emotionally stronger

Student Artwork: Sketch by Seerat Bansi, Class IX, Apeejay School, Rama Mandi

‘Recruiters value a positive attitude, curiosity, and good communication as much as technical knowledge’

Precision, Energy, and Unity mark the mass PT Drill on the school turf

Nurturing excellence through strong school culture

This Apeejay Noida topper didn’t let Covid, father’s death, keep him down

Apeejay School, Panchsheel Park hosts a heartfelt farewell

On YouTube, content is king, says Sanvi Narula, a 13-year-old YouTuber

Delhi girl reveals deep, dark secrets of wildlife photography

Apeejay School of Management infuses with Christmas spirit

Little explorers, big adventures at Summerlude 2026

Apeejay School, Saket students visit Mother Teresa Jeevan Jyoti Home

Welcoming New Beginnings: Apeejay School, Saket hosts parent orientation 2025–26

Vrindavan Dandiya Utsav 2025: A celebration of culture, joy, and learning

A dazzling evening of rhythm and Joy