Language and literacy constraints have left millions on the margins of India’s digital story, denying them a range of speech-based assistive technologies. Researchers at the Indian Institute of Science (IISc) are fine-tuning a multilingual speech-to-text project that could help democratise speech technologies in the country.
REcognising SPeech in INdian languages (RESPIN) was conceived to address the issue of inadequate speech data in Indian languages and cater to the large majority of users who can communicate only through spoken languages and dialects. Launched in May 2021, RESPIN is set for completion by the end of 2023.
IISc has been collecting voice samples in nine languages – Bengali, Bhojpuri, Chhattisgarhi, Hindi, Kannada, Magadhi, Maithili, Marathi, and Telugu and their dialects – that could be sourced to develop accurate speech-to-text and natural language processing technologies for two domains: agriculture and finance.
“The work on the project is nearly complete. We are in the advanced stages of data curation and are looking at a release by the end of the year,” Prasanta Kumar Ghosh, associate professor at the Department of Electrical Engineering, IISc, told DH.
The researchers are collecting about 1,000 hours of speech samples in each of the nine languages. About 2,000 native speakers in each language are asked to read out sentences relevant to the two domains. The recorded data, to be released as open source, can be used to train computers or Machine Learning models to handle users’ requests/questions. In their advanced form, these models can respond to queries even when they are framed in informal, spontaneous language.
Tech for social inclusion
India’s digital push has covered large sections of its population, initiating them to a host of tech-enabled services, from farming expertise to insurance schemes to healthcare, but these technologies have also been limited by language barriers.
A user with access to a smartphone and mobile data but can speak only in his or her native dialect could, still, end up not availing of the desired service – it could be an informed recommendation on the right fertiliser to use, or reporting the loss of an ATM card. Apps built on RESPIN could bridge this gap.
The project is funded by the Bill and Melinda Gates Foundation and is being implemented in partnership with conversational AI startup Navana Tech and IISc’s ARTPARK (AI and Robotics Technology Park).
Ghosh said extensive work has gone into collecting and curating the samples to get the dialects right. In Kannada, dialects from regions including Ballari, Kalaburagi, and Mysuru are being incorporated.