Introduction
The Government of India has selected Bengaluru-based Sarvam AI as the first startup to develop an indigenous foundational Large Language Model (LLM) under the INDIAai Mission. This initiative is part of a broader strategy to enhance India's AI capabilities and democratize AI innovation and computing access.
INDIAai Mission
- Announced with a budget of ₹10,372 crore (~$1.25 billion) in March 2024.
- Aims to democratize AI innovation, enhance data quality, and position India as an AI powerhouse.
- Plans include establishing a high-end, accessible computing facility with 18,693 GPUs.
Sarvam AI's Role
- Sarvam AI will receive 4,096 Nvidia H100 GPUs for six months from firms like Jio, CtrlS, Yotta, and Tata Communications.
- The model aims to handle sophisticated reasoning and voice-first interactions across 22 Indian languages, including English.
- Expected to compete with global LLMs, targeting deployment at a population scale within six months.
Challenges in Development
Speech Babel
- Difficulties in obtaining and curating large datasets that represent India's linguistic diversity.
- Complexity in building models that recognize diverse grammars, structures, and syntaxes of Indian languages.
- Need to address biases related to gender, religion, and caste.
Content and Data Issues
- Data cleansing is time-consuming.
- Copyright and licensing present significant hurdles.
Talent and Workforce
- Building LLMs requires skilled researchers, engineers, and linguists, which are scarce and hard to retain.
- Success depends on engaging researchers and industry experts to build applications on these models.
Technological and Infrastructural Challenges
- Interoperability with diverse devices and platforms is challenging.
- Adapting to evolving technologies and optimization techniques is crucial.
Infrastructure and Data Initiatives
Despite a relatively small contribution to global AI research and developing cloud infrastructure, efforts are being made to boost AI funding and bridge data disadvantages. The AIKosh platform has recently been launched to address data needs.
Collaborations and Global Engagement
- Formal government exchanges and partnerships with global universities can help mitigate lag.
- Multinational corporations have established around 2,975 Global Capability Centers (GCCs) in India, employing 1.9 million professionals and generating $65 billion in revenue.