Knowledge Without Consent
The Ministry of Science & Technology has partnered with SARAL AI to convert research publications into ‘user-friendly’ social media content. But without clear parameters in scope, the initiative leaves ambiguity for the rights of researchers, academics, writers, publishers and more.
Have you ever scrolled across a social media creator who successfully explains a complex idea and summarises it into conversational bits—fit for a short Instagram reel—as if it were common knowledge? Now what if such summarisation is performed by an AI system? It doesn’t seem too harmful at first glance, right? In fact, it might even be perceived as a natural extension of the content we already consume, specifically making the lives easier for those in research and academia.
Except, there may be more to the story. The Ministry of Science & Technology, on April 13, announced the launch of SARAL AI under Anusandhan National Research Foundation (ANRF), an AI system designed to interpret research papers and translate them into ‘user-friendly’ social media content, podcasts, videos into 18 Indian languages. ANRF has evaluated nearly 20,000 research applications in just the last four months. These outputs represent an expanding national research space designed to transform and circulate that knowledge. It said that this initiative will make knowledge and research more accessible through large-scale digital circulation.
Making research accessible and funding knowledge production by building an AI system to circulate and transform the research sounds good, right? But there is one major caveat: India does not have any specific AI policy, and yet, the state has funded an AI system that will likely be trained on the intellectual contributions made by the country’s scholars. Does this system operate with the consent of those whose work it relies on? Are there any mechanisms for compensation when that work is used as training datasets? Is attribution guaranteed when derivative outputs are circulated? The Ministry’s announcement does not offer answers to these questions.
As research conducted by American linguist Emily M. Bender and her co-authors have pointed out, LLMs are trained on vast amounts of text drawn from publicly available sources, often assembled through large-scale web scraping with limited transparency around what is included. This means that if you are a writer, an academician, a researcher, or a journalist based in India today, there is a possibility that an AI system is reading, processing, and redistributing your work as social media content in a simplified manner. This is not a hypothetical situation for the near future in the age of AI, but already a pertinent reality of India’s AI ecosystem.
The official Ministry statement frames SARAL AI as a tool for simplifying research outputs. However, it does not present any clear parameters that restrict its scope to academic publishing alone. In times when large-scale Text and Data Mining are foundational to AI training, the absence of defined limitations opens the possibility for similar models to extend beyond academic publications, into journalism and literary writing as well. This lack of clarity becomes more consequential when viewed against the scale of India’s research ecosystem. India is among the world’s largest producers of research, publishing almost 300,000 plus research papers annually, yet its Gross Expenditure on Research and Development [GERD] has remained at approximately 0.6-0.7 per cent of GDP, with private sector funding at 36 per cent. Even though this leads to vast research, it remains an underfunded system without sufficient institutional and financial backing.
India does not have any specific AI policy, and yet, the state has funded an AI system that will likely be trained on the intellectual contributions made by the country’s scholars. Does this system operate with the consent of those whose work it relies on? Are there any mechanisms for compensation when that work is used as training datasets?
At the same time, research in India remains limited within a higher education system that is largely organised around evaluation and credentialing. Only a small proportion of institutes prioritise research-based learning, amongst the majority of institutions that remain focused on classroom teaching, with academic output defined by the marks and exams rather than sustained curiosity. In such a system, the value of research is already unevenly supported, making its unregulated transformation and redistribution through AI systems a matter of policy concern rather than a new technological advancement.
There is direct implication of these actions for research incentives. When intellectual work can be systematically processed, simplified, redistributed without clear safeguards for consent, attribution, or compensation, it raises the risk of original publications losing their rightful incentives in an already underfunded system. This begins to mirror the outcomes associated with piracy and wide circulation without compensation. The only difference being, the extraction here is performed by a state-backed AI system. Without any accountability or clearly articulated safeguards, what may begin as a tool for research accessibility has a possibility of transforming into a system that operates on a broader scale.
In the EU, creators are given an option to opt out of having their work used for such purposes by the Digital Single Market Directive. Under the EU’s Digital Single Market Directive (Directive (EU) 2019/790), Article 4 permits Text and Data Mining but allows the creators a choice to opt-out of such use. In the US, the ongoing legal challenges such as the case of New York Times vs OpenAI, have brought the concerns regarding unauthorised data used for training into focus, with media institutions arguing that AI systems are built on the uncompensated use of their archives, questioning what qualifies as ‘fair use’ and what does not.
These mechanisms may not be perfect, but they are still acknowledging the fact that intellectual and creative contributions are the result of sustained work and cannot be treated as freely extractable resources.
These global developments highlight what is missing in the Indian context. While other jurisdictions are attempting to define the terms under which intellectual labour can be used, India’s current approach foregrounds deployment without a comparable emphasis on safeguards.
India, however, instead of offering similar protective mechanisms in its AI initiatives or existing legal structures, has come up with a state-backed AI system that will be trained on the country’s intellectual contributions. Thus, largescale data extraction without protection mechanisms is further normalised for the people who produced it.
The Indian Copyright Act, 1957, Section 52(1)(a) permits “A fair dealing of any work… including research purposes.” Now if this provision is applied to AI driven summarising, interpretation, format conversion at a larger scale remains ambiguous. When the writers’ intellectual property is systematically processed and circulated at a scale, is it still considered ‘fair dealing’ of research?
Along with SARAL AI, the ANRF Foundation has also launched research driven programmes across sectors such as climate resilience, agriculture, infrastructure, and public health, while also onboarding nearly 250 institutions to streamline research processes. This indicates a system that is rapidly scaling both knowledge production and its circulation. All of it without a framework governing how that knowledge is reused once digitised and transformed. This initiative signals prioritising access, scale, and infrastructure development. SARAL AI is positioned as a solution to democratising research without a proper articulation of the rules governing its datasets.
If systems like SARAL AI become embedded within social media content, the difference between the original work and the simplified derivative begins to merge. Simplified summaries and AI reinterpretations lose the nuances of a research paper, making it susceptible to virality of it being used out of context, particularly on social media platforms that run on algorithms designed for accelerated audience reach. In such a scenario, the visibility of original authors becomes compromised, especially when attribution is structurally not guaranteed. This raises ethical and practical concerns for creative and research professionals.
If AI systems are going to mediate how knowledge circulates, then the terms of that mediation matter. Initiatives like SARAL AI presented as tools that make “research accessible” risk, normalising a model where intellectual work is processed and circulated without clear safeguards for consent, attribution, or compensation. For writers and researchers, this is a concern that directly affects control over their work, visibility of authorship, and the ability to derive value from what they produce.
And the question remains unanswered: Does this system operate with the consent of those whose work it relies on? Are there any mechanisms to compensate them for their work? Is there any guaranteed attribution irrespective of the form? And most importantly, whose work is being made accessible, and at what cost?
Until these questions are answered in policy terms, the cost of accessibility of knowledge will continue to be endured by those who produced it.
***
Madhuri Kankipati is a reader, writer, translator and independent researcher based in Khammam, Telangana. Her work explores the intersections of literature, culture, gender, pop culture and technology. She translates between Telugu to English, with a particular focus on women writers. Currently, she is independently researching the impact of GenAI for Indian publishing, authorship and creative labour from a policy perspective. Previously, her writing has appeared in Borderless Journal and Muse India. She also runs an Instagram account completely dedicated to arts and literature: @withlovemadhuu.