NASSCOM Backs Text & Data Mining Exemptions for AI in India

IT industry body National Association of Software and Service Companies (NASSCOM) has called for a blanket exemption from copyright infringement for training AI models in its submission to a working paper on generative AI and copyright released by a committee under the Department for Promotion of Industry and Internal Trade (DPIIT).

It proposed introducing a text and data mining (TDM) exception to India’s copyright laws that would allow AI firms to train their models on copyrighted works as long as the content is accessed lawfully and “a good faith knowledge safeguard is met”, solely for the training and input-processing stage of machine learning.

NASSCOM counts Big Tech companies such as Microsoft and Google among its members. Co-founders of several Indian AI startups, including Fractal Analytics and Sarvam AI, serve on the IT industry lobby group’s executive council. While their role in the submission remains unclear, it should not come as a surprise if the proposal reflected the views of NASSCOM’s members and executives.

Here is a deep dive into the core arguments NASSCOM made in its submission to the government panel.

According to NASSCOM, India should permit TDM for training AI systems for both commercial and non-commercial purposes. However, the TDM exception should be without prejudice to applicable laws that protect specific categories of data, including personal and confidential data, it noted.

The IT industry body also called for a good faith knowledge safeguard to protect AI developers, on the condition that they pass the lawful access test, that is, if the TDM user “does not know” that the source of the data is infringing.

On the flip side, it argued that TDM protection should not apply where developers knowingly rely on a source that is infringing or poses a heightened risk of unlawful access.

In its 24-page written submission, NASSCOM said copyright owners should be provided clear statutory protection against TDM. To safeguard rights holders, it proposed a dual-track approach: Additionally, the IT industry body called for a legal framework that would allow the government to block copyright owners from opting out of AI training. For instance, if the source of data is used for research, cultural heritage activities, or other “public interest” uses, access should be considered lawful, NASSCOM said. However, this would not apply to content that is not publicly accessible, it added.

While NASSCOM advocated machine-readable opt-outs for publicly available online content and contract-based opt-outs for content that is not publicly accessible, it also flagged challenges with industry-wide implementation.

Currently, companies such as OpenAI, Meta, and Anthropic are embroiled in lawsuits worldwide, including in India, for allegedly infringing copyrighted materials to train their AI models. The debate centres on whether AI’s conversion of source works into mathematical representations, producing statistically derived outputs with no direct, traceable link to the original work, constitutes copyright infringement.

In its submission to the DPIIT committee, NASSCOM argued that TDM is “at most a technical infringement” and not a real copyright violation. “AI training requires accessing and, in some form, copying and storing data, but without showing, distributing, or transmitting it to people. These are purely machine-only steps to help models learn patterns. Interpreting the law to treat each technical step as an infringement would create legal uncertainty or require mandatory licences never intended,” it said.

NASSCOM contended that, to foster AI innovation, it is essential to recognise that mere access, copying, and storing can serve a new purpose unrelated to communication or transmission and should not be treated as infringement. It cited examples from foreign jurisdictions such as the US, EU, UK, Israel, Japan, and Singapore, which already allow TDM.

The Centre is currently reviewing the adequacy of the Copyright Act, 1957, to address challenges arising from the use of generative AI, Minister of State for Commerce and Industry Jitin Prasada told the Lok Sabha on December 17.

In its working paper, the DPIIT committee noted that a majority of stakeholders from the tech and AI industry advocated a blanket TDM exception enabling training of generative AI on copyright-protected works. However, the committee rejected this policy approach, saying such an exception for commercial purposes would undermine copyright and leave creators powerless to seek compensation for the use of their works in AI training.

The committee also examined a TDM exception with an opt-out right for copyright holders. This model was found inadequate, with the committee arguing that while it may benefit large content industry players, it would leave small creators largely unprotected due to a lack of awareness, bargaining power, and mechanisms to verify whether their content had been scraped despite opting out.

“Moreover, opt-out functionality does not prevent downstream reuse once the content is stripped of its metadata and transformed; hence, control over the data is irrecoverably lost. This model may also limit the availability of broad and representative datasets for AI training, especially if many rights holders choose to opt out. This could compromise the quality of AI systems,” the DPIIT committee said.

In light of the above, the DPIIT committee decided to adopt a hybrid approach and recommended a mandatory blanket licence in favour of AI developers, allowing them to train their models on copyright-protected works as long as the content is lawfully accessed.

Under this framework, copyright owners will not have the option to withhold their works from AI training. However, developers would be required to pay a certain percentage of the revenue generated from AI models trained on copyrighted content as royalties to creators.

The government panel said that this model aims to ensure the availability of all lawfully accessed copyrighted content for AI training as a “matter of right,” provide “fair compensation” to copyright owners, and reduce transaction costs and compliance burdens for AI developers.

However, it raises a critical question: wouldn’t this approach effectively take away the free will of creators by denying them the choice to decide whether they want their work used for AI training or to negotiate prices independently? Because forced consent cannot be treated as consent.

NASSCOM Backs Text & Data Mining Exemptions for AI in India

Editorial Context & Insight