The Copyright Play: AI, Section 52, and the Acts of Fair Dealing

Written as a five-act drama, this paper analyzes AI training under Section 52. It argues for recognizing tokenization as transformative research. It focuses on training process, that is distinct from generated output, and argues that it is a non-infringing necessity to encourage innovation.

Ritesh Karale & Ritaja Chattopadhyay*

March 15, 2026 13 min read
Share:

Act I: Setting the Stage

AI algorithms today are trained on humanity’s vast library of creation: every book ever written, every image uploaded, every line of code shared online. Large Language Models (“LLM”) are built on these enormous digital datasets and are celebrated as a tool that could transform culture, business, and knowledge itself. Therefore, the question that remains unanswered is whether AI training is simply an innovation or an act of infringement. 

The contemporary relevance of the same is reflected in the pending litigation before the Delhi High Court in ANI Media Pvt. Ltd. v. OpenAI (“OpenAI Case”), where the use of copyrighted news content for training LLMs has been directly challenged. While no judicial determination has been made so far, the dispute foregrounds unresolved questions at the intersection of copyright and artificial intelligence, including the legal characterisation of storing copyrighted works for model training, and the scope of Section 52 to accommodate such uses within its ‘fair dealing’ framework. 

This analysis does not attempt to engage with the merits of the pending litigation; it draws on it to argue that the process of training an AI model should be treated as a separate, non-expressive use, distinct from the outputs it generates, and fair dealing must be interpreted in light of this functional distinction.

Section 14 of the Copyright Act, 1957 (“the Act”) grants authors exclusive rights of reproduction, including storage in any medium by electronic means. On the other hand, Section 52(1)(a)(i) creates an important exception: fair dealing for “research”. The challenge lies in the interpretation. 

The act of scraping and storing copyrighted material to train AI models may prima facie appear to be copyright infringement. Scholars suggest that AI training on copyrighted works without consent under the fair dealing exception could give AI a kind of blanket immunity that could undermine authors’ rights. While these concerns are valid, they overlook a crucial distinction, i.e. the process of training a model, which includes collection, storage and processing of data, is technical and non-expressive in nature and does not in itself reproduce the original work. Expression arises in the outputs generated by the model. 

Recognising this distinction allows Section 52 to accommodate AI training without weakening the protections the Act grants to authors. However, a detailed inquiry is necessary to understand how the rigid structure of Section 52 can fit in with advancements in technologies without legislative intervention. The scope of the analysis is, firstly, the question of whether the act of scraping and storing data to train an AI model amounts to copyright infringement. Secondly, the a need to protect it under the current copyright regime to fulfil the aim of the Act, which is to foster innovation and creativity. The courts may be required to address this issue through a progressively interpretive approach.

Act II: The Old Tests Falter

Indian courts have traditionally looked for “substantial” copying of protectable expression. In R.G. Anand v. Deluxe Films, the Supreme Court held that only such copying amounts to infringement. Later, in Eastern Book Co. v. D.B. Modak, the Court rejected the “sweat of the brow” approach and required “skill and judgment with a modicum of creativity” for originality. However, this standard presumes a human author. An AI model that trains on massive corpora does not exercise judgment or creativity; it processes data statistically. 

Further, India’s statutory “fair dealing” exceptions carved out in Section 52 of the Act, like the rest of the exemptions to infringement provided under the section, are similarly ill-suited to AI training. Although they cover personal use, research, criticism, review, and reporting, they mention nothing about AI training or new technological advancements. While the Act does not define “fairness”, the Kerala HC in Civic Chandran v. Ammini Amma laid down a three-factor test to determine the same. However, the test has to be interpreted in the context of AI training.

Taken together, these doctrines leave the training of LLM models in a legal limbo. This seeks a progressive interpretation of Section 52 of the Act to include training of LLM models in order to account for the recent technological advancements. 

Act III: Section 52 Takes the Stage

Section 52 of the Act enumerates an exhaustive and rigid list of acts that would not constitute infringement under the Act. While Section 52 of the Indian Copyright Act is exhaustive, its rigidity is sometimes diluted through judicial interpretation. In contrast, Section 107 of the US Copyright Act is designed to provide broad statutory flexibility to accommodate technological advancements. Because the US model relies on inherent legislative open-endedness, whereas the Indian model relies on court-led adaptation, placing reliance on US jurisprudence risks steering the interpretation of Section 52 off course. Hence, the following analysis of Section 52 primarily relies on Indian jurisprudence, with only minimal reference to foreign authorities for comparative insights. 

Section 52(1)(a)(i) of the Act outlines a statutory exemption for the fair dealing of any work, excluding a computer program, for private or personal use for research purposes. However, the Act lacks a definition of the term “research”. While in the common notion, research is a human activity requiring cognitive analysis, such a narrow interpretation is at odds with the interpretation given by the Apex Court. As established in State of Maharashtra v. Dr. Praful B. Desai, statutes must be interpreted as “always speaking” to accommodate advancements that were unforeseen at the time of drafting. In this light, the term research in Section 52(1)(a)(i) should not be restricted to biological cognition but should include the functional extraction of patterns, syntax, and semantics. 

An LLM’s processing of datasets into numerical tokens is the technological equivalent of a researcher internalising a library to identify linguistic structures. Another aspect of research is that the commercial motivations behind LLM development do not automatically disqualify the training phase from being classified as research. Indian jurisprudence, particularly in Chancellor Masters v. Narendera Publishing House, has clarified that commerciality is not an absolute bar to fair dealing if the use is highly transformative.

Therefore, the term “research” should be interpreted in an expansive light, and not narrowly, where it is restricted to only the common understanding of research. An LLM does research by scraping and storing the work to internalise patterns such as structure, syntax, and semantics of language in tokens and not in the original format. The storage of a single copy thus functions merely as a technical necessity to convert them into tokens that enable the model to identify elements like grammar, sentence construction, and word associations. 

Furthermore, Section 52(1)(b) and (c) specifically provide an exemption for transient storage with the condition that it shall not be expressly prohibited by the right holder. In MySpace Inc. v. Super Cassettes Industries Ltd., the Delhi High Court clarified that for storage to be considered “transient” and thus non-infringing, it must satisfy two distinct criteria: it must be temporary, and it must be subordinate to a process of greater significance. Training an LLM meets this dual-test. Firstly, the raw data is held only as a technical prerequisite for conversion into tokens; the original expressive format is no longer required for the model’s functioning. Secondly, the reading performed by the LLM is not the final act of consumption but is subordinate to the greater process of training. At the same time, Section 65A of the Act also prohibits accessing work which has been protected by a technological measure, like behind a paywall, or access-control codes, etc. Hence, the training of LLM may be restricted only to the work which is freely or readily available in public domains.

Another critical aspect of Section 52(1)(a)(i) is the inclusion of “private or personal use” in the section. The use of LLM models to generate outputs may appear as commercial use. However, where a work is used for a transformative purpose, courts have interpreted Section 52 in a manner that carves out an exception to infringement. While the Copyright Act does not define fairness or transformative use, Indian courts, specifically in cases like Civic Chandran v. Ammini Amma, have developed a de facto three-factor test that closely resembles several elements of the US fair use analysis

Since the bare text of the Act in terms of fair dealing is narrower than the US fair use standard, courts remain divided on whether the said standard can meaningfully guide interpretation. Some courts read Section 52 narrowly, limiting fair dealing to uses that strictly match the statutory purposes, like research or criticism. Others have treated fair dealing more flexibly. 

The Civic Chandran case shows that Indian courts can adopt a purposive, three-factor approach that mirrors some elements of the US test. For training LLMs, this wider reading is necessary. Sticking to a narrow reading goes beyond principles of interpretation, as argued above. The following analysis interprets the three-factor test as laid down in the Civic Chandran case in the light of training LLM models.

Act IV: Rewriting the Three-Factor Test

Factor 1: The Purpose and Character of the Use

This first factor is primarily concerned with how and why the work was used. The copyrighted work, when used for training an LLM model, is fundamentally non-infringing and transformative. A large amount of data or copyrighted work is scraped from public domains to convert them into tokens and train the model to predict the next word. 

This aligns with the transformative purpose of the test, where the copyrighted work is merely accessed to make a large data set and is not expressed in its original form. In the case of University of Cambridge v. B. D. Bhandari, the Delhi High Court ruled that creating a guidebook from a textbook was a “substantially different purpose” that did not compete with the textbook’s expressive purpose. Similarly, the purpose of AI training is far more transformative than creating a guidebook, as it creates human-readable words into a set of numerical data points or tokens for building associations. While the expression of work in an output for a user’s prompt may be with a point of commercial interest, courts have acknowledged that commercial use is not an absolute bar to fair dealing.

Factor 2: The Amount and Substantiality of the Portion Used

The factor of amount and substantiality remains a challenge to include training of LLM models under the ambit of the fair dealing doctrine, as the entire work of the author is utilised to train such models. However, the quantum cannot be assessed in a vacuum with the current standard. Conducting research and training an LLM model require a massive amount of datasets. This was emphasised by the courts in the US, while dealing with the Fair Use doctrine in the case of Authors Guild v. Google (hereinafter “Authors Guild”), which permitted the scanning and scraping of entire books by machine learning algorithms for them to be indexed on Google Search. The reliance on US precedents like the Authors Guild is not an attempt to use the open-ended Section 107 framework in India’s exhaustive Section 52.  It is rather to apply a purposive interpretation that aligns with established Indian jurisprudence. 

The Courts in India have specifically de-linked the fairness of a dealing from the literal limitations of the statute, holding that if an act is necessary to achieve a transformative purpose, it fulfils the fairness requirement regardless of whether the entire work is utilised. By this, the reasoning in the Authors Guild serves as a persuasive parallel: it identifies that scraping entire works for non-expressive indexing is a technical necessity for research that does not substitute the author’s market. Therefore, citing US jurisprudence in this context is necessary to define the scope of necessity within India’s own “Fair Dealing” doctrine. 

Further, in the case of Oxford University v. Rameshwari Photocopy Services, the Court implicitly permitted the use of a substantial part of the copyrighted work to create guidebooks. The principle in the case was that the amount of copyrighted work used should be reasonable and necessary in the context of fair dealing. This is important, especially for training LLMs; the work is necessary in its entirety to create large datasets and make the model efficient. Therefore, the amount and substantiality of the portion used should be interpreted in a contextual understanding of the training of LLM models. 

Factor 3: The Effect of the Use on the Potential Market

The effect on the potential market remains a core factor to assess the training of LLMs under the fair dealing doctrine. The primary concern of copyright law is to protect the economic interests of the authors. Hence, analysing the potential impact on the market is necessary to build a case for allowing the training of LLM models under the Fair Dealing doctrine. The mere act of scraping data and using it for training has no effect on the market, as the work is not sold or made available to a human audience in its original human-readable format. The assessment of market harm must be read alongside the statutory limits already imposed on access. As noted earlier, Section 65A of the Act prohibits the circumvention of technological protection measures, including paywalls, essentially confining LLM training to works that are freely or readily available in the public domain. The act of scraping and processing this data for training does not cause any impairment in the market for copyrighted works, since paywalled or access-controlled content is excluded at the threshold. So, training is a technical process and does not by itself create a complete creative product, which otherwise may have infringed upon the author’s rights. 

The potential market harm, as claimed by The New York Times, in the case of The New York Times v. OpenAI, refers to the harm of making the content available to users upon a prompt, when the content in question is hidden behind a paywall. However, this is a separate legal event which relates to the output and not the training of LLM Models. The law addressing the potential infringement at the stage of output, where actual market harm occurs, will surely be necessary. However, this shall not affect the foundational process of training. The training process itself does not harm the market for the original works. This satisfies this core test of fair dealing.

Act V: The Curtain Call

The analysis of the three-factor test is intended to shape the understanding of training LLMs in India. Whether training AI models can be accommodated under copyright regimes remains an unresolved question across various jurisdictions. Nevertheless, there has been a rise in cases where private entities or individuals have challenged the use of their hosted data for training LLMs.  While the question of whether AI models fit within existing copyright regimes remains unresolved globally, most notably the pending OpenAI case before the Delhi High Court, it demands a clear judicial roadmap.

The outcome of the OpenAI dispute may hinge on whether the Court views the use of news content as a singular act of exploitation or as two distinct legal events. By applying the analysis, the Court could find that the initial scraping and tokenisation of ANI’s news archive for training is transformative and non-expressive, thus falling under the “research” exception of Section 52(1)(a)(i). 

Indian Courts should therefore interpret Fair Dealing progressively, which can safeguard the rights of authors and ensure that copyright law does not stifle innovation. The law must evolve to preserve its core purpose: to promote creativity and progress, not to hinder technological advancement. Legislative intervention may take time and political will; it will fall upon the courts to adopt a progressive interpretation.

*The authors are Year III students at MNLU Mumbai.

Panel Discussion on AI Governance in the Global South : India AI Pre-Summit Event February 19, 2026