Adobe is facing a proposed class-action lawsuit alleging the company used pirated books to train one of its artificial intelligence models. The suit, filed on behalf of Oregon-based author Elizabeth Lyon, claims Adobe relied on unauthorized copies of copyrighted works to develop SlimLM, a small language model designed for document assistance tasks on mobile devices.
According to the complaint, SlimLM was pre-trained on SlimPajama-627B, an open-source dataset released by Cerebras in 2023. The lawsuit alleges that SlimPajama is a derivative of the RedPajama dataset, which incorporated Books3, a large collection of roughly 191,000 copyrighted books. Lyon claims her own works were included in this dataset without consent, credit, or compensation.
Books3 and RedPajama have been cited repeatedly in lawsuits targeting AI developers. Similar claims have been brought against Apple and Salesforce over alleged use of copyrighted material in training generative AI systems. The case against Adobe was first reported by Reuters.
The Adobe case follows a wave of high-profile copyright disputes involving AI developers and content owners. Disney has accused Google of training and deploying AI models that generate infringing images of its characters, while The New York Times has sued AI search startup Perplexity over alleged reproduction of copyrighted articles through retrieval-augmented generation. In the music industry, Warner Music Group recently settled a copyright lawsuit with AI startup Suno, signaling a shift toward licensing frameworks as companies seek to reduce legal risk around AI training and outputs.