During discussions of generative AI, we increasingly hear concerns being raised about copyright infringement. Indeed, a number of lawsuits have been brought over the last year regarding AI and copyright law. What’s this all about?

The nutshell is that generative AI, such as ChatGPT and Dall-E, doesn’t know anything by itself. It draws from the vast world of millions, even hundreds of millions, of data points and “learns” based what it finds. For example, the blog post you generated using ChatGPT didn’t come out of thin air. It’s the output of the AI model’s algorithm after having analyzed potentially millions of pieces of content, many of which may be copyrighted.

The same issue arises when you enter a prompt for an AI-generated image. When you generate a picture of a moody teenager in a crocheted hat scrolling on her phone, the AI platform is generating that image based on its understanding of the concept of “moody”; its analysis of millions of images of teenagers and determining which ones are and aren’t moody; analyzing the details of millions of facial expressions, hairstyles, outfits, and body language, alongside millions of crocheted hats; and taking all of those inputs (and more) and generating an image based on them.

So is that image of the scrolling, moody teenager with a crocheted hat truly unique? Or is it one one-millionth of every image AI uses as inspiration? Even if it is the latter, is that considered fair use?

What Does the U.S. Copyright Act Say?

According to the U.S. Copyright Act of 1976, there are four factors taken into consideration when determining whether fair use of a work, whether text or image:

  • The nature of the copyrighted work.
  • The purpose and character of the use of the copyrighted work.
  • The portion of the work used.
  • The impact of the market value of the original work.

For example, writers can quote a portion of a work as long as it is for editorial use and properly cited. But they can’t take a chapter from someone else’s novel and pass it off as their own. Likewise, artists can use someone else’s work as inspiration as long as it is changed enough to be considered truly original.

How that applies to the world of AI is debated. While it might seem common sense that content created based on millions of inputs is “fair use,” as noted by the law firm Baker Donelson, the courts may not see it that way: “As to the first factor, generative AI platforms are generally offered for a commercial purpose. Second, generative AI systems have the capacity to produce new works that closely resemble the originals. Third, generative AI technologies are trained with the whole of the copyrighted work. And, fourth, AI has the potential to generate an effective substitute for copyrighted work in the marketplace.”

Millions of Data Points

To quote the movie, “It’s complicated.” Certainly, the U.S. Copyright Act was passed in a world in which comparisons were between a generated work and a single original work, not millions of them. The sheer scale and scope of today’s datasets (on which the AI models are drawing) are mind-boggling.

The U.S. Copyright Office has launched a comprehensive review of the copyright system with a focus on generative AI, but it’s the government. Don’t hold your breath. In the meantime, companies like OpenAI have engaged in licensing deals with content owners (e.g., News Corp., Vox Media) to obtain permission to use their works for training. Just in case.

But is that overkill? It seems common sense that using copyrighted material to train AI falls under “fair use.” Especially when the AI model is analyzing millions of data points. But on narrower topics, the number of data points are much fewer. In these cases, it’s not unheard of for the AI output to look (or sound) an awful lot like the originals.

Can You Hold an Algorithm Responsible?

Even if you could prove that the works were similar, however, how would you establish intentional infringement? Simply because the output is similar? Even if you could prove that the content was drawn from another creator’s work, who should be held responsible?

Notes the legal firm Crawford & Shultz: “When a generative AI system produces infringing content, be it an image of Mickey Mouse or Pikachu, courts will struggle with the question of who is initiating the copying. The AI researchers who gathered the training dataset? The company that trained the model? The user who prompted the model?”

Here’s another issue: Current U.S. copyright law requires human authorship for a work to be copyrightable. So if you can’t copyright AI works, then how can AI works be “responsible” for infringement?

If you’re thinking, “It’s a real mess,” you’re right. But there is a certain amount of common sense here, too. When someone creates a blog post using ChatGPT, and they publish it without having added their own thoughts or tweaked the copy in any substantial way, do they really think they should claim it as their own? Likewise, if someone creates an image of uncanny likeness to Lamar Jackson, NFL quarterback for the Baltimore Ravens, do they really think it’s a good idea to put it on a T-shirt?

Whether these legal cases get resolved quickly or not, it’s always best practice to stay away from potential copyright lines as much as possible. With art, it’s a little more gray. But for text content, there is a simple solution. Throw any AI-generated content into a plagiarism and AI checker. If your content doesn’t come back 0% AI or 100% plagiarism-free, you might want to tweak it a little bit more.