Key Takeaways ChatGPT has exhibited overfitting tendencies, leading to potential plagiarizing issues.

OpenAI is being sued by The Times due to GPT-4 regurgitating articles verbatim.

The answer to whether ChatGPT plagiarizes is complex, as it may unintentionally lift text without permission.

ChatGPT has been making waves over the last year for a number of reasons, particularly in educational and even work-related contexts. It can be used to draft documents, write essays, make plans, and so much more. However, if you're worried about whether ChatGPT can plagiarize, there's no clear answer. If you need a short, cautious answer, then the answer is yes. If you need a more nuanced answer, then the answer is... it's complicated.

Related Can ChatGPT be detected? ChatGPT is changing industries and reshaping education, but can it be detected? If so, how? The answer is complicated.

ChatGPT has already been proven to plagiarize

In AI, we call this "overfitting"

Overfitting in AI, especially in relation to Large Language Models (LLMs) like ChatGPT, occurs when a model is trained too closely to the specifics of its training data. Think of it like a student who memorizes facts for an exam rather than understanding the concepts; they might do well on that specific test but struggle to apply the knowledge to different questions. Similarly, an overfitted LLM learns the training data's patterns and noise so well that it becomes great at predicting or generating responses for similar data but performs poorly on new, unseen data. This happens because the model has essentially memorized the training data, including its quirks and anomalies, rather than learning the underlying structures and generalizable knowledge.

ChatGPT has already been demonstrated to have some overfitting tendencies, which is why OpenAI is currently being sued by The Times. In a lawsuit filed in the Federal District Court in Manhattan, The Times demonstrated how GPT-4 could be prompted to regurgitate entire articles near-verbatim, clearly demonstrating that these articles were in its training data without permission. In this case, if you were to write an essay and use the response given by ChatGPT in your essay, you would be plagiarizing an already-existing document on the internet without realizing.

As such, the answer to whether ChatGPT plagiarizes is a difficult one to give. It's not that ChatGPT plagiarizes intentionally, but in some severe cases like these it would cause a student or someone else to be caught out for it. OpenAI is now aware of these tendencies and has taken steps to prevent it from happening, but that doesn't mean that it won't do it again. If you're someone who relies on ChatGPT not plagiarizing, there's nothing that can really guarantee it for you. At some stage it's very likely to lift text directly from somewhere else, as if it can copy entire articles from The New York Times, it can copy anywhere else, too.

For what it's worth, ChatGPT isn't likely to be the only LLM that falls victim to overfitting, it's just the first and most high-profile case. It's very likely that we'll see similar issues crop up with Copilot and Gemini at some point, but until then, if you're really worried, then you're better off using one of those instead.