AI trained on AI produces nonsense

Contents

AI models forget meaning the more they train on their own Preserving the original human text could prevent a collapse

Large language models like those from OpenAI and Google require enormous amounts of training data to run. The latest versions of these models have already scoured much of the existing Internet, leading some to fear that there may not be enough new data left to train future iterations. Some prominent voices in the industry, such as Meta CEO Mark Zuckerberg proposed a solution to that data dilemma: Simply train new AI systems on old AI outputs.

But new research suggests that cannibalizing the results of previous models would quickly result in a series of babbling AI gibberish and could ultimately lead to what’s being called “model collapse.” In one example, researchers fed an AI a benign paragraph about church architecture, but it quickly degraded over the generations. The final, most “advanced” model simply repeated the phrase “black@tailed jackrabbits” continuously.

An investigation published in Nature this week, that AI-trained-on-AI scenario was put to the test. The researchers created their own language model with which they initially fed original, human-generated text. They then created nine more generations of models, each trained on the text output generated by the model before it. The end result in the last generation was an inessential, surreal-sounding gibberish that had essentially nothing to do with the original text. The researchers say that over time and successive generations, their model becomes “poisoned with its own projection of reality.”

AI models forget meaning the more they train on their own

The researchers call this strange case of AI seemingly imploding on itself “model collapse,” a degenerative process that can occur in early and late stages. On the early side, collapse starts to happen when AI models several generations removed from the original training data seemingly forget outliers or rarities in the original text. This means that the most likely outcomes are becoming more common. That would be a problem in the real world because it could result in a reduction of minority views or expressions. An LLM that shows signs of early collapse could present a version of reality that lacks diversity and suffers from overwhelming sameness.

In the later stages of the collapse, things get stranger. In recent generations, models trained on models have moved so far away from the original training data that they begin to forget important aspects of the initial training and lose the plot completely. It is at this stage that models start to generate completely meaningless gibberish. When this happens, the researchers say the model’s “random” self-cannibalization of its own previous results “causes irreversible defects in the resulting model.”

The researchers argue that this cascading effect and eventual model collapse are inevitable for large models trained on their own data. It is important to note that this study focuses specifically on language models and does not address what might happen if multimodal models such as image and video generators were trained on their own. This research also focuses on what should happen with model training on it own facts. It is unclear what exactly would happen if one model, for example from Meta, were to train on the output generated by OpenAI.

Preserving the original human text could prevent a collapse

The prospect of the model’s collapse in the real world is not an unthinkable hypothesis. Straight away, countless websites are active with articles and blog posts generated entirely by LLMs. In the race to build new models as quickly as possible, it’s not inconceivable that much of that AI-generated nonsense could eventually end up in training sets.

A possible solution to the inadvertent inclusion of AI-generated content in training sets would be to encourage a watermarking standard across all platforms that clearly indicates the authenticity of the content and whether or not it was produced by a machine. Google, Adobe and major tech players are trying to do just that with a special ‘content credential’ badge that they are trying to standardize as part of The Coalition for Content Provenance and Authenticity (C2PA).

But that would only apply to images. AI-generated text is also much more difficult to watermark or even accurately identify it using available detection software. A more realistic approach may require AI developers to closely examine material for signs of AI manipulation, and possibly pay reputable human sources for access to training based on their high-quality data. Without these safeguards for human training data, the internet risks being blown up by a wave of AI vomit. Nobody wants that.