LAION-5B, an open source training set, was exposed to contain "child sexual abuse content", and Stable AI "urgently disconnected"

2024-05-30


Shulou( Report--, December 21 (Xinhua)-- Stanford University recently conducted a study of the open source model training data set LAION-5B and found that there were 3000 "suspected child sexual abuse content" in the data set, according to Bloomberg. LAION project maintainers urgently removed LAION-5B and claimed to have removed 1008 items of "solid relevant content."

▲ source Bloomberg (the same below) foreign media said that the LAION-5B training set has a total of 5.85 billion picture information, Stability AI has used the LAION data set to train its own AI model, so as to provide users with a "text map service."

However, noted that Stability AI quickly responded to foreign media, "although the Stable Diffusion model uses LAION-5B for training, it uses a 'screened and fine-tuned' training set version, so it 'will not affect the model output'."

Stanford University claimed that from the LAION-5B incident, it can be seen that there should be a large number of data sets related to inappropriate content in the industry, and the researchers called on model trainers to carefully select "necessary training data sets."

