OpenAI wants to make DALL-E secure – and encounters an unexpected side effect

Image: OpenAI

The article can only be displayed with activated JavaScript. Please enable JavaScript in your browser and reload the page.

OpenAI’s DALL-E 2 relies on a whole range of security measures to stop potential misuse. Now OpenAI gives a deep insight into the training process.

In April, OpenAI gave the first insights into DALL-E 2, the company’s new image-generating AI model. Since then, a closed beta test has been running with impressive results. They raise questions about the role of DALL-E 2 in the future of creative work or make photographers fear the death of photography.

A central goal of the closed beta phase is to prepare artificial intelligence for use as a freely available product. To this end, OpenAI wants to ensure that DALL-E 2 in particular does not generate any violent or sexual content. So far, DALL-E 2 has shown itself to be fairly compliant.

To this end, the company has taken a number of measures such as input and upload filters for the system’s input mask, restrictions on the number of images that can be generated simultaneously, a comprehensive content policy and active control of generated content, including human reviews of questionable content.

OpenAI automatically filters the training data

Apart from these measures, OpenAI focuses on one Mitigating potentially dangerous content in the training dataset. For the training of DALL-E 2, OpenAI collected hundreds of millions of images and associated labels on the Internet. The automatically collected data set therefore contained numerous images with undesired content.

To identify and remove this content, OpenAI uses a semi-automated process: a neural network for image classification is trained with a few hundred images that have been manually classified as problematic. Another algorithm then uses this classifier to find some images in the main data set that could improve the classifier’s performance. These images are then processed by humans and – if appropriate – used for further training of the classifier. This process is performed for several specialized classifiers.

OpenAI trains classifiers with human-labeled data. A learning algorithm helps to filter the data. | Image: OpenAI

The trained classifier can then automatically filter problematic images from hundreds of millions of images. I have that Filtering out problematic data takes precedence over preserving non-problematic data, writes OpenAI. It is much easier to refine a model later with more data than to make the model forget something it has already learned.

Due to the very careful filtering process, approx discarded five percent of the entire training data set, including numerous images that do not show problematic content, the company said. Better classifiers could recover some of this lost data in the future and improve the performance of DALL-E 2 even more.

To test the efficiency of their approach, OpenAI trained two GLIDE models, one filtered and one unfiltered. GLIDE is a direct predecessor of DALL-E 2. As expected, the filtered model generated significantly less sexual and violent content.


Data filter increases bias in the AI ​​model

The successful one However, the filtering process has an unexpected side effect: It creates or reinforces the bias of the model towards certain demographic groups. This bias is also such a big challenge, but the actually positive filter process makes the problem even worse, according to OpenAI.

As an example, the company cites the input “a CEO”: The unfiltered model tends to generate more images of males than females – much of this bias is due to the training data. But with the filtered model, this effect was amplified – it showed almost exclusively images of men. Compared to the unfiltered model, the frequency of the word “woman” in the data set is reduced by 14 percent, while that for “man” is only six percent.

There are presumably two reasons for this: Despite roughly the same representation of men and women in the original dataset, it may contain women more often in sexualized contexts. The classifiers therefore remove more images of women, thereby increasing the imbalance. Additionally, the classifiers themselves might be skewed by certain class definitions or implementation, removing more images of women.

OpenAI fixes bias by re-weighting the training data

However, the OpenAI team was able to significantly reduce this effect: the remaining training data for the model was re-weighted, for example by having the less common images of women have a stronger influence on the training of the model. For tested words such as “woman” and “man”, the frequency values ​​dropped to around one and minus one percent instead of 14 and six percent.

A simplified representation of how rebalancing training data can offset the bias effects of the filtering process. | Image: OpenAI

In a blog post, OpenAI also shows that Sometimes memorize models like GLIDE and DALL-E 2, i.e. reproduce training images instead of creating new images. The company identified images that are frequently repeated in the training data set as the cause. The problem can be solved by removing visually similar images.

Next, OpenAI wants to further improve the filters for training, further fight the bias in DALL-E 2 and better understand the observed effect of memorization.

#OpenAI #DALLE #secure #encounters #unexpected #side #effect

Leave a Comment

Your email address will not be published.