OpenAI made waves in 2021 when they announced DALL-E, a text-to-image generative AI tool that gave select beta participants the ability to generate images in real time. The results were crude, visually distinct as AI-generated, and certainly needed more time. But despite the quality of the images, there were hopes that the model could be refined. For many, this first generation of DALL-E was like a toddler first making human figures. No one expected perfection, but to be able to see so clearly the silhouette of the intended subject completely generated by a computer was inspiring.
Just yesterday, OpenAI unveiled the new model they dubbed “Sora,” which is capable of generating video clips from text input. Currently, only a small group of testers has access to Sora while they determine where the safety limitations should be. From the examples OpenAI has shared, some of these videos already can pass as real footage. In particular, the footage where the subject is a location, animal, or object. Take a look at the example below:
The prompt given to generate this 20-second video is “A litter of golden retriever puppies playing in the snow. Their heads pop out of the snow, covered in.” If you’ve ever used a generative AI to create an image before, you’ll know that shorter prompts tend to give strange results and that verbose prompts with specific imagery to generate will tend to be closer to the image you have in your head. But as impressive as this video is, there are still a handful of tells for this first iteration of the tool. The physics of the snow still has an uncanny feeling to it as it looks to move on its own in some instances.
However, I am not viewing these videos in the way they are usually intended. I am viewing these with the intent of finding flaws in their presentation because I opened them knowing fully that these are AI-generated videos. I would assume that once the tool is fully released and these clips are used sparingly as stock video, most people would have trouble determining if it’s AI-generated. Even now with ChatGPT being released just over a year ago, people are having a harder time determining if text is AI-generated and the detection tools that are available fall short of being reliable.
While earlier generations of AI-generated content were more obvious to the casual viewer who happened to come across them without the context of them being AI-generated, I don’t have the same expectations going forward. With it being an election year in the US, and with the rise of AI-generated political misinformation, there are serious concerns about the ethical use of AI generation for OpenAI to consider before releasing this tool to the public. There is already precedent for using AI to swing elections, but will AI regulation be capable of reigning it in, or will any legislation be too little, too late?
OpenAI’s Sora release can be found here, and the technical report can be found here.