OpenAI, which introduced the world to ChatGPT, has rolled out Sora Turbo, an AI video generation tool for creating, editing, and remixing videos through the use of text prompts, images, and more. The video can be created in a range of styles, resolutions, lengths, and aspect ratios. Sora Turbo, the commercially available version of the tool, is free to ChatGPT users.
First, AI intrigued us with words, and then still images, and now it’s video’s turn in the spotlight. While there are AI video generators available, OpenAI’s Sora, which was exhibited in February of this year, generated excitement. Now, after various internal updates and modifications, an updated version of the Sora video generator tool has been released to the public as a stand-alone product for ChatGPT Plus and Pro users. Referred to as Sora Turbo, it is more advanced and much faster than the earlier preview models (hence the name), according to OpenAI.
“We find that video models exhibit a number of interesting emergent capabilities when trained at scale. These capabilities enable Sora to simulate some aspects of people, animals, and environments from the physical world. These properties emerge without any explicit inductive biases for 3D, objects, etc.—they are purely phenomena of scale,” the company’s website states.
Sora works in a similar manner to text-generating AI foundation models. Users simply type a text string into the prompt, and Sora will generate a video. The more detailed the text description, the more detailed and specific the video is to the input. However, users can generate videos using images and existing videos as well.
The videos that are generated can range from the realistic to the stylistic. If the user is looking for a particular style, Sora Turbo is able to provide that—ranging, for example, from a Disney-like animation to a 35mm-style cinematic clip. Furthermore, Sora Turbo can generate videos with dynamic camera motion, so as the camera shifts and rotates, people and scene elements move consistently through the 3D space.
Sora Turbo can also create a video from scratch or extend an existing video clip. As a result, Sora Turbo can perform a range of image and video editing tasks—looping video, animating static images, extending videos forward or backward in time, and so forth. It can generate multiple shots within a single video and gradually interpolate between two input videos to create seamless transitions between videos with different subjects and scene compositions. It can additionally generate videos when provided an image along with a prompt, even if the image is based on Dall-E 2 and Dall-E 3 images.
According to OpenAI, Sora Turbo can generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. It says the model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.
The text-based generative AI video tool can produce videos of variable durations, resolutions, and aspect ratios (widescreen, vertical, or square). OpenAI says Sora Turbo is capable of generating up to a minute of high-fidelity video. However, for the time being, those factors are limited based on the user’s plan, with the most encompassing Pro product plan allowing for videos up to 1080p in resolution and a maximum of 20 seconds in length.
Although Sora’s superpower is generating videos, it can also create images. OpenAI explains that it accomplishes this by arranging patches of Gaussian noise in a spatial grid with a temporal extent of one frame. The model can generate images of variable sizes—up to 2048´2048 resolution.
While Sora Turbo is a marked improvement over the preview versions, it is not without issues, especially when it comes to limitations as a simulator. OpenAI admits that it does not accurately model the physics of many basic interactions, like glass shattering, and certain interactions, like eating food, do not always yield correct changes in an object’s state. Also, OpenAI says incoherencies develop in long-duration samples or spontaneous objects can appear.
“We believe the capabilities Sora has today demonstrate that continued scaling of video models is a promising path toward the development of capable simulators of the physical and digital world, and the objects, animals, and people that live within them,” the company states.
Sora Turbo is available to ChatGPT customers through their current plan at no additional cost. For $20 per month, ChatGPT Plus includes up to 50 priority videos (1,000 credits) at up to 720p resolution and 5 seconds in duration. ChatGPT Pro, at $200 per month, includes unlimited generations at the highest resolution, with up to 500 priority videos (10,000 credits) up to 1080p resolution and 20 seconds in duration. OpenAI says it is working on additional pricing, which will be available early next year.
Meanwhile, the company says it is focused on making the Sora model faster and more affordable. It also says it is working on developing safeguards to ensure that the product is used responsibly as this field develops. Until then, all Sora Turbo-generated videos will have C2PA metadata identifying a video as coming from Sora. Visible watermarks have been added by default, and the company says it is blocking particular forms of abusive and sexually exploitive generations.