OpenAI recently introduced a new generative AI system called Sora, which creates short videos from text prompts using a “diffusion transformer model”. Although Sora is not yet available to the public, the sample outputs published by OpenAI have received both positive and negative reactions due to their high quality. These outputs include videos created from prompts such as “photorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee” and “historical footage of California during the gold rush”.
Sora uses a combination of text and image generating tools to create videos with high-quality textures, dynamics of scenes, camera movements, and consistency. The transformer architecture handles how frames relate to each other, and tokens representing small patches of space and time are used instead of tokens representing text.
Read More News: Hackers are now using high-tech AI to commit fraud
Sora is not the first model to generate videos from text prompts. Other models include Emu by Meta, Gen-2 by Runway, Stable Video Diffusion by Stability AI, and Lumiere by Google. Lumiere was released just a few weeks ago and claimed to produce better video than its predecessors; however, Sora appears to be more powerful in various respects. Sora can generate videos with a resolution of up to 1920 × 1080 pixels, in different aspect ratios, and in lengths of up to 60 seconds. It can also create videos composed of multiple shots and perform video-editing tasks.
Sora has promising applications in entertainment, advertising, education, and prototyping. However, concerns have been raised about its societal and ethical impact. The ability to generate realistic video from text prompts could be used to spread fake news, influence elections, or burden the justice system with potential fake evidence. Video generators may also enable direct threats to targeted individuals, particularly through deepfakes.
Despite these concerns, Sora and other video generators have the potential to become capable simulators of the physical and digital world, with applications in scientific experiments and simulations.
OpenAI says it is “taking several important safety steps” before making Sora available to the public, including working with experts in “misinformation, hateful content, and bias” and “building tools to help detect misleading content”.
This news is a creative derivative product from articles published in famous peer-reviewed journals and Govt reports:
References:
1.Lonni Besançon, & Vahid Pooryousef. (2024, February 20). What is Sora? A new generative AI tool could transform video production and amplify disinformation risks. The Conversation. https://theconversation.com/what-is-sora-a-new-generative-ai-tool-could-transform-video-production-and-amplify-disinformation-risks-223850
2. Kim, J., Lee, Y., & Moon, J. (2023). T2V2T: Text-to-Video-to-Text Fusion for Text-to-Video Retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 5612-5617).
3. Zhou, P., Wang, L., Liu, Z., Hao, Y., Hui, P., Tarkoma, S., & Kangasharju, J. A Survey on Generative AI and LLM for Video Generation, Understanding, and Streaming.
4. Singh, A. (2023, May). A Survey of AI Text-to-Image and AI Text-to-Video Generators. In 2023 4th International Conference on Artificial Intelligence, Robotics and Control (AIRC) (pp. 32-36). IEEE.