Context: Google DeepMind has just introduced Genie, a new model that can generate interactive video games from just a text or image prompt.
Genie
About: Itis a groundbreaking world model trained on internet-sourced videos, as stated in the official Google DeepMind blog post.
Unsupervised Learning: The research paper 'Genie: Generative Interactive Environments' highlights Genie as the first generative interactive environment trained unsupervisedly from unlabelled internet videos.
Technical Specifications: With 11B parameters, Genie comprises a spatiotemporal video tokenizer, autoregressive dynamics model, and a scalable latent action model.
Frame-by-frame Interaction: Genie operates in generated environments on a frame-by-frame basis, independent of training, labels, or specific requirements.
What Does Genie Do?
Generative AI for All: The research paper suggests that Genie is a revolutionary generative AI, allowing anyone, including children, to immerse themselves in generated worlds resembling human-designed environments. Genie can generate a diverse range of interactive and controllable environments despite being trained solely on video data.
Playable Environments from Images: In simpler terms, unlike traditional generative AI models that focus on language, images, or videos separately, Genie stands out by creating playable environments from a single image prompt.
Why is Genie Important?
Genie's standout feature lies in its ability to learn and replicate controls for in-game characters exclusively from internet videos.
This is significant as internet videos lack labels indicating the actions performed or specifying which part of the image should be controlled.