GENIE (Syllabus: GS Paper 3 – Sci and Tech)

News-CRUX-10 28th February 2024

Download PDF (English)

Context: Google DeepMind has just introduced Genie, a new model that can generate interactive video games from just a text or image prompt.

Genie

About: Itis a groundbreaking world model trained on internet-sourced videos, as stated in the official Google DeepMind blog post.
Unsupervised Learning: The research paper 'Genie: Generative Interactive Environments' highlights Genie as the first generative interactive environment trained unsupervisedly from unlabelled internet videos.
Technical Specifications: With 11B parameters, Genie comprises a spatiotemporal video tokenizer, autoregressive dynamics model, and a scalable latent action model.
Frame-by-frame Interaction: Genie operates in generated environments on a frame-by-frame basis, independent of training, labels, or specific requirements.

What Does Genie Do?

Generative AI for All: The research paper suggests that Genie is a revolutionary generative AI, allowing anyone, including children, to immerse themselves in generated worlds resembling human-designed environments. Genie can generate a diverse range of interactive and controllable environments despite being trained solely on video data.
Playable Environments from Images: In simpler terms, unlike traditional generative AI models that focus on language, images, or videos separately, Genie stands out by creating playable environments from a single image prompt.

Why is Genie Important?

Genie's standout feature lies in its ability to learn and replicate controls for in-game characters exclusively from internet videos.
This is significant as internet videos lack labels indicating the actions performed or specifying which part of the image should be controlled.