Serving tech enthusiasts for over 25 years.
TechSpot means tech analysis and advice you can trust.
In a nutshell: Microsoft has demonstrated Quake II running on a generative AI model for real-time gaming called WHAMM. While the game has full controller support, it predictably runs at very low frame rates. Microsoft says the demo showcases the model’s potential rather than presenting a finished gaming product.
Microsoft’s World and Human Action MaskGIT Model, or WHAMM, builds on its earlier WHAM-1.6B version launched in February. Unlike its predecessor, this iteration introduces faster visual output using a MaskGIT-style architecture that generates image tokens in parallel. Moving away from the autoregressive method, which predicted tokens sequentially, WHAMM reduces latency and enables real-time image generation – an essential step toward smoother gameplay interactions.
The model’s training process also reflects substantial advancements. While WHAM-1.6B required seven years of gameplay data for training, developers only taught WHAMM on one week of curated Quake II gameplay. They achieved this efficiency by using data from professional game testers focusing on a single level. The GenAI’s visual output resolution also got a boost, going from 300 x 180 pixels to 640 x 360 pixels, resulting in improved image quality without significant changes to the underlying encoder-decoder architecture.

Despite these technological strides, WHAMM is far from perfect and remains more of a research experiment than a fully realized gaming solution. The model demonstrates an impressive ability to adapt to user input. Unfortunately, the model struggles with lag and graphical anomalies.
Players can perform basic actions such as shooting, jumping, crouching, and interacting with enemies. However, enemy interaction is notably flawed. Characters often appear fuzzy, and combat mechanics are inconsistent, with health-tracking and damage stat errors.

The limitations extend beyond combat mechanics. The model has a limited context length. The model forgets objects that leave the player’s view for longer than nine-tenths of a second. This drawback creates unusual gameplay quirks like teleportation or randomly spawning enemies when changing camera angles.
Additionally, the scope of WHAMM’s simulation is confined to a single level of Quake II. Attempting to progress beyond this point freezes image generation due to the lack of recorded data. Latency issues further detract from the experience when scaled for public use.
While engaging with WHAMM may be enjoyable as a novelty, Microsoft did not intend for it to replicate the original Quake II experience. Its AI developers were merely exploring machine-learning techniques they could use to create interactive media.
Microsoft’s team explored WHAMM’s possibilities amid broader discussions about AI’s role in creative industries. OpenAI recently faced backlash over its Ghibli-inspired AI creations, highlighting skepticism about whether AI can replicate human artistry.
Redmond has positioned WHAMM as an example of AI augmenting rather than replacing human creativity – a philosophy echoed by Nvidia’s ACE technology, which enhances lifelike NPCs in games like inZOI. While fully AI-generated games and movies remain elusive, innovations like WHAMM signal they could be right around the corner.
Looking ahead, Microsoft envisions new forms of interactive media enabled by generative models like WHAMM. The company hopes future iterations will address shortcomings while empowering game developers to craft immersive narratives enriched by AI-driven tools.
Discover more from Cave News Times
Subscribe to get the latest posts sent to your email.
Discussion about this post