Don’t Let Bad Lip Sync Break the Spell

Bioshock Infinite Elizabeth
Your helicopter touches down on the deck. You’ve arrived at the outskirts of a magnificent city of the future – gleaming skyscrapers pierce distant clouds, flying robots of all kinds swerve in airborne traffic patterns, and a sunset so real you can’t believe it coats the entire scene in a rosy glow.

A policeman in ultra-sleek SWAT gear approaches to tell you the details of your new assignment here in Next Generation Consoleberg – but all you can focus on is his mouth, his lips moving up and down like some kind of animatronic fish. In an instant you’re pulled out of the game, and the spell of immersion is broken.

Like most technical advancements in videogames, lip sync has been in a continual state of evolution. Yet even as overall graphic fidelity has evened out in the current generation, the quality of character lip sync still varies significantly between studios and games. The problem may seem minor, but is deceptively substantial: As games’ graphical quality increases, so do consumer standards, making even slightly off lip sync orders of magnitude more noticeable and jarring.

Bioshock Infinite
Elizabeth’s disjointed lip sync dampers the immersion in Bioshock Infinite.

Take Bioshock Infinite. Irrational’s luscious, sprawling world is so dense with architectural flourishes and environmental storytelling that whenever Elizabeth opens her mouth to initiate the Player/Companion AI Bonding Process ™, her mediocre lip sync actively detracts from the fantastic world the developers took such pains to create.

In general, the trend seems to be that unless the developer undertook targeted, laborious efforts to address lip sync and avoid these disruptions, the results have a higher chance of lowering the overall quality of the illusion.

But if only the studios who pour time, money and energy into lip sync and performance capture at the level of Quantic Dream with Heavy Rain and Team Bondi with L.A. Noire can secure performances that live up to the standard, should anyone else even bother?

Fortunately, there are plenty of ways to achieve compelling results without turning your studio into an animation house. The key lies in emphasizing your studio’s strengths instead of overreaching your capabilities. Some examples of how games have handled this challenge:

Place dialog sequences strategically. Silent Hill 2 is regarded as a paragon of complex, effective storytelling, yet technical barriers at the time prevented it from achieving realistic lip sync. The developers were nevertheless able to craft a fantastic experience by working around these restrictions whenever possible.

Silent Hill 2 Intro
Silent Hill 2‘s introductory sequence smartly works around limitations with lip sync.

In the first few minutes of the game, the player’s character James stares at himself in a mirror, rubs a hand over his face, and sighs wordlessly – all in close-up. Only after the game pulls back to a wide, top-down perspective does he begin a longer voiceover that sets the stage for the game.

Use establishing cinematics to plant representations of characters in the player’s mind. The first Silent Hill had even more technical obstacles with the PlayStation 1 hardware, but the developers elegantly circumvented these with brief CG cinematics introducing each character as they appeared – with zero voiceover.

Instead, the short snippets placed images of more fully-formed character models into the player’s head to use as a reference when the game reverted back to the more simplistic, PS1-powered in-game models.

The first Bioshock adopted a similar strategy. By partially obscuring the first few characters the player interacted with, the developers painted an incomplete picture for the player to finish, giving no opportunity for merely adequate lip sync to taint the illusion.

“A lot of the things that we can crunch numbers on in a simulation, we do that on the computer. But a lot of other things that the computer is not well-suited for, we actually run that in the player’s imagination.” –Will Wright

Simulate in the player’s imagination, not the game. As Will Wright has famously said, there is much to be gained from choosing which parts of the experience to overtly include in the game and which parts to leave up to the player’s imagination.

For example, Thomas Was Alone nudges players to work with the game in creating realistic personalities for its crude square and rectangle-shaped characters by leveraging humanity’s innate tendency to anthropomorphize when given the slightest cue.

Thomas Was Alone
Thomas Was Alone relies on suggestions from the narrator to help turn shapes into memorable characters.

Of course, older, lesser technically-advanced games have used these techniques for years to draw players into a world. Establishing cinematics and animations, avatar portraits and plain ‘ol text has worked exceptionally well for RPGs and story-driven games long before 3D graphics became the standard.

Please Sync Responsibly

With a new batch of high-powered consoles on the way, developers will be eager to show off what they can do at the helm of the latest technology. Yet the closer the desired quality target for lip sync is to the studio’s capabilities, the better the results will be.

And for smaller studios, there is plenty of success to be had by playing to your strengths. As evidenced by Bastion, Braid and Thomas Was Alone, it’s entirely possible to create amazing, memorable experiences with nothing but text, maybe some voice, and the player’s imagination.

Tagged with: , , , ,