by Axel Stockburger

1 Sound in Early Computer Games

2 The Expansion of the Audiovisual Repertoire in Games with Music and Spatial Representations

3 The Introduction of Storage Media (CD-ROM, Cartridges) for the Playback of Recorded Sound Material and Digital Sound Effects

4 New Functions of Sound-Image Coupling in Contemporary Computer Games

4.1 Acousmatic Function

4.2 Motoric Function

5 Game Genres with Audiovisual Emphasis: Music Video Games

5.1 Rhythm Action Games

5.2 Electronic Instrument Games

6 Appropriation of Games in Media Art

7 Music Video Games in Interactive Art


Since their appearance in the 1960s, computer games have been defined by the interplay of events at the image and sound level. Their connection fulfills numerous functions at both the affective and the semantic level. There are several parallels between the role of sound and music in film and in computer games, but an essential difference is the fact that in many cases the gamers have an interactive control over the course of events in time. With so-called rhythm games and electronic instrument games there are now genres that position the dynamic relation between sound, image, and interaction as the focus of the game itself. Intensive research on innovative interfaces and interaction methods is currently being conducted. Several prototypes for successful music video games have been crucially influenced by multimedia artists.


1 Sound in Early Computer Games

The relationships between sound and image and the development of innovative sound effects have played an important role since the beginning of computer games. Even Spacewar (MIT 1962), generally regarded as a prototypical computer game, already featured simple sound effects in a subsequent enhanced edition.

The first commercial computer game with sound effects was Atari’s famous, table tennis-based Pong (Atari 1972). Stephen L. Kent reports that the game’s designer, Al Acorn, added the sound almost coincidentally: The truth is, I was running out of parts on the board. Nolan [Bushnell] wanted the roar of a crowd of thousands—the approving roar of cheering people when you made a point. Ted Dabney told me to make a boo and a hiss when you lost a point, because for every winner there’s a loser. I said ‘Screw it, I don’t know how to make any one of those sounds. I don’t have enough parts anyhow.’ Since I had the wire wrapped on the scope, I poked around the sync generator to find an appropriate frequency or a tone. So those sounds were done in half a day. They were the sounds that were already in the machine.[1] The sounds of first-generation video games like Pong were still generated on the basis of specific electronic circuitry due to limited storage capacities. This meant that the options for integrating musical forms were severely limited. But still, even the characteristic noises of this early phase fulfilled the important function of generating auditive feedback coupled with the visual events on the screen. Claus Pias, for instance, writes about the extremely reduced electronic sounds that can be heard when the racket hits the ball in Pong: [T]he ‘pong’ sound of the collision detection seems like a reward for the right answer in a responsible game, and its steady recurrence makes audible the functioning of this ball game and thus couples man and game to the beat of a shared internal clock.[2]

This illustrates one of the most important qualities of the relation between image and sound in interactive games: the audiovisual coupling of the players to the game system. This phenomenon is connected with Michel Chion’s concept of ergo audition,[3] referring to situations in which we hear ourselves doing something, or in which the listener is simultaneously the one who triggers the sound. For this reason, the permanent audiovisual feedback in games is the basis for the affective positioning of the players in a simulated world.

Even though the quality and complexity of audiovisual forms have clearly changed due to technological developments since the games of the early 1970s, this basic principle remains continuously effective.

2 The Expansion of the Audiovisual Repertoire in Games with Music and Spatial Representations

Unlike Pong, the sounds of which were based on circuits of transistors and resistors especially developed for single tones, an arcade game such as Space Invaders (Midway 1978) already had sound chips that made it possible to synthesize sounds. This expanded the sound spectrum, and the audio quality of the sounds moved from a clearly perceptible electronic form to a more realistic reproduction of sound effects. At the same time, the increasing improvement of electronic sound synthesis made music become a more important element in computer games. The 8-bit sound chip of the Commodore Amiga, which entered the market in 1985, enabled the reproduction of short samples and thus marked the advent of an increasing use of recorded sounds in computer games.

Another central transformation, especially in relation to generating spatial impressions, took place for the audio with the introduction of stereo sound in the mid-1980s (Amiga 1000)—here the game Discs of Tron (Atari 1983) should be particularly mentioned—and for the video with the transition from sprite-based 2D to vector-based 3D display. This was accompanied by a development from graphically abstract and reduced images toward the currently prevalent forms of representation derived from photo and film realism. The space game Elite (Acornsoft 1984) is considered the first representative of fully computed three-dimensional representation.

3 The Introduction of Storage Media (CD-ROM, Cartridges) for the Playback of Recorded Sound Material and Digital Sound Effects

The 16-bit sound chips that were integrated for instance in Nintendo’s Super Famicom (1990) and the SNES consoles (1991) enabled a sound generation in CD quality for the first time and thus another surge toward higher resolution and complexity of soundscapes.

With the spread of storage media such as the CD-ROM, from the 1990s on pre-recorded sound material could also be integrated to a far greater degree. This resulted in a stronger reliance on the use of instrumental and sometimes even orchestral music that increasingly took its cue from movie scores.

In addition, the quality of sounds was also improved, since effects like echo, modulation, and even velocity changes could now be generated with the help of digital signal processing (DSP).

With the introduction of the multi-channel Dolby Surround 5.1, which had been developed for DVDs, in the game consoles Sony Playstation 2 and Microsoft X-Box in 2000, it also became possible to spatially position sounds very precisely.

4 New Functions of Sound-Image Coupling in Contemporary Computer Games

Central functions of the interplay between sound and image levels in games, which were specific to the medium, crystallized in interaction with these technical innovations. Sounds animate the fictitious game world and locate the players within the action, the auditive level expands the spatial dimension beyond the limited field of vision. In addition, sounds provide feedback and information about the dynamic situations in the game. The connection between image and sound in conjunction with the actions of the players generates a specific rhythm. Here sound and music open up an additional affective-emotional dimension. Against this background, the link between sound and image in games appears as a complex and dynamic interconnection. Whereas in film there is a fixed soundtrack, with which the link between images and sounds is fully defined, in computer games this is regulated through the programming of the game engine and from then on essentially dependent on various factors within the game situation.

Apart from pre-defined background sounds, sound effects and ambient noises relevant to the situation are dynamically generated by the game action. Although the relationships between visual objects and sounds are determined in the program, control over the time when they are triggered remains in the hands of the players.

Different sounds can be assigned to a single visual object, for example, corresponding to the changing dynamic states of this object. The actions of the visual representation of the adversary in a classical first person shooter, for example, can be signalized by different sounds: movement by the sound of steps, a shot by a bang, or a hit by a scream. The adversary does not necessarily have to remain in the field of vision, as the sound level can convey information relevant to the game about the location and the damage status of the adversary.

Sounds can be attributed not only to actions, but also to objects and locations in the game, their auditive qualities defined by the respective fields of use. Noises attributed to actions or objects often have a signal character that shifts them into the foreground. Sounds attributed to locations, on the other hand, create a background and therefore tend to be designed more atmospherically. These different attributions between image and sound can be described as audiovisual functions that fulfill different tasks related to the game events.

4.1 Acousmatic Function

The acousmatic function[4] designates situations in which a sound unambiguously attributed to a visual object is audible without the corresponding object being seen. Michel Chion describes comparable manifestations in film with the term of the acousmatic: In a film an acousmatic situation can develop along two different scenarios: either a sound is visualized first, and subsequently acousmatized, or it is acousmatic to start with, and is visualized only afterwards.[5] In current 3D games the players often have the task of establishing a transition from acousmatic to visualized sounds themselves by steering the viewing perspective in the direction of a sound, for instance, to see which object evoked it. Unlike film, where these kinds of situations are clearly defined during the sound editing, the permanent oscillation between acousmatic and visualized sound sources, which is controlled by the players, is essentially responsible for dynamically locating the player in the simulated game space. In some cases this is even elevated to the predominant game principle, for instance in games of the Metal Gear Solid series (Konami 1998–2008), where players have to hide from adversaries which they can only hear at first.

4.2 Motoric Function

A further qualitatively specific connection between sound and image, which can be called the motoric function,[6] refers to the joint occurrence of objects and movement noises. It is used in many racing games and flight simulators, when the vehicle steered by the player is permanently seen in the image, while the background is animated. At the sound level, motor noises can be heard which are coupled with the interaction and dynamically modulated in frequency and volume. Motoric functions are generated by adaptive audio, which dynamically reacts to the game action. The sound designer Andrew Clark explains the scenario: [T]he archetypal example of adaptive audio is the sound of a car engine in a racing game—when the user steps on the gas, the effect must change to reflect the engine’s changing RPM (revolutions per minute). This type of sound adds a unique extra dimension to the challenge of game sound design. Whereas an engine sound for a linear AV medium only has to fit with two contexts (the visual and the mix), an adaptive engine sound must also respond dynamically to pseudo-random user input events.[7] In this case the dynamically adapted sound lends the visually rather static vehicle an appearance of motion, and the sound is perceived in the sense of Chion’s ergo audition as being controlled by the players. The car race simulations from the Gran Turismo series (Sony 1997–2008) provide an exemplary illustration of the motoric function between sound and image as explained above.

5 Game Genres with Audiovisual Emphasis: Music Video Games

Although the signification of the relationship between sound and image is relevant in all games, especially for generating spatial simulation, there is a genre, the so-called music video games, that shifts this relation into the foreground. One of the first incarnations of this genre is the game Otocky (ASCII Corporation 1987) developed by the Japanese multimedia artist Toshio Iwai. Otocky is a side-scrolling shooter, where the players generate sounds by shooting adversary objects.

A new sound is generated with each shot, and some of the objects shot can change the pitch or key. The resultant tones are automatically quantized, meaning that the rhythm is adapted to the beat so that harmonious melodies result from playing.

Since then, a multitude of very different music video games, which explore the relationship between image, sound, and player interaction in the most diverse aspects, have been developed,. Music video games can be categorized in the two partially overlapping categories of rhythm action games and electronic instrument games.[8] In the first case, players largely follow the rhythm set by the game, whereas in the latter, the game can be seen as a veritable instrument for generating independent musical expressions, so that there is a greater degree of freedom. In addition to these two forms, there is a multitude of appropriative works and interactive installations by artists, who take up and alter aspects of the computer game. As Toshio Iwai’s practice shows, for example, the boundaries between these forms often cannot be unambiguously distinguished. Several projects started as commercial products in the game industry, whereas others are produced as multimedia installations within the framework of art exhibitions.

5.1 Rhythm Action Games

In so-called rhythm action games, such as FreQuency (2001), Amplitude (2003), or Rhythm Tengoku (Nintendo 2006), rhythm is to be seen as a central factor and thus responsible for the specific aesthetics of the game experience. These games center around the fact that sound can be used as feedback to actions in simulated environments, involving the players into the rhythm of the system. The tremendously popular games Guitar Hero (Harmonix 2005) and Rockband (Harmonix 2007), which allow players to slip into the role of rock musicians, require rhythmic and time-critical reactions to what challenges the game presents them with. In the highly original game Vib Ribbon (NaNaOn-Sha 2000), a stick figure is moved over a line, which transforms itself based on music chosen by the players in the form of a CD, resulting in various obstacles to be overcome. Here too, the musical rhythm becomes the starting point for the interaction. However, rhythmical structures resulting from the audiovisual feedback coupling between human and machine during the game process have a broader-ranging significance in digital games. In these situations, the computer always remains superior to the human ability to react. In fact, the principle of most rhythm action games is precisely to adapt to a given rhythm, which becomes increasingly complex and faster from level to level, by pressing buttons as exactly as possible. The awarding of points, which makes these kinds of games competitive, thus depends on the players’ time-critical accommodation to the respective rhythm of the game system. The game Parappa the Rapper (NaNaOn-Sha 1996), which is distinguished by its idiosyncratic aesthetics located between 2D and 3D, is also constructed in this fashion: the player has to operate the buttons of the controller in the right rhythm to make the main figure rap. A game that is based on this same principle, but which allows the players a freer choice in relation to the tempo of the interaction, is the Japanese audio game REZ (2001). Accordingly, REZ goes beyond typical rhythm action games, offering an aesthetically independent, inter-modal experience, which opens up a somewhat greater scope of action.

5.2 Electronic Instrument Games

Computer games that generate music and put the game principle outside the realm of awarding points and clear winner or loser conditions, can be regarded as electronic instrument games. On the one hand, the computer-supported creation of music has gained enormous success since the first experiments by electronic pioneers in the 1950s and 1960s, and contemporary pop music can no longer be imagined without it. On the other, multimedia artists such as Toshio Iwai have started exploring and further developing complex interactive simulations for generating sound. Also, analog video/audio synthesizers can be regarded as predecessors of electronic instrument games.

Many of these games, such as Sim Tunes (Maxis 1996) and Electroplankton (Nintendo 2005), can be understood less as rule-bound, goal-oriented games in the classical sense, but more as pointing in the direction of toys and new types of musical instruments in the broadest sense. Unlike popular software for sound production, these games often use metaphors at the visual level, which are taken from dynamic systems. In Electroplankton, for instance, a biological system, specifically the movement of micro-particles in different simulated environments that players interact with, becomes the starting point for sound production.

6 Appropriation of Games in Media Art

An intensive engagement with the medium of the computer game began in the 1990s, especially on the part of media artists. This mainly evolved into practices of appropriating aesthetic content and in the modification (modding) of existing games.[9]

In 2003, the Australian artist Julian Olivier, together with Steven Pickles, developed the project q3apd, which used a modified game environment from the popular game Quake III Arena (ID Software 1999) for sound performances. The actions within the game environment were given new sound parameters and staged live. QQQ (2002) by the British artist Nullpointer (Tom Betts) also appropriated the code from the online shooter Quake III by modifying the parameters of the game engine that control sound and graphics, so that the actions of the online players were transformed into an abstract stream of images and sounds.

The work retroYOU r/c (1999–2001) by the Spanish artist Joan Leandre intervenes in the simulation of game physics and the graphical representation by altering the software code of a car racing game. This way, the audiovisual level of the game is turned into an interactive performance tool that can be controlled with a steering wheel, realizing an impressive dynamic-abstract collage of the various visual and auditive game elements (car fragments, explosions, elements of the race track, etc.).

These are examples of how artists and programmers develop innovative and autonomous installations and performance tools on the basis of existing hardware and software from the game industry. Unlike appropriation in modern art of the 20th century, this has less to do with a reference to the original context of the material used, but rather with the experimental transformation of existing software in an attempt to automatically generate new types of audiovisual situations. The modified code of all the aforementioned works is made available for download by the artists on their web sites, which affords the audience the possibility of further experiments or collaborations.

7 Music Video Games in Interactive Art

The audiovisual installation Small Fish (1999) by Masaki Fujihata, Kiyoshi Furukawa, and Wolfgang Münch is based on computer games in its functionality, yet it is not a direct appropriation but rather a completely independent development in the field of interactive art.

Small Fish is an interactive graphical score that enables users to couple circles, dots, and lines with sounds by mouse click, and to set these visual elements in motion so that a musical structure emerges. Accordingly, the installation invites users to play with various graphic and sonic parameters. This type of algorithmically controlled generation of musical forms can also be called active score music—a term that is based on the title of a live performance event during the Ars Electronica Festival (2000), where Scribble (2000) by Golan Levin, Gregory Shakar, and Scott Gibbons and Small Fish Tales (2000) by Kiyoshi Furukawa were presented. Small Fish Tales uses the software developed for Small Fish for a performance before an audience. Scribble is based on Golan Levin’s Audiovisual Environment Suite (AVES), a collection of seven different interactive systems that were developed especially for the real-time performance of abstract computer-generated animations and sounds. The AVES instruments also represent an experimental investigation of innovative interfaces that, although they are intuitively accessible, provide great variability and countless individual settings for performers.

The music interface fijuu (2004) by Julian Olivier and Pix appears as an innovative and independent work that is aware of its proximity to computer games, but explores new approaches to the interaction between image, sound, and users beyond the realm of direct modding or appropriation.

all footnotes

[1] Steven L. Kent, The Ultimate History of Video Games: The story behind the craze that touched our lives and changed the world, New York, Three Rivers Press, 2001, 41–42.

[2] Claus Pias, Computer Spiel Welten, Munich, Diaphanes Verlag, 2000, 113. According to Pias, the pong sound is nothing other than the extremely amplified clicking of the line counter, in other words a noise inherent to the computer. See Claus Pias, “Die Pflichten des Spielers. Der User als Gestalt der Anschlüsse,” in: Martin Warnke, Thomas Coy, Georg C. Tholen (eds.): Hyperkult II. Zur Ortsbestimmung analoger und digitaler Medien, Bielefeld, transcript, 2004, 326.

[3] Cf. Michel Chion, Le Son, Paris, Nathan-Université, coll. “Cinéma et image,” 1998

[4] Axel Stockburger, The Game Environment from an Auditive Perspective, DIGRA Level Up Conference (2003), University of Utrecht, Holland, 10, available online from http://www.stockburger.co.uk/research/pdf/AUDIO–stockburger.pdf

[5] Michel Chion, Audiovision. Sound on Screen, New York, Columbia University Press, 1994, 74.

[6] Claus Pias, Computer Spiel Welten, Munich, Diaphanes Verlag, 2000, 113. Axel Stockburger, The Rendered Arena, Modalities of Space in Video and Computer Game, PhD Thesis, University of the Arts, London 2006, 202, http://www.stockburger.co.uk/research/pdf/Stockburger_Phd.pdf.

[7] Claus Pias, Computer Spiel Welten, Munich, Diaphanes Verlag, 2000, 113. Axel Stockburger, The Rendered Arena, Modalities of Space in Video and Computer Game, PhD Thesis, University of the Arts, London 2006, Andrew Clark, Designing Interactive Audio Content To Picture, 1992, 2 http://www.gamasutra.com/features/19991220/clark_01.htm.

[8] Martin Pichlmair, Fares Kayali, Levels of Sound: On the Principles of Interactivity in Music Video Games, Conference Proceedings, DIGRA Conference -Situated Play 2007, 424.

[9] The Australian Internet resource http://www.selectparks.net is an outstanding source in this respect.

List of books in this text

Audio-vision: Sound on screen
1994, Author: Chion, Michel Publisher: Columbia University Press

2002, Author: Pias, Claus Publisher: Sequenzia

Designing Interactive Audio Content To-Picture
1999, Author: Clark, Andrew

HyperKult II: Zur Ortsbestimmung analoger und digitaler Medien
2005, Publisher: Transcript-Verl.

Le son
1998, Author: Chion, Michel Publisher: Nathan

Levels of Sound: On the Principles of Interactivity in Music Video Games
2007, Author: Pichlmair, Martin and Kayali, Fares

The Rendered Arena: Modalities of Space in Video and Computer Games
2006, Author: Stockburger, Axel

The game environment from an auditive perspective
2003, Author: Stockburger, Axel Publisher: Digra

The ultimate history of video games : from Pong to Pokémon and beyond : the story behind the craze that touched our lives and changed the world
2001, Author: Kent, Steve L. Publisher: Prima Pub

see aswell

  • Al Acorn
  • Tom Betts
  • Michel Chion
  • Andrew Clark
  • Masaki Fujihata
  • Kiyoshi Furukawa
  • Scott Gibbons
  • Toshio Iwai
  • Steven L. Kent
  • Joan Leandre
  • Golan Levin
  • Wolfgang Münch
  • Julian Olivier
  • Claus Pias
  • Steven Pickles
  • pix
  • Gregory Shakar
  • Works
  • Amiga 1000 (Commodore Amiga 1000)
  • Amplitude
  • Audiovisual Environment Suite (AVES)
  • Discs of Tron
  • Electroplankton
  • Elite
  • fijuu
  • FreQuency
  • Gran Turismo
  • Guitar Hero
  • Metal Gear Solid
  • Otocky
  • PaRappa the Rapper
  • PlayStation 2 (PS2)
  • Pong
  • q3apd
  • QQQ
  • Quake III Arena (Quake 3)
  • retroYOU r/c
  • Rez
  • Rhythm Tengoku
  • Rockband
  • Scribble
  • SimTunes
  • Small Fish
  • Small Fish Tales
  • Space Invaders
  • Spacewar
  • Super Nintendo Entertainment System (Super Nintendo, Super NES)
  • Vib-Ribbon
  • X-Box (XBox)

  • Timelines
    1960 until today

    All Keywords
  • Algorithmus (Chap. 7)
  • Entgrenzung (Chap. 5, 6)
  • Intermodale Analogie (Chap. 5.1)
  • Performativität (Chap. 6, 7)
  • Polysensualität (Chap. 1, 4.1, 5.1)
  • Verzeitlichung (Chap. 4.2)
  • notation (Chap. 7)
  • participation (Chap. 6)
  • rhythm (Chap. 4, 5, 5.1)
  • sampling (Chap. 2)
  • synchronicity (Chap. 4)
  • synthesis (Chap. 2, 5.2)

  • Socialbodies
  • Acornsoft
  • Ars Electronica Festival
  • ASCII Corporation
  • Atari
  • Harmonix
  • id Software
  • Konami Corporation
  • Massachusetts Institute of Technology
  • Maxis
  • Microsoft Corporation
  • Midway Games Inc.
  • NanaOn-Sha
  • Nintendo
  • Sony