Following early variants in the classical avant-gardes, the first art projects inviting spectators to interact with audiovisual systems date back to the 1950s and 1960s. Participatory assemblages, performance art, action art, kinetic art, and cybernetic art all called the traditionally object-oriented conception of the artwork into question, favoring a more process- and event-oriented understanding. This led to a greater degree of stimulated activity on the part of the recipient of the work, as well as the incorporation of mechanical elements and electronic media. The first systems offering possibilities for technically supported interaction were based almost exclusively on acoustic input that generated movement, light, and/or sound as output. In the 1960s and 1970s, the spread of video technology, on the one hand, created possibilities for real-time playback and manipulation of motion images; on the other, the advances made in computer technology enabled real-time interaction between humans and computers as well as the first graphical images. This paved the way for digital systems with elaborately programmed feedback processes, such as those developed by Myron Krueger and David Rokeby in the 1970s and 1980s. While these artists still focused on the manipulation of either visual or acoustic information, since the 1990s interactive art projects have been created that involve the joint manipulation of acoustic and visual information by the users. Artists such as Toshio Iwai, and Golan Levin and Zachary Lieberman have since developed a range of interactive art projects based on mainly abstract, at times also associative relations between sounds, colors, and forms, which are activated, manipulated, or indeed newly created during the interactive process.
The term interaction is used in everyday language to refer to the phenomenon of inter-relations. As early as 1901, the Dictionary of Philosophy and Psychology defined interaction as the relation between two or more relatively independent things or systems of change which advance, hinder, limit, or otherwise affect one another, with reference both to body-mind relations and to interrelations between objects in the environment and between objects and the environment.
According to this definition—which still applies today—all sound-image combinations based on interplay between auditory and visual information are interactions. However, more specific uses of the term became established in various scientific disciplines over the course of the 20th century. Whereas in sociology, interaction generally refers to relations between people, in the computer and media sciences the term is typically discussed with reference to the human-machine interface (HMI). This contribution will use the term in the latter sense, focusing on artistic projects that invite audiences to interact with audiovisual systems.
The traditional object-oriented concept of the work of art was already called into question by the avant-garde artists of the first half of the twentieth century, who advocated a more process- and event-oriented understanding of the artwork. Futurists and dadaists demonstrated their opposition to the traditional concept of art in provocative manifestos and spectacles. Dadaism and surrealism relied on elements of chance during the genesis of a work of art, either through the incorporation of everyday materials or the psychic automatism of écriture automatique. Jackson Pollock’s action paintings then led these ideas into the realm of full abstraction, thus shifting the creative process during the genesis of the work to center stage.
This interest in processes and factors that cannot be controlled by the creator of the work anticipated a debate about the role of the public and the act of reception. In reference to the exhibition of his entirely monochromatic White Paintings, Robert Rauschenberg insisted in 1951 that the paintings were not passive but
The new interest in elements of chance and the process-related aspects of artworks also led to the use of technical equipment that, on the one hand, structured or mediated processes, and, on the other, served as an inexhaustible supplier of sounds (and later images). It was John Cage who first incorporated radios and tape recorders into his compositions and thus inspired the visual artists of the period. In the mid-1950s, Robert Rauschenberg began experimenting with technical components (lighting, ventilators, radios) in his Combine Paintings. His relationship with engineer Billy Klüver eventually led to the renowned 9 Evenings (Theater and Engineering) in 1966—a series of events in which performers, musicians, and visual artists designed and implemented elaborate multimedia performances together with the engineers. Rauschenberg himself staged a tennis match entitled Open Score, which was used to control the lights and to perform as an orchestra. Microphones were attached to the tennis rackets, recording the vibration of the racket strings, and then the recording was sent to loudspeakers via FM transmitters. Each sound emitted by the loudspeakers switched off one of the floodlights illuminating the court. Other artists involved in 9 Evenings also experimented with image-sound interactions. David Tudor’s Bandoneon!, for example, controlled light sources, video images, and sound using the instrument of the same name.
Up until then, interaction had been confined either to specially trained performers using technical equipment or to the incorporation of the recipients, in particular by means of an acoustic or visual reflection of their presence. However, in the 1960s Rauschenberg also created installation works that called for effective action on the part of the recipient. Oracle (1962–1965), for example, is a sculptural ensemble which plays radio frequencies that can be manipulated by visitors, while the extensive installation Soundings (1968) consists of three large panels of plexiglass standing one behind the other. The first is mirrorized, the other two display photographs of chairs. Lights are installed between the panels, and their brightness varies in intensity depending on the level and pitch of the noise produced by visitors, thus allowing the viewer to see the motif more or less clearly through the reflective panel. Billy Klüver explains the effect of the work as follows: Soundings places you in a semi-dark room looking only at your own reflection. To remove the darkness you have to talk out loud to yourself, which is an unpleasant thing to do in public. The works of the 1960s based on active inclusion of technology are often termed intermedia—a word coined by Fluxus artist Dick Higgins—because they rupture the disciplinary boundaries both within the arts and between art and technology. Interactions—between people themselves, between people and technical systems, and within technical systems—are a central feature of these works.
While Rauschenberg’s interactive works are perfect examples of how audience participation can be realized with physically tangible pieces, action art calls the object-based work itself into question. Allan Kaprow and George Brecht created happenings and events with extremely open choreographies that involved the audience in a variety of ways and also incorporated multimedia elements. Even if in their case there was no technically induced causal interplay between visual and acoustic elements, the visitors often created both: for example in Brecht’s event score Motor Vehicle Sundown, a group of volunteers were asked to sit into their parked cars and follow a predetermined score inviting them to turn on and off various headlights, acoustic signals, and the car engine, and to operate mechanical equipment such as windscreen wipers and windows.
Whereas in op art the movements of the observer in front of a painting created illusionistic flickering effects, in kinetic art—for example, that of Jean Tinguely—paintings, reliefs, and sculptures were themselves made to move. However, it was not until the arrival of cybernetic art that technology-based interactions between light and sound were created. From the mid-1950s onward, Nicolas Schöffer began building cybernetic spatiodynamic sculptures (CYSP) and towers, whose built-in microphones and photo-electric cells caused them to react with their own light and sound compositions to the noise and lighting conditions of the environment, or indeed to the creation or manipulation of these conditions by the user. Between 1953 and 1957, Gordon Pask designed and built a sophisticated Musicolour System which used acoustic input to manipulate a color projection. The sound produced was analyzed by means of frequency filters and rhythm and interference detectors and in turn controled light bulbs with colored projection wheels placed in front of them. The device also incorporated a learning mechanism that allowed it to alter the filter parameters as it was used.
The first systems incorporating technical interaction were based almost exclusively on acoustic input, while they generated movement, light, and/or sounds as output. It was not until the development of video technology in the 1960s and 1970s that real-time recording and manipulation of moving images became possible. As early as 1963, in his first solo exhibition, Exposition of Music—Electronic Television, Nam June Paik invited visitors to distort television pictures either directly with a magnet or by means of sounds amplified through a microphone—both methods to manipulate the magnetic field of the cathode-ray tube. Around 1970, together with Shuya Abe, he developed one of the first video synthesizers, which allowed electronic montage, manipulation, and color-coding of videos. The manipulation of video images through sound was continued after Paik, for example in Steina Vasulka’s Violin Power and in David Stout’s work, although in these cases mostly without audience participation. Audiences were first invited to participate in video art in closed-circuit installations that recorded the image of the visitor and then reproduced it on screens, sometimes after a time delay. However, these works were mainly based on visual feedback loops.
Developments in computer technology were so far advanced by the 1960s that real-time interaction between humans and computers as well as the first graphical representations moved into the realm of the possible. This was the prerequisite for real-time interaction with visual information, which was quickly discovered for artistic applications. As early as the 1970s, American computer scientist Myron Krueger developed a system that recorded people’s movements with a video camera and immediately converted them into silhouettes that could interact on a screen with graphical objects. Krueger also experimented, in the different versions of this Videoplace system, with the use of sound to accompany the visual feedback. However, he emphasizes that at the time he was not able to satisfactorily resolve the challenge of creating works that brought forth a meaningful association between visual and acoustic outputs as well as an aesthetically successful outcome.
The true pioneer of the translation of human movement into sound may therefore be David Rokeby, who designed his Very Nervous System—also based on video recordings of body movements—in the 1980s. This system entirely dispenses with visual output, but analyzes human motion and reacts to it via synthetically generated sounds that imitate different musical instruments. While this kind of activation of sounds by means of human motion was increasingly used in performative media art, it was the exception to the rule in interactive installations. Much more commonly the movements of the audience controlled a visual feedback.
Attempts to use visual information in order to control sounds had already begun in the 1940s. At that time, the principle of the sonogram was reversed; instead of recording audio frequencies, visual information was interpreted and sonified as frequencies. This method, termed pattern playback in North American linguistic studies, was further developed with the help of computer technology. The UPIC developed by composer Iannis Xenakis in 1977, was the first real-time system that directly sonified visual forms. In this system, figures are drawn on a graphics pad and their shapes then prescribe the pitch, while their positioning determines the tone sequence or tone variation. Likewise in the 1970s, the diffusion of video technology led to the development of various systems that used live video images as the input for the generation of sound, for example Erkki Kurenniemi’s Dimi-O systems and the Cloud Music project created by Robert Watts, David Behrman, and Bob Diamond (1974–1979), in which a video camera recorded cloud movements, and an analysis of the brightness at six points of the image was used to manipulate sound canals.
It was not until the 1990s that participating visitors in interactive art projects were able to engage in joint manipulation of acoustic and visual information. One of the methods used to achieve image sonification was based on the principles of pattern playback. In 1997, Toshio Iwai created Piano as image media, an installation in which visitors use a trackball to draw shapes and patterns that are then projected onto a screen and interpreted as musical notation. The individual pixels of the patterns first move slowly line by line toward a real piano, accelerating from a particular threshold onward as they approach the keyboard, which then independently plays the corresponding note. The pixels now appear to traverse the keyboard, only—this time on a vertical projection screen—to stream out of the piano, changing into colored, geometric objects as they flow.
In his work audiovisual environment suite, Golan Levin also experimented with directly drawing the sounds, using a standard interface consisting of a mouse and a monitor. He is interested in the idea of a painterly metaphor for interfaces: This metaphor is based on the idea of an inexhaustible, extremely variable, dynamic, audiovisual substance which can be freely ‘painted,’ manipulated, and deleted in a free-form, non-diagrammatic context.
In the first application of the Yellowtail series, the sonification of the shapes drawn with the mouse and set into motion by the system is still achieved by means of an axis that repeatedly sweeps from the bottom to the top of the image, triggering a sound as soon as it makes contact with a pixel (the horizontal position determines the tone, the brightness determines the volume). In his subsequent project, Loom, Levin dispenses with this axis and generates the sound directly from the shape drawn by the user, mapping the time axis straight onto it. Thus, for example, a thicker line generates a louder note, while a change in direction increases the brightness of its timbre. The movement dynamics of the drawing are recorded and then played back repeatedly.
Another means for the visual manipulation of sound is its symbolic representation through objects activated within the framework of an interactive process. Golan Levin calls such objects interactive widgets.
Between 1992 and 1994, Toshio Iwai developed a system called Music Insects, in which visitors use a mouse to create drawings on a monitor. He assigned musical notes to the lines and shapes based on the colors in which they were drawn. Then he chose various insects to represent different musical instruments and programmed them to run across the screen. As soon as an insect makes contact with a drawing, the corresponding note is sounded, while white and gray color tones change the direction in which the insects move. The groundbreaking thing about this work is that it turns away from a linearly understood notation toward a system of notation organized in space. A similar direction is taken in Small Fish, created by Kiyoshi Furukawa together with Wolfgang Münch and Masaki Fujihata in 1998/1999. The fifteen different variants of this system almost all work on the basic principle that one or several pick-ups, usually in the form of simple dots, move across the screen and activate notes and change direction when they collide with each other, with sounding graphical elements, or with the boundaries of the window frame. The user can move the elements around in order to manipulate the composition.
Golan Levin criticizes many of these systems for the very limited freedom the user enjoys to influence the acoustic output, which is partly a consequence of the fact, he says, that not the sound object itself but only its environment or direction of movement can be altered interactively. With their Manual Input Workstation, he and Zachary Lieberman succeeded in developing a fully intuitive system in which the visitor can both create and manipulate shapes and notes at the same time by using hand gestures in a kind of shadow play. The use of human gestures means there is no equipmental level that creates a distance between input and output. Form and color are immediately generated by the hands.
Other systems proceed from the concept of the mixing console to visually create or manipulate sound. In particular, (commercial) audio software (software sequencers such as Digital Performer, for example) often imitate the optics and functionality of analog mixing consoles, while more experimental systems attempt to better this functionality by means of other graphic forms of representation. Further improvements have been achieved by new forms of music tables that visually depict the sounds, frequencies, and rhythms that can be or have been created, and also give them a spatial association. One highly sophisticated and complex example of the many music tables that exist is the reacTable—a round table on which various marked building blocks are positioned and can be activated at the same time. These building blocks adopt the function of generators, audio filters, controllers, control filters, audio mixers, and global objects (e.g., a metronome), although the user does not in any way need to know or be able to identify their function in order to create sequences of sound. The positioning of the building blocks with respect to each other determines their reciprocal influence. While the individual sound components are still depicted by symbols, their interaction is shown through connecting lines that visualize frequencies and rhythms. What is interesting about the reacTable—in addition to its truly vast range of possibilities for intuitive, real-time musical production and visualization—is its potential for collaborative improvisation between several users.
Over the last ten years, the visualization of sound produced by recipients, which had already been experimented with in the 1950s and 1960s, has also been further developed. Thus, Levin and Lieberman added sound components to installations based on real-time analysis and projection of shadows. In their installation re:mark, a voice-recognition system attempts to transform visitors’ speech into writing which then—taking its cue from the visitors’ shadows—moves across a screen. The similarly constructed installation messa di voce (Voice Placement), on the other hand, converts sound into abstract shapes.
In 2002, the two artists together with the Ars Electronica Futurelab developed a completely new approach in their installation The Hidden Worlds of Noise and Voice. Spoken exchanges are made spatially visible: the voices of or noises made by different users sitting at a round table are converted into virtual sound sculptures by means of 3D technology. The forms that emerge can be observed, on one side, through special 3D spectacles, while on the other they are projected as shadows onto the table, so that observers standing next to the users can also follow the visualized process of communication.
Another type of translation of sound and image is produced when the sounds are interpreted at a symbolic level rather than visualized. For example, in Vincent Elka’s installation Shout, the projection of a woman’s face reacts to the acoustic input produced by the visitors. The system attempts to read emotions from their voices and induces the woman to react to them both through her facial mimics and her language. Here, Elka is referring back to concepts of a linguistic communication between humans and technology that are relevant in AI (artificial intelligence) research, but because they are based on symbolic systems cannot be considered sound-image transformations in the narrow sense. As has been shown, these transformations are mostly dedicated to abstract, and also often associative relations between sounds and colors and forms. As interactive art projects, they also invite the visitor to actively explore them.
 James Mark Baldwin, ed., Dictionary of Philosophy and Psychology, vol. 1 (London: Macmillan, 1901), 561.
 For a detailed treatment of the term interaction, see Katja Kwastek, “Interactivity—A word in process,” in The Art and Science of Interface and Interaction Design, eds. L. C. Jain, Laurent Mignonneau, and Christa Sommerer (Berlin/Heidelberg: Springer, 2008), 15–26.
 Cited in Lars Blunck, Between Object & Event. Partizipationskunst zwischen Mythos und Teilhabe (Weimar: VDG, 2003), 65.
 Letter from Marcel Duchamp to Jehan Mayoux, 1956, cited in Dieter Daniels, Duchamp und die anderen. Der Modellfall einer künstlerischen Wirkungsgeschichte in der Moderne (Cologne: DuMont, 1992), 2.
 Umberto Eco, The Open Work, trans. Anna Concogni (Cambridge: Harvard University Press, 1989), 21.
 Robert Rauschenberg in the exhibition program, cited in Billy Klüver and Julie Martin, “Arbeiten mit Rauschenberg,” in Robert Rauschenberg—Retrospektive, eds. Walter Hopps and Susan Davidson (New York: Solomon R. Guggenheim Museum, 1997–1998; Ostfildern: Hatje Cantz, 1998), 315.
 Klüver and Martin, “Arbeiten mit Rauschenberg,” 319.
 Alfred Fischer, ed., George Brecht. Events. Eine Heterospektive (Cologne: Museum Ludwig, 2005), 87.
 Gordon Pask, “A comment, a case history and a plan,” in Cybernetics, Art and Ideas, ed. Jasia Reichardt (Greenwich, CT: New York Graphic Society, 1971), 76–99. Also see Margit Rosen, “‘The control of control’—Gordon Pasks kybernetische Ästhetik,” in Pask Present. An exhibition of art and design inspired by the work of Gordon Pask, eds. Ranulph Glanville and Albert Müller (Vienna: Echoraum, 2008), 131–91.
 Interview with the author, May 20, 2007.
 One example here is the performance group Palindrome, cf. http://www.palindrome.de (last access 24 March, 2009).
 Cf. Haskin Laboratories, The Science of the Spoken and Written Word, http://www.haskins.yale.edu/featured/patplay.html (last access July 27, 2009).
 Cf. Golan Levin, “The Table Is The Score: An Augmented-Reality Interface for Real-Time, Tangible, Spectrographic Performance,” in Proceedings of the International Conference on Computer Music 2006 (ICMC’06), New Orleans, November 6–11, 2006, http://www.flong.com/storage/pdf/articles/levin_scrapple_20060320_1200dpi.pdf.
 Cf. Titti Kallio et al., “Design Principles and User Interfaces of Erkki Kurenniemi’s Electronic Musical Instruments of the 1960s and 1970s,” in Proceedings of the 2007 Conference on New Interfaces for Musical Expression (NIME07), New York, 91, http://itp.nyu.edu/nime.old/2007//proc/nime2007_088.pdf.
 Cf. David Dunn, Eigenwelt der Apparate-Welt (Linz: Ars Electronica, 1992), 152–153.
 Golan Levin, Painterly Interfaces for Audiovisual Performances, M.S. Thesis MIT Media Laboratory, Cambridge 2000, 56, http://www.flong.com/storage/pdf/articles/thesis600.pdf.
 Levin, Painterly Interfaces, 41.
 See A Short History of the Works by Toshio Iwai, http://www.vanriet.com/doors/doors1/transcripts/iwai/iwai.html (last access July 27, 2009).
 Levin, Painterly Interfaces, 41f.
 E.g., in various Soundscape projects created by the Interval Research Cooperation. See Levin, Painterly Interfaces, 37.
 Cf. the overview at http://reactable.iua.upf.edu/?related (last access July 27, 2009).
 See Sergi Jordà et al., The reacTable: Exploring the Synergy between Live Music Performance and Tabletop Tangible Interfaces, http://mtg.upf.edu/reactable/pdfs/reactable_tei2007.pdf.
 Cf. Golan Levin and Zachary Lieberman, “In-Situ Speech Visualization in Real-Time Interactive Installation and Performance,” in Proceedings of The 3rd International Symposium on Non-Photorealistic Animation and Rendering, Annecy, France, June 7–9, 2004, http://www.flong.com/storage/pdf/articles/messa_NPAR_2004_300dpi.pdf.
 See the project website: http://shout.emosmos.com (last access March 24, 2009).
1900 until today