Intro | The Text | Reading Rilke as Music | Interaction | Reading Rilke as Performer

Interacting with the Drama: The Real-Time System

Vis-à-vis is called an "interactive monodrama" because the polyphonic logic of the piece is realized by means of a technology that enables close communication between the singer, the electronic score and the computer-generated video. The computer functions like a virtual performer, as in a chamber ensemble, listening to the changing musical events (for example, the phrasing, dynamics, and register of the singer) and responds accordingly, in its own terms. The point is to capture, in this electronic work, something of the improvisatory spirit that is part of any live performance, and to create a sonic and visual environment in which the singer and computer can interact as a duo, responding to each other's musical decisions.

The electronic score of Vis-à-vis is performed using an interactive environment created in the programming language MAX/MSP. This custom MAX/MSP environment produces an audio landscape whose sounds are exclusively derived from the singing voice. A Macintosh computer, running the program, receives audio from a headset microphone worn by the singer. A variety of real-time processes are used on the voice, shifting from scene to scene, according to the tone of the text. Among the real-time processes used are spectral analysis/resynthesis (analyzing and recreating the overtone series of the voice), granular sampling (fragmenting and recombining the voice), harmonization (creating vocal polyphony), frequency shifting (shifting the overtone series of the voice), and envelope tracking (responding to the loudness of the voice). Surprisingly, one of the most difficult parts to bring off was the second long section of the piece —the discursive section beginning with the words: "For example . . ." The real-time electronics for this part track the difference between the singer's consonants and vowels, recombining only the consonants into a percussive polyphony that manages to ampfily the text without obliterating it.

The overall structure of the system invites anthropomorphic allusions, for there are a number of "listener" agents that function throughout: such agents are employed to analyze the pitch, amplitude, and timbre of the voice, while also classifying particular events (such as the incidence of consonants and vowels described above). These listeners produce data that is used, in turn, to drive other "players"—essentially DSP algorithms— that process and/or synthesize sound. And it is this data that eventually gets passed on to the video control algorithms. Figure 1 maps out the components of the real-time system in the form of a virtual flow chart. (Fig. 1)

FIG. 1: BLOCK DIAGRAM OF THE SYSTEM



Building a system on this type of architectural paradigm is, in itself, nothing new. However, in the case of Vis-à-vis, there is a constant level of "fuzziness" in the algorithmic relation between listener and player. The connections are designed so that the listener agents have a constantly changing degree of autonomy. At times they will choose to listen, other times not. When they are not listening, they have the option of generating their own data, or to do nothing at all. This changing relation between listener and player helps to create a sense of dialogue between human and machine, and gives rise to varying levels of uncertainty—or what I like to call "obstinance"—in the interactive electronics: an obstinance that defines the essence of what we mean by "interactivity." In Vis-à-vis such obstinance is manifested at several levels of the system, from those listener agents that derive data from the audio input, to the DSP algorithms that process and generate the audio, to the video algorithms that control the visuals. One might also say that the live singer adds her own level of obstinance, in the way that she chooses to respond, or not to respond, to the real-time environment.

In concert performance, Vis-à-vis actually requires three Macintosh computers: the first runs the real-time MAX/MSP program; a second Macintosh runs a small program (without audio processing) that gives directions (or mappings) to the video computer; and the third runs the video program. The video for Vis-à-vis is, like the audio score, controlled in real-time, in response to the events of the different audio “scenes.” This visual score was in fact realized using a second custom program—an algorithmic video controller—written in a video programming environment called Onadime. As in the audio score, all the material for the video is derived from raw footage captured mostly of the singer, sometimes speaking, sometimes thinking. The only exception is a single visual "quotation," a lonely woman looking through a windowpane, taken from an image by the turn-of the-century Parisian photographer Eugène Atget. During performance, the video program controls the choice of visual material, their varied combinations (cutting and cross-fading), the rate of change, and the different processing algorithms. But there is, in effect, no video "track" for the piece: in each performance the visual score is created anew.
(next page)