|
Interacting
with the Drama: The Real-Time System
Vis-à-vis is called an "interactive monodrama" because
the polyphonic logic of the piece is realized by means of a technology
that enables close communication between the singer, the electronic score
and the computer-generated video. The computer functions like a virtual
performer, as in a chamber ensemble, listening to the changing musical
events (for example, the phrasing, dynamics, and register of the singer)
and responds accordingly, in its own terms. The point is to capture, in
this electronic work, something of the improvisatory spirit that is part
of any live performance, and to create a sonic and visual environment
in which the singer and computer can interact as a duo, responding to
each other's musical decisions.
The electronic score of Vis-à-vis is performed using an interactive
environment created in the programming language MAX/MSP. This custom MAX/MSP
environment produces an audio landscape whose sounds are exclusively derived
from the singing voice. A Macintosh computer, running the program, receives
audio from a headset microphone worn by the singer. A variety of real-time
processes are used on the voice, shifting from scene to scene, according
to the tone of the text. Among the real-time processes used are spectral
analysis/resynthesis (analyzing and recreating the overtone series of
the voice), granular sampling (fragmenting and recombining the voice),
harmonization (creating vocal polyphony), frequency shifting (shifting
the overtone series of the voice), and envelope tracking (responding to
the loudness of the voice). Surprisingly, one of the most difficult parts
to bring off was the second long section of the piece —the discursive
section beginning with the words: "For example . . ." The real-time
electronics for this part track the difference between the singer's consonants
and vowels, recombining only the consonants into a percussive polyphony
that manages to ampfily the text without obliterating it.
The overall structure of the system invites anthropomorphic allusions,
for there are a number of "listener" agents that function throughout:
such agents are employed to analyze the pitch, amplitude, and timbre of
the voice, while also classifying particular events (such as the incidence
of consonants and vowels described above). These listeners produce data
that is used, in turn, to drive other "players"—essentially
DSP algorithms— that process and/or synthesize sound. And it is
this data that eventually gets passed on to the video control algorithms.
Figure 1 maps out the components of the real-time system in the form of
a virtual flow chart. (Fig. 1)
FIG. 1: BLOCK DIAGRAM OF THE SYSTEM

Building a system on this type of architectural paradigm is, in itself,
nothing new. However, in the case of Vis-à-vis, there is a constant
level of "fuzziness" in the algorithmic relation between listener
and player. The connections are designed so that the listener agents have
a constantly changing degree of autonomy. At times they will choose to
listen, other times not. When they are not listening, they have the option
of generating their own data, or to do nothing at all. This changing relation
between listener and player helps to create a sense of dialogue between
human and machine, and gives rise to varying levels of uncertainty—or
what I like to call "obstinance"—in the interactive electronics:
an obstinance that defines the essence of what we mean by "interactivity."
In Vis-à-vis such obstinance is manifested at several levels of
the system, from those listener agents that derive data from the audio
input, to the DSP algorithms that process and generate the audio, to the
video algorithms that control the visuals. One might also say that the
live singer adds her own level of obstinance, in the way that she chooses
to respond, or not to respond, to the real-time environment.
In concert performance, Vis-à-vis actually requires three Macintosh
computers: the first runs the real-time MAX/MSP program; a second Macintosh
runs a small program (without audio processing) that gives directions
(or mappings) to the video computer; and the third runs the video program.
The video for Vis-à-vis is, like the audio score, controlled in
real-time, in response to the events of the different audio “scenes.”
This visual score was in fact realized using a second custom program—an
algorithmic video controller—written in a video programming environment
called Onadime. As in the audio score, all the material for the video
is derived from raw footage captured mostly of the singer, sometimes speaking,
sometimes thinking. The only exception is a single visual "quotation,"
a lonely woman looking through a windowpane, taken from an image by the
turn-of the-century Parisian photographer Eugène Atget. During
performance, the video program controls the choice of visual material,
their varied combinations (cutting and cross-fading), the rate of change,
and the different processing algorithms. But there is, in effect, no video
"track" for the piece: in each performance the visual score
is created anew. (next
page)
|