Lukas Körfer – Marlon Schumacher

ByLukas Körfer

Common Lisp, OM-SoX, OpenMusic, Wellenfeldsynthese

Wave field synthesis with OM-SoX

Abstract: This final project was created at the end of the winter semester 2023/24 as part of the course “Symbolische Klangverarbeitung und Analyse/Synthese” (eng. Symbolic Sound Processing and Analysis/Synthesis) of the MA Music Informatics. An application for sound spatialization was developed in the program OpenMusic using the library OM-SoX implementing Steinberg and Snow’s “acoustic curtain”, a technique for wave field synthesis.

Responsible: Lukas Körfer

Wave field synthesis

Wave field synthesis (WFS for short) is the spatialization of virtual sound sources using a high-density loudspeaker array. This spatialization technique attempts to reproduce a physical soundfield over an extended area in a way to provide for multiple non-conicident listening positions a congruent impression of the localization of sound sources. This is achieved by generating a wave field consisting of a large number of individual sound sources that are synchronized in such a way that a coherent sound wave is created, for which given certain constraints it should be possible to localize a virtual sound source in the room.

For a better understanding of how WFS works, the subject can be approached via the physical phenomenon of interference pattern formation behind an obstacle with openings. When a wave encounters one or more slits, it is diffracted through the openings and propagates behind the obstacle. This leads to the formation of a pattern of wave interference on the other side of the obstacle. Similarly, wave field synthesis uses an array of loudspeakers to generate a coherent sound wave. This requires precise calculation and control of the phase and amplitude relationships of the sound waves emanating from each speaker. These calculations are dependent on the distances of each individual loudspeaker in the array relative to the position in space of the respective virtual sound source.

Project description

For this project, a program was to be created with the general goal of ultimately obtaining a multi-channel audio file that can be used for wave field synthesis with a loudspeaker array through certain influence and adjustments by a user. To achieve this, it was first necessary to design which parameters should be set and influenced by the user of the program.

User input

In addition to the audio file, which is to be used for spatialization, the user must specify certain information about the loudspeaker array on the one hand and the position or positions of one or more virtual sound sources relative to the loudspeaker array on the other. In order to make the configuration of the program as simple and intuitive as possible, I have decided to mainly use a picture object in which the structure can be recorded. The positions of the loudspeakers can be specified by drawing a rectangle and those of the virtual sound sources with circles. One or more circles can be drawn, with each circle representing a sound source. The loudspeakers can be specified in two different ways. If only a single rectangle is drawn in the picture object, this represents the area of a loudspeaker array. In order to be able to determine the specific positions of the individual loudspeakers in the next step of the program, two additional pieces of information are required. Firstly, the length of the loudspeaker array in meters; this also influences the scale for the complete drawn setup. Secondly, the number of loudspeakers in the drawn area must be specified. As soon as more than one rectangle is specified by the user, each individual rectangle represents an individual loudspeaker. In order to be able to specify a scale for the drawn structure in this variant – which was previously possible by specifying the length of the loudspeaker array – the width/height of the area of the complete picture object can now be specified. The first variant, where the loudspeaker array can only be drawn with a rectangle, makes the application much less complicated, but also requires the loudspeakers to be linear and evenly spaced.

Calculating distances

Once all the graphics of the picture object have been read out, they must be divided into rectangles and circles for further processing. If only one rectangle is found, the position and dimension of the rectangle and the two specifications for the length and number of loudspeaker arrays can first be used to determine the position of each individual loudspeaker within the array in meters. If there are several rectangles, this step is not necessary and the center points of all specified rectangles are simply determined. It is then possible to calculate the Euclidean distance from all sources to each individual loudspeaker on the same scale using another Lisp function. It should be noted that all graphics drawn by the user in the Picture object that do not correspond to a rectangle or a circle are ignored and not taken into account for the further calculations. As any number of virtual sound sources can be specified for the application, all circles that exist in the picture object are also captured in this step, whereby the order is irrelevant.

Sound processing

Sound processing is implemented in the next section of the program. Basically, a multi-channel file is created with the sound file specified by the user together with the previously calculated distances, which can be used for the intended loudspeaker array. This process takes place in a nested OM loop with two levels.

In the first level, it is first iterated over each element within the distance list. Each of these elements corresponds to a list that belongs to a virtual sound source, which contains the distances to each loudspeaker. Before the process enters the second level of the loop, further calculations are performed in a Lisp function using the current distance list.

This function iterates over each distance and determines the time delay, volume reduction and a cutoff frequency for a lowpass filter to calculate the air absorption of high frequencies and collects them in a list. In the next step, the result of this Lisp function is used to enter the second level of the loop.

Here, the respective SoX effect is applied to the calculated value; SoX level for volume reduction, SoX lowpass for air absorption and SoX pad for the time delay. The resulting audio file is saved for each iteration. Each of the three lists has as many values as the previously calculated distances from the current sound source to the speakers. This means that each audio file saved in this loop represents one channel of the subsequent multi-channel file for the current sound source.

The multi-channel file can now be created in the next step in the first layer with SoX-Merge and stored temporarily at the end of the loop. This process is repeated for all remaining virtual sound sources (if existing) and are collected as the output of this upper loop. All multi-channel files of the respective sound sources are then merged with a SoX-Mix.

If only one virtual sound source is specified by the user, the output of the outermost loop will only consist of a single multi-channel file for this one source. In this case, the SoX-Mix is not required and it would even lead to an error during the evaluation of the program if the input of the SoX-Mix consisted of only one audio file. The OM-If therefore avoids the use of the SoX-Mix as soon as the output of the patcher, in which the distances are determined, only consists of one list, which means that only one circle for a virtual sound source has been drawn in the picture object.

Finally, silence can be added to the multi-channel file using the SoX pad, depending on preference, if the selected audio file is particularly short, for example. At the same time, the final multi-channel file is saved in Outfile as “wfsOutFile.wav”.

ByLukas Körfer

Binauralix, Electromagnetic Tracking, Extended Reality, Gestural Control, Human-Computer Interaction, Max, Motion Capture

Speaking Objects

Abstract

In this project, an audio-only augmented reality sound installation was created as part of the course „Studienprojekte Musikprogrammierung“ (“Study Projects Music Programming”) at the Karlsruhe University of Music. It is important for the following text to distinguish the terminology from virtual reality (VR for short), in which the user is completely immersed in the virtual world. Augmented reality (AR for short) is the extension of reality through the technical addition of information.

Motivation

On the one hand, this sound installation should meet a certain artistic standard, on the other hand, my personal goal was to bring AR and especially auditory AR closer to the participants and to get them excited about this new technology. Unfortunately, augmented reality is very often only understood as the visual representation of information, as is the case with navigation systems or smartphone applications, for example. However, in my opinion, it is important to sensitize people more and more to the auditory extension of reality. I am convinced that this technology also has enormous potential and that there is a lot of catching up to do in terms of public awareness compared to visual augmented reality. There are already numerous areas of application in which the benefits of auditory AR have been demonstrated. These range from areas in which many applications of visual AR can already be found, such as education, increasing productivity or purely for entertainment purposes, to specialist areas such as medicine. Ten years ago, for example, there were already attempts to use auditory AR to enhance the sense of hearing for people with visual impairments. By sonifying real objects, it was possible to create a purely auditory orientation aid.

Methodology

In this project, participants should be able to move freely in a room in which objects are positioned and although these do not produce sounds in reality, the participants should be able to perceive sounds through headphones. In this sense, it is an extension of reality (“augmented reality”), as information is added to reality in auditory form using technical means. Essentially, the areas for implementation extend on the one hand to the positioning of the person (motion capture) and binauralization and on the other hand in the artistic sense to the design of the sound scene by positioning and synthesizing the sounds.

Figure 1

The motion capture in this project is realized with the Polhemus G4 system. The direction and position of a micro-sensor, which is attached to a pair of glasses worn by the participant, is determined by a magnetic field generated by two transmitters. A hub, which is connected to the micro-sensor via a cable, sends the motion capture data to a USB dongle connected to a laptop. This data is sent to another laptop, on which the binauralization takes place and which is ultimately connected to the wireless headphones.

Figure 2 shows two of the six objects in one variant each (angles of 45° and 90°). The next illustration (Fig. 3) shows the over-glasses (protective glasses that can also be worn over glasses) that are used in the sound installation. These goggles have a wide nose bridge to which the micro-sensor is attached with a micro-mount from Polhemus.

Figure 2

Figure 3

As previously explained, various decisions have to be made before the artistic aspect of the sound installation can be realized. This involves the positioning of the objects / sound sources and the sounds themselves.

Figure 4

Figure 5

Figure 4 shows a sketched top view of the complete structure. The six blue-colored circles mark the positions of the objects in the room and, of course, the sound sources of the scene in Binauralix, which can be seen in Figure 5. The direction and angle of the sources can be taken from the colorless areas (in Fig. 4), at either 45° or 90° angles, around the sound sources.

The completely wireless position detection and data transmission enables the participants to immerse themselves fully in this experience of the interactive reality-expanding sound world. The sound synthesis was carried out using the SuperCollider software. The sounds were mainly created through various tapping and clicking noises recorded by the SoundIn object, and finally changes and alienation of the sounds through amplitude and frequency modulation and various filters. By routing the sounds to a total of 6 output channels and “s.record(numChannels:6)”, I was able to create a two-minute multi-channel audio file in SuperCollider. When playing the file in Binauralix, the first channel is automatically mapped to source one, the second channel to source 2 and so on.

Technical implementation

The technical challenge for the implementation of the project initially consisted of receiving and reformatting the data from the sensor so that it could be used in Binauralix. The initial problem was that Binauralix is only available for MacOS and the software for the Polhemus G4 system is only available for Windows and Linux. As I had a MacBook and a laptop with Ubuntu Linux as my operating system at the time, I installed the Polhemus software for Linux.

After building and installing the Polhemus G4 software on Linux, the five applications “G4DevCfg”, “CreateSrcCfg”, “g4term”, “g4display” and “g4export” were available. For my project, all devices used must first be connected and configured with “G4DevCfg”. The terminal application “g4export” can be used to transmit the sensor data via UDP by specifying the previously created source configuration file, the local IP address of the receiver device and a port. The source configuration file is a file in which the position and orientation of the transmitter are defined by a “virtual frame of reference” and settings can be made for the entry hemisphere into the magnetic field, floor compensation and source calibration file. To run the application, the transmitters and the hub must be switched on at this point, the USB dongle must be connected to the laptop and the sensor to the hub, and the hub must be connected to the USB dongle. If the MacBook is now in the same network as the Linux laptop, the data can be received by specifying the previously used port. This is done with my sound installation in a self-created MaxMSP patch.

Figure 6

In this application, the appropriate port must first be selected on the left-hand side. As soon as the connection is established and the messages arrive, you can view them in raw form under the selection field. The six values that can be seen at the top in the middle of the application are the values for position and orientation that have been separated from the raw message. Final settings for the correct calibration can now be made in the action field below. There is also the option to mirror the axes individually or to change the Yaw value if unexpected problems should arise when setting up the sound installation. Once the values have been formatted into messages that can be used by Binauralix (visible at the bottom right of the application), they are sent to Binauralix.

The following videos provide a view of the scene in Binauralix and an auditory impression as the listener — driven by the sensor data — moves through the scene.

Author Archive Lukas Körfer