Human-Computer Interaction – Marlon Schumacher

Digital Signal Processing, Higher-Order Ambisonics, Human-Computer Interaction

DIRAC – a distributed interactive radial audio cluster

DIRAC (D_istributed I_nteractive R_adial A_udio C_luster) is a multi-purpose spatial audio device consisting of a cluster of networked loudspeakers combined with embedded computing units in a radial configuration (speakers facing outwards from each of the 12 vertices of a regular icosahedron, see Fig. 1). It has been conceived, designed and realized by Prof. Schumacher with support of music tech, mechanical engineering, IT and industrial design companies and individuals, such as RobotUnits, Augmented Instruments, Sonible. Janis Streib and Garvin Schultheiß. It can be considered a hybrid device between compact-spherical speaker array, interactive sound sculpture and multichannel loudspeaker arrangement.

Background:

Today’s use of loudspeakers in contemporary music is so commonplace that it can be virtually considered an integrated part of the instrumental catalogue of electroacoustic composers and sonic artists. Although approaches to their artistic use are manifold, they can be categorized along a continuum between its function as a sounding object with an intended distinct sonic identity and directivity pattern (sound radiation) to their use as a acoustically neutral element of an array of speakers often in a surrounding configuration to create a sonic envelopment or projection (sound projection). While former approaches often consider the loudspeaker as a component embedded within and interacting with its surrounding environment (e.g. wall reflections, etc.), in latter approaches the intention is often to minimize or “neutralize” acoustic interactions with the room – which are often considered unintended sound reproduction artefacts.

Illustration of a continuum from sound radiation to sound projection. Upper line represents artistic practices, lower line represents technical developments and installations.

DIRAC was born from the idea to develop a flexible framework and technological infrastructure, which is supportive of using individual speakers like instruments with their distinct acoustic signatures, as a generic radial sound projector (for reproduction of sound directivity patterns), and for interdisciplinary sonic art/installation contexts. Its audio rendering capabilities can range from synthesis and reproduction of channel-based material (in the tradition of sound diffusion), up to ambisonics rendering and acoustic beamforming. There is today a notable body of research into spherical loudspeaker arrays (some of them targeting commercial markets, see References below), however most developments are either lab prototypes relying on custom components and special hardware; or proprietary products, relying on third-party support and maintenance.

Rather than building on a one-size-fits-all solution, which might be costly to build, restrictive in its adaptability and difficult to maintain and extend, here key factors were modularity (adaptability for research and artistic purposes), sustainability via interchangeability of components (using off-the-shelf commercial hardware), and extensibility.

To optimize flexibility our design is based on the concept of “audio nodes” combining audio sinks (loudspeakers) with computing units (that are aware of their surrounding by sharing sensor data and other information) communicating in a distributed network cluster, using open standards and computing platforms. To minimize installation time and facilitate artistic practice we aimed to keep the necessary technical infrastructure and logistic effort to a minimum.

Accordingly, rather than developing custom hardware, the design is built on technologies from ubiquitous computing and smart device fields, in order to keep the necessary technical infrastructure and logistic effort to a minimum while providing the necessary functionalities that allow the cluster to adapt itself to conditions of its surrounding environment (e.g. for sound projection purposes) and to facilitate artistic practices involving human-computer or other forms of interaction.

Mechanical Structure and Mounts

For building the mechanical support structure and mounts, CNC-machined aluminum profiles from robotics and machine engineering domains are employed due to their modularity, precision manufacturing, extensibility (construction kit system), light weight, and convenient cable management. Below are figures illustrating the designs of the support structure.

In order to synthesize directivity patterns it is convenient to use compact spherical loudspeaker arrays in the shape of platonic solids, due to their symmetrical geometric properties, not favouring certain regions/directions. A common approach for directivity synthesis is using spherical harmonics which can be combined in order to produce multiple sound lobes in specific directions. Previous research has determined that a configuration of 12 regularly placed speakers on a sphere (dodecahedron) provides the best compromise among the platonic solids between number of channels, sound power and complexity of the controllable patterns.

Although arbitrary arrangements are technically possible (such as a dodecahedric setup), we opted for a configuration where speakers are placed at the corners of three mutually perpendicular golden-ratio rectangles (the vertices of an icosahedron) to provide a maximum of flexibility, as the symmetrical speaker configuration allows for reproduction of various audio reproduction formats, including horizontal (stereo, quad, hexaphonic) and periphonic formats (cube/hexahedron).

NB: prototypes for different numbers of nodes in various geodesic configurations exist (e.g. hemispherical, icosahedric, etc.).

Fig.1: Three perpendicular golden ratio rectangles form a regular icosahedron with 12 vertices.

Fig. 2: Technical Drawing of Aluminum Structure and Mounts at the Vertices of a Regular Icosahedron.

Fig. 3: Rendering of 3D-Model of Speakers Mounted on Structure.

Tech Configuration and Connectivity

Each cluster “node” consists of a Smart IP Speaker unit (digital loudspeakers with programmable DSP units) and a dedicated embedded computer (BELA or Raspberry Pi) mounted to the speaker via custom 3D-printed casing, offering additional processing and connectivity to various peripherals (microphones, sensors, transducers, etc.), all connected to a shared DANTE and local area network for transmission of audio signals and generic communication. To facilitate electrical circuit design 3D-printed custom mounts for breadboards were developed. In its current configuration it is using 12 Genelec 4410A smart speakers and respective embedded computers, all connected via Netgear AV Line switches, delivering Power-Over-Ethernet, Audio, Configuration, and Network communcation via a single ethernet cable. Fig. 4 shows an early prototype DIRAC setup (breadboards were added in a later design stage).

Fig. 4: Photographic Image of DIRAC on vertical support structure. Humand hands/arms shown for scale.

Control and Current Developments

Besides proprietary software (SmartIP Manager by Genelec) the internal DSP settings can be configured via an API. Current developments for a generic, opensource networking software with auto discovery and management features using generic OpenSoundControl messaging are ongoing (see this git repository). In addition, since both Dante and AES67 are supported for audio-over-ip transmission, there are current developments to add support for both BELA and RaspberryPis to extend/interact with the audio network using additional transducers (microphones, audio exciters, etc.), see this and this repository. There are also ongoing developments to integrate sensing for making the cluster aware of its surroundings, e.g. via proximity sensing, see this repository.

Audio Rendering

We are currently evaluating and experimenting with a number of different approaches: channel-based material (using direct loudspeaker feeds), panning/spatialization algorithms, such as VBAP, DBAP, etc. , Ambisonics (e.g. using Allrad Decoders), Virtual Microphones and Beamforming. In combination with networked sensing this allows covering a wide array of applications, from adaptive rendering to interactive synthesis.

Related Literature:

Pasqual, A. M., Arruda, J. R., & Herzog, P. (2010, May). A comparative study of platonic solid loudspeakers as directivity controlled sound sources. In Proceedings of the second international symposium on Ambisonics and spherical acoustics.
Pasqual, A. M. (2010). Sound directivity control in a 3-D space by a compact spherical loudspeaker array (Doctoral dissertation, Universidade Estadual de Campinas)
Avizienis, R., Freed, A., Kassakian, P., & Wessel, D. (2006, May). A compact 120 independent element spherical loudspeaker array with programable radiation patterns. In Audio Engineering Society Convention 120. Audio Engineering Society
Freed, A., Schmeder, A., & Zotter, F. (2008). Applications of environmental sensing for spherical loudspeaker arrays. IASTED Signal and Image Processing
Schmeder, A. (2009, June). An exploration of design parameters for human-interactive systems with compact spherical loudspeaker arrays. In Proceedings of the Ambisonics Symposium.
Farina, A., & Chiesi, L. (2016, May). A novel 32-speakers spherical source. In Audio Engineering Society Convention 140. Audio Engineering Society

Byadmin

Human-Computer Interaction, Sonification, UI/UX, X-Media

Music and Installation Chair @IEEE IoS 2024

Marlon Schumacher will serve as music and installation co-chair together with Esther Fee Feichtner for the IEEE

held at the International Audio Laboratories Erlangen, from 30 September – 2 October 2024. Follow this link to the official IEEE Website:

“The Internet of Sounds is an emerging research field at the intersection of the Sound and Music Computing and the Internet of Things domains. […] The aim is to bring together academics and industry to investigate and advance the development of Internet of Sounds technologies by using novel tools and processes. The event will consist of presentations, keynotes, panels, poster presentations, demonstrations, tutorials, music performances, and installations.”

The Internet of Sounds Research Network is supported by an impressive number (> 120) of institutions from over 20 countries, with a dedicated IEEE committee for emerging technology initiatives. Partners from Germany include:

International Audio Laboratories Erlangen, 
a joint institution of the Friedrich-Alexander-Universität Erlangen-Nürnberg and Fraunhofer IIS. 
[Main contact person].

Technical University Dresden, Institute of Communication Technology. [Main contact person].

Leibniz University Hannover, Institute of Communication Technology. [Main contact person].

```
Sennheiser. [Main contact person].
```

Fraunhofer Institute for Digital Media Technology (IDMT), Industrial Media Application group. 
[Main contact person].

Berlin University of the Arts, Sound Studies and Sonic Arts. [Main contact person].

Hamburg University of Music and Drama, LIGETI Center. [Main contact person].

Hochschule Anhalt University of Applied Sciences. [Main contact person].

Ostfalia University of Applied Sciences, IoT / IIoT Research Group. [Main contact person].

Berlin University of Applied Sciences, School of Popular Arts. [Main contact person].

```
Soundjack. [Main contact person].
```

ByLukas Körfer

Binauralix, Electromagnetic Tracking, Extended Reality, Gestural Control, Human-Computer Interaction, Max, Motion Capture

Speaking Objects

Abstract

In this project, an audio-only augmented reality sound installation was created as part of the course „Studienprojekte Musikprogrammierung“ (“Study Projects Music Programming”) at the Karlsruhe University of Music. It is important for the following text to distinguish the terminology from virtual reality (VR for short), in which the user is completely immersed in the virtual world. Augmented reality (AR for short) is the extension of reality through the technical addition of information.

Motivation

On the one hand, this sound installation should meet a certain artistic standard, on the other hand, my personal goal was to bring AR and especially auditory AR closer to the participants and to get them excited about this new technology. Unfortunately, augmented reality is very often only understood as the visual representation of information, as is the case with navigation systems or smartphone applications, for example. However, in my opinion, it is important to sensitize people more and more to the auditory extension of reality. I am convinced that this technology also has enormous potential and that there is a lot of catching up to do in terms of public awareness compared to visual augmented reality. There are already numerous areas of application in which the benefits of auditory AR have been demonstrated. These range from areas in which many applications of visual AR can already be found, such as education, increasing productivity or purely for entertainment purposes, to specialist areas such as medicine. Ten years ago, for example, there were already attempts to use auditory AR to enhance the sense of hearing for people with visual impairments. By sonifying real objects, it was possible to create a purely auditory orientation aid.

Methodology

In this project, participants should be able to move freely in a room in which objects are positioned and although these do not produce sounds in reality, the participants should be able to perceive sounds through headphones. In this sense, it is an extension of reality (“augmented reality”), as information is added to reality in auditory form using technical means. Essentially, the areas for implementation extend on the one hand to the positioning of the person (motion capture) and binauralization and on the other hand in the artistic sense to the design of the sound scene by positioning and synthesizing the sounds.

Figure 1

The motion capture in this project is realized with the Polhemus G4 system. The direction and position of a micro-sensor, which is attached to a pair of glasses worn by the participant, is determined by a magnetic field generated by two transmitters. A hub, which is connected to the micro-sensor via a cable, sends the motion capture data to a USB dongle connected to a laptop. This data is sent to another laptop, on which the binauralization takes place and which is ultimately connected to the wireless headphones.

Figure 2 shows two of the six objects in one variant each (angles of 45° and 90°). The next illustration (Fig. 3) shows the over-glasses (protective glasses that can also be worn over glasses) that are used in the sound installation. These goggles have a wide nose bridge to which the micro-sensor is attached with a micro-mount from Polhemus.

Figure 2

Figure 3

As previously explained, various decisions have to be made before the artistic aspect of the sound installation can be realized. This involves the positioning of the objects / sound sources and the sounds themselves.

Figure 4

Figure 5

Figure 4 shows a sketched top view of the complete structure. The six blue-colored circles mark the positions of the objects in the room and, of course, the sound sources of the scene in Binauralix, which can be seen in Figure 5. The direction and angle of the sources can be taken from the colorless areas (in Fig. 4), at either 45° or 90° angles, around the sound sources.

The completely wireless position detection and data transmission enables the participants to immerse themselves fully in this experience of the interactive reality-expanding sound world. The sound synthesis was carried out using the SuperCollider software. The sounds were mainly created through various tapping and clicking noises recorded by the SoundIn object, and finally changes and alienation of the sounds through amplitude and frequency modulation and various filters. By routing the sounds to a total of 6 output channels and “s.record(numChannels:6)”, I was able to create a two-minute multi-channel audio file in SuperCollider. When playing the file in Binauralix, the first channel is automatically mapped to source one, the second channel to source 2 and so on.

Technical implementation

The technical challenge for the implementation of the project initially consisted of receiving and reformatting the data from the sensor so that it could be used in Binauralix. The initial problem was that Binauralix is only available for MacOS and the software for the Polhemus G4 system is only available for Windows and Linux. As I had a MacBook and a laptop with Ubuntu Linux as my operating system at the time, I installed the Polhemus software for Linux.

After building and installing the Polhemus G4 software on Linux, the five applications “G4DevCfg”, “CreateSrcCfg”, “g4term”, “g4display” and “g4export” were available. For my project, all devices used must first be connected and configured with “G4DevCfg”. The terminal application “g4export” can be used to transmit the sensor data via UDP by specifying the previously created source configuration file, the local IP address of the receiver device and a port. The source configuration file is a file in which the position and orientation of the transmitter are defined by a “virtual frame of reference” and settings can be made for the entry hemisphere into the magnetic field, floor compensation and source calibration file. To run the application, the transmitters and the hub must be switched on at this point, the USB dongle must be connected to the laptop and the sensor to the hub, and the hub must be connected to the USB dongle. If the MacBook is now in the same network as the Linux laptop, the data can be received by specifying the previously used port. This is done with my sound installation in a self-created MaxMSP patch.

Figure 6

In this application, the appropriate port must first be selected on the left-hand side. As soon as the connection is established and the messages arrive, you can view them in raw form under the selection field. The six values that can be seen at the top in the middle of the application are the values for position and orientation that have been separated from the raw message. Final settings for the correct calibration can now be made in the action field below. There is also the option to mirror the axes individually or to change the Yaw value if unexpected problems should arise when setting up the sound installation. Once the values have been formatted into messages that can be used by Binauralix (visible at the bottom right of the application), they are sent to Binauralix.

The following videos provide a view of the scene in Binauralix and an auditory impression as the listener — driven by the sensor data — moves through the scene.

Tag Archive Human-Computer Interaction