Category Archive Slider

ByLukas Körfer

Wave field synthesis with OM-SoX

Abstract: This final project was created at the end of the winter semester 2023/24 as part of the course “Symbolische Klangverarbeitung und Analyse/Synthese” (eng. Symbolic Sound Processing and Analysis/Synthesis) of the MA Music Informatics. An application for sound spatialization was developed in the program OpenMusic using the library OM-SoX and the method of wave field synthesis.

Responsible: Lukas Körfer

Wave field synthesis

Wave field synthesis (WFS for short) is the spatialization of virtual sound sources using a loudspeaker array. This advanced audio technology attempts to reproduce sounds in such a way that they give the impression that they are coming from a specific position in space. This is achieved by generating a wave field consisting of a large number of individual sound sources that are synchronized in such a way that a coherent sound wave is created, with which it should be possible to localize a virtual sound source in the room.

 

For a better understanding of how WFS works, the subject can be approached via the physical phenomenon of interference pattern formation behind an obstacle with openings. When a wave encounters one or more slits, it is diffracted through the openings and propagates behind the obstacle. This leads to the formation of a pattern of wave interference on the other side of the obstacle. Similarly, wave field synthesis uses an array of loudspeakers to generate a coherent sound wave. This requires precise calculation and control of the phase and amplitude relationships of the sound waves emanating from each speaker. These calculations are dependent on the distances of each individual loudspeaker in the array relative to the position in space of the respective virtual sound source.

Project description

For this project, a program was to be created with the general goal of ultimately obtaining a multi-channel audio file that can be used for wave field synthesis with a loudspeaker array through certain influence and adjustments by a user. To achieve this, it was first necessary to design which parameters should be set and influenced by the user of the program.

User input

 

In addition to the audio file, which is to be used for spatialization, the user must specify certain information about the loudspeaker array on the one hand and the position or positions of one or more virtual sound sources relative to the loudspeaker array on the other. In order to make the configuration of the program as simple and intuitive as possible, I have decided to mainly use a picture object in which the structure can be recorded. The positions of the loudspeakers can be specified by drawing a rectangle and those of the virtual sound sources with circles. One or more circles can be drawn, with each circle representing a sound source. The loudspeakers can be specified in two different ways. If only a single rectangle is drawn in the picture object, this represents the area of a loudspeaker array. In order to be able to determine the specific positions of the individual loudspeakers in the next step of the program, two additional pieces of information are required. Firstly, the length of the loudspeaker array in meters; this also influences the scale for the complete drawn setup. Secondly, the number of loudspeakers in the drawn area must be specified. As soon as more than one rectangle is specified by the user, each individual rectangle represents an individual loudspeaker. In order to be able to specify a scale for the drawn structure in this variant – which was previously possible by specifying the length of the loudspeaker array – the width/height of the area of the complete picture object can now be specified. The first variant, where the loudspeaker array can only be drawn with a rectangle, makes the application much less complicated, but also requires the loudspeakers to be linear and evenly spaced.

Calculating distances

 

Once all the graphics of the picture object have been read out, they must be divided into rectangles and circles for further processing. If only one rectangle is found, the position and dimension of the rectangle and the two specifications for the length and number of loudspeaker arrays can first be used to determine the position of each individual loudspeaker within the array in meters. If there are several rectangles, this step is not necessary and the center points of all specified rectangles are simply determined. It is then possible to calculate the Euclidean distance from all sources to each individual loudspeaker on the same scale using another Lisp function. It should be noted that all graphics drawn by the user in the Picture object that do not correspond to a rectangle or a circle are ignored and not taken into account for the further calculations. As any number of virtual sound sources can be specified for the application, all circles that exist in the picture object are also captured in this step, whereby the order is irrelevant.

Sound processing

 

Sound processing is implemented in the next section of the program. Basically, a multi-channel file is created with the sound file specified by the user together with the previously calculated distances, which can be used for the intended loudspeaker array. This process takes place in a nested OM loop with two levels.

 

In the first level, it is first iterated over each element within the distance list. Each of these elements corresponds to a list that belongs to a virtual sound source, which contains the distances to each loudspeaker. Before the process enters the second level of the loop, further calculations are performed in a Lisp function using the current distance list.

This function iterates over each distance and determines the time delay, volume reduction and a cutoff frequency for a lowpass filter to calculate the air absorption of high frequencies and collects them in a list. In the next step, the result of this Lisp function is used to enter the second level of the loop.

 

Here, the respective SoX effect is applied to the calculated value; SoX level for volume reduction, SoX lowpass for air absorption and SoX pad for the time delay. The resulting audio file is saved for each iteration. Each of the three lists has as many values as the previously calculated distances from the current sound source to the speakers. This means that each audio file saved in this loop represents one channel of the subsequent multi-channel file for the current sound source.

The multi-channel file can now be created in the next step in the first layer with SoX-Merge and stored temporarily at the end of the loop. This process is repeated for all remaining virtual sound sources (if existing) and are collected as the output of this upper loop. All multi-channel files of the respective sound sources are then merged with a SoX-Mix.

If only one virtual sound source is specified by the user, the output of the outermost loop will only consist of a single multi-channel file for this one source. In this case, the SoX-Mix is not required and it would even lead to an error during the evaluation of the program if the input of the SoX-Mix consisted of only one audio file. The OM-If therefore avoids the use of the SoX-Mix as soon as the output of the patcher, in which the distances are determined, only consists of one list, which means that only one circle for a virtual sound source has been drawn in the picture object.

Finally, silence can be added to the multi-channel file using the SoX pad, depending on preference, if the selected audio file is particularly short, for example. At the same time, the final multi-channel file is saved in Outfile as “wfsOutFile.wav”.

ByFlorian Simon

Interspaces – Acousmatic study with OM-SoX

Interspaces juxtaposes sounds from human civilization with sounds from nature. Four pairs of field recordings are presented, which are filtered according to the principle of a vocoder according to the spectrum of a section of the counterpart.

Responsible: Florian Simon

Interspaces shows the following four pairs (format: total recording – source of the spectrum):

  1. Chirping Arctic terns – Vowel “E” called by humans
    Lively market, people talking and calling – Arctic tern call

  2. Rippling of a river – Accelerating car
    Main road – rushing of a river

  3. Forest scenery, rustling leaves and birds – Train horn
    Station concourse – chirping of a songbird

  4. Thunderstorm – clinking of cutlery
    Business in a restaurant kitchen – thunder

The field recordings come from the FreeToUseSounds library.

Interspaces uses an equilateral octagonal loudspeaker arrangement, whereby the two channels of the source material are each placed at opposite points in the array. The two recordings of a pair are also offset by 90 degrees from each other by default, so that four sound sources can be perceived.

Each recording is divided into several sections of random size within a certain frame and concatenated again in randomized order with short crossfades. The number of sections increases with each pair of recordings: 4, 9, 16 and finally 23. With each new section, the two sound sources also “move” in the array by 0.25 channels in a certain direction. Since the number of sections is the same for both recordings of a pair, but not the position of the cuts, deviations from the base of a 90-degree spacing and a greater variety of sounds are created. Interspaces is designed as an installation to allow free exploration of the stereo fields.

Interspaces was created in OpenMusic using functions from the OM-SoX library. The underlying program consists of two parts. The first is used to create the manipulated recordings by spectral analysis (sox-dft), splitting the source material into up to 4096 frequency bands (sox-sinc), adjusting their volume levels according to the generated spectrum (sox-level) and reassembling them (sox-mix).

The second part of the program uses the synthesis patch of a maquette to control the division into sections (sox-trim) and their spatialization (sox-remix) and final alignment (sox-splice) for each of the eight generated audio files, and finally to organize the finished blocks in terms of time (sox-pad and sox-mix). In the last step, the time saved by the crossfades must be taken into account and subtracted from the onset value/x position in the maquette.

Audio (binaural mixed to stereo):

Alex Player - Best audio player
 

Unfortunately, this vocoder method has the disadvantage that the individual frequency bands are initially very quiet and therefore artefacts in the form of noise occur when applying the gain and the final normalization. Conversely, clipping occurs when certain frequencies are strongly represented in both source recordings. If you lower the gain values accordingly to avoid this, quieter sections in the result may be barely audible, depending on the size of the dynamic difference. The noise can be easily eliminated by selecting higher gain values, but this increases the clipping problem. In the above version of Interspaces, the best compromise between the two effects was sought for all eight audio clips.


 

 

ByFlorian Simon

PixelWaltz: Sonification of images in OpenMusic

Abstract: The OpenMusic program PixelWaltz can be used to convert images into symbolic representations of music (pitches and onset times). Options for image manipulation are available with which the result can be additionally influenced.

Responsible persons: Florian Simon

Mapping: Pitch

The pixels of the image are scrolled through line by line and the respective red, green and blue values (between 0 and 1) are mapped to a desired pitch range. This means that three pitch values in midicent are always obtained from one pixel. As two adjacent pixels are similar in many cases, this mapping method often results in repeating patterns every three notes. This is the reason for the title of the project.

It is also possible to limit the number of note values output.

Mapping: Application times

A constant value can be set for the start times and note durations. A humanizer effect can also be switched on, which randomly shifts each note forwards or backwards within a specified range. Starting from the basic tempo, accelerandi and ritardandi can be created by passing lists of three numbers. These represent the start note, end note and speed of the tempo change. (20 50 -1) creates an accelerando from note 20 to note 50, in which the intervals per note become one millisecond shorter. A positive third value corresponds to a ritardando.

Dynamics

Different random ranges for “red”, “green” and “blue” notes can be defined for the volume or velocity. The values generated in this way can also be modulated sinusoidally so that, for example, the volume can rise and fall over longer periods of time. This requires the specification of a wavelength in the number of notes and the maximum deviation factor.

Accompaniment

PixelWaltz offers the option of generating an accompanying voice, which consists of individual additional tones in a desired fixed note number frequency. If this is not divisible by 3, a polymetric is often created. The pitch is determined randomly and can be between 3 and 6 semitones below the respective “accompanied” note.

Image processing

In order to create further variation, the sonification section of PixelWaltz is preceded by tools for manipulating the input image. In addition to adjusting the image size, brightness and contrast, it is also possible to shift the color values and thus recolor the image. The changes in the musical translation are immediately noticeable: More brightness leads to a higher average pitch, more contrast reduces the number of different pitch values. With a blue-dominated image, the last notes of the triplet will usually be the highest.

Sound results

The tonal results naturally differ depending on the input – but photographed material in particular often leads to the same wave-like overall structure, which winds irregularly and at a slow tempo chromatically, sometimes upwards, sometimes downwards. The accompaniment supports this effect and can form a counter-pulse to the main voice.

ByLaura Peter

Whitney Music Box with OMChroma/OMPrisma in OpenMusic

The Whitney Music Box is a sonified and/or visual representation of a series of interrelated sound elements. From a musical point of view, these elements can be related chromatically or harmonically, for example. In the visual representation, each of these elements is represented by a circle or dot (see Figure 1). These dots circle around a common center point depending on their own assigned frequency. The lower the frequency, the smaller the radius of the orbiting circle and the higher the orbital speed. Each sound element represents multiples of a fixed fundamental frequency in a harmonic series. As soon as an element has completed a revolution around the center point, the sound is triggered with the frequency it represents. Due to the mathematical relationship between the individual elements, there are moments during the performance of the Whitney Music Box in which certain elements are triggered simultaneously and phases in which the elements can be perceived consecutively. At the beginning and at the end, all elements are triggered simultaneously.

Figure 1: Whitney Music Box – visual representation

In this project, OMChroma is used to synthesize the individual sound elements (see Figure 2). The synthesis classes of OMChroma inherit from OpenMusic’s class-array object. The columns in the array describe the individual components within the synthesis. The rows represent parameters that can be assigned locally to the individual components or globally to the entire process. For the Whitney Music Box, elements are needed that implement the individual pitch gradations and the temporal offset of the individual pitch gradations. An OMChroma matrix is regarded as an event. Such an event represents a pitch and the sound repetitions within the global duration of the Whitney Music Box. The global duration is defined at the beginning and also describes the round trip time of the lowest frequency or the previously defined start frequency. Each matrix represents a frequency that is a multiple of the start frequency. The round trip time of a sound element is calculated using the formula

duration(global) / n

Where n is the index of the individual sound elements or matrices. The higher the index, the higher the frequency and the shorter the round trip time. The repetitions of the sound elements are defined by the parameter e-dels . Each component of a matrix is given a different entry delay. These entry delays are spaced at regular intervals of duration(global) / n.

Figure 2: Application of OMChroma

Without spatialization, the Whitney Music Box with OMChroma sounds like this:


Figure 3 shows how the collected matrices or sound events are spatialized with the OMPrisma library. This was based on the visual representation of the Whitney Music Box. Sound elements with a low frequency are further away from the center and sound elements with a high frequency circle closer to the center. With OMPrisma, this representation is to be implemented in spatial sound. This means that sounds with a low frequency should sound further away and sounds with a high frequency should sound closer to the listener. In the OpenMusic patch, elements with an even index were also positioned further to the front and further to the right and, similarly, elements with an odd index were positioned further to the left and back in order to distribute the sounds evenly in the room. The OMPrisma classes also offer presets for the attenuation function, air-absorption function and time-of-flight function . These were used to create an even greater sense of spatiality in addition to the positioning in the room.

Figure 3: Application of OMPrisma

In stereo, for example, the Whitney Music Box sounds like this:


Figure 4 shows how the collected OMChroma and OMPrisma matrices are merged using the chroma-prisma function. The list of all collected matrices is returned via an om-loop and rendered as a sound using the synthesize function(see Figure 5).

Figure 4: chroma-prisma

Figure 5: loop and synthesize

The OpenMusic patch and sound samples can be downloaded from the following link: https://github.com/lauraptrcodes/Whitney-music-box

ByMoritz Reiser

Markov processes for controlling harmonics in OpenMusic and Common Lisp

Abstract: A project on the use of random processes in a musical context. Basically, two different models are used. These generate chord sequences, which are then provided with a rhythm and an overlying melody.

Responsible: Moritz Reiser

 

Overview

The overall structure of the program, which corresponds to the content of the main patch, can be seen in Figure 1. At the top is the selection of the algorithm to be used for chord progression generation. This can be selected via the selection field at the top left. The two input fields of the subpatches can be used to specify the desired length and the starting chord or the key of the composition.

This is followed by a random determination of the respective tone lengths. Here you can set the tempo in BPM as well as the frequencies of the tone lengths occurring in multiples of quarter notes. The respective start times of the chords are calculated from the calculated durations using a “dx→x” function. When using the program, care must be taken here that Open Music calculates new random numbers in both strings due to the output being used twice, as a result of which the relationship between the start time and the tone duration is lost. This can be remedied by locking the subpatches for chord progression and tone length generation with “Lock Eval” after running the program once and then running it again to adjust the start times to the now saved tone durations (see information panel in the main patch). The third major step in the overall process is the generation of a melody that lies above the chord sequence. Here, a note is selected from the underlying chord and shifted up an octave. You can set whether this should always be a random chord tone or whether the tone closest to or furthest away from the preceding melody tone should be selected.

The result is then visualized at the bottom in a multi-seq object.

Figure 1: Overall structure of the composition process

 

Chord progression generation

Two algorithms are available for generating the chord sequence. The desired length of the sequence, which corresponds to the number of chords, and the starting chord or the key are transferred to them.

Harmonic chord sequence using Markov chain

The sequence of the first algorithm can be seen in Figure 2. The subpatch “Create Harmonic Chords” generates the basic set of chords that will be used in the following. This corresponds to the usual levels of counterpoint theory and, in addition to the tonic, subdominant, dominant and their parallels, contains a diminished chord on the seventh degree, a sixth ajoutée of the subdominant and a dominant seventh chord. The “Key” input adds a value corresponding to the desired key to these chords.

Figure 2: Subpatch for generating a harmonic chord sequence using a Markov chain

The “Create Transition Matrix” subpatch generates a matrix with transition probabilities for the individual chords. For each chord step, the probability with which it transitions to a certain other chord is determined. The probability values were chosen arbitrarily according to the usual processes in counterpoint theory and adjusted experimentally. For each chord it was investigated how likely it is to transition from this chord to another chord, so that the result corresponds to the conventions of counterpoint theory and allows a frequent return to the tonic level in order to focus on it. The exact transition probabilities are listed in the following table, whereby the initial sounds are listed in the left-hand column and the transitions are represented line by line.

Table 1. Transition probabilities of the harmonies of corresponding chord levels

The generation of the chord sequence finally takes place in the patch “Generate Markov Series”, which is shown in Figure 3. This initially only works with the numbering of the chord steps, which is why it is sufficient to pass it the length of the chord list. The Lisp function “Markov Synthesis” now generates a chord sequence of the desired length using the transition matrix. As it is not guaranteed that the last chord in the sequence generated in this way corresponds to the tonic, another Lisp function is used, which generates further chords until the tonic is reached. As the steps have only been numbered so far, the chords valid for the respective steps are finally selected in order to obtain the finished chord sequence.

Figure 3: Subpatch for generating a chord sequence using Markov synthesis

 
Chromatic chord progression using a tone net

In contrast to the harmonic chord progression, all 24 major and minor chords of the chromatic scale are used here (see Figure 4). The special feature of this algorithm lies in the choice of transition probabilities. These are based on a so-called tone network, which is shown in Figure 5.

Figure 4: Subpatch for generating a chord sequence based on the tone net representation

Figure 5: Tonnetz (image source:<https://jazz-library.com/articles/tonnetz/>)

Within the tone net, individual tones are applied and connected to each other. On the horizontal lines, the tones are each a fifth apart, while the diagonal lines show minor thirds (from top left to bottom right) and major thirds (from bottom left to top right). The resulting triangles each represent a triad, for example the triangle of the notes C, E and G results in the chord C major. All major and minor chords of the chromatic scale can be found. The Tonnetz representation is mostly used for analysis purposes, as a Tonnetz allows you to see directly how many tones two different triads share. One example is the analysis of classical music of the romantic and modern periods as well as film music, as the harmonic counterpoint rules used above are often neglected here in favor of chromatic and other previously unusual transitions. The distance between two chords in the tonal network can be a measure of whether the transition of one chord into the other is melodious or rather unusual. It is calculated from the number of edges that have to be crossed to get from one chord triangle to another. In other words, it corresponds to the degree of adjacency between two triangles, whereby a direct adjacency results from sharing an edge. Figure 6 shows an example of this: To get from the chord C major to the chord F minor, three edges have to be crossed, resulting in a distance of 3.

Figure 6: Example of determining the distance in the tone network using the transition from C major to F minor

As part of the project, the transition probabilities are now calculated on the basis of the distances between chords in the tone network. It is only necessary to distinguish whether the active triad is a major or minor chord, as the same distances to other chords result for all keys within these two classes. This means that every transition can be calculated from C major or C minor and then shifted to the desired key by adding a value. Starting from both variants (C major and C minor), the distances to all other triads were first recorded in the tonal network:

Distances from C major:

Intervals from C minor:

In order to obtain probabilities from the intervals, all values were first subtracted from 6 to make larger intervals less probable. The results were then used as the exponent of the number 2 in order to give greater weighting to closer chords. Overall, this results in the formula

P=2^(6-x) ; P=probability, x=distance in the grid

to calculate the transition weights. These result in the following matrix for all possible chord combinations, from which 342 probabilities result when divided by the row sum.

Within the patch, the Lisp function “Generate Tonnetz Series” first determines whether the active chord is a major or minor triad. As with the harmonic procedure, only the numbers 0-23 are used initially, this can be determined using a simple modulo-2 calculation. Depending on the result, the respective probability vector is used, a new chord is determined and finally the previous step is added. If the result is a number greater than 23, 24 is subtracted in order to always remain within the same octave.

After the previously determined length of the sequence, this section is finished. There is no return to the tonic as in the previous section, as the chromaticism means that the tonic is not as pronounced as in the harmonic chord sequence.

Determining the tone lengths

After a chord progression has been generated, random lengths are calculated for the individual triads. This is done in the “Calculate Durations” subpatch, which is shown in Figure 6. In addition to the desired BPM number, a list of note lengths is transferred as multiples of quarter notes. More probable values occur more frequently in this pool, so that a corresponding selection can be made via “nth-random”.

Figure 7: Subpatch for random determination of note durations

Melody generation

The basic melody generation process has already been described above: A tone is selected from the respective chord and transposed up an octave. This tone can be selected at random or according to the smallest or largest distance to the previous tone.

 

Sound examples

Example of a harmonic chord sequence:

 
 

Example of a tone net chord sequence:

 
 

 

ByAndres Kaufmes

Transient Processor

Transient Processor

SKAS symbolic sound processing and analysis/synthesis

Prof. Dr. Marlon Schumacher

Intermediate project by Andres Kaufmes

HfM Karlsruhe – IMWI (Institute for Music Informatics and Musicology)

Winter semester 2022/23

_____________

For this interim project, I worked on the implementation of a transient processor in OpenMusic with the help of the OM-Sox library.
A transient processor (also known as a transient designer or transient shaper) can be used to influence the attack/release behavior of the transients of an audio signal.

The first hardware device presented was the SPL TD4, introduced by SPL in 1998, which was available as a 19″ rack device and is still available today in an advanced version.

Transient Designer from SPL. (c) SPL

Transient Designers are particularly suitable for processing percussive sounds or speech. First, the transients must be isolated from the desired audio signal; this can be done using a compressor, for example. A short attack time “ducks” the transients and the signal can be subtracted from the original. The audio signal can then be processed with further effects in the course of the signal chain.

Transient processor patch. FX chain of the two signal paths (left “Transient”, right “Residual”).

At the top of the patch you can see the audio file to be processed, from which, as just described, the transients are isolated using a compressor and the resulting signal is subtracted from the original. Now two signal paths are created: The isolated transients are processed in the left-hand “chain”, the residual signal in the right-hand one. After both signal paths have been processed with audio effects, they are mixed together, whereby the mixing ratio (dry/wet) of both signal paths can be adjusted as desired. At the end of the signal processing there is a global reverb effect.

“Scope” view of the two signal paths. Sketches of the possible signal path and processing.

Sound examples:

Isolated signal:

Residual signal:

Byadmin

BAD GUY: An acousmatic study

Abstract:

Inspired by the “Infinite Bad Guy” project, and all the very different versions of how some people have fueled their imaginations on that song, I thought maybe I could also experiment with creating a very loose, instrumental cover version of Billie Eilish’s “Bad Guy”.

Supervisor: Prof. Dr. Marlon Schumacher

A study by: Kaspars Jaudzems

Winter semester 2021/22
University of Music, Karlsruhe

To the study:

Originally, I wanted to work with 2 audio files, perform an FFT analysis on the original and “replace” its sound content with content from the second file, based only on the fundamental frequency. However, after doing some tests with a few files, I came to the conclusion that this kind of technique is not as accurate as I would like it to be. So I decided to use a MIDI file as a starting point instead.

Both the first and second versions of my piece only used 4 samples. The MIDI file has 2 channels, so 2 files were randomly selected for each note of each channel. The sample was then sped up or down to match the correct pitch interval and stretched in time to match the note length.

The second version of my piece added some additional stereo effects by pre-generating 20 random pannings for each file. With randomly applied comb filters and amplitude variations, a bit more reverb and human feel was created.

Acoustic study version 1

Acousmatic study version 2

The third version was a much bigger change. Here the notes of both channels are first divided into 4 groups according to pitch. Each group covers approximately one octave in the MIDI file.

Then the first group (lowest notes) is mapped to 5 different kick samples, the second to 6 snares, the third to percussive sounds such as agogo, conga, clap and cowbell and the fourth group to cymbals and hats, using about 20 samples in total. A similar filter and effect chain is used here for stereo enhancement, with the difference that each channel is finely tuned. The 4 resulting audio files are then assigned to the 4 left audio channels, with the lower frequency channels sorted to the center and the higher frequency channels sorted to the sides. The same audio files are used for the other 4 channels, but additional delays are applied to add movement to the multi-channel experience.

Acousmatic study version 3

The 8-channel file was downmixed to 2 channels in 2 versions, one with the OM-SoX downmix function and the other with a Binauralix setup with 8 speakers.

Acousmatic study version 3 – Binauralix render

Extension of the acousmatic study – 3D 5th-order Ambisonics

The idea with this extension was to create a 36-channel creative experience of the same piece, so the starting point was version 3, which only has 8 channels.

Starting point version 3

I wanted to do something simple, but also use the 3D speaker configuration in a creative way to further emphasize the energy and movement that the piece itself had already gained. Of course, the idea of using a signal as a source for modulating 3D movement or energy came to mind. But I had no idea how…

Plugin “ambix_encoder_i8_o5 (8 -> 36 chan)”

While researching the Ambix Ambisonic Plugin (VST) Suite, I came across the plugin “ambix_encoder_i8_o5 (8 -> 36 chan)”. This seemed to fit perfectly due to the matching number of input and output channels. In Ambisonics, space/motion is translated from 2 parameters: Azimuth and Elevation. Energy, on the other hand, can be translated into many parameters, but I found that it is best expressed with the Source Width parameter because it uses the 3D speaker configuration to actually “just” increase or decrease the energy.

Knowing which parameters to modulate, I started experimenting with using different tracks as the source. To be honest, I was very happy that the plugin not only provided very interesting sound results, but also visual feedback in real time. When using both, I focused on having good visual feedback on what was going on in the audio piece as a whole.

Visual feedback – video

Channel 2 as modulation source for azimuth

This helped me to select channel 2 for Azimuth, channel 3 for Source Width and channel 4 for Elevation. If we trace these channels back to the original input midi file, we can see that channel 2 is assigned notes in the range of 110 to 220 Hz, channel 3 notes in the range of 220 to 440 Hz and channel 4 notes in the range of 440 to 20000 Hz. In my opinion, this type of separation worked very well, also because the sub-bass frequencies (e.g. kick) were not modulated and were not needed for this. This meant that the main rhythm of the piece could remain as a separate element without affecting the space or the energy modulations, and I think that somehow held the piece together.

Acousmatic study version 4 – 36 channels, 3D 5th-order Ambisonics – file was too big to upload

Acoustic study version 4 – Binaural render

ByVeronika Reutz

Composing in 8 channels with OpenMusic

In this article I present my ideas, creative processes and technical data for the patch programmed for the class “Symbolic Sound Processing and Analysis/Synthesis” with Prof. Marlon Schumacher. The idea of this text is to show the technical solutions for my creative ideas and to share the knowledge gained to help the reader with their ideas. The purpose of this patch is to take sounds from everyday life and transform them into your own composition using several processes within Open Music.

Responsible: Veronika Reutz Drobnić, winter semester 21/22

Introduction, Iteration 1

The initial idea of the piece was to transform everyday sounds, for example the sound of a kettle, into a different, processed sound by implementing technical solutions in Open Music. This patch processes and merges several files into one composition. There are three iterations of the patch that I worked on during the semester. I will describe them in chronological order.

The original idea for the patch came from musique concréte. I wanted to make a 2-minute piece from concrete sounds (not synthesized in Open Music, but recorded). This patch consists of three subpatches that are connected to the maquette in the main patch.

The main patch

Read More

Pages: 1 2 3

Byadmin

Acousmatic study by Christoph Zimmer

This article is about the three iterations of an acousmatic study by Christoph Zimmer, which were carried out as part of the seminar “Symbolic Sound Processing and Analysis/Synthesis” with Prof. Dr. Marlon Schumacher at the HFM Karlsruhe. It covers the basic concept, ideas, subsequent iterations and the technical implementation with OpenMusic.

Responsible persons: Christoph Zimmer, Master student Music Informatics at the HFM Karlsruhe

 

Basic idea and concept:

I usually work a lot with hardware for music, especially in the field of DIY. This often coincides with the organization and optimization of the workflow associated with this hardware. When we students were given the task of producing an acousmatic study in the form of musique concrète, I was initially disoriented. Up to that point, I had only dealt a little with “experimental” music genres. To be honest, I wasn’t even aware of the existence of musique concrète up to this point. So with this task I was thrown out of my usual workflow, sound synthesis with hardware, and therefore also out of my comfort zone. Now I had to use field recordings as samples.
 
My DIY attitude intuitively led me to the decision to record the samples myself. I wanted to focus on a variation of samples. However, I was still dismissive of the idea of completely cutting myself off from my previous work. I wanted to bring a “meta-connection” to my hardware-focused work into the piece. Based on this idea, the piece “chris builds a trolley for his hardware” was created
 

The finished trolley for hardware. More pictures at: https://www.reddit.com/r/synthesizers/comments/ryyw8e/i_finally_made_a_proper_stand_for_my_synth_rack/

First iteration

The piece should therefore consist of samples that were not randomly produced or downloaded from the internet, but were created as a “by-product” of work that I actually carried out myself, in this case the construction of a trolley for music hardware. Over the course of two weeks, I used my smartphone to record the sounds that emerged as I went through the various work steps. As I made use of different materials and processing methods in these work steps, not only did a wide variation of sound textures emerge, but the macroscopic structure of the piece also formed by itself. It composed itself, so to speak. The desired meta-connection was thus created. Once the trolley was complete, it was time to start producing the piece.
 
The raw audio files of the recordings are each several minutes long. To simplify handling in OpenMusic, the individual sound elements were exported as .wav files. The DAW REAPER was used for this. The result was about 350 individual samples. These are available under the following link:
 
https://drive.google.com/file/d/1hRk4OZvNEJLkpo_bzSZxP1lwO0YlcpLy/view
 
Here are a few examples of the sound elements used:
 

 

With the samples prepared, the work in OpenMusic could now begin.
As is usual for musique concrète, the samples were to be processed with various effects to support the musical context. However, it was also important to me that these effects should not dominate in such a way that the sounds become unrecognizable and the context is lost. That’s why I had the idea of programming a workspace for the arrangement within an OpenMusic patch to make the samples dynamically editable. The “Maquette” object turned out to be ideal for this. Basically, this makes it possible to place other objects within an x-axis (time) and y-axis (parameterizable). These objects can then access their own properties in the context of the maquette. I then used these functions to create four different “Template Temporal Boxes” which use the parameterization of the maquette in different ways to apply effects to the respective samples. Using multiple templates further reduces complexity while maintaining a variation of modulation possibilities:
 
tempboxa
  • Position y –> Reverbance
  • Size y –> Playback speed
  • Random –> panning

OM Patch of the tempboxa

 
tempboxb
  • Position y –> Delay time
  • Size y –> Playback speed
  • Random –> panning

OM Patch of the tempboxb

 
 
tempboxc
  • Position y –> Tremolo speed
  • Size y –> Playback speed
  • Random –> panning

OM Patch of the tempboxc

 
 
tempboxd
  • Position y –> Lowpass cutoff frequency
  • Size y –> Playback speed
  • Random –> panning

OM Patch of the tempboxd

 

With the creation of these boxes, the composition of the piece could begin.
As already mentioned, the macroscopic structure of the construction process was to be retained. In practice, certain samples of the sections (research, sketching, steel processing, welding, steel drilling, 3d printing, wood drilling, wood sanding, painting and assembly) were selected in order to process them with the parameterized tempboxes into interesting sounding combinations, which should describe the current work step.
 
 

Detail of the maquette with arrangement

 

The result of the first iteration:

 

Second iteration

 
My goal for the second iteration was to place accents on samples that represent anchor points of the piece. More precisely, the panning used in the first iteration was to be reworked by adding a provisional Haas effect (delay between the left and right channels) to the existing logic. For this purpose, the result of the previous panning is duplicated inversely and then extended with a delay (up to 8 ms) and level adjustment, which are dynamically related to the strength of the panning. Finally, both sounds are merged and output from the tempbox.

OM Patch of the extended panning

 

The result of the first iteration:

 

Third iteration

For the third and final iteration, the task was to make the piece available for an arbitrarily selectable setup of 8 channels. The structure was not to be changed. This gave me the opportunity to work on the panning again. Instead of setting the limit of the panning randomizer to 8 channels, I came up with the idea of raising the macroscopic structure even further. I chose the following speaker setup for this:
 

Setup of the speakers (with numbering of the channels)

 
With this setup, it is possible to distribute the panning to two opposite speakers, depending on the sections of the piece. During the course of the piece, the sound should then move around the listener as a slow rotational movement.
 

Part 1 of macroscopic panning

 
 
 

Part 2 of macroscopic panning

 
 

Part 3 of macroscopic panning

 
This principle applies in parallel to the accentuation of some samples from the second iteration: while the other samples (depending on the section) are distributed to different pairs of speakers, the anchor elements remain on channels 1 and 2.
 
The final version is also available in 2-channel format:

 

Fourth iteration

In this iteration, the task was to spatialize the piece using the tools we learned in the course “Visual Programming of Space/Sound Synthesis” (VPRS) with Prof. Dr. Marlon Schumacher and Brandon L. Snyder
 
“chris builds a trolley for his hardware” was already so far developed at this point that I submitted it to Metamorphoses 2022 (a competition for acousmatic pieces). For this it was necessary to mix the piece on a 16 channel setup. Due to the imminent deadline, I had very little time to adapt the piece to the requirements. Therefore, the channels were simply doubled in REAPER and LFO panning was added to the respective pairs. Unfortunately, the piece was not accepted afterwards because the length of the piece did not meet the requirements. Since the spatialization also left a lot to be desired, I took the opportunity to use the newly learned tools to improve it.
 
I decided to discard the Metamorphoses 16-channel spatialization and return to the state of the third iteration. My goal was a spatialization that not only deals with the macroscopic structure (such as the steel processing, 3D printing…), but also with the microscopic structure, i.e. to make individual sounds more dynamic. The audio exported from OM (8 channel) served as the source material, which was then to be processed using the Ambisonics (IEM) VSTs.
 
The Ambisonics template for REAPER was used as a workspace template, as it already provided a setup for the audio busses to finally render a 5th order Ambisonics file and a binaural stereo downmix. In the first step, the 8-channel audio file was routed so that it could be processed separately. To do this, channels 1-2, 3-4, 5-6 and 7-8 were sent to new tracks and the master send was deactivated. These tracks were then defined as multi-channel tracks with 36 channels and the stereo encoder (IEM) was inserted into the effect chain. The parameters for the spatialization (azimuth, elevation, roll and width) were then added as envelopes to the REAPER timeline to enable their dynamic processing. Finally, all tracks can be merged into the Ambisonics bus. The binaural downmix was used as a monitoring output.
 

A simplified representation of the routing in REAPER

 
In practice, points were inserted into the envelope tracks by hand, between which linear interpolation was then used to create dynamic changes in the parameters. I proceeded intuitively and listened to individual sections to get a basic idea of what kind of spatialization would emphasize this section. Then I looked at the individual sounds and their origins and tried to describe them with the help of the parameters. Examples of this are: an accelerating rotary movement when drilling, a jumping back and forth when the digital input of the 3D printer beeps or a complete mess when crumpling paper. I was already familiar with this type of workflow, not only when using DSP VSTs in the DAW, but also when programming DMX lights via the envelope.
 
When editing, I found the visual feedback of the EnergyVisualizer (IEM) not only very helpful to keep an overview. I therefore decided to record it and add it to the binaural downmix:
 
 
All uncompromised files can be found under the following link: https://drive.google.com/drive/folders/1bxw-iZEQTNnO92RTCmW_l5qRFjeuVxA9?usp=sharing
 

ByLukas Körfer

Speaking Objects

Abstract

In this project, an audio-only augmented reality sound installation was created as part of the course „Studienprojekte Musikprogrammierung“ (“Study Projects Music Programming”) at the Karlsruhe University of Music. It is important for the following text to distinguish the terminology from virtual reality (VR for short), in which the user is completely immersed in the virtual world. Augmented reality (AR for short) is the extension of reality through the technical addition of information.

 

Motivation

On the one hand, this sound installation should meet a certain artistic standard, on the other hand, my personal goal was to bring AR and especially auditory AR closer to the participants and to get them excited about this new technology. Unfortunately, augmented reality is very often only understood as the visual representation of information, as is the case with navigation systems or smartphone applications, for example. However, in my opinion, it is important to sensitize people more and more to the auditory extension of reality. I am convinced that this technology also has enormous potential and that there is a lot of catching up to do in terms of public awareness compared to visual augmented reality. There are already numerous areas of application in which the benefits of auditory AR have been demonstrated. These range from areas in which many applications of visual AR can already be found, such as education, increasing productivity or purely for entertainment purposes, to specialist areas such as medicine. Ten years ago, for example, there were already attempts to use auditory AR to enhance the sense of hearing for people with visual impairments. By sonifying real objects, it was possible to create a purely auditory orientation aid.

 

Methodology

In this project, participants should be able to move freely in a room in which objects are positioned and although these do not produce sounds in reality, the participants should be able to perceive sounds through headphones. In this sense, it is an extension of reality (“augmented reality”), as information is added to reality in auditory form using technical means. Essentially, the areas for implementation extend on the one hand to the positioning of the person (motion capture) and binauralization and on the other hand in the artistic sense to the design of the sound scene by positioning and synthesizing the sounds.

Figure 1

The motion capture in this project is realized with the Polhemus G4 system. The direction and position of a micro-sensor, which is attached to a pair of glasses worn by the participant, is determined by a magnetic field generated by two transmitters. A hub, which is connected to the micro-sensor via a cable, sends the motion capture data to a USB dongle connected to a laptop. This data is sent to another laptop, on which the binauralization takes place and which is ultimately connected to the wireless headphones.

Figure 2 shows two of the six objects in one variant each (angles of 45° and 90°). The next illustration (Fig. 3) shows the over-glasses (protective glasses that can also be worn over glasses) that are used in the sound installation. These goggles have a wide nose bridge to which the micro-sensor is attached with a micro-mount from Polhemus.

Figure 2

 

Figure 3

As previously explained, various decisions have to be made before the artistic aspect of the sound installation can be realized. This involves the positioning of the objects / sound sources and the sounds themselves.

Figure 4

 

Figure 5

Figure 4 shows a sketched top view of the complete structure. The six blue-colored circles mark the positions of the objects in the room and, of course, the sound sources of the scene in Binauralix, which can be seen in Figure 5. The direction and angle of the sources can be taken from the colorless areas (in Fig. 4), at either 45° or 90° angles, around the sound sources.

The completely wireless position detection and data transmission enables the participants to immerse themselves fully in this experience of the interactive reality-expanding sound world. The sound synthesis was carried out using the SuperCollider software. The sounds were mainly created through various tapping and clicking noises recorded by the SoundIn object, and finally changes and alienation of the sounds through amplitude and frequency modulation and various filters. By routing the sounds to a total of 6 output channels and “s.record(numChannels:6)”, I was able to create a two-minute multi-channel audio file in SuperCollider. When playing the file in Binauralix, the first channel is automatically mapped to source one, the second channel to source 2 and so on.

 

Technical implementation

The technical challenge for the implementation of the project initially consisted of receiving and reformatting the data from the sensor so that it could be used in Binauralix. The initial problem was that Binauralix is only available for MacOS and the software for the Polhemus G4 system is only available for Windows and Linux. As I had a MacBook and a laptop with Ubuntu Linux as my operating system at the time, I installed the Polhemus software for Linux.

After building and installing the Polhemus G4 software on Linux, the five applications “G4DevCfg”, “CreateSrcCfg”, “g4term”, “g4display” and “g4export” were available. For my project, all devices used must first be connected and configured with “G4DevCfg”. The terminal application “g4export” can be used to transmit the sensor data via UDP by specifying the previously created source configuration file, the local IP address of the receiver device and a port. The source configuration file is a file in which the position and orientation of the transmitter are defined by a “virtual frame of reference” and settings can be made for the entry hemisphere into the magnetic field, floor compensation and source calibration file. To run the application, the transmitters and the hub must be switched on at this point, the USB dongle must be connected to the laptop and the sensor to the hub, and the hub must be connected to the USB dongle. If the MacBook is now in the same network as the Linux laptop, the data can be received by specifying the previously used port. This is done with my sound installation in a self-created MaxMSP patch.

Figure 6

In this application, the appropriate port must first be selected on the left-hand side. As soon as the connection is established and the messages arrive, you can view them in raw form under the selection field. The six values that can be seen at the top in the middle of the application are the values for position and orientation that have been separated from the raw message. Final settings for the correct calibration can now be made in the action field below. There is also the option to mirror the axes individually or to change the Yaw value if unexpected problems should arise when setting up the sound installation. Once the values have been formatted into messages that can be used by Binauralix (visible at the bottom right of the application), they are sent to Binauralix.

The following videos provide a view of the scene in Binauralix and an auditory impression as the listener — driven by the sensor data — moves through the scene.

 

 

Past performances of the sound installation

The sound installation as a contribution to the EFFEKTE lecture series of the Wissenschaftsbüro-Karlsruhe

 

 

test
The sound installation as the subject of a workshop for the Kulturakademie at the HfM-Karlsruhe