Tag Archive Sonification

ByMads Ole Clasen

Sonification of Image Sequences

Abstract: This project deals with the design of an OpenMusic application that has the goal of converting image sequences into a symbolic music representation consisting of three voices.

Responsible: Mads Clasen

 

Source material and preparation

The image sequence can be individual frames from a video, a series of images from an artist or images you have compiled yourself. These must first be available in a folder on the relevant computer. If it is a video, it must first be broken down into its frames outside of OpenMusic. In Python, this can be done very easily in just a few lines with the help of the OpenCV library.

Fig. 1: Splitting video in Python

 

As can be seen in the code example, the resulting images should also be named numerically with numbers from 1 to n and, depending on the computing power of the computer and the number of images, be downsampled. This, as well as the installation of the Pixels Library in OpenMusic itself, is necessary for the patch to work properly.

 

Sonification

Note values

After specifying the file path, file format and number of images, the desired images are loaded into a picture-lib object and summarised as a list. The average R, G and B colour value is read from each image in this list and mapped to a corresponding note. In this case, the R, G and B values are treated as different voices and each have their own note range within which they move. This range is additionally subdivided into microtonal steps (1/8 notes) to which the values are finally mapped. A value of 1 corresponds to the highest possible note and 0 to the lowest. Accordingly, one note for each of the three voices is read from an image.

Fig. 2: Generation of a score line

 

The note ranges in which the colour values move can be adjusted using the corresponding chord objects; it is also possible to transpose all ranks uniformly by a desired value in midicents. The preset values correspond to the approximate note ranges of soprano (R), alto (G) and bass (B) voices. After running through the entire image sequence, there are now three voices with a number of notes corresponding to the number of images entered. These voices are processed independently of each other in the next step.

Fig. 3: Mapping the RGB values onto note values

 

Rhythm

As I have decided to use voice objects because of the direct way of influencing rhythm, the rhythm trees required for this are created first. Or rather the two building blocks of a rhythm tree : time signature and proportions. A list is created for each of these, based on the number of notes and the time signature, which are later merged into the required rhythm tree form using a simple mat-trans object.

Fig. 4: Generating the rhythm tree

 

Before this, however, the list responsible for the proportions is enriched with ostinato. For this purpose, there is a second list containing various possibilities for such ostinato, from which one is selected at random and placed in a random position in the proportion list. As this increases the number of note values, new notes corresponding to the embellishment are inserted at the same position in the list containing the notes. Starting from the original note, a decision is made between constant, ascending or descending notes. The strength with which these ornaments are to be included in the parts can also be adjusted.

Fig. 5: Addition of rhythmic ornaments

 

Dynamics

The edited proportions are then merged with the list that determines the time signature to form the rhythm tree . The corresponding note list goes through one more step, in which a random velocity value is assigned to each note.

Fig. 6: Creating dynamics

 

Finally, the note lists are combined with their respective rhythm tree in a voice object to form a voice, resulting in a total of three voices (R, G, B), which can then be combined in a poly object.

 

Image processing

In order to additionally influence the sound result, it is possible to edit the saturation of the colour values of individual images of the selected sequence. The original images are then mixed with the altered ones.

Fig. 7: Changing colour values

 

Example

In the following, the frames of a video taken during a bus journey and showing the surroundings were used as the image sequence. Every thirtieth frame was saved. In the patch, the option of colour manipulation was also used to make the sequence a little more varied.

Fig. 8: Edited image sequence

 

After evaluating the remaining patch, a possible end result of this image sequence could look and sound as follows:

Fig. 9: Excerpt of symbolic representation

 

 

 

Byadmin

Music and Installation Chair @IEEE IoS 2024

Marlon Schumacher will serve as music and installation co-chair together with Esther Fee Feichtner for the IEEE

5th International Symposium on the Internet of Sounds

held at the International Audio Laboratories Erlangen, from 30 September – 2 October 2024. Follow this link to the official IEEE Website:

“The Internet of Sounds is an emerging research field at the intersection of the Sound and Music Computing and the Internet of Things domains.  […] The aim is to bring together academics and industry to investigate and advance the development of Internet of Sounds technologies by using novel tools and processes. The event will consist of presentations, keynotes, panels, poster presentations, demonstrations, tutorials, music performances, and installations.”

 

The Internet of Sounds Research Network is supported by an impressive number (> 120) of institutions from over 20 countries, with a dedicated IEEE committee for emerging technology initiatives. Partners from Germany include:

ByFlorian Simon

PixelWaltz: Sonification of images in OpenMusic

Abstract: The OpenMusic program PixelWaltz can be used to convert images into symbolic representations of music (pitches and onset times). Options for image manipulation are available with which the result can be additionally influenced.

Responsible persons: Florian Simon

Mapping: Pitch

The pixels of the image are scrolled through line by line and the respective red, green and blue values (between 0 and 1) are mapped to a desired pitch range. This means that three pitch values in midicent are always obtained from one pixel. As two adjacent pixels are similar in many cases, this mapping method often results in repeating patterns every three notes. This is the reason for the title of the project.

It is also possible to limit the number of note values output.

Mapping: Application times

A constant value can be set for the start times and note durations. A humanizer effect can also be switched on, which randomly shifts each note forwards or backwards within a specified range. Starting from the basic tempo, accelerandi and ritardandi can be created by passing lists of three numbers. These represent the start note, end note and speed of the tempo change. (20 50 -1) creates an accelerando from note 20 to note 50, in which the intervals per note become one millisecond shorter. A positive third value corresponds to a ritardando.

Dynamics

Different random ranges for “red”, “green” and “blue” notes can be defined for the volume or velocity. The values generated in this way can also be modulated sinusoidally so that, for example, the volume can rise and fall over longer periods of time. This requires the specification of a wavelength in the number of notes and the maximum deviation factor.

Accompaniment

PixelWaltz offers the option of generating an accompanying voice, which consists of individual additional tones in a desired fixed note number frequency. If this is not divisible by 3, a polymetric is often created. The pitch is determined randomly and can be between 3 and 6 semitones below the respective “accompanied” note.

Image processing

In order to create further variation, the sonification section of PixelWaltz is preceded by tools for manipulating the input image. In addition to adjusting the image size, brightness and contrast, it is also possible to shift the color values and thus recolor the image. The changes in the musical translation are immediately noticeable: More brightness leads to a higher average pitch, more contrast reduces the number of different pitch values. With a blue-dominated image, the last notes of the triplet will usually be the highest.

Sound results

The tonal results naturally differ depending on the input – but photographed material in particular often leads to the same wave-like overall structure, which winds irregularly and at a slow tempo chromatically, sometimes upwards, sometimes downwards. The accompaniment supports this effect and can form a counter-pulse to the main voice.

Byadmin

Extension of the acousmatic study – 3D 5th-order Ambisonics

This article is about the fourth iteration of an acousmatic study by Zeno Lösch, which was carried out as part of the seminar “Visual Programming of Space/Sound Synthesis” with Prof. Dr. Marlon Schumacher at the HFM Karlsruhe. The basic conception, ideas, iterations and the technical implementation with OpenMusic will be discussed.

Responsible persons: Zeno Lösch, Master student Music Informatics at HFM Karlsruhe, 2nd semester

 

Pixel

A Python script was used to obtain parameters for modulation.

This script makes it possible to scale any image to 10 x 10 pixels and save the respective pixel values in a text file. “99 153 187 166 189 195 189 190 186 88 203 186 198 203 210 107 204 143 192 108 164 177 206 167 189 189 74 183 191 110 211 204 110 203 186 206 32 201 193 78 189 152 209 194 47 107 199 203 195 162 194 202 192 71 71 104 60 192 87 128 205 210 147 73 90 67 81 130 188 143 206 43 124 143 137 79 112 182 26 172 208 39 71 94 72 196 188 29 186 191 209 85 122 205 198 195 199 194 195 204 ” The values in the text file are between 0 and 255. The text file is imported into Open Music and the values are scaled.

These scaled values are used as pos-env parameters.

Reaper and IEM-Plugin Suite

 

With different images and different scaling, you get different results that can be used as parameters for modulation. In Reaper, the IEM plug-in suite was used in post-production. These tools are used for Ambisonics of different orders. In this case, Ambisonics 5 order was used. One effect that was often used is the FDNReverb. This reverb unit offers the possibility of applying an Ambisonics reverb to a multi-channel file. The stereo and mono files were first encoded in 5th order Ambisonics (36 channels) and then converted into two channels using the binaural encoder. Other post-processing effects (Detune, Reverb) were programmed by myself and are available on Github. The reverb is based on a paper by James A. Moorer About this Reverberation Business from 1979 and was written in C. The algorithm of the detuner was written in C from the HTML version of Miller Puckette’s handbook “The Theory and Technique of Electronic Music”. The result of the last iteration can be heard here.