Abstract: OpenMusic and the OM-SoX library were used to create a way to encode mono audio files as a 3D Ambisonics signal up to the third order.
Responsible: Alexander Nguyen (WS 2023/24)
Main text:
Ambisonics
Ambisonics is a method for describing a two- or three-dimensional sound field (in the following I shall restrict myself to 3D Ambisonics). Ambisonics uses a basis of orthogonal functions and the spherical coordinate system to describe the sound field along a spherical surface resulting from a sound source . The simplest case is “Zero-th Order Ambisonics”, which resembles an ideal omnidirectional microphone: exactly one audio channel is used (also called the “W” channel, according to Furse-Malham naming). With “First Order Ambisonics” (FOA), the signal is split into an additional three channels (three bases): These are the three “directional” components (also called X, Y, Z channels). Assuming an ideal point sound source is placed at the end of one of these axes, then only this axis (with respect to the same ordinal number) will contain the signal. In the case of Ambisonics, the channels of lower orders are always included, i.e. the FOA signal consists of a total of four audio channels. In general, the number of channels for a 3D Ambisonics signal of the $n$th order can be calculated using the formula $(n+1)^2$ (i.e. for $n=0$: 1; for $n=1$: 4, for $n=2$: 9, for $n=3$: 16). Ambisonics signals with ‘higher’ order numbers (…, 2, 3, 4, …) are also referred to as Higher Order Ambisonics (HOA).
Channel Numbering
An HOA signal therefore consists of several components. There are several approaches to sorting the components in a multi-channel audio file. The sorting chosen here for this project is “Ambisonic Channel Numbering” (ACN), in which each channel is assigned an integer number starting at zero (0). The first channel is therefore labeled “0”, the second channel “1”, the third channel “2” and so on. This numerical designation can be used to determine the ‘order’ ($l$) and the ‘degree‘ ($m$) to which the component belongs. See Table 1 for an overview of all components of 3rd Order Ambisonics (3OA) – and a collation with an alternative labeling, “Furse Malham” (FuMa).
Normalization
The values $l$ (order) and $m$ (degree) are used to calculate a normalization factor for each audio channel. The normalization used here is called “Semi-Normalized 3D” (SN3D). See Table 2 for an overview of the normalization factors for all components of 3rd Order Ambisonics.
ACN together with SN3D normalization reflect a currently common convention called ambiX (Nachbar et al., 2011).
Encoding
To map a point sound source in Ambisonics, its audio signal is added to each of the audio channels, weighted using the normalization factor just described and an attenuation factor. The attenuation factor, which will be defined below, depends on the angle of incidence (described in the spherical coordinate system) and the ACN number (i.e. order and degree). An intuition (w.r.t. FOA): The attenuation is minimum (0 dB or multiplication factor 1, respectively) if the angle of incidence coincides with one of the axes in an ordinary 3-dimensional coordinate system ($x$, $y$ or $z$), maximum (-∞ dB or factor 0) if it is perpendicular to it.
In Ambisonics, the 3D coordinate system is usually defined as follows: The “front” (relative to the listener’s point of view) is defined as the positive x-axis. Being a right-handed system, this implies that the positive y-axis points to the “left” and the positive z-axis points “up“. For the transformation to polar coordinates, i.e. to the spherical coordinate system, one defines 0° azimuth (θ) coincident to the positive x-axis on the xy-plane, counterclockwise. 0° elevation (ϕ) coincident to the xy-plane, maximum positive, if coincident to the positive z-axis (see Figure 1), with:
$0≤θ≤2π$
$-π/2≤ϕ≤π/2$
In order to encode a time $t$-dependent signal $S(t)$ of a point sound source with angles of incidence $θ, ϕ$ in Ambisonics, the eventual Ambisonics signal component is calculated separately for each channel $B_l^m$. To do this, the signal is multiplied by the attenuation factor $Y_l^m$ :
$B_l^m (t) := S(t)\cdot Y_l^m (\theta, \phi)$
The formula for the attenuation factor is (see Nachbar et al., 2011):
\[
Y_l^m(\theta, \phi) :=N_l^{|m|} \cdot P_l^{|m|}(sin(\phi)) \cdot \begin{cases}
sin(|m|\theta) & \text{if } m < 0\\
cos(|m|\theta) & \text{if } m > 0\\
1 & \text{if } m=0
\end{cases}
\]
where $P_l^m$ is the “associated Legendre polynomial” of $l$-th order and $m$-th degree, and $P_l$ is the (unassociated) Legendre polynomial of $l$-th order (in the Rodrigues representation). These are defined as follows:
\[\begin{eqnarray*}
P_l(x) &:=& \frac{1}{2^l\cdot l!}\cdot \frac{d^l}{dx^l} \left[ (x^2-1)^l \right] \\
P_l^m(x) &:=& (1-x^2)^{\frac{m}{2}}\cdot \frac{d^m}{dx^m} \left[ P_l(x) \right] \\
&=& \frac{1}{2^l\cdot l!}\cdot (1-x^2)^\frac{m}{2}\cdot\frac{d^{l+m}}{dx^{l+m}} \left[ (x^2-1)^l \right]
\end{eqnarray*}\]
For example:
\[\begin{eqnarray*}
P_0^0(x) &=& (1-x^2)^\frac{0}{2}\cdot \frac{d^0}{dx^0} \left[ P_0(x) \right] \\
&=& 1\cdot P_0(x) = 1 \cdot 1 = 1
\end{eqnarray*}\]
\[\begin{align*}
P_2^1(x) &= (1-x^2)^\frac{1}{2}\cdot \frac{d^1}{dx^1} \left[ P_2(x) \right] \\
&= (1-x^2)^\frac{1}{2}\cdot \frac{d}{dx} \left[ \frac{1}{2^2\cdot 2!}\cdot \frac{d^2}{dx^2} [ (x^2-1)^2 ] \right] \\
&= (1-x^2)^\frac{1}{2}\cdot \frac{d}{dx} \left[ \frac{1}{8}\cdot \frac{d^2}{dx^2} [ x^4-2x^2+1 ] \right] \\
&= (1-x^2)^\frac{1}{2}\cdot \frac{d}{dx} \left[ \frac{1}{8}\cdot \frac{d}{dx} [ 4x^3-4x ] \right] \\
&= (1-x^2)^\frac{1}{2}\cdot \frac{d}{dx} \left[ \frac{1}{8}\cdot [ 12x^2-4 ] \right] \\
&= (1-x^2)^\frac{1}{2}\cdot \frac{d}{dx} \left[ \frac{3}{2} x^2 -\frac{1}{2} \right] \\
&= (1-x^2)^\frac{1}{2}\cdot \left[ \frac{3\cdot 2}{2} x \right] \\
&= (1-x^2)^\frac{1}{2}\cdot \frac{6}{2}x \\
&= 3x\cdot (1-x^2)^\frac{1}{2} \\
\end{align*}\]
Let $x≡sin(ϕ)$, then we obtain one of the spherical harmonics (see Table 3 for further examples):
\[\begin{align*}
P_2^1(sin(\theta)) &= 3\cdot sin(\phi)\cdot \sqrt{1-sin^2(\phi)} \\
&= 3\cdot sin(\phi)\cdot \sqrt{cos^2(\phi)} \\
&= 3\cdot sin(\phi)\cdot cos(\phi) \\
&= \frac{3\cdot sin(2\phi)}{2} \\
\end{align*}\]
The formulas for FOA are thus:
\[\begin{align*}
\text{ACN 1 / W:}\qquad &B_0^0(t) =S(t)\cdot Y_0^0(\theta, \phi)= S(t) \\
\text{ACN 2 / Y:}\qquad &B_1^{-1}(t) =S(t)\cdot Y_1^{-1}(\theta, \phi)= S(t)\cdot cos(\phi) \cdot sin(\theta) \\
\text{ACN 3 / Z:}\qquad &B_1^0(t) =S(t)\cdot Y_1^1(\theta, \phi)= S(t)\cdot sin(\phi) \\
\text{ACN 4 / X:}\qquad &B_1^1(t) =S(t)\cdot Y_1^1(\theta, \phi)= S(t) \cdot cos(\phi) \cdot cos(\theta) \\
\end{align*}\]
Design of an OM-SoX implementation
As part of this project, an implementation was created in OpenMusic (v7.3) with the OM-SoX library (v1.0.1) (tested under macOS 14.5).
The signal processing consists of the following steps:
-
- Given the mono signal of an audio file, azimuth $θ$, elevation $ϕ$ and the ambisonic order $L$.
- Calculate the attenuation factors $Y_l^m (θ,ϕ) $ for all orders $0≤l≤L$.
- Create a multi-channel audio file with $n_{channels} := (l+1)^2$ audio channels.
- Each channel $ch$, $0≤ch≤n_{channels}$, corresponds to the signal $S(t)$, multiplied by the corresponding attenuation factor and normalization factor.
In the case of several audio files, these are simply added channel by channel.
The following main functions (in the form of patches) have been defined in OpenMusic for this purpose:
-
- ambisonics-gains
Given a (maximum) order, the attenuation factors are calculated. - ambisonics-encoder_simple
Given an audio file, azimuth, elevation and order, a multichannel audio file is generated and returned. - ambisonics-encoder
Given a list of audio files (even if only one audio file) and either a) a list of azimuth-elevation tuples, or b) a list of xyz tuples, or c) a 3dc object (with a corresponding number of coordinates), a multichannel audio file is generated and returned.
- ambisonics-gains
The following auxiliary functions were defined:
-
- deg-to-rad
Given a number in degrees, the radian measure is returned. This function uses double-precision floats, in particular in order to be sufficiently precise for the calculations with the trigonometric functions in hoaenc.ambisonics-gains. Note: Double-precision floats cannot be used for signal processing with OM-SoX; the calculated gain values are converted back to single-precision floats. - 3dc-to-spherical
Given a 3dc object, azimuth elevation tuples are calculated and returned. The coordinate system is: 0° azimuth = positive x-axis, 90° azimuth = positive y-axis, 90° elevation = positive z-axis, 0° elevation = xy-plane. - 3dc-translate
Given a 3dc object and an xyz tuple, the origin of the coordinate system is shifted by the given value. A coordinate (1 1 0) becomes (0 0 0) when shifted by (1 1 0). - 3dc-rotate
Given a 3dc object and a ypr tuple (yaw, pitch, roll), the coordinate system is rotated by yaw along the z-axis, by pitch along the y-axis and by roll along the x-axis. If you imagine an airplane model with the tip along the positive x-axis and the left wing along the positive y-axis, then positive-yaw corresponds to a rotation of the tip along the horizon to the right, positive-pitch to a lifting of the tip and positive-roll to a lifting of the left wing.
- deg-to-rad
Example
In the following example, two audio files (amen-break.wav and noise-white.aif) are transformed into a 3rd order HOA signal. The audio files are assigned two different positions based on a list of XYZ coordinates. These coordinates are saved in a 3DC object and rotated by -30° along the roll axis, i.e. looking along the positive x-axis (“front”), the left side is moved downwards. The sound sources are then shifted by -1.5 units along the y-axis, i.e. by 1.5 to the right (while maintaining the “viewing direction”).
The following can be heard: Amen break signal on the left, noise signal on the front left.
Audio 1: amen-break.wav
Summary
As part of this project, research was carried out into how the Ambisonics formulas can be deduced. With this knowledge, the required normalization factors for 3D ambisonics up to third order were derived (following the ambiX convention (Nachbar et al., 2011), i.e. ACN sorting and SN3D normalization. Functions have been written in OpenMusic to encode (in the simple case) a mono sound source for 0th, 1st, 2nd and 3rd order ambisonics and (in the general case) any number of mono sound sources. In the general case it is possible to specify either xyz coordinates, azimuth-elevation coordinates or even a 3DC object. Functions have been written to transform the coordinate system of the 3DC object (specifically: to shift or rotate). Examples were created to show the functional use.
Alexander Nguyen, 2024.
References
Corcuera Marruffo, Andrea. “A Real-Time Encoding Tool for Higher Order Ambisonics,” December 5, 2014. http://repositori.upf.edu/handle/10230/22890.
Nachbar, Christian, Franz Zotter, Etienne Deleflie, and Alois Sontacchi. “AMBIX – A SUGGESTED AMBISONICS FORMAT” [with comments from 2016], Lexington, KY, 2011. https://ambisonics.iem.at/proceedings-of-the-ambisonics-symposium-2011/ambix-a-suggested-ambisonics-format.
Also: ambisonics.ch (via archive.org), Wolfram MathWorld, Wikipedia.
About the author