Immersive Audio: The Next Big Thing?
Content need not necessarily be distributed over the air with dedicated closed networks, since IP and mobile systems have become a significant distribution channel for media content. This has been also visible on the floors of the broadcast trade shows, where mobile and IT manufacturers have recently become actively present. Multi-platform production and distribution is no longer a future trend but is becoming an established part of the broadcaster’s media production and distribution models. Along with that, VR and immersive productions are gaining momentum and quickly becoming the focus of future development.
Though new technologies have been the driving force for such transformation, the emergence of new business models has become inevitable. While in certain areas like immersive media, production continues to require capital expenditure (Capex), in general the broadcast structure is gradually shifting from Capex to an operating expense (Opex)-based, software-driven business model in order to respond to this transformation while seizing new opportunities.
Since its birth in 1978, Genelec has been at the very heart of the broadcast industry and has been cooperating closely with broadcasters. Many of our landmark products have been the result of such inspiring cooperation, and the recent introduction of the award-winning Genelec 8430A IP loudspeaker is another excellent example of Genelec working closely with our broadcast customers. While the benefit of IP will take some time to be fully realized, recent history has shown that the adoption time of new technologies is becoming shorter than ever. The fact that the whole broadcast industry is now working together on interoperability and open standards means the transition to IP-based infrastructures is now really beginning to gain momentum.
Genelec has also been at the cutting edge of immersive audio, and visitors to recent IBC shows have experienced our monitors playing back immersive audio material in a number of different formats. Later in this series of blogs we’ll go on to explain why the full bandwidth, compact size and mounting options of our monitors make them ideally suited to immersive applications - from broadcast to music to spatial audio research - with the point source characteristics and excellent off-axis response of The Ones three-way coaxial monitors representing a particularly powerful immersive solution.
We believe that immersive audio will bring an unprecedented listening experience to end users and will become the ‘next big thing’ in audio in the near future, and we also realize that more than ever, these complex new immersive formats demand that engineers make informed and accurate mix decisions.
In recognition of this, during this month’s IBC show we’ll be inviting top European broadcasters to an evening of immersive audio, showcasing The Ones in 7.1.4 Atmos mode, along with the latest version of our GLM loudspeaker manager application. We’ll demonstrate how GLM can be used to set up, fully calibrate and manage an immersive system, and how The Ones’ uncoloured imaging, large sweet spot and minimized listener fatigue will help engineers produce mixes they can rely on – whether they’re working in stereo, surround or full-blown immersive formats.
Joining us as guest speaker is Adam Daniel of London’s Point1Post. Having started his 18-year career at The Pinewood-Shepperton Group, Adam is now one of the UK’s leading authorities on mixing in Dolby Atmos, and will be discussing his approach to working with immersive – offering expert advice, and using The Ones to illustrate real-world examples of his acclaimed work. We’ll be documenting this event on film, and will be sharing it with you soon.
We believe that although the public now has a wealth of options, the broadcasting industry has a unique opportunity to secure its future by benefiting from these new technologies, renewing its business model and offering unique high quality audio experiences and services to its users.
Tomorrow’s Solutions, Today
Benefits Over Legacy Formats
Immersive audio formats not only surround the listener, they also encircle them in the height dimension too. One way to understand the capability of an immersive audio system is to describe how many height layers an immersive playback system offers. The two-channel stereo and conventional surround formats offer only one height layer; this layer is located at the height of the listener’s ears, with all loudspeakers located at equal distance (in terms of acoustic delay) from the listener and playing back at the same level.
The channel layouts for immersive formats serve several purposes. One target is to create envelopment and a realistic sense of being inside an audio field. One height layer alone cannot create this sensation sufficiently realistically, because a significant portion of the listening experience is created by the sound arriving at the listener from above. The extra height layers of a true immersive system enable this, and therefore add a significant dimension to the experience.
The second aim for immersive systems used with video is to be able to localize the apparent source of audio at any location across the picture. This is the reason why the 22.2 format has three height layers, including the layer below the listener's ears: the UHDTV picture can be very large, extending from the floor to the ceiling, and the audio system has to support localizing audio across the whole area of the picture.
Current Industry Developments
Immersive sound monitoring is gaining momentum at increasing speed, and several systems are competing for dominance in the world of 3D immersive audio recordings. The front-runners are now the cinema audio formats, who are trying to increase their presence in the audio-only area and enter the television broadcast market too.
Whereas the cinema industry is always searching for the next ‘wow-effect’ to lure the audience from the comfort of their homes into theatres, the growth of immersive audio has been slightly slower in the world of television. But the pace is now picking up, with several companies studying 3D immersive sound as a companion to ultra-high definition television formats and the International Telecommunication Union (ITU) issuing recommendations about the sound formats to accompany UHDTV pictures. Japan’s own national broadcaster NHK is already starting to deliver 8K programming, with 22.2 audio, in preparation for the Tokyo 2020 Summer Olympic Games.
Modern immersive formats offer two or three height layers: current cinema formats offer two height layers, while the emerging broadcasting formats have three or more.
One of the height layers is always at the height of the listener’s ears - this typically creates a layout with backward compatibility to surround formats and even down to standard stereo. Typically, other layers are above the listener. For certain formats, layers can also be located below the listener in the front only, to enhance the sense of envelopment.
Certain encoding methods for broadcast applications can compress 3D immersive audio into a very compact data package for storage or transmission to the customer. These formats offer a very interesting advantage over the many immersive audio formats since the channel count and the presentation channel orientations can be selected according to the playback venue or room. Essentially any number of height layers and density of loudspeaker locations can be used - and furthermore, this density does not need to be constant.
Creating the loudspeaker feeds for loudspeakers dynamically from the transport format is called rendering. The compact audio transport package is decoded and the feeds to all the loudspeakers are calculated in real time while the immersive audio is played back in the user’s location. The compact delivery format and the freedom to adjust and optimize the number and location of the playback loudspeakers makes these flexible formats very exciting.
The popular immersive audio playback systems typically share two assumptions about the loudspeaker layout and one assumption about the loudspeaker characteristics. Concerning layout, it is assumed that the same level of sound will be delivered to the listening location from all loudspeakers, and the time taken for the audio to travel from each loudspeaker to the listener will also be the same. This implies equal loudspeaker distance, for the case where all loudspeakers are similar in terms of internal audio delay, or electronic adjustments of the level and delay to align the system.
Concerning loudspeaker characteristics, a fundamental assumption is the similarity of the loudspeaker frequency response for all the loudspeakers in the playback system. Sometimes this is taken to mean that all the loudspeakers in the playback system should be of the same make and model. In reality, loudspeaker sound is affected by the room in many ways. This can significantly change the character of the audio signal so that even when the same make and model of the loudspeaker is used throughout the system, the individual locations of the loudspeakers will change the audio in a way that renders the individual loudspeaker performance slightly different.
Genelec has the widest selection of Smart Active Monitors (SAM), and working in conjunction with the Genelec Loudspeaker Manager (GLM) software, users can configure highly accurate monitoring systems for immersive audio. In fact, since GLM today supports up to 45 loudspeakers and subwoofers in one room, Genelec’s solution for immersive audio monitor control covers all existing audio playback formats in existence today. Quite simply, SAM and GLM are future-proof tools for top-level audio professionals.
In order to fulfill the previously mentioned assumptions about how the playback system works, calibration and alignment of the monitoring system in the room is necessary. An increasing number of monitoring controllers with calibration are appearing on the market, but GLM is by far the most complete and cost-efficient solution for precise calibration of immersive monitoring systems.
GLM takes care of the essentials of calibrating an immersive audio playback system, with features to make the monitoring systematic and controlled, including the alignment of levels and time of flight at the listening location, subwoofer integration and compensation for the acoustical effects of loudspeaker placement – ensuring that all the loudspeakers in the system deliver a consistent and neutral sound character.
All of these can improve both the quality of the production and the speed of the working process, and additionally one of the key concepts for immersive monitoring is achieving a standard sound level at the listening location, with new recommendations about maintaining loudness in broadcast signals including a definition of the SPL at the listening location for monitoring these loudness-controlled signals.
As well as the wealth of configuration and calibration features already mentioned, our latest software version - GLM 3 - supports both preset levels and loudness-oriented level calibration in order to do just this.
So it’s clear that Genelec has tomorrow’s solutions today!
Immersive monitoring: A perceptive perspective
Hearing the world around us is so natural that we often only notice its importance once we lose the ability. Most of the time, a loss is fortunately temporary, for instance caused by a cold, but a one-side hearing loss is more stressful and depressing than we generally tend to believe.
Localisation makes use of the most energy-consuming and fast-firing synapses of the brain, so the capability has been important for survival. Hearing, balance/acceleration and proprioception are our main look-ahead senses, without which the 0.4 s latency of our mind could get us hurt many times per day, for instance if we had to rely just on vision.
Hard-wired reflexes from the fast senses therefore play a crucial role, also when sound is accompanied by picture, conveying dimensionality, suspense and surprise. One of the first things a baby does is to localise, quickly and automatically turning eyes towards a sound. Until adolescence, we further learn and refine localisation using a system under construction. Ear canals and other structures of the outer ear ("pinnae") grow and reshape, constantly modifying spherical hearing, as we reach out and experience a fascinating world in return.
Pinnae continue to be entirely personal. To some extent, they are actually also under development throughout life, though the rate of change slows in adults. Sound is colored by the pinnae, depending on its direction of arrival (azimuth), which is a highly important feature. Expert listeners constantly use it in combination with head movements; not only when evaluating immersive content but also to distinguish direct sound from room reflections.
Personal head related transfer functions (HRTFs) drive localisation, considering frequencies above 700 Hz. That is the frequency range where interaural level difference (ILD) is of primary concern. From 50 Hz to 700 Hz, however, fast-firing synapses in the brainstem are responsible for localisation, employed in a phase-locking structure to determine interaural time difference (ITD). Humans can localise at even lower frequencies, but we will come back to that in a specific ultra low frequency blog.
The ability to position sound sources with precision spherically is a key benefit of immersive systems. Another is the possibility to influence the sense of space in human listeners. For the latter, the lowest two octaves of the ITD range (i.e. 50-200 Hz) play an essential role; but may be compromised in multiple ways: Microphones with not enough physical spacing during pick-up, synthesized reverb without the right kind of decorrelation, lossy codecs that collapse channel-differences, loudspeakers with limited LF capability, bass-management etc.
So where does this all lead, considering immersive reference monitoring? A well-aligned loudspeaker system in a fine room has the best chance of translating well to a variety of immersive playback situations. The sound engineer can make full use of outer ear features and head movements, with listener fatigue and "cyber sickness" minimised.
Headphone-based immersive monitoring needs to incorporate precise, personal HRTFs and head tracking around a n-channel virtual reference room. Even so, any static or temporal imperfection can lead to listener fatigue, and head movements in production are unlikely to produce anywhere near the same results as during reproduction across platforms.