How broadcasters can overcome the challenges of audio over IP

posted:

27/01/2020

by

Olivier Suard, Vice President of Marketing, Nevion

During in the move to IP in broadcast production a great deal of emphasis is on video transport as it requires large amounts of bandwidth which can be an issue for networks. Yet, audio presents its own challenges, not only in terms of the substantially greater number of flows involved in comparison to video flows but also because of the diversity of standards in use in production.

Professional audio over Ethernet has existed since before the turn of the century, with broadcast radio being an early adopter of standardized network technology. Over the years, several competing proprietary approaches and standards for audio over IP have emerged, including DANTE, REVENA MADI (AES10) and AES67. However, the compatibility between the various approaches and even between implementations of specific formats has been a long-standing issue in audio transport and processing. As an increasing number of broadcasters move to IP in the facilities, this issue of ensuring audio compatibility has become critical.

With audio often being one of the most complex aspects of moving to IP, it’s vital that broadcasters consider these key aspects: the streaming plane, timing, the control plane, and the issue of protection.

The streaming plane

The streaming plane refers to the basic transport of the audio over the network. In that context, AES67 has become key. First issued in 2013, the AES67 standard has been adopted and integrated by most manufacturers, including providers of products based on proprietary approaches. Crucially, AES67 is also the basis of the recent SMPTE ST 2110-30 standard, which means that compatibility on the streaming plane between most of the popular solutions is largely assured.

However, what is often overlooked is that, within the SMPTE ST 2110-30 standard, three levels of conformance are defined – not all of which are currently supported by vendors. The mandatory Level A provides support for 48 kHz streams with one to eight audio channels, at packet times of 1 ms. Level B adds support for packet times of 125 µs. Level C increases the max number of audio channels allowed per stream to 64. The latter means that MADI, which continues to enjoy a lot of popularity, may be carried as-is over the audio network.

Many audio-over-IP systems are currently only able to handle the basic level A. They may also have limitations when it comes to the total number of audio network streams supported, and what combinations of channel count and stream count can be used. These limitations should be taken into careful consideration when selecting audio equipment as they could place a restriction on the flexibility of the overall workflow.

Timing

As part of implementing AES67 compatibility, Precision Time Protocol (PTP) version 2, or IEEE 1588-2008, can now be used for timing of the network by the different manufacturers. This also fits with the SMPTE ST 2110-10 standard which mandates use of PTP v2. SMPTE has also published the ST-2059 standard, which generalizes the media clock concept of AES67 to any kind of periodic media clock, including video and timecode.

Control plane

As mentioned earlier, in a typical production environment, there are many more audio-sources than video, and an even greater number of destinations. A major sports production could have thousands of audio channels travelling across the network, for example. So, while audio may not necessarily place high demands on bandwidth in a network compared to video, it certainly creates a challenge in terms of control and orchestration.

Audio engineers expect to be able to “plug and play” equipment and connect sources and destinations without concerns about protocols and standards. On the other hand, in a broadcast facility, inter-studio routing must be centrally controlled for the integrity of signals, but also for security and access control.

The apparent strength of some of the proprietary approaches is that they include a comprehensive control plane, whereas standards like AES67 or indeed SMPTE ST-2110 do not define how the streams should be controlled.

Proprietary approaches

Although the proprietary control planes are effective on their own, they are not compatible with each other. More crucially, they are designed for a local studio environment (LAN), and therefore aren’t suited to a seamless distributed production environment, such as for big campus or inter-campus use, or for remote production (over WAN).

Furthermore, these control planes rely on audio being made seamlessly available to any equipment in the network by default, meaning no explicit routing of streams is required. This could be a security concern, especially in a distributed, multi-department or multiorganization environment.

In addition, the fundamental assumption behind this approach is that no controlled bandwidth management is needed as audio streams are comparatively small, but this is not valid when the size and complexity of the network increases.

MADI tielines

A pragmatic approach to overcome the issues with control plane interoperability, and address security and stability concerns, is to bridge different IP audio “islands” by using MADI baseband tielines. However, this adds complexity to the management of audio routing in the campus and reduces flexibility and agility. Essentially, this approach largely defeats the purpose and promise of using a converged media network in the first place.

NMOS

The Networked Media Open Specifications (NMOS), a family of specifications to support the development of products and services within an open industry framework, proposed by the Advanced Media Workflow Association (AMWA) offers a way to address endpoint control for audio in a way that may deliver the true promise of distributed IP production. The standard is now gaining traction in the industry – although its uptake among audio equipment manufacturers is lagging behind that of video equipment vendors.

SDN

In the meantime, and indeed beyond, the most promising approach to control is to use software defined networking (SDN) capabilities to control the audio flows across the IP network. Additionally, this can be combined with having standard control interfaces implemented in both endpoint equipment and correspondingly in the broadcast media network controller. This not only provides an easy way to connect diverse sources and destinations, but it also adds a layer of predictability, performance guarantees and security by managing bandwidth and only allowing authorized destinations access to specific audio network flows.

Protection

As production moves from the LAN environment and into the WAN, and IP audio networking is converged with video networking, audio signal protection is becoming an issue. The SMPTE ST 2022-7 dual path protection standard has now been extended beyond video, to cover any RTP media stream, and provides a great way to ensure audio signal reliability. There may still be compatibility and network addressing issues where different parties need to exchange audio signals, e.g. between different organizations, or simply between an OB van and the live audio system. Broadcasters can address these concerns through IP Media Edge devices and/or SDN controlling which flows can cross the boundary and how – a better approach than using MADI tielines to bridge the gap.

Looking ahead

With focus more commonly placed on how to handle high-bandwidth video, the complexity of handling audio in an IP-based facility, which can involve a number of streams and a multitude of formats, is often underestimated. However, the introduction of new standards and products, as well as the right expertise and experience, can help broadcasters to overcome these challenges and benefit from a fully functional IP-facility.