The above image is created by Jamie Tate from Rukkus Room and I stumbled upon it randomly on the web. It inspired me to write this blog post, as it resonated deeply with me.
With the advance of computers and the binary world, one thing that took a significant step back
was audio quality. While the availability and music choice has increased dramatically,
the quality has taken a step back primarily because of limitations of storage capacity and internet speed over the years. In here I would usually also blame the cheap headphones produced that are the final nail in the loss of music quality. The main focus of this post is going through the basics of digital audio in particular and understanding of basic terminology and how it differs from analog, as a bonus I’ve attached at the end an amazing article on how the popular music recognition app Shazam works, for the super curious.
Even today, vinyl still offers the best quality music and still shows popularity amongst people young and old, however not all artists release vinyl, and vinyl players are far outnumbered nowadays by digital AV receivers in the living rooms.
With the era of music streaming – Spotify, Pandora, Apple Music & others and low prices incetivizing every music pirate to switch to streaming and the abundance of content available, supporting the mantra of 21st century – everything, everywhere. They took over the world by storm. As of latest stats (31 Jul 2016) Spotify has 60 million paid users and 140 million in total and Apple Music has 27 million in total.
So what is the state of current music streaming scene? I’ve aggregated my findings in the table below.
|Highest Bitrate (kbps)||320||256||192||320*||256**||1411***|
|Offline playback||Mobile & Desktop||Mobile||Mobile||Mobile||Mobile & Desktop||Mobile|
* Not properly disclosed as it says “up to 320 kbps”
** Variable (256 kbps on average)
*** Varies on the music genre
To understand a bit better what these mean, let’s delve into the details
Codecs (Lossy vs Lossless)
Lossy codecs use inexact approximations and partial data discarding to reduce file sizes. In simple terms, reduces the quality to reduce file size and make it still sound reasonably well. The prime example is the MP3 format. The idea is the same as in image compression, RAW file (Lossless) from a DSLR camera to a JPG compressed format, on a small screen you wouldn’t be able to tell the difference, it’s until you start zooming in on the photo that you start to see the difference in quality. The same with music, until you turn on the volume quite high that the weakness of the lossy format shows. It also depends on what is the bitrate of the MP3 file, as described in the table above comparing the music services, higher bitrate means bigger file size. The achieved end size is a small fraction of the original, usually varying between 3MB-12MB per song. The format was created by a consortium of companies some ago, here’s the wikipedia page.
Lossless on the hand is keeping the sound quality as it is. Meaning the original can be perfectly reconstructed from the compressed file. Unlike lossy, once you go MP3, you cannot revert back to a CD-quality. With lossless formats such as WAC and FLAC, you absolutely can. Of course the trade off is the file size, where a single song in a FLAC format ranges in 25MB to 150MB, again this is very subjective as it varies on the type of sounds in the song (high, low, length, etc.)
Seeing the file size difference between the two types, you understand why MP3 took over the world for personal audio. Nowadays storage is cheap, internet speeds are good and streaming high quality music is not as difficult as before, thus the interest in lossless increasing. There are other lossy formats, such as Ogg Vorbis, WMA, which I consider as the 2nd and 3rd most popular formats. For lossless – FLAC, WAV, ALAC. I don’t have statistic on actual usage. But these are let’s call them “mainstram” formats.
A new trend setter are the new HTML5 Audio & Video web standard, which natively supports WAV, MP3, MP4, AAC, Ogg Vorbis, WebM and FLAC in the <audio></audio> element. Pushing the adoption of these few formats and making good quality audio available to everyone.
So how do we measure quantitavely the difference between Lossy and Lossless? The simplest answer is Bitrate. It is the amount of bits per second of music for the file. That’s what decides the file size, thus why more data per second means more quality. The highest MP3 one is 320 kbps, while CD-quality goes up to 1411 kbps, which is almost 4.5x more data per second.
There is a caveat here, some MP3’s have VBR (Variable Bit Rate) and FLAC’s also don’t really have a fixed bitrate, as they record 1:1, if a there are 5 seconds of dead silence, the data needed there will be much less than 1411 compared to Adele’s voice singing for 3 minutes non-stop. It is a very imperfect metric that can be gamed, but this is the most popular one for comparison we have at the moment. As more data doesn’t nessarily always mean better qualiry, imagine static noise in the background of a song, that’s a lot of data, yet not really better quality. Closest example I can give you is the following – When you do a video recording at a concert and then play the recording at home, the audience noise doesn’t mean you’ll hear the song as you did during the concert, yet means more data to store.
Digital to Analog Converter (DAC)
Do I really need super expensive sound system or headphones, will I hear the difference ? This is a common question I get on a daily basis.
The answer is simple – depends on your source. As every digital music format in the end is just a bunch of binary 0s and 1s. You need to convert it to electric current for your speakers. That job is dedicated to the Digital to Analog Converted (DAC). Yes, your mobile phone has one of those, the better the quality of the DAC, the better quality you will get out of your headphones. Even if you have FLAC music on your phone, but the DAC packed inside is the cheapest available, doesn’t matter the format, you will hear aweful sound. This brings me back to the image at the top, it’s a chain of elements, and the sound you’ll get is equal to its weakest link. That’s why LG started doing module for their phones with Bang & Olufsen, LG V6 has a Quad DAC for example, only available in South Korea, etc. AV Receivers obviously have better quality DACs, due to size. So for headphones, if you have a cheap phone, don’t bother spending much on headphones. If you have a decent mobile phone and good headphones, highly unlikely you will hear a difference between a FLAC file and a Spotify 320kbps MP3, due to the volume. The good thing out of this is, that headphones manufacturers optimize their products for mobile use nowadays. If you change your source to a proper receiver, the story changes.
Wired vs Wireless
For phones that don’t have a DAC built in, ie iPhone 7 without a headphone jack. The DAC is not on the headphone’s side. For example, if you play an MP3, the phone’s processor converts it for transmission via Bluetooth, the headphones receive the digital signal and convert it to analog and you get the sound in your ears. Due to size limitations the converter is smaller and possibly of lower qualitt, which is why you might have noticed the difference in quality between wired and wireless headphones. For wired headphones, the DAC is on the phone side, and most likely bigger than the in-built one in the headphones, thus why you get better quality. I’m no expert in the wireless area, but as an owner of AirPods, I’m impressed by the amount of technology and quality squeezed into the tiny space and the battery life. Given that more than half of the sound production chain is in the earbuds themselves, wireless has to also worry about tons of different issues like latency between the two buds, etc, it is impressive by every count. That technology will develop further, I’m watching closely on what’s to come. Bluethooth 5 supports dramatic data transmission throughput and might be the technology that turns some of the quality issues around, but as time of writing, no headphones with BT5 support exist.
Firstly, to hear that sound you need a home theatre that has the 5 speakers (Front Left & Right, Center, Back Left & Right) and a subwoofer (0.1). The famous 5.1 and 7.1/7.2 can only be achieved if the video format supported the type and the recording was done that way, ie, you cannot get 5.1 from a every video file. You can technically try to simulate it from a stereo file, by breaking the different frequency levels between different speakers, but a true surround sound cannot be achieved for a video unless the file format and your source support it and pass it to an AV Receiver that knows what to do with it. Most of the 5.1 Codecs are made by Dolby Digital and DTS and are proprietary. Which is why you most likely have seen their logos in most AV shops. Nowadays, almost all video-streaming services support it. Netflix being a prime-example of a good citizen. The way it works is nothing surprising, multiple audio tracks passed to different speakers. End of story. The encoding and decoding as mentioned above are proprietary, so I haven’t found any meaningful details. To record it you need multiple microphones, which is why your phone can record 4K videos, but not a 5.1 sound yet (Who knows, might be the new feature behind the corner).
This concludes my brief digital music primer, hope you enjoyed it.
As promised, for the more interested readers, the article on how Shazam works: http://coding-geek.com/how-shazam-works/