Good audio is dependent more than ever on software and its integration with hardware. Due to the heavy dependencies of one and the other, audio has been a great way for me to deeply understand its nuances and understand its importance in the realm of IoT (Internet of Things) in addition to data engineering across the audio spectrum.
I blew up my dadās large floor standing speakers when I was 8 years old playing with what I can only imagine being the Pokemon soundtrack. As it was his first āmarried personā purchase you can imagine his reaction. However, since this point in time Iāve been infatuated with how audio works and what I can do to fix it and optimize it.
Paired with a recent understanding of my ADHD (auditory sensitivity, specifically) Iāve now come to realize that my ears are innately capable of picking out subtle audio discrepancies that others canāt hear for themselves. Itās also why when itās time for me to drown out the world, audio is my escape.
My first main point is that audio is deeply personal. What sounds good to one person may be bad for another making most comparisons pointless. If youāre happy with your Apple Airpods, great. However, in the domain of audio products much like anything else thereās considerable brand recognition behind it and if you dive under the hood you can maximize your investment by learning what makes great audio and what doesnāt.
Hardware: What Makes Sound (Super Simplified)
In the most basic sense a speaker needs an input/source, amplifier and speaker/magnet. The source provides your music and then passes its low volume to an amplifier which controls how loud it gets by adjusting its output to the magnet or speaker.
Turntables provided consumers 2 āchannelsā of audio. That is vinyls had the ability to have both a left speaker and right speaker. When recording and editing vinyl, music engineers had ā2 productsā to deliver in the form of ridges inside a vinyl that told the player what sounds to play from each speaker. The more expensive the production, the better music would sound to take advantage of these two variables such as the Beatles where multiple layers would combine onto the 2 channels. This source is primarily responsible for SQ or sound quality.
The amplifier then is responsible for the SPL or sound pressure level aka volume we hear. A good amplifier will allow you to turn music to a volume you desire while staying true to the source.
This is why analog vinyl players hold such a place in the heart of many audiophiles both young and old. Itās audio in its purest form and likewise allows an easy setup while still allowing audiophiles to use higher and higher end equipment to reach their sound goals.
The Subwoofer is Born; Let There Be BASS
Leave it to the hippies (or was it just audio junkies?) to express the desire to āfeelā music. This movementās embrace shepherded the creation of new audio sources allowing for additional bass effects to come into play. Because vinylās ridges canāt carry the needed ādata bandwidth,ā new technologies like the cassette gave sound engineers more variables to deliver sound.
Using these new technologies required a new component to emerge in the form of a processor. This processorās main responsibility was to direct sound towards where it best needs to go. In this case it would direct bass signals to a dedicated subwoofer that in turn could be further amplified. It also allowed other elements to be altered such as Treble. You may know this as EQ adjustments.
If you had a higher quality source the better the processor would be able to direct and manipulate sound. Vinyl, lacking the data for bass would oftentimes be processed and without the data would potentially sound worse. Cassettes would give processors more data to play albeit analog allowing for more sophisticated sound without degradation.
All the above, though, was again processed using analog hardware which was both expensive and difficult to understand and manipulate unless a pro.
Enter Digital & MP3
CDs and MP3s introduced digital audio to consumers for the first time. It allowed the processor to now become digital. In effect this allowed sound engineers to mix as they intended while allowing aftermarket processors to have direct access to data variables to further improve sound.
This is why true digital sources can be manipulated much more than analog sources. More data = more to adjust and fine tune without distortion.
In the above diagram users could now set a 80hz āfilterā and direct all audio contained within a digital source to where its desired whereas analog equipment had to run hardware to obtain the same effect. The cost of innovation or improvement was drastically reduced and hardware became more obtainable at lower price points.
This is where Bose emerged as a major contender allowing for better sound in more compact equipment. They invested in their own processing on top of digital sources to maximize the sound output of a speaker based on the dataās attributes. Their price premium wasnāt for better hardware; it was for the cost of the digital sound engineering that they used as a differentiator. This is why most enthusiasts refer to Bose as Buy-Other-Sound-Equipment. If savvy enough you could purchase better hardware for the same price as Bose and be better off. Still, a great product lesson continuing to show UX is king.
Audio Innovation Pre-Sonos; More Channels or Manipulate Whatās There
With new formats such as DVDs and larger MP3 players came a fork in the road for sound innovation. You could either develop audio processing for sources already in existence (2 Channels) or you could add channels and, likewise, more variables for sound engineers to tweak.
Iāll save it for a dedicated home theater post but Dolby Digital and DTS emerged as standards for 5.1 channel audio allowing for movie sound to envelope consumers inside their own homes. However, due to headphones and mobile storage limitations 2 channel audio remained the standard for music production. More channels = more engineering and thus more expensive to create music so why bother pivoting away from the norm?
Looking towards product innovation, develop where most users already are; donāt create a competing platform.
This is where the majority of sound processing and innovation has taken place. Processing a 2 Channel audio files and turning into something more.
Lossless Audio & New Codecs
The average MP3 file is roughly 5MB for a typical 3 minute song. However that size is only a fraction of the actual mastered track which may be as high as 200MB or more. Due to early digital sources not having that high of storage, songs were compressed using MP3 or other codecs to reach smaller files sizes.
These file sizes would reflect a songās bit rate or the amount of data it sends to a source per second. The higher this is the more the source can process for output to speakers. Lossless music retains all elements of the song while MP3 or lossy codecs only keep the most audible frequencies we can hear and leave out the rest.
To many, the difference is hard to decipher and a well compressed MP3 can sound every bit as good as a FLAC file as most of the ādataā elements are things such as a drumstick hitting the floor; elements outside our auditory range and likewise, worth not including.
Although we no longer have to worry about the 500 song limit on our iPod Classic, these limitations are still in effect across streaming services today. Tidal for example advertises 1411kbps (Lossless Quality) whereas Spotifyās highest streaming option is 320kbps and defaults to even lower unless changed. Amazon and Tidal even offer Master quality where streaming could be as high as 4000kbps (1080P Movie!!).
These services, though, cost more than Spotify and reflect the increased cost of streaming such high quality files. Unless you have high end equipment the difference will never be noticeable and exactly why Tidal and others advertising this remain after a very niche market and Spotify continues to dominate the industry.
It is worth noting that Sony has recently been pushing a 360 degree spatial audio format. To date only select Sony recording artists have used the technology and although it does sound better, itās yet another format engineers need to develop for. Recycling the aforementioned product scenario; build where people are vs creating something new.
Aaron Mahnkeās recent 13 Days of Halloween Podcast highlights the better approach; add spatial audio to the pre-existing formats. I believe this is another doomed Sony innovation.
The Innovation Path of the Future; Speaker Optimization
Have you noticed how Sonos has now become more premium than Bose? How about why Sonos is suing Google?
Sonos is using whatās been around for years in the high-end audio space; time correction and bass correction using microphones. Knowing that sound processing has gotten as good as it can; letās optimize the speakers directly for their environment.
What they did was simplify what took multiple steps in a home theater or lab and reduced it to a few seconds when users would first turn on a device. They integrated a microprocessor with a microphone so the device could tune itself to wherever you placed the unit maximizing not the audio reproduction (Like Boseās Tech) but instead adjust its speaker output based on environment.
Looking at an audio graph you may see frequencies completely disappear but others amplified so that the perception of sound is improved (only thing that matters when selling); not the actual source. Genius!
Itās a magnificent example of how a simplified UX and more integrated engineering results in better products and a reason they became the dominant speaker of choice for many over the past decade.
This has only become viable for speakers outside of $1k+ home audio receivers as this type of processing needs enough power to allow the UX to come forth. As the above highlights in red you can see how an additional non-audio processor needs to use a microphone to figure out improvements before setting them on the audio processor which then yields the benefit.
As the lawsuit notes; both Amazon and Google use the same technology within their IoT-based smart speakers. Whether they stole the technology or simply cloned it is another case entirely but itās the type of technology that once done for one device, can be applied to many if using the same chipsets.
This is also the reason many third party speakers boasting Google and Alexa smarts may not simply sound as good as others. Each brand is likewise responsible for its own tuning and without this feature, better hardware (higher manufacturing cost) is going to sound worse than lower end hardware that is software optimized.
The IP for this to work would be the embedded code processing the mic information and relaying that back to a compatible audio processor. Without this IP any speaker in the marketplace is doomed for failure as it simply canāt compete with the tuning and (very noticeable) audio quality boost.
If Sonos, Google and Amazon can make their tiny speakers sound great why canāt I?
Well, it comes down to cost and that aforementioned IP. This type of technology needs tightly integrated software and firmware. These companies have standardized on chipsets that will be used in products for many years to come whereas other players in the space donāt have the alternative revenue streams to support the easy UX.
Being the hacker I am, I attempted to figure out years ago what was making phones like the OnePlus 6 series so great in this department and how I could get it on my Google Pixel.
While browsing the forums of XDA Developers I stumbled upon a company responsible for the improvements across the Android landscape, Dirac Audio. Much to my disappointment, though, unlike other Android audio enhancement modifications this technology wasnāt in the form of an APK (Android App) or even a system-level service. It was firmly integrated into the chipsetās sound processor. Good for Dirac as this means the tech canāt be pirated. Itās truly embedded to the highest degree and shows the sophistication in their process.
While at CES I stopped by their booth for a demo and was blown away. They had me guess the woofer size of a bookshelf speaker. I guessed at least 5.25ā with the bass it gave out. Instead, it was a single 2ā tweeter. I was in awe. Then they played for me a tiny Bose Bluetooth speaker with the processingā¦instant goosebumps.
I knew I wanted this tech and immediately high tailed it to CESās dedicated home theater section. However, my excitement was quickly diminished. Receivers start at $3k. Well outside of my budget and knowing that used components would be a rarity (and still pricy in this market segment). Thereās just not enough demand to drive down prices.
Going back to my Yamaha (Still high end, I might add) processing and Schiit desktop processorā¦.I simply couldnāt help but wonder just how much I was missing out on, though.
Dreams Do Come True
Some 2.5 years later and I finally found the Dirac processor in the form of the $449 MiniDSP DDRC-24. I know, very engineeringly named. However, this is the most cost effective way to achieve Dirac processing and after selling my previous DAC, much more cost effective.
Iām able to plug it directly into my computer via USB and have a true DAC. Without Dirac turned on itās a good processor. Maybe not as good as my Schiit but thatās not why I bought it. No, what I paid for is the tight firmware/hardware to Dirac software pairing.
Following along the instructions I took 9 separate measurements around my ātightly focusedā area using a tripod for accuracy.
After the results were measured the real magic happens and Diracās AI-driven processing kicks in and works out all the kinks in my listening setup. I have them paired with Polkās LSiM 703 bookshelf speakers powered by a self-refurbished Parasound 2250 THX (250w RMS) amplifier I picked up on eBay for 1/5 the brand new price back in 2008. My dream high school desktop scenario thanks to many hours navigating the Polk Audio forums for optimal pairing.
This processing made an already amazing setup go over the edge. I had goosebumps when listening.
A great amplifier and speaker pairing will give you a āwide soundstageā which was the case prior to tuning but afterā¦.well the soundstage sounds as if it goes away entirely putting you right in the action. Even the most high-end open ear headphones Iāve tested havenāt come close to this.
The best way I can describe it is imagine being front stage at your favorite concertā¦for your entire Spotify playlist. Heaven.
Conclusion ā Smart Speakers āSmartsā Need to Trickle Down; Legacy Audio Companies Need to Upskill
I now wonder why every speaker doesnāt sound this good when itās āsimplyā software.
The reality is that itās too complex and too expensive to tightly pair all the needed components. Dirac Research has been around for years fine tuning their AI processing. In order to stay afloat they have to make money too. This is a tough market being so small.
They were back at CES this year and I demoād their tech improving the most popular headphones on the market. It made Sonyās XM3ās sound even better which I didnāt think could be done. Sadly, itās been radio silence since and I hope they can find an alternative revenue stream soon.
The reality is brands such as Sony or automakers such as GM would rather self-make or use cheaper, less expensive tuning to maximize profit as most people arenāt needing this type of fidelity as impressive as it may be. Coupled with an ongoing licensing deal itās a tough sell.
However, as ambient computing and consumers come to expect this new level of audio performance legacy audio brands such as Polk Audio, Yamaha, Klipsch and countless others will need to unite on standards that allow this deep level of customization. If they donāt I fear weāll be living in a world dominated by the few key players whom hold the expertise (software).
My hope is that weāll continue to see great hardware made better by software-enabled innovation and not simply cheaper hardware using it as a way to improve cost of production margins. Only time will tell but you can already hear it in the new voice assistants.
In the meantime Iāll keep enjoying the amazing sound and helping those who wish to achieve it obtain it within their budget. I promise you wonāt regret it.
ā