It's Alive! Ultrasonic Spectra Isn't So Ultra Anymore

Andrew Hon
ashon_at_uclink.berkeley.edu
www.ocf.berkeley.edu/~ashon

Psychology 112
Fall 2000
Professor Ervin Hafter

When the consumer audio Compact Disc (CD) was introduced in the late 1980s, the Redbook CD format specified 16 bit word lengths and 44 kHz sample rates. This specification was sufficient for 96 dB of dynamic range and a frequency response of up to 22 kHz (Nyquist principle). The limit of 22 kHz was based on consensus among audio engineers that the human ear can only hear up to at most 20 kHz. Limitations of the technology - of data density and in Digital-to-Analog conversion circuitry - undoubtedly also influenced this decision.

Early CD players sounded horrible. Part of the reason for their sub-audiophile performance was the engineering learning curve manifested with any new technology - Digital-to-Analog Converters (DACs) that could only extract 14 bits of the 16, and poor implementation of filtering followed the conversion stage. Now though, over a decade later, the audio industry can be considered to have mastered the technology, with a great number of very good-sounding yet inexpensive CD players available. Still, some audiophiles claim that the very best CD players on absolute terms still do not compare with the very best of LP (record) players. In addition, one point of agreement among audiophiles is that there has yet to be a CD implementation that gives a convincing sonic illusion, in particular of a full orchestra. What gives?

One possible reason for CD's lack of ultimate fidelity is that it simply cannot encode the frequency response present in real world acoustics. Work done by James Boyk of Caltech (incidentally my EE/Music professor from when I was a student there, and the primary motivating force for my being involved with audiophilia) has documented this over-20 kHz spectral content in various musical instruments. After accounting for confounding factors, Boyk concluded that there is indeed acoustic energy extending as high as 100 kHz and perhaps beyond, limited only by his analyzing equipment. The following charts show Boyk's findings:

Instruments With Harmonics

Instrument SPL (dB) Harmonics Visible To What Freq.? Percentage
of Power Above 20 kHz
1. Trumpet (Harmon mute) 96 >50 kHz 0.5%
2. Trumpet (Harmon mute) 76 >80 kHz 2%
3. Trumpet (straight mute) 83 >85 kHz 0.7%
4. French horn (bell up) 113 >90 kHz 0.03%
5. French horn (mute) 99 >65 kHz 0.05%
6. French horn 105 >55 kHz 0.1%
7. Violin (double-stop) 87 >50 kHz 0.04%
8. Violin (sul ponticello) 77 >35 kHz 0.02%
9. Oboe 84 >40 kHz 0.01%

Instruments Without Harmonics

Instrument SPL (dB) 10 dB Above
Bkgnd. to What Freq.?
Percentage of Power
Above 20 kHz
10. Speech Sibilant 72 >40 kHz 1.7%
11. Claves 104 >102 kHz 3.8%
12. Rimshot 73 >90 kHz 6%
13. Crash Cymbal 108 >102 kHz 40%
14. Triangle 96 >90 kHz 1%
15. Keys jangling 71 >60 kHz 68%
16. Piano 111 >70 kHz 0.02%

Certain instruments more than others exhibit ultrasonic energy, with percussive instruments and in particular the cymbal having 40% of its energy in >20 kHz frequencies. This finding in part accounts for the cymbal's frequent mention in audiophile literature as a benchmark for audio systems' high frequency performance - whether or not the system can capture the unique "hiss" of a real-life cymbal. Needless to say, not very many systems can come close to portraying a cymbal naturally; most consumer systems produce something closer to white noise.

Additional work supporting James Boyk's findings was done by John Atkinson, editor of Stereophile magazine, the preeminent audiophile journal. In his October 2000 editorial he describes spectral analyses of audio recordings, all of which demonstrate more or less activity above 20 kHz. An interesting finding he reports is that it is not just acoustic instruments that exhibit ultrasonic activity - the electric guitar in bluegrass music, where intentional feedback produces rampant clipping and the characteristic electric guitar sound, also results in spectral content extending above 20 kHz. Furthermore, Atkinson noticed that even old analog recordings from the '60s and earlier have captured this ultrasonic content.

All this ultrasonic energy is all well and good, one argues impatiently, but what about the long-accepted 20 kHz limit of the human ear? Sounds above 12 kHz, even, are relatively indistinguishable and are lumped together as simply "high frequencies". The first response is that we may have to rethink our dogma of the hard perception limit at 20 kHz.

Recent work by Tsutomu Oohashi et al., published in June of 2000 in the Journal of Neurophysiology, shows that the brain may in fact be registering over-20 (or 22) kHz spectral energy. Titled "Inaudible High-Frequency Sounds Affect Brain Activity: Hypersonic Effect", their paper discusses their finding that sounds containing High Frequency Components (HFCs) above the audible range significantly affect the brain activity of listeners. They used the gamelan music of Bali, which is extremely rich in HFCs with a nonstationary structure, as a natural sound source, and divided it into two components: an audible low-frequency component (LFC) below 22 kHz and an HFC above 22 kHz. Brain electrical activity and regional cerebral blood flow (rCBF) were measured as markers of neuronal activity while subjects were exposed to sounds with various combinations of LFCs and HFCs.

The experimenters found that while subjects could not recognize (i.e. perceive in the common sense of the word) HFC when presented alone, their brain activity altered significantly when they were presented with music containing HFC in addition to LFC as compared to LFC alone. Psychological evaluation indicated that the subjects felt the sound containing an HFC to be more pleasant than the same sound lacking an HFC. These results suggest the existence of a previously unrecognized response to complex sound containing particular types of high frequencies above the audible range. Oohashi et al. term this phenomenon the "hypersonic effect."

One conclusion this research suggests is that the method used to determine the limit of human hearing is imperfect. The standard "report" method of psychology has been criticized (e.g. by UCB Professor Richard Ivry) as not being an accurate measure of internal representation. Specifically, the access of internal state for verbal report may result in information being discarded as is commonly the case with any sort of attention-evaluation-selection-action cognitive pathway. What may have happened with the original research on the 20 kHz hearing limit, in keeping with Oohashi's recent findings, is that even though the ear/brain system registers high frequency content but only as a complement to low frequency (audible) content and not sufficiently enough to be consciously reported. The effects of HFCs are subtle but not inconsequential.

As an aside, another criticism of standard methodology may be warranted. Great debate has raged in the audio community over subtle effects in amplifier quality, cable differences, and even mechanical resonance effects, with boundaries being drawn between "subjectivists" and "objectivists". A staple of objectivist argument has been the double-blind test (DBT) or ABX test. Under DBT or ABX conditions many self-proclaimed golden-ear (i.e. sensitive to these subtle differences) audiophiles have failed to identify differences to any significant statistical degree. Nevertheless, over the past twenty to thirty years, the threshold of criteria for accepted high fidelity audio characteristics has steadily been decreasing. Nowadays not many respected audiophiles would claim that there are no differences between the above-mentioned amplifiers (tube versus solid state), interconnect or speaker cables, and to a lesser degree in electro-mechanical resonance interactions, mainly with properly mechanically damped electrical components. Jon Risch, a respected audiophile on the Internet with rigorous engineering principles, has suggested objective mechanisms for many of these subjectively-perceived differences. More importantly, he has thoroughly denounced standard DBT and ABX tests to be inaccurate measurements of perception. Most forms of these tests, being rigid and timed, put undue psychological stress on the subject thus resulting in a worsening of apparent perceptual abilities. It could be that the original tests that determined the supposed 20 kHz hearing limit were confounded by these effects.

A second explanation that may not necessarily have to refute the 20 kHz hearing limit entails engineering details slightly beyond the scope of this class. A well-respected high fidelity digital audio company, dCS, has published a white paper describing the engineering issues involved with reproducing high-sample rate material and standard sample rate material. Due to what is called the Gibbs phenomenon, typical sharp anti-aliasing filtering for standard 22 kHz sample rate material as is necessitated by the Nyquist theorem results in a ringing transient response. The energy contained in this transient ringing "smears" or "defocuses" the sound, impairing the ability to localise sounds.

Higher sample rates mitigate this problem. dCS produces an ultra high-end upsampler and DAC that converts standard 16 bit/44 kHz CD material to interpolated 24 bits at 192 kHz, improving the sound by all subjective audiophile criteria - air, soundstage, imaging, ease - to no end. Given that there is no real information being added to the signal, the engineering explanation dCS offers gains credibility.

One could even argue that the dCS explanation supercedes the neuroimaging and EEG work done by Oohashi et al., because the playback equipment used by Oohashi et al. may be subject to the same engineering limitations and may in fact be responsible for the results they found.

Along the same lines, reader feedback to John Atkinson's editorial on high frequency spectral content took aim at the analysis of transients (from which much of the HFC is derived). The reader states that "spectral-content analysis shows the flaws inherent in concluding that the necessary frequency bandwidth and sampling rates of audio systems can be determined simply by analyzing the frequency response of the human ear. Because the Fourier Transform isn't valid for those dynamic, transient musical sounds and resulting signals, the assumption simply isn't so." He goes on to praise the merits of analogue-only LP systems, that, not being subject to invalid Fourier Transform analysis, always had to have frequency responses much higher than the 20 kHz limit of human hearing. FT analysis is only one way to look at acoustic waveforms, and like with all modes of perception, it carries along its own assumptions, some of which may not be applicable to every circumstance.

A concensus seems to be arising from this discussion, which is that whatever the cause, a higher-than-CD bandwidth would be beneficial to the ultimate fidelity of sound reproduction, due to the requirements of transient signals. These transients may be exhibiting high frequency spectral energy or they may merely be an artifact of attempting to apply Fourier Transform frequency analyses to mere impulses. Regardless, in terms of the engineering criteria involved, achieving higher frequency-response bandwidth in digital recordings is a Good Thing™.

The two major new digital formats, DVD-Audio (Digital Versatile Disc- Audio) and SACD (Super Audio CD), both provide substantially improved frequency bandwidths though from differing engineering approaches. Sony's proprietary SACD format uses the Direct Streaming Digital (DSD) format which samples analogue material roughly 2.4 million times per second, though in 1-bit increments. Despite recent debate at the 109th AES meeting (2000) about the true nature of SACD, this completely different paradigm for digital audio recording does away with the anti-alias filters needed for PCM (CD and DVD-A) analogue waveform reconstruction. Preliminary reports in the audiophile community indicate that SACD has a natural quality of sound that DVD-A has yet to demonstrate. Ironically, one of the descriptions of SACD, this radically new digital format, is that it sounds "like analogue", meaning like LPs, ancient technology. (LPs i.e. vinyl records are associated with having smooth, relaxing presentations, that while sometimes not as impressive per audiophile standards, nevertheless can offer perfectly enjoyable music. The same is not always true, and in fact is seldom true, for CDs.)

If DVD-A is unable to achieve the same sublime description of "naturalness" as SACD does, especially within the next year when truly high-end implementations of DVD-A are released, then one may be tempted to give credence to the explanation offered by dCS. It may be that PCM D-to-A reconstruction is inherently flawed, that one will never be able to escape the Gibbs phenomenon manifest in reconstructing transient signals, no matter how high the sample rate. I believe it is the hope of the high-end members of the DVD-A consortium that a sufficiently high sample rate (perhaps 192 kHz) will mitigate this problem.

In any case, SACD seems to have gained a foothold in the market, and DVD-A will be practically guaranteed success if only via market piggy-backing on the success of video DVD, so the audiophile's dream for high resolution digital audio will inevitably be fulfilled. To what degree and by which format, not to mention within what time frame the dream will be fulfilled is yet to be determined, but one estimates that the next year in high-fidelity audio reproduction will be truly exciting.

RESOURCES

John Atkinson, "What's Going On Up There?" October 2000 Stereophile http://www.stereophile.com/fullarchives.cgi?282

James Boyk, "There's Life Above 20 Kilohertz! A Survey of Musical Instrument Spectra to 102.4 KHz" http://www.cco.caltech.edu/~boyk/spectra/spectra.htm

Tsutomi Oohashi, Emi Nishina, Norie Kawai, Yoshitaka Fuwamoto, Hiroshi Imai, "High-Frequency Sound Above theAudible Range Affects Brain Electric Activity and Sound Perception. Audio Engineering Society preprint No. 3207 (91st convention, New York City)". Abstract, page 2. http://jn.physiology.org/cgi/content/abstract/83/6/3548

Personal communication with Jon Risch, web page http://www.geocities.com/jonrisch/

dCS White Papers, "A Suggested Explanation For (Some Of The) Audible Differences Between High Sample Rate and Conventional Sample Rate Audio Material" http://www.dcsltd.co.uk/papers.htm

dCS White Papers, "Effects in High Sample Rate Audio Material" http://www.dcsltd.co.uk/papers.htm

Digital and Hi-Rez Digital Forums at the AudioAsylum, http://www.audioasylum.com