|
It's Alive! Ultrasonic Spectra Isn't So Ultra Anymore Andrew
Hon Psychology
112 When
the consumer audio Compact Disc (CD) was introduced in the late 1980s,
the Redbook CD format specified 16 bit word lengths and 44 kHz sample
rates. This specification was sufficient for 96 dB of dynamic range and
a frequency response of up to 22 kHz (Nyquist principle). The limit of
22 kHz was based on consensus among audio engineers that the human ear
can only hear up to at most 20 kHz. Limitations of the technology - of
data density and in Digital-to-Analog conversion circuitry - undoubtedly
also influenced this decision. Early CD
players sounded horrible. Part of the reason for their sub-audiophile
performance was the engineering learning curve manifested with any new
technology - Digital-to-Analog Converters (DACs) that could only extract
14 bits of the 16, and poor implementation of filtering followed the conversion
stage. Now though, over a decade later, the audio industry can be considered
to have mastered the technology, with a great number of very good-sounding
yet inexpensive CD players available. Still, some audiophiles claim that
the very best CD players on absolute terms still do not compare with the
very best of LP (record) players. In addition, one point of agreement
among audiophiles is that there has yet to be a CD implementation that
gives a convincing sonic illusion, in particular of a full orchestra.
What gives? One possible reason for CD's lack of ultimate fidelity is that it simply cannot encode the frequency response present in real world acoustics. Work done by James Boyk of Caltech (incidentally my EE/Music professor from when I was a student there, and the primary motivating force for my being involved with audiophilia) has documented this over-20 kHz spectral content in various musical instruments. After accounting for confounding factors, Boyk concluded that there is indeed acoustic energy extending as high as 100 kHz and perhaps beyond, limited only by his analyzing equipment. The following charts show Boyk's findings: Instruments With Harmonics
Instruments Without Harmonics
Certain
instruments more than others exhibit ultrasonic energy, with percussive
instruments and in particular the cymbal having 40% of its energy in >20
kHz frequencies. This finding in part accounts for the cymbal's frequent
mention in audiophile literature as a benchmark for audio systems' high
frequency performance - whether or not the system can capture the unique
"hiss" of a real-life cymbal. Needless to say, not very many
systems can come close to portraying a cymbal naturally; most consumer
systems produce something closer to white noise. Additional
work supporting James Boyk's findings was done by John Atkinson, editor
of Stereophile magazine, the preeminent audiophile journal. In his October
2000 editorial he describes spectral analyses of audio recordings, all
of which demonstrate more or less activity above 20 kHz. An interesting
finding he reports is that it is not just acoustic instruments that exhibit
ultrasonic activity - the electric guitar in bluegrass music, where intentional
feedback produces rampant clipping and the characteristic electric guitar
sound, also results in spectral content extending above 20 kHz. Furthermore,
Atkinson noticed that even old analog recordings from the '60s and earlier
have captured this ultrasonic content. All this
ultrasonic energy is all well and good, one argues impatiently, but what
about the long-accepted 20 kHz limit of the human ear? Sounds above 12
kHz, even, are relatively indistinguishable and are lumped together as
simply "high frequencies". The first response is that we may
have to rethink our dogma of the hard perception limit at 20 kHz. Recent work
by Tsutomu Oohashi et al., published in June of 2000 in the Journal of
Neurophysiology, shows that the brain may in fact be registering over-20
(or 22) kHz spectral energy. Titled "Inaudible High-Frequency Sounds
Affect Brain Activity: Hypersonic Effect", their paper discusses
their finding that sounds containing High Frequency Components (HFCs)
above the audible range significantly affect the brain activity of listeners.
They used the gamelan music of Bali, which is extremely rich in HFCs with
a nonstationary structure, as a natural sound source, and divided it into
two components: an audible low-frequency component (LFC) below 22 kHz
and an HFC above 22 kHz. Brain electrical activity and regional cerebral
blood flow (rCBF) were measured as markers of neuronal activity while
subjects were exposed to sounds with various combinations of LFCs and
HFCs. The experimenters found that while subjects could not recognize (i.e. perceive in the common sense of the word) HFC when presented alone, their brain activity altered significantly when they were presented with music containing HFC in addition to LFC as compared to LFC alone. Psychological evaluation indicated that the subjects felt the sound containing an HFC to be more pleasant than the same sound lacking an HFC. These results suggest the existence of a previously unrecognized response to complex sound containing particular types of high frequencies above the audible range. Oohashi et al. term this phenomenon the "hypersonic effect." One conclusion
this research suggests is that the method used to determine the limit
of human hearing is imperfect. The standard "report" method
of psychology has been criticized (e.g. by UCB Professor Richard Ivry)
as not being an accurate measure of internal representation. Specifically,
the access of internal state for verbal report may result in information
being discarded as is commonly the case with any sort of attention-evaluation-selection-action
cognitive pathway. What may have happened with the original research on
the 20 kHz hearing limit, in keeping with Oohashi's recent findings, is
that even though the ear/brain system registers high frequency content
but only as a complement to low frequency (audible) content and not sufficiently
enough to be consciously reported. The effects of HFCs are subtle but
not inconsequential. As an aside,
another criticism of standard methodology may be warranted. Great debate
has raged in the audio community over subtle effects in amplifier quality,
cable differences, and even mechanical resonance effects, with boundaries
being drawn between "subjectivists" and "objectivists".
A staple of objectivist argument has been the double-blind test (DBT)
or ABX test. Under DBT or ABX conditions many self-proclaimed golden-ear
(i.e. sensitive to these subtle differences) audiophiles have failed to
identify differences to any significant statistical degree. Nevertheless,
over the past twenty to thirty years, the threshold of criteria for accepted
high fidelity audio characteristics has steadily been decreasing. Nowadays
not many respected audiophiles would claim that there are no differences
between the above-mentioned amplifiers (tube versus solid state), interconnect
or speaker cables, and to a lesser degree in electro-mechanical resonance
interactions, mainly with properly mechanically damped electrical components.
Jon Risch, a respected audiophile on the Internet with rigorous engineering
principles, has suggested objective mechanisms for many of these subjectively-perceived
differences. More importantly, he has thoroughly denounced standard DBT
and ABX tests to be inaccurate measurements of perception. Most forms
of these tests, being rigid and timed, put undue psychological stress
on the subject thus resulting in a worsening of apparent perceptual abilities.
It could be that the original tests that determined the supposed 20 kHz
hearing limit were confounded by these effects. A second
explanation that may not necessarily have to refute the 20 kHz hearing
limit entails engineering details slightly beyond the scope of this class.
A well-respected high fidelity digital audio company, dCS, has published
a white paper describing the engineering issues involved with reproducing
high-sample rate material and standard sample rate material. Due to what
is called the Gibbs phenomenon, typical sharp anti-aliasing filtering
for standard 22 kHz sample rate material as is necessitated by the Nyquist
theorem results in a ringing transient response. The energy contained
in this transient ringing "smears" or "defocuses"
the sound, impairing the ability to localise sounds. Higher sample
rates mitigate this problem. dCS produces an ultra high-end upsampler
and DAC that converts standard 16 bit/44 kHz CD material to interpolated
24 bits at 192 kHz, improving the sound by all subjective audiophile criteria
- air, soundstage, imaging, ease - to no end. Given that there is no real
information being added to the signal, the engineering explanation dCS
offers gains credibility. One could
even argue that the dCS explanation supercedes the neuroimaging and EEG
work done by Oohashi et al., because the playback equipment used by Oohashi
et al. may be subject to the same engineering limitations and may in fact
be responsible for the results they found. Along the same lines, reader feedback to John Atkinson's editorial on high frequency spectral content took aim at the analysis of transients (from which much of the HFC is derived). The reader states that "spectral-content analysis shows the flaws inherent in concluding that the necessary frequency bandwidth and sampling rates of audio systems can be determined simply by analyzing the frequency response of the human ear. Because the Fourier Transform isn't valid for those dynamic, transient musical sounds and resulting signals, the assumption simply isn't so." He goes on to praise the merits of analogue-only LP systems, that, not being subject to invalid Fourier Transform analysis, always had to have frequency responses much higher than the 20 kHz limit of human hearing. FT analysis is only one way to look at acoustic waveforms, and like with all modes of perception, it carries along its own assumptions, some of which may not be applicable to every circumstance. A concensus
seems to be arising from this discussion, which is that whatever the cause,
a higher-than-CD bandwidth would be beneficial to the ultimate fidelity
of sound reproduction, due to the requirements of transient signals. These
transients may be exhibiting high frequency spectral energy or they may
merely be an artifact of attempting to apply Fourier Transform frequency
analyses to mere impulses. Regardless, in terms of the engineering criteria
involved, achieving higher frequency-response bandwidth in digital recordings
is a Good Thing. The two
major new digital formats, DVD-Audio (Digital Versatile Disc- Audio) and
SACD (Super Audio CD), both provide substantially improved frequency bandwidths
though from differing engineering approaches. Sony's proprietary SACD
format uses the Direct Streaming Digital (DSD) format which samples analogue
material roughly 2.4 million times per second, though in 1-bit increments.
Despite recent debate at the 109th AES meeting (2000) about the true nature
of SACD, this completely different paradigm for digital audio recording
does away with the anti-alias filters needed for PCM (CD and DVD-A) analogue
waveform reconstruction. Preliminary reports in the audiophile community
indicate that SACD has a natural quality of sound that DVD-A has yet to
demonstrate. Ironically, one of the descriptions of SACD, this radically
new digital format, is that it sounds "like analogue", meaning
like LPs, ancient technology. (LPs i.e. vinyl records are associated with
having smooth, relaxing presentations, that while sometimes not as impressive
per audiophile standards, nevertheless can offer perfectly enjoyable music.
The same is not always true, and in fact is seldom true, for CDs.) If DVD-A
is unable to achieve the same sublime description of "naturalness"
as SACD does, especially within the next year when truly high-end implementations
of DVD-A are released, then one may be tempted to give credence to the
explanation offered by dCS. It may be that PCM D-to-A reconstruction is
inherently flawed, that one will never be able to escape the Gibbs phenomenon
manifest in reconstructing transient signals, no matter how high the sample
rate. I believe it is the hope of the high-end members of the DVD-A consortium
that a sufficiently high sample rate (perhaps 192 kHz) will mitigate this
problem. In any case,
SACD seems to have gained a foothold in the market, and DVD-A will be
practically guaranteed success if only via market piggy-backing on the
success of video DVD, so the audiophile's dream for high resolution digital
audio will inevitably be fulfilled. To what degree and by which format,
not to mention within what time frame the dream will be fulfilled is yet
to be determined, but one estimates that the next year in high-fidelity
audio reproduction will be truly exciting. John Atkinson, "What's Going On Up There?" October 2000 Stereophile http://www.stereophile.com/fullarchives.cgi?282 James Boyk, "There's Life Above 20 Kilohertz! A Survey of Musical Instrument Spectra to 102.4 KHz" http://www.cco.caltech.edu/~boyk/spectra/spectra.htm Tsutomi Oohashi, Emi Nishina, Norie Kawai, Yoshitaka Fuwamoto, Hiroshi Imai, "High-Frequency Sound Above theAudible Range Affects Brain Electric Activity and Sound Perception. Audio Engineering Society preprint No. 3207 (91st convention, New York City)". Abstract, page 2. http://jn.physiology.org/cgi/content/abstract/83/6/3548 Personal communication with Jon Risch, web page http://www.geocities.com/jonrisch/ dCS White Papers, "A Suggested Explanation For (Some Of The) Audible Differences Between High Sample Rate and Conventional Sample Rate Audio Material" http://www.dcsltd.co.uk/papers.htm dCS White Papers, "Effects in High Sample Rate Audio Material" http://www.dcsltd.co.uk/papers.htm Digital and Hi-Rez Digital Forums at the AudioAsylum, http://www.audioasylum.com |