Some Basic Principles

All sounds are created by causing a medium to vibrate, be it wood, strings, vocal chords, or the wings of a cicada. Sound is propagated through mediums by causing adjacent particles to vibrate in a similar fashion-- the strings on a cello vibrate at a given frequency and thereby displace air molecules adjacent to the strings. This process continues, and eventually air particles in our ears bump into tiny hairs in our inner ear; these hairs send electrical impulses to our brain, which tells us that we are hearing a particular tone. Sound requires a medium; this is why in the lower-pressure atmosphere of an airplane it is more difficult to hear, and why the killing of a mime in outer space will be particularly silent.

The most popular analogy to sound wave propagation is the example of a rock dropped into a pond. The rock, on its initial descent into the water (thank you, gravity), produces ripples in the water, originating from the point source of the rock, spreading in all directions. Due to the mass of the water molecules, energy is used up in making the water ripple, and thus as the ripples travel further and further away from the rock's point of droppage, the ripples lose intensity. Sound behaves in the same manner; as sound travels, it loses energy-- to you, the interested listener, it gets softer. A law known as the inverse-square law dictates the amount of energy lost per unit distance-- in a free-field, doubling the distance quarters the sound energy, given a point-source.

The rock-zum-Wasser analogy also helps describe the shape of the sound waves-- the ripples in the water are something of a sinusoidal curve, and physical and electrical models of sound waves follow the exact same construction.

From physics and mathematics, we know that a sine wave is periodic; that is, it takes a specific amount of time to get from the peak of one wave to the peak of the second wave. Period is expressed mathematically by the symbol T, and is usually measured in seconds. Sound waves are periodic in nature, but we rarely utilize the designation. Instead, we use the inverse of the period, called the frequency. The frequency is the number of complete cycles (complete periods) a sound wave can propagate in a given amount of time; we measure frequency in cycles-per-second, termed Hertz and abbreviated, "Hz." [n.b. older texts may use the designation "cps".] Sounds that vibrate many times per second are called "high-frequency" sounds, and those which vibrate less frequently are known as "low-frequency" sounds. Another term, called wavelength, is defined as the distance from a particular point of one wave (sound wave, mechanical wave) to the same point of the next wave. Wavelengths of sound may range from one inch to forty feet, depending on the frequency. Wavelength is expressed by the Greek symbol lambda; since we are using HTML, we'll just use "L" for the time being.

Uh-oh, Math:

Period: T(sec) = 1/ƒ, where ƒ is the frequency, in Hertz.
Frequency: ƒ(Hz) = 1/T, where T is the period, in seconds.
Wavelength: L = V/ƒ, where V is the velocity of sound, in meters-per-second or feet-per-second, and ƒ is the frequency, in Hertz. [Wavelength L will be measured in feet or meters, depending on system used.]

A waveform of a signal is a graphical representation of its amplitude versus time at a given instant. Imagine you are in the water after the rock has been dropped. If you took a picture of the water from within the water- that is to say, a cross-section of the water, that would pretty much be a graphical representation of the amplitude (in this case, distance) of the water versus distance (one side of the picture would be closer to the rock's ground-zero), which can easily be translated into time. Viola! There's your waveform.

Phase is defined as how far along its cycle a given waveform is. As we mentioned before, sound waves are periodic, or cyclical. From your trigonometry class back in high school, you may remember some evil sine-wave graphs where one period equalled 360 degrees. It is possible to have two identical sound waves of the same frequency and amplitude, but one is delayed slightly-we term this being "out of phase" with respect to each other. If you have one sound wave, it doesn't much matter how far along the sound wave is at a given instant. However, when you have multiple sound waves which are "out of phase," or delayed slightly with respect to one another, the waveforms will interact with each other in constructive and destructive ways. How much the waves interact, and at what frequencies they interact, depends on the waveforms involved, and how far out of phase they are-- two identical sine waves, 180 degrees out of phase with respect to each other, will cancel completely. Draw it on a piece of paper-- add one sine wave with a positive peak at 90 degrees and a negative peak at 270 degrees to another sine wave with a negative peak at 90 degrees and a positive peak at 270 degrees-- whaddya get? Nuttin'.

Conversely, if two similar waveforms, of same frequency, shape, and peak amplitude, are added, the resultant frequency is of the same frequency, phase, and shape, but has twice the original peak amplitude ((y = sin x ) + (y = sin x) = y = 2 sin x). We call this state where two waves are exactly the same in-phase.

Depending on the density of the medium, sound travels through some mediums faster than others. For example, sound travels about four times faster in water than in air, mainly due to the molecular structure of those mediums; it travels about ten times slower in rubber. In air at 0°C at one atmosphere, sound travels at a speed of 331 m/s. Temperature will affect the speed of sound-- in air, the speed of sound increases approximately .60 m/s for each degree Celsius:

Additionally, the speed of sound is influenced by humidity; humid air absorbs more high frequencies than low frequencies.

The pitch of a sound refers to whether it is perceived as high, like the sound of a violin, or low, like the sound of a cello or bass drum. Pitch is determined by a sound's frequency-- the lower the frequency, the lower the pitch. Humans hear from about 20 Hz to about 20,000 Hz (the audible range), but the range varies with age, individuals, and exposure. Frequencies above and below the audible range may be sensed by humans, but they are not necessarily heard. Bats, for instance, can hear frequencies around 100,000 Hz (1 MHz), and dogs as high as 50,000 Hz. Frequencies above 20,000 Hz are referred to as ultrasonic, and frequencies below 20 Hz are referred to as infrasonic. You may hear the word supersonic substituted for "ultrasonic," but technically that is incorrect when referring to sound waves above the range of human hearing; supersonic refers to a speed greater than the speed of sound; similarly, subsonic refers to speeds slower than the speed of sound, although that is also inaccurate.

The pitch of the sound influences the way we hear. Because of the construction of the human ear, humans have difficulty in associating a point-source to a low-frequency sound, but is quite accurate when sourcing high frequencies. This factoid is important to remember when tuning and balancing a sound system for proper imaging. We know that high frequencies have shorter wavelengths than low frequencies; the wavelengths of high frequencies are generally shorter than the distance between human ears (i.e. the diameter of your head), and sounds above 1,000 Hz (1 kHz) cannot reach both ears at exactly the same time or intensity, so one ear is favored and provides directional information in the horizontal plane. The ear is generally less successful in calculating directivity in the vertical plane.

The initial vibration of a sound is called the fundamental, or fundamental frequency. In a purely Physics-based sense, the fundamental is the lowest pitch of a sound, and in most real-world cases this model holds true. Additionally, the fundamental frequency is the strongest pitch we hear.

Life would be rather boring if all sounds were comprised of just these fundamental frequencies-- how would we tell the difference between a violin playing an "A" at 440 Hz and a flute playing the same note? Luckily, most sounds are a combination of a fundamental pitch and various multiples of the fundamental, known as overtones, or harmonics. When overtones are added to the fundamental frequency, the character of the sound is changed; the character of the sound is called timbre.

For example, an instrument playing a note at a fundamental of 200 Hz will have a second harmonic at 400 Hz, a third harmonic at 600 Hz, a fourth harmonic at 800 Hz, ad nauseam. The study of psychoacoustics teaches us that even-numbered harmonics tend to make sounds "soft" and "warm," while odd-numbered harmonics make sounds "bright" and "metallic." Lower-order harmonics control the basic timbre of the sound, and higher-order harmonics control the harshness of the sound. For more detail, look in an acoustics text.

One other term with which you should familiarize yourself is the term octave. An octave denotes the difference between any two frequencies where the ratio between them is 2:1. Therefore, an octave separates the fundamental from the second harmonic as above: 400 Hz:200 Hz. Note that even though, as frequency increases, the linear distance between frequencies becomes greater, the ratio of 2:1 is still the same: an octave still separates 4000 Hz from 2000 Hz. In the musical world, two notes separated by an octave are said to be "in tune." An "A" on a violin at 440 Hz is an octave below the "A" at 880 Hz. It is interesting to note that the sense of hearing is the only sense in which this sort of repetition occurs-- which is probably a good thing; imagine if we had to describe a color as an octave below middle red.

In sound, we call the range of frequencies that an audio system will transmit within a level range the frequency response. For instance, a microphone may have a frequency response of 45 Hz to 15,000 Hz, ±3 dB, which means that it will adequately reproduce those frequencies with only a small amount of level deviation, usually at the extremes. Frequency response curves are a graphical representation of the frequency response- measured with frequency in a logarithmic scale on the x-axis, and amplitude in dB on the y-axis.

Similar to pitch, loudness is a sensation produced in the human being. It is related to a measureable quantity, the intensity of the wave. The intensity of a wave depends on the amplitude of the wave. It is mathematically defined as follows:

The human ear can detect sounds with an intensity as low as 10^-12 W/m², and as high as 1 W/m². This is a huge range of intensity; presumably this is why we don't perceive loudness as proportional to intensity-- to produce a sound that sounds to humans about twice as loud requires a sound wave that has about ten times the intensity. Because of this disproportionality, we measure sound intensity using a logarithmic scale, using the decibel (dB), as the primary unit. The decibel is used to express power, but it doesn't measure power. It is in fact a ratio of two power levels:

Notice that given this formula, the threshold of hearing is 0 dB: b = 10 log (10^-12 W/m²/ 10^-12 W/m²) = 10 log 1 = 0.

The decibel is equal to one-tenth of a bel, a measuring unit named after Alexander Graham Bell first used in telecommunications, where signal loss is a logarithmic function of the cable length. Its convenient logarithmic basis made it a convenient unit by which a slew of measurements are represented. However, it always requires a reference point. Thus, we append a letter after the "dB" designation:

From this equation, we learn that if one sound pressure level is twice another, it is 6 dB greater; humans perceive SPL subjectively, but as a general rule, a sound that is 6 dB higher in level is perceived to be about twice as loud.

WHAT?!!!

The human ear is a nonlinear device-- that is, input and output amplitudes don't necessarily have the same ratio at all signal levels (that would be a linear device), and thus, it introduces harmonic distortion, usually when subjected to sound waves above a specific loudness. Harmonic distortion is the production of harmonics that do not exist in the original signal. For example, when the ear hears a loud 1 kHz tone, it is perceived as a combination of tones at 1 kHz, 2 kHz, 3 kHz, ad nauseam. We've mentioned that a violin playing a 1 kHz note has a system of overtones; the ear perceives these tones, if the level is loud enough, the ear will produce additional harmonics, changing the perceived timbre of the instrument.

The ear's frequency response also changes with respect to loudness. Two researchers in the 1930s, one named Fletcher, and the other named Munson, were the first to measure and publish a set of curves showing the ear's sensitivity to loudness versus frequency. The curves show that the ear is most sensitive to sounds in the 3 kHz to 4 kHz area; thus, frequencies above and below 3 - 4 kHz must be somewhat louder in order to be perceived as loud. Thus, these curves are called the Fletcher-Munson equal-loudness contours. To equal the loudness of a 1.5 kHz tone at a level of 110 dBSPL, a 40 Hz tone has to be 2 dB greater in actual sound pressure level, while a 10 kHz tone must be 8 dB greater than the 1.5 kHz tone to be perceived as loud. It is from these curves that sound recording engineers established an average, optimum listening level of 85 dB SPL. The "loudness" button on home-audio preamplifiers, and the various bass-boost switches on portable compact disc players are also a result of correcting the ear's non-linearity-- they are designed to be used at low listening levels to compensate for the ear's intrinsic loss of low- and high-frequency perception. The loudness of a tone can also affect the perceived pitch of the sound. If the intensity of a 100 Hz tone is increased from 40 to 100 dBSPL, the ear will perceive a pitch decrease of about 10%.

Another effect experienced by the human listener is the interaction of tones with each other. Three situations can occur:

Beats - two tones separated only slightly (less than 30 Hz or so) and have approximately the same amplitude will produce beats-- literally, pulses-- alternating reinforcement and cancellation of amplitude-- at the ear equal to the difference between the two frequencies. As the difference between the frequency decreases, the speed of the beats decreases, too. Beats are the result of the ear's inability to separate closely-pitched notes.
Combination Tones - the sound two loud tones that differ by more than 50 Hz will be interpreted by the ear as a complex set of tones, including the two originals, and an additional set of tones that are equal to the sum and the difference of the two original tones. For example, 1 kHz and 1.5 kHz tones produce a difference tone of 500 Hz, and produce a sum tone of 2.5 kHz. Difference tones are easier to detect.
Masking - the phenomenon which prevents the ear from hearing softer sounds underneath loud tones. The effect is more pronounced when the frequencies in question are relatively close together. For example, a loud 4 kHz tone will mask a softer 3.5 kHz tone, but will have little effect on a soft 1 kHz tone.

All of these nonlinearities of the human ear are factors that we must consider when constructing, designing, and operating a sound system.

How do we localize? How do we know from which direction a train is approaching? How can we design a sound system that appears natural? We must examine some more characteristics of the human hearing system. Binaural localization, or the ability of using the two ears that most of us are given to determine from where a sound source appears, uses three cues:

Now that we've given you a background in sound terminology, it's time to catch you up on basic electricity principles. And then, maybe, we'll start talking about sound.

Comments, Questions, and Additions should be addressed via e-mail to Kai Harada. Not responsible for typographical errors.
http://www.harada-sound.com/sound/handbook/basicterms.html - © 1999 Kai Harada. 07.11.1999.