It’s in the timing


Why is it so important to set the distances between each of the loudspeakers and the listeners in a home theatre system? Because, writes Stephen Dawson, the surround image is all in the timing.

I was at a church service recently during which worship was being conducted with considerable enthusiasm. But, from my position in the hall, there was a problem. Someone off to the side of the congregation was playing a tambourine … out of time.

Well, they were in time in the sense that the beats were at the same rate as the beats of the worship band, but they were delayed by a discernible amount. What was wrong with the woman wielding the tambourine?

Nothing. It’s just that sound takes time to travel. That has implications for aligning speakers in home theatre systems, as well as for professional audio. And can delay the sound of a tambourine.

Hundreds of metres per second

Now as we all know, sound travels very fast through air. How fast depends on the air temperature, but the range in normal temperatures is not wide. At 20°C the speed is 343.5 metres per second. At 10°C it’s 337.5, and at 30°C it’s 349.5. For most practical purposes, the 343.5m/s figure is close enough.

Indeed, take the reciprocal and you can calculate that – close enough – sound takes 2.9 milliseconds (thousandths of a second) to travel one metre. If you’re unfortunate enough to be of an age where imperial measurements make intuitive sense to you, or if you’re American, then that works out to 1.05 milliseconds per foot.

Again, for practical purposes and for the kinds of distances we’re working on in home theatres, and probably even church auditoriums, it’s safe to round these off: 3 milliseconds per metre, 1 millisecond per foot.

So what was going on with the tambourine? Let’s assume that she was spot on, time wise, with the band from her point of view. That is, as the sound of a drum beat reached her from the front sound system, she was striking the instrument. Perfect timing to her ears. As it happened, I was roughly the same distance from those speakers as she was, so had I been able to see her clearly, I would have seen the tambourine striking in time with the music.

But she was about 20m from me. The sound of the tambourine thus took 20 x 3 = 60 milliseconds to get from her to me. That doesn’t seem like a very long time, but it is easily discernible. Expressive musical performance involves tiny departures from a regular beat, often just a few milliseconds. Your internal clock might not detect it as a delay, but you’ll hear it as part of the character of the performance.

As for 60 milliseconds, I’ve made a sound file – available at – that has two ticks, one 60 milliseconds behind the other. You can hear them clearly. If the first were the band’s drum and the second were the strike of a tambourine, the latter would seem like it’s trailing.

Milliseconds again aren’t really a unit for which we have intuitive understanding, so perhaps this will help. Sixty milliseconds is very nearly one eighth of a second (62.5ms is precisely one eighth of a second).


The human brain is a strange thing, and since hearing is principally an activity of the brain (your ears feed the signal, but your brain interprets it) hearing is a great deal more complicated than is typically thought. You hear – that is, recognise and understand and make use of – sounds which have been heavily processed by various functions of your brain.

This processing isn’t random. It has a very specific purpose: to provide information which humans can act on to survive and reproduce. That’s what evolution does. So what we hear tends to be useful, and that need not necessarily be “accurate” in the sense of an accurate recording.

If the music wasn’t going on in that church, but the woman was banging the tambourine anyway and I could see it, would I be aware that the sound didn’t match the vision, that the crash I could hear was 60ms behind the whack I could see? Remember, again for these practical purposes, the vision gets to our eyes instantaneously.

Maybe, maybe not. Our brains try to align what we hear with what we see. If I were ten metres away, the sound and vision would be in sync. If I were fifty metres away, there’d be a clear delay. At twenty metres it comes down to what the sound is and individual differences. Some of us are more sensitive to delays than others. I think I’m less sensitive than average amongst those interested in things like lip sync delays. I suspect that our brains will push harder to align someone’s voice with their lips than it will something striking something else with the sound of the strike.

Here’s something else where timing matters: the Haas effect, otherwise known as the Precedence effect. This relates to a discovery in the years after the Second World War that if a person hears two identical sounds very slightly apart in time and coming from different directions, it is going to sound as if there is just one sound and that it is coming from the direction of the first sound to arrive at the ears.

If the sounds are far enough apart in time (eg. 60ms), then they will be perceived as separate sounds. But if they’re close enough in time – typically 20ms or less – then this merging and precedence effect takes place.

You can see why. In the real world we, or our cave people ancestors, find and found it useful to hear the precise direction possible dangers were coming from. But if you’re in a cave, there will be sounds reflecting off walls and objects arriving just milliseconds after the first sound. The first sound is always the one emanating directly from the source of the sound and the others are always reflections. Those creatures (all this happened long before there were people, of course) which couldn’t tell the difference were more likely to perish.


So strong is this effect that it works even if the second sound is a great deal louder than the first one, up to around ten decibels.

Of course, sound engineers have put this feature of hearing to work. In that church auditorium there are two rows of speakers: three on the ceiling near the stage, and three more on the ceiling about half way down towards the back wall. If the timing were the same, to those sitting under or behind the rear-most row of speakers it would sound as though the music were coming from the speakers above them on the ceiling.

So the mixing panel incorporates a delay for the rear set of speakers, ensuring that the sound from them emerges slightly after the sound from the front speakers arrives. And so it sounds to everyone in the hall as though the sound was coming from the front, and only the front, while the rear of the hall still receives a good clear dose of sound reinforcement.

Another application is the inclusion in higher end car head units of a time alignment feature for all four channels. In a straight stereo system, the sounds which should seem to be coming from a central position between a pair of speakers, in fact sound to the driver like they’re coming from the right hand side of the car, just because that speaker is closer and so its sound arrives earlier. Such head units allow the sound on specified speakers to be delayed so that the centre channel seems to be coming from the centre of the sound.

Of course, that will tend to make the balance between the speakers worse for everyone else in the car, but there are some driver privileges!


Obviously all this has implications for surround systems, and channel balancing there. That is why the speakers which are closer to the listeners – the surround speakers in most systems, along with any ceiling speakers – must have their sound delayed. If it weren’t then, amongst other things, any “bleed” from the front channels could result in the sense that all the front channel sound is coming from the rear of the room.

Which is not really what you want in a surround system.

Fortunately, almost all home theatre receivers allow you to dial in distance measurements to all the loudspeakers, rather than having to calculate the delays.

The post It’s in the timing appeared first on Connected Home – Trade.

Reference: Connected Home