Drop in on a religious service, and chances are you’ll hear music — chanting, singing, organs, maybe even drums, clapping, and dancing. But you’ll probably hear music if you’re at a birthday party too, or a barbecue, or a Saturday night get-together at a friend’s house. Maybe you’re even listening to music on earbuds now. Music is practically everywhere in human life, filling the air with rhythms and melodies that excite, influence, and bond us. Why do we love music so much, though? Two major forthcoming papers in the journal Behavioral and Brain Sciences tell distinct evolutionary stories about the origins of music. In a commentary to be published with the papers, I argue that music arises to solve problems that language creates.
The question of evolutionary origins arises because only humans seem to have the ability to jointly make music. Think about it — have ever seen a pack of dogs stomping their paws in rhythmic time, or a flock of geese singing antiphonal counterpoint? No, you haven’t. Other animals just don’t take part in rhythmic, melodic group interactions the way we do.
This doesn’t mean nonhuman animals don’t have any musical or proto-musical traits. Many birds “sing” — we call them songbirds.* Plenty of critters, including non-birds, use other vocalizations or sounds to communicate. Dogs bark and growl. Chimpanzees hoot and drum on hollow logs. Honeybees do rhythmic solo “dances” to show each other where the nearest pollen supply is. And all animals are capable of periodic, or rhythmic, movement, like walking or beating wings.
But humans link these multiple capacities for rhythm, vocalization, and movement into a coherent, social behavior that no culture seems to lack, and which no other animal replicates.
This is an interesting, exciting biological puzzle. It should pique our curiosity and make us want to dive in and learn more. Why don’t cats — or giraffes, or ducks, or honey badgers — occasionally get together in groups and bob their heads or swing their tails in rhythm? Why don’t animals dance?
It Comes Down to Rhythm
Let’s talk specifically about rhythm, or entrainment: the ability to move in time to a regular beat, usually an audio beat. (We humans can also entrain to purely visual stimuli, such as a flashing or bouncing dot, but not quite as well.) Although it might seem easy, your brain is doing an incredible amount of work when you clap your hands to a simple rhythm.
First, you have to be able to hear the beat. That is, your brain needs to recognize the regularity of stressed beats in the audio signal, process them as a single recurring stimulus, and “hear” them as a steady, unified rhythm.
Already at this point, we’ve left most other animals behind. In humans, brain activity spikes at the peak of each stressed beat in a metrical audio rhythm, but most other animals show no corresponding pattern. So while your dog might look like she’s enjoying that Pharrell Williams track — and maybe she is! — she’s not hearing it the way you do: as a patterned, rhythmic, coherent amalgamation of rhythm, melody, and harmony.
After decoding the beat, the brain needs to somehow link up your body’s actions. Fortunately, the brain’s motor systems are already involved in internally reconstructing the pattern of the rhythm. Those spikes in brain activity that correspond to the beats? Most scientists who study music and the brain believe they represent an internal, predictive “model” that uses parts of the cortex and limbic systems that control motor responses. In other words, if you’re hearing a beat, then your motor system is literally simulating it. Hearing a beat is simply what this dynamic, motor-driven simulation “feels” like.
Another clue that the motor system plays a crucial role in perceiving auditory rhythms is the well-known urge to groove. For many people, it’s hard to just sit perfectly still and listen to music. You might find that your foot idly taps to the music you’re listening to as you read a book. Or your head might bob as you listen to earbuds on the train. This semi-conscious response to a good beat emerges from — and feeds back into — the motor system’s central role in perceiving and decoding rhythms.
Of course, if your favorite song comes on while you’re huddled with big-name clients in a four-star restaurant, haggling out a contract, you’d be ill-advised to jump up and start head-banging right then.** We often need to subtly dampen or inhibit our intrinsic urge to move based on what’s socially appropriate. Intentionally keeping time to a beat — clapping or stomping to the rhythm — is at least partly a matter of releasing those inhibitions.
Keeping time to a beat, then, is complicated and tricky. Many different cognitive processes are taking place all at once, linking together audio perception and reward processing, timing simulations, and motor control.
Practice Makes Perfect
But all this isn’t quite enough. We also need to practice coupling our movements to the rhythms we hear, an activity that usually begins early in childhood. Even very young infants seem to get excited by rhythmic music, bouncing and moving energetically when they hear drum-heavy sounds. But they can’t keep the beat very well — their movements and the rhythm don’t coincide. In fact, it takes until about age 4 or 5 for kids to get good at keeping a beat. And kids who don’t practice take longer, if they ever master it at all. (Everyone knows someone who, despite being a full-fledged grownup in all other respects, can’t keep a beat any better than a lobster can write a treatise on aerodynamics.)
The developmental role of practice is reminiscent of how songbirds learn their songs. You might think that a robin’s warble is raw instinct, but actually it learns its song by hearing and then imitating its parents. Once it privately masters the songs, it finally feels comfortable performing out in the open.
Learning birdsong by imitation and practice takes a time-delayed, call-and-response structure. The young fledgling hears the song, then goes off to some corner of the woods to practice quietly (an adorable behavior called “subsong”). But when little humans learn how to synchronize to a beat, they have to move their hands (or feet) at the exact same time as they hear the music. That’s why it’s “synchronized.”
While rhythm processing and motor entrainment are biologically rare, a few other animals, including cockatoos and sea lions, have learned to perceive regular beats in recorded music, and to dance or bob their heads to the rhythm. One famous cockatoo loved to head-bang to Queen’s “Another One Bites the Dust.” Ronan, a sea lion at UC – Santa Cruz in California, can keep the beat to a wide variety of songs, including ones she hasn’t heard before.
But humans seem to emerge from the womb with a strong proclivity for rhythm and dance. We have an urge for it — one that begins even before we can smile. This powerful, innate urge to synchronize might be the only aspect of rhythmic entrainment that’s completely unique to humans.
Where Does Music Come From?
How did we get the combination of musical urge and skills? Let’s return to the forthcoming articles in Behavioral and Brain Sciences. The first, led by comparative musicologist Patrick E. Savage, argues that, in evolutionary terms, music in all its forms serves one overarching function: social bonding. From all-night trance dances to cooing lullabies, they see music as forging links between people, encouraging trust, and building relationships. Since relationships are the most important currency of survival for Homo sapiens, musical abilities — including timekeeping and entrainment — would be rewarded by evolution, and so would spread through the population over time. Savage and his co-authors argue that this was actually a process of gene-culture coevolution: as we developed more musical skills, the scope of culture expanded, which in turn rewarded more effective forms of musical bonding.
The second article, lead-written by evolutionary psychologist Sam Mehr, posits instead that musical abilities evolved for signaling purposes. Specifically, they argue that rhythmic group dancing evolved as a display of coalition strength. The more tightly bonded a group was, the better it was at synchronizing and coordinating during dances. Audience members from other groups might be intimidated or might want to joint the successful dancers — either way, the successful dancer group would come out on top. In this way, people with strong musical and especially rhythmical abilities became common in the population over the millennia of evolution.
Mehr and his colleagues also argue that lullabies, or “infant-directed song” evolved as a credible signal of direct parental attention to babies. Together, these two basic forms — beats or dancing and modulated vocal pitch — form the basis for all other kinds of music.
Mehr and colleagues’ argument sees music as something that signals properties that already exist, like group cohesion or parental love. Savage and his colleagues instead see music as something that creates properties like social bonds, trust, and alliances.
In my commentary on these articles, I come down mostly on the latter side. Music has many of concrete effects on emotions and social behavior that seem to extend beyond mere signaling functions (although they can also serve those functions). A hefty body of research shows that music and shared rhythm actively strengthen social bonds, activate positive emotional states, and trigger the brain’s reward responses. The functional roles of music seem more extensive than intergroup competitive displays.
Music and Collective Intentionality
Here’s a more detailed observation that I wasn’t able to completely fit into the commentary. Music and dance allow us to coordinate and synchronize our bodily actions at very fine-grained timescales, with remarkable efficiency. We can sync to frequencies faster than 5 Hz. The rhythmic nature of synchrony makes actions very easy to control and predict. Everything about synchrony is seemingly optimized to enable very streamlined, highly precise nonverbal interactions.
Crucially, it does so in a way that allows us to release inhibitions on low-level motor impulses. Remember that perceiving a beat intrinsically makes us want to move along with it. Well, when we actually sync up with others, we let that impulse express itself.
By contrast, most other forms of social motor coordination inherently require inhibition. We have to carefully constrain our behavior to adapt to roles, norms, and top-down expectations. Evolutionary anthropologist Michael Tomasello calls this level of coordination “collective intentionality.” It produces the rules and expectations that apply across society, even in anonymous interactions. Collective intentionality is how we know that doctors are supposed to write prescriptions, while pharmacists fill them. This process depends on language, because its categories — “doctor,” “prescription,” etc. — are fundamentally abstract.
When you visit the doctor’s office, you and the doctor coordinate your actions in particular ways. You to sit up straight while she listens to your breath through a stethoscope. You flex your wrist while she checks for pain in the elbow. Each person’s actions are fairly tightly regulated by the context, constrained by the other person’s behavior. We have implicit scripts prescribing how each person is supposed to act. Those scripts are top-down, rule-like.
Music and dance, by contrast, allow us to coordinate our bodily actions in a way that’s more granular and bottom-up. Everyday roles and social categories become less important for guiding behavior. A doctor claps in time to a beat the same way as a nurse or plumber. This might be why musical expressiveness is often associated with low-status people or groups: dancing and rhythm are ways of blurring the boundaries between roles, including hierarchical roles.
Ritual and Music in Evolution
This tension extends to religion. It’s a curious fact that higher-status religious believers are often drawn to fairly formalized, structured rituals. For example, T.S. Eliot, literary lion and old-money scion, embraced smells-and-bells Anglo-Catholicism. Meanwhile, lower-status believers often prefer expressive, music-driven worship — think of the wildfire spread of Pentecostalism across the Global South.
The expressiveness and disinhibition of charismatic rituals are often especially appealing to people for whom the social rules are oppressive. An evolutionary view of music helps us partly understand why: rhythm and synchrony bond us by blurring our roles, by linking us together at the level of the physical and bodily rather than the abstract and categorical. If the social world is oppressive for you, music and dance can provide a literal escape from it.
Evolution, then, probably selected for our musical abilities and drives in part because social structures divide us as much as they unite us. A doctor and a patient have only so much they can talk about together. Members of different tribes or clans might seem like aliens to us. But through music and dance, we can bond in ways that transcend those symbolic or category divides, giving us a common experience of togetherness and literal rhythmic unity.
Without that ability, human societies might bubble away into nothing. But without language, we’d have no way to split into differentiated roles, to create the norms and institutions that solve our complex problems. Could music and language have co-evolved together, each correcting for each other’s blind spots?
This Thursday, June 17th, I’ll be giving a Zoom lecture through Scholarium on the evolution of music. I’ll be linking up ritual studies and comparative anthropology and cognitive neuroscience to dive deeper into the ideas I’ve surveyed here — a real-life example of the interdisciplinary research we’re trying to exemplify. Tune in at 7:00 pm. RSVP here to get the Zoom URL. See you then.
* Crows and other corvids are technically included in the category of songbirds, though, which seriously stretches the colloquial definition of “song.”
** This would depend on what kind of clients and what kind of contract we’re talking about, of course.