
shytone
books music
essays
home exploratories
new this month
book
reviews
Steven
Mithen: The Singing Neanderthals:
the
origins of music, language, mind and body
(Weidenfeld & Nicolson: 2005)
“Music and language
are universal features of human society. They can be manifest
vocally, physically, and in writing; they are hierarchical,
combinatorial systems which involve expressive phrasing,
and are reliant on rules that provide recursion and generate
an infinite number of expressions from a finite set of
elements. Both communication systems involve gesture and
body movement. In all of these regards, they may well
share what Douglas Dempster calls some ‘basic cognitive
stuff’. Yet the differences are profound. Spoken
language transmits information because it is constituted
by symbols, which are given their full meaning by grammatical
rules; notwithstanding formulaic phrases, linguistic utterances
are compositional. On the other hand, musical phrases,
gestures, and body language are holistic; their ‘meaning’
derives from the whole phrase as a single entity. Spoken
language is both referential and manipulative; some utterances
refer to things in the world, while others [aim to] make
the hearer think and behave in certain ways. Music, on
the other hand, is principally manipulative, because it
induces emotional states and physical movement by entrainment.
So, where does this leave us with regard to the relationship
between music and language? ...Music is too different
from language to be adequately explained as an evolutionary
spin-off [- or vice versa - and ]...while music and language
have their own unique properties, they still share more
features than one would expect from entirely independent
evolutionary histories. The remaining possibility is that
there was a single precursor for both music and language:
a communication system that had the characteristics that
are now shared by music and language, but that split into
two systems at some date in our evolutionary history....
It is my task in this book...not only to explain the origin
of music and language, but also to provide a more accurate
picture of the life and thought of our human ancestors.”
(Mithen, pp.25-6)
Whilst the evolutionary background of many of our complex
behaviours has been the subject of much debate of late
- language being only the most obvious case - music until
very recently has been largely ignored, or dismissed as
an inconsequential by-product of language. Yet, as Mithen
convincingly argues, this is extremely unlikely, given
its incredible hold upon our emotional lives...for emotions
are phylogenetically old
- key determinants of our choices - and anything capable
of stirring them so deeply must have its own evolutionary
raison d’etre,
even if the likes of Steven Pinker cannot recognize it...
But, of course, Pinker is a linguist - and a good Chomskyan,
at that - and so, trained to dismiss an enormous amount
of evidence which does not accord with his prejudices...including
quite a lot of important elements of language itself,
elements which Steven Mithen would argue are central to
the evolution of both language and
music. For this book offers the best argument yet for
the nature and development of what is usually described
as “proto-language”...albeit, as Mithen argues,
this term neglects the strongly musical nature of such,
and hence encourages the neglect of much of importance.
However, to properly deal with his arguments, we had better
start with the definitions...and their lacunae:
“Bruno Nettl,
the distinguished ethnomusicologist, defined music as
‘human sound communication outside the scope of
language’. That is perhaps as good a definition
as we can get.... The definition of language is, perhaps,
more straightforward: a communication system composed
of a lexicon - a collection of words with agreed meanings
- and a grammar - a set of rules for how words are combined
to form utterances. But, even this definition is contentious.
Alison Wray, the champion of holistic proto-language,
has argued that a considerable element of spoken language
consists of ‘formulaic’ utterances - prefabricated
phrases that are learnt and used as a whole. Idioms are
the most obvious example, such as ‘straight from
the horse’s mouth’ or ‘a pig in a poke’.
Unlike the other sentences within this paragraph, the
meaning of such phrases cannot be understood by knowledge
of the English lexicon and grammatical rules. Wray and
certain other linguists argue that the ‘words and
rules’ definition of language places undue emphasis
on the analysis of written sentences, and pays insufficient
attention to the everyday use of spontaneous speech, which
often contains very little corresponding to a grammatically
correct sentence.... [Moreover,] traditional linguistics
has neglected to study the rhythms and tempos of verbal
interaction - the manner in which we synchronize our utterances
when having a conversation. This is a fundamental and
universal feature of our language use, and has an evident
link with communal music-making.... [Furthermore,] even
when sign language is taken out of the equation, many
would argue that it is equally artificial to separate
language from gesture. Movements of the hands or the whole
body very frequently accompany spoken utterances...[and]
the majority are quite spontaneous. Speakers are often
unaware that they are gesticulating, and many find it
difficult to inhibit such movements - in a manner similar
to people’s inability to stop moving their bodies
when they hear music.”
(Mithen, pp.11-17)
“The majority
of spontaneous gestures used by modern humans are iconic,
in the sense that they directly represent whatever is
being verbally expressed, and..one striking finding is
that everyone appears to use a similar suite of spontaneous
gestures, irrespective of what language they speak. [Moreover,]
gestures play a complementary role to spoken utterances,
rather than being merely derivative or supplementary.
So, gestures are not used simply to help the speaker retrieve
words from his or her mental dictionary; they provide
information that cannot be derived from the spoken utterance
alone...[and] are particularly important for conveying
information about the speed and direction of movement,
about the relative position of people and objects, and
about the relative size of people and objects.... The
critical role of gesture in human communication...is perhaps
best expressed in the words of David McNeill, whose 1992
book, Hand and Mind ,
pioneered the notion that gesture can reveal thought.
McNeill explained that ‘Utterances possess two sides,
only one of which is speech; the other is imagery, actional
and visuo-spatial. To exclude the gestural side, as has
been traditional, is tantamount to ignoring half of the
message out of the brain.’ Thus body movement appears
to be as crucial to language as it is to music...[even
though] we are very poor at consciously attending to and
using body language today. I suspect that this was very
different for our non-linguistic ancestors.... The true
significance of body language can perhaps be appreciated
by recognizing that whereas speaking is an intermittent
activity - it has been estimated that the average person
talks for no more than twelve minutes a day - our body
language is a continuous form of communication.... [For
our ancestors, such movements] would have placed different
nuances of intent or meaning onto the same basic holistic
utterance/gesture. [Rudolf] Laban gives the simple example
of the expressive range of gestures that can accompany
the word ‘no’. He explains that one can ‘say’
this with movements that are pressing, flicking, wringing,
dabbing, thrusting, floating, slashing, or gliding, each
of which says ‘no’ in a quite different manner.
Once such gestures are integrated into a sequence of body
movements and vocalizations, once some are exaggerated,
repeated, embedded within each other, one has both a sophisticated
means of self-expression and communication, and a pattern
of movements that together can be observed as pure dance
alone.”
(Mithen, pp.155-7)
Amid the formalist obsession w/grammar that has plagued
mainstream linguistics since Chomsky, such aspects of
language are relegated to the sidelines...with the consequence
that many of the most interesting - and, arguably, ancient
- aspects of language are then perversely ignored by those
attempting to theorize its evolution. And, when the evidence
speaks loudly of the priority of prosody in the creation
of meaning - as follows here - the usual tactic is to
return to the ineffability of grammar, without attempting
to answer the question:
“Both music and
language have the property of expressive phrasing. This
refers to how the acoustic properties of both spoken utterances
and musical phrases can be modulated to convey emphasis
and emotion. It can apply to either a whole utterance
or phrase, or to selected parts. The word ‘prosody’
refers to the melodic, [timbral,] and rhythmical nature
of spoken utterances; when the prosody is intense, speech
sounds highly musical. Prosody plays a major role in the
speech directed towards infants; indeed, whether the utterances
‘spoken’ to very young babies should be considered
as language or as music is contentious. [Significantly,]
although the content of language can be used to express
emotion, it is subservient to the prosody. I can, for
instance, state that ‘I am feeling sad’. The
words alone, however, may be unconvincing. If I say that
‘I am feeling sad’ in a really happy voice,
priority will be given to the intonation, and the inference
drawn that I am, for some unknown reason, being ironic.”
(Mithen, p.24)
One of the most fascinating - and revealing - studies
to be cited by Mithen examines so-called “musical
savants”, whose outstanding abilities developed
either without language, or in concert with severely restricted
language abilities. Such people tend to share many similarities
- perfect pitch (also extremely common in pre-linguistic
infants), a strong tendency to echolalia (combining extreme
sensitivity to sounds with an inability to attach symbolic
meanings to same) and, perhaps most surprisingly, an aversion
to rote playing, and a highly sophisticated level of musical
understanding right across all areas tested, easily comparable
to well-trained professional musicians. As Mithen underlines,
this radically separates them from other types of savants,
whose skills appear to be strongly circumscribed, suggesting
that music as a whole may have far deeper evolutionary
roots than other savant skills...
However, this does not mean that musical awareness cannot
be dismembered by brain damage, for it is clear from the
clinical record that timbre, rhythm and melody are all
dissociable with specific types of brain damage - meaning
that they are to a large extent ‘modular’
in nature, as is normal with regards to the earlier stages
of sensory inputs into the brain. However, this should
not be confused
with modular claims re higher brain functions - although
it often is - for which there is little neurobiological
support. Interestingly, although Mithen has championed
one version of the latter - his "cognitive fluidity" hypothesis
- and still argues for it here, its status in this book
is rather of a fifth wheel, since he has now come around
to supporting other, more strongly based hypotheses re
the inventive stagnation of erectus cultures (particularly
Donald's "mimetic culture") which obviate the need for
this claim. And prosody, lest we forget, is a key element
in the making of mimetic culture...
“[It is] the variations
in intonation - dynamics, speed, timbre and so forth -
that infuse speech with emotional content, and often influence
its meaning. Prosody, as this is called, can sound very
music-like, especially when exaggerated, as in the speech
used to address young children. And it has a musical equivalent
in melodic contour - the way pitch rises and falls as
a piece of music is played out.... A study published in
1998 by Isabelle Peretz and her colleagues is particularly
important, because it explicitly attempts to identify
whether the same neural network within the brain processes
sentence prosody and melodic contour, or whether independent
systems are used.... The study examined two individuals
who were suffering from amusia, but who appeared to differ
in their abilities to perceive prosody in speech and melodic
contour in music.... Peretz and her colleagues designed
a very clever set of tests.... They began with sixty-eight
spoken sentences, recorded as pairs, that were lexically
identical, but differed in their prosody and hence their
meaning. For instance, the sentence ‘he wants
to leave now ?’
was spoken first as a question, by stressing the final
word, and then as a statement, ‘he wants to leave
now’. These are known as ‘question-statement’
pairs. ‘Focus-shift’ pairs were also used,
in which the emphasis of the sentence was altered - for
example, ‘take the train
to Bruges, Anne’ was paired with ‘take the
train to Bruges ,
Anne’.... Sentences of a third type formed ‘timing-shift’
pairs, where the location of a pause was varied so as
to alter the meaning. For instance, ‘Henry, the
child, eats a lot’ was paired with ‘Henry,
the child eats a lot’.... [From the results of their
tests,] Peretz and her colleagues concluded that there
is indeed a stage where the processing of language and
of melody utilize a single, shared neural network...[which
is] used for holding pitch and temporal patterns in short-term
memory.”
(Mithen, pp.55-8)
This result re prosody, however, is atypical, in that
most aspects of musical awareness appear to be separate
from their counterparts in language - as revealed by lesion
studies - despite the undoubted fact that brain-imaging
results suggest that they do
share the same networks! What this strongly suggests,
I would claim, is exactly what Mithen is arguing in this
book - that the two communication systems were originally
one, that it was much closer to music than to language,
and that they have only comparatively recently divided...with
language, in consequence, “colonizing” neighboring
neural areas where its structural needs differed sufficiently.
And, once we remove Chomsky’s blinkers, and admit
the full range
of language-related behaviour as evidence, such a model
looks to be very strongly supported by the evidence...
“’Baby-talk’,
‘motherese’, and ‘infant-directed speech’
(IDS) are all terms used for the very distinctive manner
in which we talk to infants who have not yet acquired
full language competence - that is, from birth up until
around three years old. The general character of IDS will
be well known to all: a higher overall pitch, a wider
range of pitch, longer ‘hyperarticulated’
vowels and pauses, shorter phrases, and greater repetition
than are found in speech directed to older children and
adults. We talk like this because human infants demonstrate
an interest in, and sensitivity to, the rhythms, tempos,
and melodies of speech long before they are able to understand
the meanings of words. In essence, the usual melodic and
rhythmic features of spoken language - prosody - are highly
exaggerated, so that our utterances adopt an explicitly
musical character.... In general, the exaggerated prosody
of IDS helps infants to split up the sound stream they
hear, so that individual words and phrases can be identified.
In fact, mothers of young children fine-tune the manner
in which they use prosody to their infants’ current
linguistic level. One would be mistaken, however, to believe
that the prosody of IDS is primarily intended to help
children accomplish the truly astounding task of learning
language.”
(Mithen, pp.69-71)
“Ann Fernald has
identified four developmental stages of IDS, of which
only the last is explicitly about facilitating language
acquisition.... For newborn and very young infants, IDS
serves to engage and maintain the child’s attention,
by providing an auditory stimulus to which it responds.
Relatively intense sounds will cause an orienting response;
sounds with a gently rising pitch may elicit eye opening;
while those with an abrupt rising pitch will lead to eye
closure and withdrawal. With slightly older infants...it
now begins to modulate arousal and emotion. When soothing
a distressed infant, an adult is more likely to use a
low pitch and falling pitch contours; when trying to engage
attention and elicit a response, rising pitch contours
are more commonly used. If an adult is attempting to maintain
a child’s gaze, then her speech will most likely
display a bell-shaped contour. Occasions when adults need
to discourage very young infants are rare; but when these
do arise, IDS takes on a similar character to the warning
signals found in non-human primates - brief and staccato,
with steep, high-pitched contours. As a child ages, IDS
enters its third stage, and its prosody takes on a more
complex function: it now not only arouses the child, but
also communicates the speaker’s feelings and intentions....
Owing to its exaggerated prosody, IDS is a more powerful
medium than adult-directed speech for communicating intent
to young children...[as] in IDS ‘the melody is the
message’....[Finally, when] young children begin
to understand the meaning of words, further subtle changes
to IDS occur...[in which] the specific patterns of intonation
and pauses facilitate the acquisition of language itself.”
(Mithen, pp.71-2)
“The idea that
IDS is not primarily about language is supported by the
universality of its musical elements. Whatever country
we come from, and whatever language we speak, we alter
our speech patterns in essentially the same way when we
talk to infants.... If the exaggerated prosody of IDS
were no more than a language-learning device, one would
expect to find the IDS of peoples speaking Xhosa, Chinese
and Japanese [- all voiced languages -] to be quite different
from that of those speaking English, German, and Italian.
That this is not the case strengthens the argument that
the mental machinery of IDS belongs originally to a musical
ability concerned with regulating social relationships
and emotional states.... [Furthermore, there is a similarly]
striking degree of cross-cultural unity in the melodies,
rhythms, and tempos [of lullabies]”
(Mithen, pp.72-9)
Personally, I find the evidence from IDS to be one of
the most compelling portions of Mithen’s argument,
since it so clearly situates a holistic, near-musical
communication system as developmentally prior to full-blown
language, yet also - significantly - independent of it.
Combined with the extremely strong parallels with the
communication systems of our most vocal evolutionary cousins,
I find it difficult to see how anyone bar what we might
term “linguistic snobs” could fail to see
the overwhelming support for a holistic proto-language
- especially since there is precisely zero evidence for
anything else....
“Studies of gelada
monkeys have been important in understanding how acoustically
variable calls, many of which sound distinctly musical,
mediate social interactions.... [These include] ‘fast
rhythms, slow rhythms, staccato rhythms, glissando rhythms;
first-beat accented rhythms, end-accented rhythms; melodies
that have evenly spaced musical intervals covering a range
of two or three octaves; melodies that repeat exactly,
previously produced, rising or falling musical intervals;
and on and on: geladas vocalize a profusion of rhythmic
and melodic forms.’ ...After making a detailed description
of their use, and exploring the contexts in which they
arose, [Bruce Richman] concluded that they performed much
the same function as the rhythm and melody that is found
in human speech and singing. In essence, the geladas used
changes in rhythm and melody to designate the start and
end of an utterance; to parse an utterance, so allowing
others to follow along; to enable others to appreciate
that the utterance was being addressed to them; and to
enable others to make their own contribution at the most
appropriate moment. In fact, Richman’s interpretation
of how geladas use rhythm and melody appears strongly
analogous to its use in the early and non-linguistic stages
of infant-directed speech.”
(Mithen, pp.109-10)
Moreover, other studies have also found a nearly exact
match between the type of pitch changes which mark the
emotional expressions of humans and macaque monkeys -
further evidence for evolutionary continuity in the prosodic
aspects of speech.
“The communication
systems and apes remain little understood. It was once
believed that their vocalizations were entirely involuntary,
and occurred only in highly emotional contexts...[however,
it is now known they] are often deliberate, and play a
key role in social life.... There are some common features.
First, none of the vocalizations or gestures are equivalent
to human words. They lack consistent and arbitrary meanings,
and are not composed into utterances by a grammar that
provides an additional level of meaning.... They are holistic.
Secondly, the term ‘manipulative’ is also
generally applicable...[since] monkeys and apes probably
simply do not appreciate that other individuals lack the
knowledge and intentions that they themselves possess.
Rather than being referential, their calls and gestures...are
trying to generate some form of desired behaviour in another....
A third feature may be applicable to the African apes
alone: their communication systems are multi-modal, in
the sense that they use gesture as well as vocalization.
In this regard, they are similar to human language....
Finally, a key feature of the gelada and gibbon communication
systems is that they are musical in nature, in the sense
that they make substantial use of rhythm and melody, and
involve synchronization and turn-taking. Again, depending
on how one would wish to define ‘musical’,
this term could be applied to non-human primate communication
systems as a whole. The holistic, manipulative, multi-modal,
and musical [acronym: “Hmmmm’] characteristics
of ape communication systems provided the ingredients
for that of the earliest human ancestors, living in Africa
6 million years ago, from which human language and music
ultimately evolved.”
(Mithen, pp.120-1)
And so, with holistic communication systems firmly in
place in both the evolutionary and developmental tracks
- and with a range of other evidence, as we have seen,
all pointing in the same direction - Mithen, in the second
half of his book, goes on to flesh out the evolutionary
story of what he (unfortunately) wants to call “Hmmmm”,
an acronym for Holistic, Manipulative, Multi-Modal &
Musical communication...a term I am fairly certain will
not catch on,
being extremely awkward to pronounce, as well as difficult
to distinguish from its immediate evolutionary successor
amongst Early Humans, “Hmmmmm”...in which
“Mimetic” is added to the list!
Still, acronyms aside, the arguments are strong, the evidence
(particularly when all factors are considered) surprisingly
clear, and the result is by far the most impressive approach
to communications amongst our ancestors we have yet seen.
“We can think
of sounds emitted from the mouth as deriving from ‘gestures’,
each created by a particular position of the so-called
articulatory machinery - the muscles of the tongue, lips,
jaw, and velum (soft palate). When we say the word ‘bad’,
for instance, we begin with a gesture of the lips pursed
together, whereas the word ‘dad’ begins with
a gesture involving the tip of the tongue and the hard
palate. So, each of our syllables relates to a particular
oral gesture. The psychologist Michael Studdert-Kennedy
argues that such gestures provide the fundamental units
of speech, just as they form the units of ape vocalizations
today, and hominid vocalizations in the past. As motor
actions, such gestures ultimately derive from ancient
mammalian capacities for sucking, licking, swallowing,
and chewing. These began the neuroanatomical differentiation
of the tongue that has enabled the tongue tip, tongue
body, and tongue root to be used independently from each
other.... Consequently, even though we should think of
hominid vocalizations as holistic in character, they must
have been constituted by a series of syllables derived
from oral gestures. These, therefore, had the potential
ultimately to be identified as discrete units...which
could be used in a compositional language.”
(Mithen, p.129)
“We should envisage
each holistic utterance as being made of one, or more
likely a string, of [these] vocal gestures...expressed
in conjunction with hand or arm gestures, and perhaps
body language as a whole.... In addition, particular levels
of pitch, tempo, melody, loudness, repetition, and rhythm
would have been used to create particular emotional effects
for each of these ‘Hmmmm’ utterances. Recursion,
the embedding of one phrase within another, is likely
to have become particularly important, in order to express
and induce emotions with maximum effect.”
(Mithen, pp.149-150)
‘[There exist]
two differing conceptions of proto-language - compositional
and holistic...[and] monkey and ape vocalizations...are
holistic, and provide a suitable evolutionary precursor
for the type of holistic proto-language proposed by Alison
Wray. But they provide no foundation for a [compositional]
‘words without grammar’ type of proto-language,
as proposed by Derek Bickerton.... [Furthermore,] while
[the latter] may have been adequate for communicating
some basic observations about the world, it would have
been unsuitable for what Alison Wray describes as ‘the
other kind of messages’ - those relating to physical,
emotional, and perceptual manipulation. It would not,
for instance, have been suitable for the type of subtle
and sensitive communication that is required for the development
and maintenance of social relationships...the principle
selective pressure for the evolution of vocal communication
in early hominids. It is important to appreciate that
Homo ergaster
would have lived in socially intimate communities within
which there would have been a great deal of shared experience
and knowledge...[and] relatively slight demands for information
exchange, compared with our experience today.... [Therefore,]
there would have been limited, if any, selective pressure
within their society for a ‘creative language’,
one that could generate new utterances in the manner of
the compositional language upon which we depend.”
(Mithen, pp.147-8)
As Mithen argues - dovetailing nicely with the work of
both Jonathan Kingdon and Frank R. Wilson - the earliest
shifts away from ape standards were simply by-products
of Australopithecine’s erect stance, with no need
for specific selection re communicative capabilities.
However, the more open ground they (very) gradually ventured
onto would have eventually forced them to congregate in
larger groups (like Geladas, incidentally) , placing increased
pressure of vocal communication and social intelligence.
The results, however, were not straightforward, since
many intertwined factors were involved:
“The increased
range and diversity of vocalizations made possible by
the new position and form of the larynx, and changes in
dentition and facial anatomy in general, would certainly
have enhanced the capacity for emotional expression and
the inducing of emotions in others. But the musical implications
of bipedalism go much further than simply increasing the
range of sounds that could be made. Rhythm, sometimes
described as the most central feature of music, is essential
to efficient walking, running and, indeed, any complex
coordination of our peculiar bipedal bodies. Without rhythm,
we couldn’t use these effectively: just as important
as the evolution of knee joints and narrow hips, bipedalism
required the evolution of mental mechanisms to maintain
the rhythmic coordination of muscle groups.... The
key point is that, as our ancestors evolved into bipedal
humans so, too, would their inherent musical abilities
evolve - they got rhythm. One can easily imagine an evolutionary
snowball occurring as the selection of cognitive mechanisms
for time-keeping improved bipedalism, which led to the
ability to engage in further physical activities that
in turn required time-keeping for their efficient execution....
It may, indeed, be in this connection that the phenomenon
of entrainment - the automatic movement of body to music
- arose.”
(Mithen, pp.150-3)
“Whereas we should
imagine the vocal communications of the australopithecines,
Homo habilis
and Homo rudolfensis
as more melodious versions of those made by non-human
primates today, those made by members of the Homo ergaster
species, such as the Nariokotome boy, must have been very
different, with no adequate analogy in the modern world....
I must, however, be careful not to exaggerate the musicality
and communication skills of Homo ergaster ,
as this species marks only the beginning of an evolutionary
process. The holistic phrases used by Homo ergaster
- generic forms of greetings, statements, and requests
- are likely to have been small in number, and the potential
expressiveness of the human body may not have been realized
until later, bigger-brained species of Homo
had evolved. Moreover, Homo ergaster
certainly lacked the anatomical adaptations for fine breathing
control that are necessary for the intricate vocalizations
of modern human speech and song.... The Nariokotome specimen
did have a relatively large brain, compared to the 450
cubic centimetres of living African apes and australopithecines,
but this is primarily a reflection of that specimen’s
large body size. It is not until after 600,000 years ago
that the brain size of Homo
increases significantly...[which] can best be explained
by selection pressures for enhanced communication, resulting
in a far more advanced form of ‘Hmmmm’ than
that used by Homo ergaster .”
(Mithen, p.158)
“As well as imitating
how animals move, Early Humans could have imitated their
calls, along with the other sounds of the natural world.
We know that traditional peoples, those living close to
nature, make extensive use of onomatopoeia in their names
for living things...[whilst] the study of animal names
provides another clue to the nature of Early Human ‘Hmmmm’
utterances, by virtue of the phenomenon of...‘sound
synaesthesia’...the mapping from one type of variable
- size - onto another - sound. Sound synaesthesia was
recognized by Otto Jespersen in the 1920s.... Jespersen
noted that ‘the sound [i] comes to be easily associated
with small, and [u,o] with bigger things’.... [Moreover,]
onomatopoeia and sound synaesthesia may not be the only
universal principles at work in the naming of animals.
The bird names of the Huambisa tend to have a relatively
large number of segments of acoustically high frequency,
which appear to denote quick and rapid motion, or what
[ethnobiologist Brent] Berlin calls ‘birdness’.
In contrast, fish names have lower frequency segments,
which have connotations of smooth, slow, continuous flow
- ‘fishness’.... In general, it appears that
we can intuitively recognize the names belonging to certain
types of animals, in languages that are quite unfamiliar
to us, by making an unconscious link between the sound
of the word and the physical characteristics of the animal.
This finding challenges one of the most fundamental claims
of linguistics: that of the arbitrary link between an
entity and its name...[and] the implications for Early
Human ‘Hmmmm’ utterances are profound.”
(Mithen, pp.169-71)
“The
key feature of [Early Human pre-linguistic communications]
is that they would not have been constructed out of
individual elements that could could be recombined in
a different order and with different elements, so as
to make new messages. Each phrase would have been an
indivisible unit, that had to be learned, uttered, and
understood as a single acoustic sequence [like animal
calls]. As Wray points out, the inherent weakness of
a communication system of this type is that the number
of messages will always be limited...[and] if holistic
phrases were used with insufficient frequency, they
would simply drop out of memory, and be lost. Similarly,
the introduction of new phrases would be slow and difficult,
because it would rely on a sufficient number of individuals
learning the association....The ‘Hmmmmm’
communication system would, therefore, have been dominated
by utterances descriptive of frequent and quite general
events...[and] would instigate and preserve conservatism
in thought and behaviour in a manner that a language
constituted by words and grammatical rules would not.”
(Mithen, pp.172-3)
One useful aspect of Mithen’s approach, here, is
that he incorporates key aspects of the best current theories
in this area, which - unfortunately - are all too often
presented separately w/no attempt at synthesis. Thus Merlin
Donald’s work on mimetic culture, Geoffrey Miller’s
arguments re sexual selection, Ellen Dissanayake’s
theories on IDS, and William Benzon’s approach to
group bonding through music all make their way into Mithen’s
synthesis, as he sifts through the evidence, looking for
how different aspects of the evolutionary story may have
played out. However, the key theorist throughout remains
Alison Wray, and her ideas are particularly important
with regard to the emergence of language and music from
their holistic precursor....
‘[Alison] Wray
uses the term ‘segmentation’ to describe the
process whereby humans began to break up holistic phrases
into separate units, each of which had its own referential
meaning and could then be recombined with units from other
utterances, to create an infinite array of other utterances.
This is the emergence of compositionality, the feature
that makes language so much more powerful than any other
communication system. Wray suggests that segmentation
may have arisen from the recognition of chance associations
between the phonetic segments of the holistic utterance,
and the objects or events to which they were related.
Once recognized, these associations might then have been
used in a referential fashion to create new, compositional
phrases.... The feasibility of Wray’s process of
segmentation [is] enhanced when her own characterization
of holistic proto-language is replaced by the rather more
complex and sophisticated perspective I have developed,
in the form of ‘Hmmmmm’. [For] the presence
of onomatopoeia, vocal imitation, and sound synaesthesia
would have created non-arbitrary associations...[and]
significantly increased the likelihood that particular
phonetic segments would eventually come to refer to the
relevant entities, and hence to exist as words.... The
likelihood would have been further increased by the use
of gesture and body language, especially if a phonetic
segment of the utterance regularly occurred in combination
with a gesture pointing to some entity in the world. Once
some words had emerged, others would have followed more
readily, by means of the segmentation process Wray describes.
The musicality of ‘Hmmmmm’ would also have
facilitated this process, because pitch and rhythm would
have emphasized particular phonetic segments, and thus
increased the likelihood that they would become perceived
as discrete entities with their own meanings.... This
[is] the case with regard to language acquisition by infants:
the exaggerated prosody of IDS helps infants to split
up the sound stream.... The musicality of ‘Hmmmmm’
would, moreover, have also ensured that holistic utterances
were of sufficient length, so that the process of segmentation
would have some raw material to work with.... Further
confidence in the process of segmentation derives from
the use of computer models to simulate the evolution of
language...[for Simon] Kirby’s simulations show
that...the process of learning itself can lead to the
emergence of grammatical structures. Hence, if there is
such a thing as ‘Universal Grammar’, it may
be the product of cultural transmission through a ‘learning
bottleneck’ between generations, rather than of
natural selection during biological evolution; ‘poverty
of the stimulus’ becomes a creative force rather
than a constraint on language acquisition.”
(Mithen, pp.253-6)
“Together, Wray
and Kirby have helped us to understand how compositional
language evolved from holistic phrases. However, they
have also posed us with an unexpected problem: why did
this only happen in Africa after 200,000 years ago? ...There
are two possibilities, one relating to social life, and
one to human biology. As regards the first, we should
note initially that Kirby found holistic languages remain
stable in those simulations in which learning-agents
hear so much of the speaking-agent’s utterances
that they learn every single association between symbol
string and meaning. In other words, there is no learning
bottleneck for language to pass through, and hence, no
need for generalization.... This would indeed have been
quite likely in the type of hominid and Early Human communities
I have outlined.... The kick-start for [wider social and
economic ties] may have been a chance genetic mutation
- the second possible reason.... This may have provided
a new ability to identify phonetic segments in holistic
utterances.... We have already seen that some aspects
of language are dependent on the possession of the specific
gene FOXP2, the modern human version of which seems to
have appeared in Africa at soon after 200,000 years ago....
Indeed, it may be significant that those members of the
KE family that were afflicted with a faulty version of
the FOXP2 gene had difficulties not only with grammar,
but also with...the segmentation of what sound to to them
like holistic utterances.”
(Mithen, pp.257-8)
“The compositional
utterances that emerged from holistic phrases by a process
of segmentation would have begun as mere supplements...the
holistic utterances providing a cultural scaffold for
the gradual adoption of words and new utterances structured
by grammatical rules. Moreover, the first words may initially
have been of primary significance to the speaker as a
means to facilitate their own thought and planning, rather
than a means of communication.... Talking to oneself is
something that we all occasionally do, especially when
we are trying to undertake a complex task. Children do
this more than adults, and their so-called ‘private
speech’ has been recognized as an essential part
of cognitive development...[and] private speech may have
been crucial in the development of a compositional language
to sufficiently complex a state for it to become a meaningful
vehicle for information exchange...a supplement to ‘Hmmmmm’
and, eventually, the dominant form of communication....
The brains of infants and children would have developed
in a new fashion, once consequence of which would have
been the loss of perfect pitch in the majority of individuals,
and a diminution of musical abilities. Once the process
of segmentation had begun, we should expect a rapid evolution
of grammatical rules, building on those that would had
been inherited from ‘Hmmmmm’. Such rules would
have evolved by the process of cultural transmission in
the manner that Kirby describes, and perhaps through natural
selection leading to the appearance of genetically based
neural networks enabling more complex grammatical constructions.”
(Mithen, pp.259-60)
Rather than a “big bang”, therefore, this
theory would predict a gradual evolution of language-enabled
modern behaviour...with this only becoming universal after
a considerable transitional period had ended, due to the
demographic shift provided by much denser populations
- and hence social ties. This, as it happens, is exactly
what the African record reveals. But, what of music in
all of this...Mithen’s original concern in researching
this book, before the densely interwoven histories of
language and music took over?
“Music emerged
from the remnants of ‘Hmmmmm’, after language
evolved. Compositional, referential language took over
the role of information exchange so completely that ‘Hmmmmm’
became a communication system almost entirely concerned
with the expression of emotion, and the forging of group
identities, tasks at which language is relatively ineffective.
Indeed, having been relieved of the need to transmit and
manipulate information, ‘Hmmmmm’ could specialize
in these roles, and was free to evolve into the communication
system we now call music. As the language-using modern
humans were able to invent complex instruments, the capabilities
of the human body became extended and elaborated...[but,
still,] throughout history, we have been using music to
explore our evolutionary past.... [However,] technological
developments have served both to democratise the availability
of music, and to create a musical elite...[through] musical
complexity and then exclusion. When the technical level
of what is defined as musicality is raised, some
people will be defined as unmusical, and the very nature
of music will become defined to serve the needs of an
emergent musical elite.”
(Mithen, pp.266-71)
“Music...maintains
many features of ‘Hmmmmm’, some quite evident,
such as its emotional impact and holistic nature, others
requiring a moment’s reflection. It is now apparent,
for instance, why even when listening to music made by
instruments rather than the human voice, we treat music
as a virtual person, and attribute to it an emotional
state and sometimes a personality and intention. It is
now also clear why so much of music is structured as if
a conversation is taking place within the music itself,
and why we often intuitively feel that a piece of music
should have a meaning attached to it, even though we cannot
grasp what that might be.... [And] if IDS is one remnant
of ‘Hmmmmm’, then another is the use
of spontaneous gestures when speaking...[even if] the
listener/watcher may be quite unaware that some of the
information he/she is receiving is coming from the gesture
rather than the words being heard. Spontaneous gestures
maintain the key features of ‘Hmmmmm’ - they
are holistic and often both manipulative and mimetic.
Had we not evolved/developed language, we might be far
more effective at inferring information from such gestures,
and would have grown up in a culture where such gesturing
was recognized as a key means of communication, rather
than as a curious hangover from our evolutionary past....
Perhaps of most significance, however, is our propensity
to use holistic phrases whenever the possibility arises....
One might argue that we use such formulaic phrases simply
to reduce the mental effort.... But, to my mind, their
frequency in our everyday speech reflects an evolutionary
history of language that for millions of years was based
on holistic phrases alone: we simply can’t rid ourselves
of the habit.”
(Mithen, pp.275-7)
Steven Mithen’s The
Singing Neanderthal is the essential book on the
interlaced histories of language and music, and - in combination
with the works of Merlin Donald, William Benzon, and Terrence
Deacon - makes clear how we evolved such complex and deeply
paradoxical skills in the first place. In direct contrast
to the theories in vogue within mainstream linguistics,
these writers are not afraid to explore all the relevant
evidence, and their arguments make very real sense of
the archaeological record...hardly surprising in Mithen’s
case, we should note, as he is an archaeologist himself...
And, this evolutionary history proves itself to be highly
relevant to a proper understanding of all of our communications,
rather than simply of antiquarian interest. For, just
as music is more than harmony and melody, so too language
is (much) more than grammar and semantics. The impoverished
notions of mainstream musicology and linguistics may attempt
to convince us otherwise but, as Mithen shows us, these
two forms are enormously richer than that, and - when
this richness is properly assessed - is becomes clear
just how we should understand both them, and their evolutionary
forerunner. And, for this, and much else, we have Steven
Mithen to thank...
“‘Hmmmmm’
communication would have involved dance-like performance,
and this might explain an intriguing feature of the Neanderthal
archaeological record. When either the whole or a substantial
part of a Neanderthal-occupied cave is excavated, the
debris they left behind is typically found in a very restricted
area. Paul Mellars, a Cambridge archaeologist with a particularly
detailed and extensive knowledge of Neanderthal archaeology,
has remarked upon this pattern...[and] provides two possible
interpretations for each case: either the ‘empty’
areas were used for sleeping, or the groups within the
caves had been very small. There is, of course, a third:
those empty areas could have been used for performance....
[But] trying to understand the...world of a Neanderthal
is challenging, owing to the limitations of our imaginations,
the inevitable speculation involved, and the restricted
evidence on which these speculations must be based. Also,
I believe that all modern humans are relatively limited
in their musical abilities, when compared with the Neanderthals.
This is partly because the Neanderthals evolved neural
networks for the musical features of ‘Hmmmmm’
that did not evolve in the Homo sapiens
lineage, and partly because the evolution of language
has inhibited the musical abilities inherited from the
common ancestor we share with Homo neanderthalensis .
Occasionally, however, we have an intense musical experience
that may capture some of the richness that was commonplace
to the Neanderthals...[and] other experiences might also
remind us of how ‘desensitized’ we are to
the music-like sounds around us. And so...I would like
to quote for a second time how his teacher described her
walk with Eddie, the music savant.... ‘I found that
a walk with Eddie is a journey through a panorama of sounds.
He runs his hand along metal gates, to hear the rattle;
he bangs on every lamp post, and names the pitch if it
has a good tone; he stops to hear a car stereo; he looks
into the sky, to track airplanes and helicopters; he imitates
the birds chirping; he points out the trucks rumbling
down the street...If it is aural, Eddie is alert to it,
and through the aural he is alert to so much more.’”
(Mithen, p.242-5)
|
|