Learning

Having discussed the biological basis of mind and behavior in the brain and the rest of the nervous system, we're now in a position to discuss what it is that the mind does. To be brief, the purpose of the mind is to know the world: to form internal, mental representations of the external world, and to plan and execute responses to the objects and events we encounter there.

In terms of human information processing, the mind performs a sequence of activities:

picking up information through sensory and perceptual processes;
storing this information in memory;
transforming it through thought;
communicating it through language; and
executing relevant actions through the skeletal musculature.

Traditionally, this activity was described in terms of the formation of associations of three types:

between environmental events;
between environmental events and the organism's responses to them; and
between the organism's own actions and their effects on the environment.

Reflexes, Taxes, and Instincts

Some of these associations are innate or inborn, part of the organism's native biological endowment.

Reflexes

The reflex is the simplest possible connection between an environmental stimulus and an organismic response. Examples are:

the patellar reflex, the "knee-jerk" reflex commonly tested during routine physical examinations;
the eyeblink reflex, where the eye closes in response to a puff of air striking the cornea;
the other spinal reflexes, such as those that are preserved in cases of paraplegia.

Reflexes are automatic, in that they occur inevitably in response to an adequate stimulus, and occur the first time that stimulus is presented. They do not require the involvement of the higher centers in the nervous system: they persist even when the spinal cord is severed from the brain. Most reflexes are fairly simple, but even fairly complicated activities can be reflexive in nature.

The 19th-century French physiologist Flourens conducted a series of classic studies of reflexes in the decorticate pigeon. He removed both lobes of the cerebral cortex in the bird, and then attempted to determine which patterns of behavior remained in the repertoire. Certain behaviors were preserved:

the animal righted itself when its equilibrium was disturbed;
it walked when it was pushed, and flew when thrown into the air;
it would swallow when water was introduced into its beak;
and when irritated, it would move away from the stimulus.

However, other behaviors disappeared:

it did not flee from the irritation;
it did not avoid obstacles placed in its path;
it showed no voluntary action (i.e., behavior in the absence of any apparent stimulus);
and it showed no signs of emotionality.

Thus, Flourens characterized the decorticate pigeon as a reflex machine, that merely reacted to external stimulation by means of reflexes, but displayed no spontaneous or self-initiated behavior.

Human beings also come "prewired" with a repertoire of reflexes: automatic responses to stimulation that appear soon after birth, before the infant has had any opportunity for learning.

Some of these are reflexes of approach, elicited by weak stimuli, and which have the effect of increasing contact with the stimulus.

Among these is rooting: when the infant's cheek is touched, it will turn its head in the direction of the touch and open its mouth; if its mouth makes contact with any object, it will close and begin to suck (this reflex will occur even if the infant is asleep or comatose).

Another reflex of approach is grasping: if the palm of the hand is touched, the fingers will flex and close around the object;
the grasping reflex can be very strong.

Similarly, if the sole of the foot is touched, the response will be "plantarflexion": the toes will stretch and turn downward.

Other stimulus-response patterns are reflexes of avoidance, which are elicited by intense or noxious stimuli, and have the effect of decreasing contact with the stimulus.

For example, the infant's eyes will close automatically in response to a bright light, and the mouth will close at the introduction of an unpleasant taste (e.g., quinine).

If the palms or soles are scratched, pinched, or pricked, there will be spreading of the fingers or toes, and withdrawal of the hands or feet (in the case of the feet, the toes will also show "dorsiflexion", or turning upward -- the "Babinski reflex").

A very interesting set of behaviors is the stepping reflex. Infants appear to "learn to walk", but this appearance is deceiving. If the infant's body is supported, and it is moved forward along a flat surface, it will show synchronized stepping. If its toes strike the riser of a set of stairs, it will lift its feet. Neonates don't learn to walk: they can't walk because their skeletal musculature has not matured so that they can support themselves.

Despite the large repertoire of reflexes, infants do not show much initiation of directed activity. The behaviors of the young infant are pretty much confined to reflexes, which are gradually replaced with voluntary action.

Reflexes are an important part of the organism's behavioral repertoire, but they have their limitations.

They permit only a small number of responses to be elicited by stimulation.
They do not permit action to be controlled by internal goals or intentions.

With subsequent development, reflexes tend to disappear. But they are not abolished entirely: the knee-jerk and rooting reflexes can be elicited in adults; and adult paraplegics display a full repertoire of reflexes. However, adult behavior is dominated by voluntary action, and reflexes slip into the background.

Reflexes involve relatively small portions of the nervous system. In principle, the reflex arc requires only three neurons -- though in practice, spinal reflexes involve entire afferent and efferent nerves, as well as the spinal cord. Other innate stimulus-response connections consist of more complicated action sequences, that involve larger portions of the nervous system, and skeletal musculature.

Taxes

A taxis (plural, taxes) is a gross orientation response: after presentation of a stimulus, the whole organism turns and moves. Taxes come in two forms:

In positive taxes, the organism moves toward the stimulus. A common example is a moth flying into a candle.
In negative taxes, the organism moves away from the stimulus. A common example is a cockroach scurrying out of the light (actually, it is responding to slight breezes created by the motion, rather than light, such as a human entering a room and turning on a light).

Phototaxes involve responses to light, geotaxes involve responses to gravity (these can be observed in worms and ants as they move up and down inclines).

There are actually lots of other taxes, which can be observed mostly at the cellular level:

Chemotaxes, or responses to the presence of certain chemicals in the environment;
Thermotaxes, or responses to warmth or cold;
Rheotaxes, or responses to the movement of fluids;
Magnetotaxes, or responses to magnetic fields; and
Electrotaxes, or responses to electrical fields.

Taxes are not simple reflexes, because they involve the entire skeletal musculature of the organism. But they are still innate, and involuntary.

Taxes and Reflexes in the Neonate Kangaroo

The behavior of the newborn kangaroo illustrates an effective combination of reflexes and taxes. The kangaroo, like all marsupials (e.g., the opossum), has no placenta. The female gives birth after one month of gestation, and carries the developing fetus in a pouch. But how does the fetus get into the pouch?

Immediately after birth, the newborn climbs up the mother's abdomen -- perhaps by virtue of a negative geotaxis. If it reaches the opening of the pouch, it reverses its behavior and climbs in -- maybe a positive geotaxis. If it does not encounter the opening of the pouch, it will continue climbing until it reaches the top, stop -- or maybe fall off -- and eventually die. The mother kangaroo has no way of helping the infant -- the appropriate behaviors simply aren't in her instinctual repertoire, and -- a point I'll expand on later -- she has no opportunity to learn them through trial and error).

Once in the pouch, if the neonate encounters a nipple, it will attach to it and begin to nurse -- probably a variant on the (rooting reflex. If not, it will simply stop at the bottom of the pouch and eventually die.

Assuming that all goes well, the baby kangaroos emerges from the pouch after about six more months of gestation.

Note that the neonate gets in the pouch by its own automatic actions, with no assistance from its mother. The behavior is entirely under stimulus control, and if it fails to contact the appropriate stimulus it will simply die.

Instincts

Other innate behaviors involve more complicated action sequences, and more specific, discriminating responses. These are known as instincts or fixed action patterns. Instincts have several important properties. As a rule, they are:

complex, stereotyped patterns of action,
rigidly organized,
innate,
unmodified by learning,
species-specific (i.e., some species show them but others do not); and
universal within the species (i.e., every member of the species shows the behavior under appropriate conditions).

Instincts are studied by ethology, a branch of behavioral biology devoted to understanding animal behavior in natural environments, viewed from an evolutionary perspective. As a biological discipline ethology asks four questions about behavior -- all of them variants on Why does an animal behave the way it does?

Causation: What are the mechanisms by which the behavior works?
Function: What is the survival value of the behavior?
Ontogeny: How does the behavior arise in the life of the individual?
Phylogeny: How did the behavior arise in the evolution of the species?

Note, however, the focus of ethology on behavior -- and, in particular, on natural behavior. Ethologists analyze animal behavior in its ecological and evolutionary context; they do experiments, but their experiments are performed under field conditions (or something very closely resembling them), not in the sterile confines of the laboratory. Ethologists are not really psychologists, because they are interested only in behavior, not in mind per se. Nevertheless, psychology is a big tent, and many ethologists have found their disciplinary home in a department of psychology, as well as in departments of biology (especially integrative biology as opposed to molecular and cellular biology).

An observation by Konrad Lorenz illustrates the role that evolutionary thinking played in the ethologists' analysis of behavior ("The Evolution of Behavior", Scientific American, December 1958):

[I]s it not possible that beneath all the variations of individual behavior there lies an inner structure of inherited behavior which characterizes all the members of a given species, genus or larger taxonomic group -- just as the skeleton of a primordial ancestor characterizes the form and structure of all mammals today?

Yes, it is possible! Let me give an example which, while seemingly trivial, has a bearing on this question. Anyone who has watched a dog scratch its jaw or a bird preen its head feathers can attest to the fact that they do o in the same way. The dog props itself on the tripod formed by its haunches and two forelegs and reaches a hind leg forward in front of its shoulder. Now the odd fact is that most birds (as well as virtually all mammals and reptiles) scratch with precisely the same motion! a bird also scratches with a hind limb (that is, its claw), and in doing so it lowers its wing and reaches its claw forward in front of its shoulder.

One might think that it would be simpler for the bird to move its claw directly to its head without moving its wing, which lies folded out of the way on its back. I do not see how to explain this clumsy action unless we admit that it is inborn. Before the bird can scratch, it must reconstruct the old spatial relationship of the limbs of the four-legged common ancestor which it shares with mammals.

A Nobel Prize for Ethology

Three important ethologists, Konrad Lorenz, Nikolas (Niko) Tinbergen, and Karl Von Frisch, won the 1973 Nobel Prize in Physiology or Medicine for their pioneering research on instincts (four years earlier, in 1969, Tinbergen's father Jan had shared the first Nobel Prize in Economics for his pioneering research on econometrics). For an intellectual biography of Tinbergen, see Niko's Nature: A Life of Niko Tinbergen and His Science of Animal Behaviour by H. Kruuk (2003).

The concept of instinct is well illustrated in Konrad Lorenz' research on imprinting in newly hatched ducks and geese. Once out of the egg, the hatchling follows the first moving object it sees. This is usually the mother, but the hatchling will also follow a wooden decoy, block of wood on wheels, or even a human -- provided that it is the first moving object that the bird sees. The emphasis on the "first" moving object is somewhat overstated, because there is a critical period for imprinting: the imprinted object must be present soon after birth; if exposure to a moving object is delayed for several hours or days, imprinting may not occur at all. If imprinting occurs, the imprinted object will be followed even under adverse circumstances, over or around barriers, etc. When the imprinted object is removed from the bird's field of vision, the bird will emit a distress call. If imprinting has occurred to an unusual object that object will be preferred to the bird's actual parent, or any other conspecific animal.

Link to a film of Lorenz demonstrating imprinting.

Link to "Mallards on a Mission", a more recent YouTube video showing the strength of the imprinting response.

The power and perils of imprinting are vividly illustrated by an incident that occurred in Spokane, Washington, in 2009. George Armstrong, a banker, had been watching a female duck nesting on a ledge outside his office window. In the usual course of events, the ducklings would hatch, imprint on their mother, and then follow her as she led them to water. But --they're on a ledge! And they can't fly yet!. The mother duck knew nothing of this. She's built to wait until her eggs have hatched, and then go to water; and ducklings are built to follow her. The mother jumped off the ledge and -- she's built for that, too -- flew down to the street. The chicks were stranded. Armstrong went out on the street, stood below the ledge, and caught each of the ducklings as they stepped off the ledge, instinctually following their mother (actually, he had to collect a couple from the ledge). Then he served as a crossing guard while the mother collected her young and led them to water. The power of imprinting is that the ducklings will follow their mother -- or Konrad Lorenz everywhere. The peril of imprinting is that the behavior has been selected for a particular environmental niche -- in the case of ducks, the grassy area near water where they usually nest; if that environment changes, for whatever reason, the instinctive behavior may be very maladaptive.

Link to a video of Armstrong catching the ducks (sorry about the ad).

There are actually two kinds of imprinting. What we've been discussing is filial imprinting, which concerns the relationship between adult and youngster. There is also sexual imprinting, which concerns the relations between males and females. At the International Crane Foundation, in Baraboo Wisconsin, there once lived a Siberian crane named Tex (now deceased), who imprinted on the Foundation's manager, George Archibald. In order to get George to mate with a female crane, Archibald had to perform an imitation of the Siberian crane mating dance!

Imprinting is extremely indiscriminate: basically, the bird imprints on the first object that moves within the critical period. However, other instincts are much more discriminating.

Another good example of an instinct is the alarm reaction in some birds subject to predation by other bird s (studied by Tinbergen). If an object passes overhead, the birds will emit a distress call and attempt to escape. However, these birds do not show alarm to just any stimulus: it must have a birdlike appearance; moreover, birdlike figures with short (hawk-like) necks elicit alarm, while those with long (goose-like) necks do not (the length and shape of the tail and wings is largely irrelevant).

Imprinting and the alarm reaction involve, basically, only one organism. Other instincts involve the coordinated activities of two (or more) species members.

A good example is food-begging in herring gulls (studied by Tinbergen). Hatchling birds don't forage for their own food, but must be fed a predigested diet by their parents. But the parents do not do this of their own accord. Rather, the chick must peck at the parent's bill: the parent then regurgitates food, and presents it to the chick; the chick then grasps the food and swallows it. But the chick will not peck at any bird-bill. Rather, the bill must have a patch of contrasting color on the lower mandible. The precise colors involved do not matter much, so long as the contrast is salient. Food-begging exemplifies the coordination of instinctive behaviors: the patch is the releasing stimulus for the hatchling to peck; and the peck is the releasing stimulus for the parent to present food.

An excellent example of a complex, coordinated sequence of instinctual behaviors is provided by the "zig-zag" dance, part of the mating ritual of the stickleback fish (Tinbergen).

A male stickleback, when it is ready to mate, develops a red coloration on its belly.

It then establishes its territory by fighting off other sticklebacks. But he fights only sticklebacks, not other species of fish; and only males; and only males who display red bellies and enter his territory in the head-down "threat posture" (other colorations indicate that the other male is not ready to mate; other postures indicate that the other male is only passing through the territory; in either case, there is no territorial fighting).

Experiments by Tinbergen, employing "dummy" models of fish, show that It actually doesn't matter much whether the other fish looks like a stickleback, so long as it has a red-colored belly. Sticklebacks without red bellies may enter this fish's territory, because they don't constitute threats.

Other experiments, in which fish were enclosed in capsules to control their orientation, show that a male who elicits aggression when it enters a territory with its head down will not elicit aggression if it enters the territory with its head level -- perhaps indicating that it is just "passing through".

After the territory has been cleared of threatening males, the male builds a nest out of weeds.

Then he entices a female into the nest -- but only a female stickleback who enters his territory with a swollen abdomen, and in the head-up "receptive posture".

The female enters the nest only if the male displays a red belly, and performs a "zig-zag" dance.

Once in the nest, the female spawns eggs -- but only if she is stimulated at her hind quarters.

Once the eggs are laid, the female leaves the nest and the territory.

The male fertilizes the eggs, fans them to maintain an adequate oxygen supply around them, and cares for the young after hatching (until they're ready to go off to school).

When the young are hatched the red belly fades, and the male no longer incites males and attracts females -- until the next mating cycle starts.

Notice the serial organization to this pattern of stickleback behaviors. It is as if each act is the releasing stimulus for the next one. There is no flexibility in this sequence: once initiated, it does not stop, provided that the appropriate releasing stimulus is present. If any element in the sequence is left out, the entire sequence will stop abruptly. All three parties go through this pattern of behaviors, even if one of them doesn't remotely resemble a stickleback. For example, a female, ready to mate, will enter the nest if she observes a tongue depressor, painted red on one half, imitate the zig-zag dance!

Tinbergen observed a similar pattern in the mating behavior of the black-headed gull ("The Courtship of Animals", Scientific American, November 1954).

A unmated male settles on a mating territory. He reacts to any other gull that happens to come near by uttering a "long call" and adopting an oblique posture. This will scare away a male, but it attracts females, and sooner or later one alights near him. Once she has alighted, both he and she suddenly adopt the "forward posture". Sometimes they may perform a movement known as "choking". Finally, after one or a few seconds, the birds almost simultaneously adopt the "upright posture" and jerk their heads away from each other. Now most of these movements also take place in purely hostile clashes between neighboring males. They may utter the long call, adopt the forward posture and go through the choking and the upright posture.

The final gestures in the courtship sequence -- the partners' turning of their heads away from each other, or "head-flagging" -- is different from the others: it is not a threat posture. Sometimes during a fight between two birds we see the same head-flagging by a bird which is obviously losing the battle but for some reason cannot get away, either because it is cornered or because some other tendency makes it want to stay. This head-flagging has a peculiar effect on the attacker: as soon as the attacked bird turns its head away the attacker stops its assault or at least tones it down considerably. Head-flagging stops the attack because it is an "appeasement movement" -- as if the victim were "turning the other cheek". We are therefore led to conclude that in their courtship these gulls begin by threatening each other and end by appeasing each other with a soothing gesture.

Sounds a little like Shakespeare's The Taming of the Shrew. Or any of a number of screwball comedies from the 1930s.

For a good treatment of instinctual behavior, see N. Tinbergen,The Study of Instinct (1969).

For a positive treatment of sociobiology, see E.O. Wilson,Sociobiology: The New Synthesis (1975).

For extensions of sociobiology to psychology, see The Adapted Mind : Evolutionary Psychology and the Generation of Culture edited by Jerome H. Barkow, Leda Cosmides, and John Tooby (1992), and Evolutionary Psychology: The New Science of the Mind (1999) by David M. Buss.

Instincts in Humans?

Taxes and instincts are important elements in behavior, especially of invertebrates, birds, and reptiles. Some psychologists and behavioral biologists argue that much human behavior is also instinctual in nature. One of the first to make this argument was MacDougall, who argued that human behavior was rooted in instinctual behaviors related to biological motives. One of his examples, which is offered here without comment (except to note that similar descriptions could be made of the behavior of men), is reminiscent (at least in tone) of what Tinbergen discovered in sticklebacks:

The flirting girl first smiles at the person to whom the flirt is directed and lifts her eyebrows with a quick, jerky movement upward so that the eye slit is briefly enlarged. Flirting men show the same movement of the eyebrows. After this initial, obvious, turning toward the person, in the flirt there follows a turning away. The head is turned to the side, sometimes bent toward the ground, the gaze is lowered, and the eyelids are dropped. Frequently, but not always, the girl may cover her face with a hand and she may laugh or smile in embarrassment. She continues to look at the partner out of the corners of her eyes and sometimes vacillates between looking at, and looking away.

Among modern biological and social scientists, this point of view is expressed most strongly by the practitioners of sociobiology, especially E.O. Wilson, who argue that much human social behavior is instinctive, and part of our genetic endowment. More recently, similar ideas have been expressed by proponents of evolutionary psychology such as Leda Cosmides, John Tooby, and David Buss. At their most strident, evolutionary psychologists claim that our patterns of experience, thought, and action evolved in an environment of early adaptation(EEA) -- roughly the African savanna of the Pleistocene epoch, where homo sapiens first emerged about 300,000 years ago -- and have changed little since then. Although this assertion is debatable, to say the least, the literature on instincts makes it clear that evolution shapes behavior as well as body morphology. Many species possess innate behavior patterns that were shaped by evolution, permitting them to adapt to a particular environmental niche. Given the basic principle of the continuity of species, it is a mistake to think that humans are entirely immune from such influences -- although humans have other characteristics that largely free us from evolutionary constraints.For a discussion of evolutionary psychology, see the lectures on Psychological Development.

Meanings of "Instinct"

The concept of instinct has had a difficult history in psychology, in part because early usages of the term were somewhat circular: some theorists seemed to invoke instincts to explain some behavior, and then to use that same behavior to define the instinct. But, in the restricted sense of a complex, discriminative, innate response to some environmental stimulus, the term has retained some usefulness. For example, the psychologist Steven Pinker has referred to language as a human instinct.

Nevertheless, the term instinct has evolved a number of different meanings, as outlined by the behavioral biologist Patrick Bateson (Science, 2002):

present at birth (or at a particular stage of development);
not learned;
developed before it can be used;
unchanged once developed;
shared by all members of the species (at least those of the same sex and age);
organized into a distinct behavioral system (e.g., foraging);
served by a distinct neural (brain) module;
adapted during evolution;
differentiated across individuals due to their possession of different genes.

Bateson correctly notes that one meaning of the term does not necessarily imply the others. Taken together, however, the various meanings capture the essence of what is meant by the term "instinct".

From Instinct to Learning

Innate response tendencies such as food-begging can be very powerful behavioral mechanisms, especially for invertebrates and non-mammalian vertebrate species. In their natural environment, some species seem to live completely by virtue of reflex, taxis, and instinct.

Limitations on Innate Behaviors

But at the same time, these innate behavioral mechanisms are extremely limited. They have been shaped by evolution to enable the species to fit a particular environmental niche, which is fine so long as the niche doesn't change. When the environment does change, evolution requires an extremely long time to change behavior (or body morphology, for that matter) accordingly -- much longer than the lifetime of any individual species member.

Consider, for example, the behavior of newborn sea turtles. Female turtles lay their eggs on the beach above the tide line, and these eggs hatch at night in the absence of the parents. As soon as they have hatched, the hatchlings begin walking toward the water (what you might call a "positive aquataxis"): when they reach it, they begin to swim (another innate behavior), and live independently. However, the young turtles are not really walking toward the water: they are walking toward the reflection of the moon on the water (thus, a positive phototaxis). This hatching behavior evolved millions of years ago. Since then, however, the beaches where the turtles hatch have become crowded with hotels, marinas, oil refineries, and other light sources. Accordingly, these days, the hatchling turtles will also move toward these light sources, and die before they ever reach water. The animals' behavior evolved when the only light in the environment was from the sun and the moon, and they just don't know any better.In order to prevent a disaster, beach-side hotels and oil refineries now take steps to employ different kinds of light, or block their lights entirely.

Hatching Behavior in Ridley's Sea Turtles
Here's how hatching works in Ridley Sea Turtles. Images from the 2014 calendar of Sea Turtle Inc., an organization devoted to education and recovery efforts.
Female sea turtle lays eggs above the high-water line.
Here's a clutch of newly hatched sea turtles, before they make their run for the water.
And here they are, headed toward the water -- or, more properly, the light.
Here's a close-up view.
Notice that they're not distracted by the presence of people (in this case, onlookers at a release of hatchlings fostered by the organization). This is pure instinct. All they care about is the reflected light. and they don't even care about that. It's just mindless innate behavioral response.

Now perhaps, there is some subtle difference (like polarization) between moonlight and electrical light. If so, individual animals who can make this distinction, moving toward one and not the other, will survive, reproduce, and, over time, generate more individuals who can make this distinction. But again this takes time -- assuming that any individual can make the distinction in the first place. But even so, each individual gets only one chance. If it makes the right "choice", this behavioral tendency will pass on to successive generations, and the species may eventually come to distinguish between "good" and "bad" light -- provided that the species doesn't go extinct first. But that just illustrates the point that evolved behavior patterns take a very long time to change.

In June 2011, a group of diamondback terrapins caused the temporary shutdown of Runway 4 Left at New York's Kennedy International Airport. And it's happened before. The runway crosses a path that the turtles take from Jamaica Bay one side to lay their eggs on the sandy beach on the other side. Usually, in egg-laying season, the runway is not in frequent use, due to prevailing winds. But that day was an exception, and the turtles brought takeoffs and landings to a halt for about an hour until they could be moved to their destination (we don't know what happened when they tried to get back in the water). It's another example of the difficulty that animals have in adjusting evolved patterns of behavior to rapidly changing environmental circumstances. (See "Delays at JFK? This Time, Blame the Turtles" by Andy Newman,New York Times 06/30/2011).

Here's another example: seabirds, like albatrosses, feed their young through the same sort of instinctual food-begging shown by herring gulls. Adult albatrosses forage over open water, dive to catch fish swimming near the surface, and then regurgitate the fish into the mouths of their young. But it's not only fish that are near the surface. There's a lot of garbage in the ocean, as well. The birds don't know the difference -- they're operating solely on reflex. That garbage is of relatively recent vintage, so there hasn't been enough time -- assuming it were even possible -- for the birds to evolve a distinction between fish and garbage. The result is that adult albatrosses pick up garbage and regurgitate it into the bills of their chicks, who promptly die of starvation -- such as this albatross chick photographed on Midway Atoll in the Pacific.

And here's yet another example, a little closer to home. Wind farms like the one in Altamont Pass produce a large amount of electrical energy for California, reducing carbon emissions from coal-fired plants, and our dependence on Middle East oil. But they also create a hazard for birds, especially raptors, who like to forage for small mammals over open areas. Never mind that wind farms are built where there is strong, steady wind, and therefore often on migratory flight paths. The result is that a large number of raptors and other birds are killed every year because they run into the blades of the windmills.

In general, we can identify several limitations on innate response patterns:

The releasing stimulus must be physically present in the current environment. There is no way for the animal to respond to an image or idea or memory of a releasing stimulus.
Instincts and similar fixed action patterns only permit responses to be elicited by external stimuli; they do not permit action to be directed by internal goals.
Because the response patterns are built in over evolutionary time, the organism cannot respond flexibly to new stimuli, or quickly generate new behaviors in response to old or new stimuli.

Thus the problem: everyday life requires many organisms to go beyond simple, innate patterns of behavior, and acquire new responses to new stimuli in their environment.

Evolutionary Traps

Ecologists and evolutionary biologists are becoming increasingly aware of the problems caused by rapid environmental change. The United Nations Summit on Sustainable Development, held in Johannesburg, South Africa, in 2002, drew international attention to the fact that "nature", far from being "natural", has in fact been remade by human hands. According to Andrew C. Revkin, "People have significantly altered the atmosphere, and are the dominant influence on ecosystems and natural selection (see his article, "Forget Nature. Even Eden is Engineered", and other articles in a special section on "Managing Planet Earth",New York Times, 08/20/02). Even in the early part of the 20th century, Revkin notes, the geochemist Vladimir I. Vernadsky had suggested that "people had become a geological force, shaping the planet's future just as rivers and earthquakes had shaped its past". Now in the 21st century, with the growth of megacities, the increase in population, and the disappearance of the forests, to name just a few trends, we are beginning to recognize, and deal with, the impact of human activity on the environment.

The human impact on the environment doesn't just affect the conditions of human existence. Nature is a system, and what we do affects animal and plant life as well, and sometimes in non-obvious ways.

In a recent paper in Trends in Ecology & Evolution (10/02), Paul W. Sherman and his colleagues, Martin A. Schlaepfer and Michael C. Runge, detail a number of "evolutionary traps", mostly caused by the impact of human activity which alters the natural environment -- activity which goes beyond the simple destruction of habitat, which would be bad enough. More subtle changes alter the environment in such a way that a species' evolved patterns of behavior are no longer adaptive, reducing the chances of individual survival and reproduction, and eventually leading to the decline and extinction of the species as a whole. As Sherman puts it, "Evolved behaviors are there for adaptive reasons. If we [disrupt] the normal environment, we can drive a population right to extinction" ("Trapped by Evolution" by Lila Guterman,Chronicle of Higher Education, 10/18/02).

The concept of evolutionary trap is a variant on the more established notion of an ecological trap, in which animals are misled, through human environmental change, to live in less-than-optimal habitats, even though more suitable habitats are available to them. For example, Florida's manatees have progressively moved north, attracted by the warm water discharged by power plants; but when the plant goes down for maintenance, the water cools to an extent that they can no longer survive in it.

Some examples of evolutionary traps:

The male buprestid beetle (Julodimorphabakewelli) of Australia recognizes the female of its species as a brown, shiny object with small bumps on its surface. However, this is also what some Australian beer bottles look like. Accordingly, males will frequently be found attempting to mate with beer bottles, instead of with more appropriate partners. The solution is to get Australians not to litter.
American wood ducks,Aix sponsa, build nests in the cavities of dead trees. When wildlife managers constructed nesting boxes for them, in an attempt to help them meet the demands of habitat loss, the animals actually declined. The reason is that female wood ducks adapted to the loss of natural nesting places by following each other to the few sites that were still available. When the artificial nesting boxes appeared, they all gravitated to the same ones, and laid too many eggs in individual boxes to incubate properly. The solution was to hide the boxes in the woods, increasing the likelihood that individual ducks would find their own nesting sites.
Male Cuban tree frogs,Osteopilus septentrionalis, attempt to mate with females that are actually roadkill (at least they don't move!). Not only does this increase the chance that they themselves will be run over by cars and trucks, but of course the exercise yields no offspring.
Due to global warming, yellow-bellied marmots,Marmota flaviventris, come out of hibernation too early in the season for food to be available, and so many will starve.
Insects are famously attracted to light, and this positive phototaxis includes artificial light at night (ALAN). Entomologists and ecologists have become increasingly concerned about the effect if ALAN on insect populations, because ALAN threatens the survival of the species, with effects that may be felt throughout entire ecosystems.

Mayflies provide a startling case in point. Mayfly larvae sit on the water: after they hatch, females live only a few minutes (males live for a day or two), during which time they mate and lay eggs back on the water. But those that live near bridges are attracted to the light, missing the opportunity to mate closer to the water surface; and many of those who do mate drop their eggs on the reflective pavement instead of in the water. The result is fewer mayflies in the next generation. And since mayflies help control algae and serve as fishfood, the effects of this interruption of the mating cycle are felt throughout their ecosystem. Similar effects of ALAN are affecting other insect species: it's been estimated that ALAN has reduced insect populations by 80% in some areas; and that because of ALAN, as many as 40% of insect species may be headed for extinction. (For more information, see "Fatal Attraction to Light at Night Pummels Insects" by Elizabeth Pennisi, Science, 05/07/2021.)

Learning Defined

In vertebrates, and especially mammalian species, everyday action goes beyond such innate behavior patterns. These organisms can also acquire new patterns of behavior through learning.

Psychologists define learning as: a relatively permanent change in behavior that occurs as a result of experience.

This definition excludes changes in behavior that occur as a result of insult, injury, or disease, the ingestion of drugs, or maturation. Learning permits individual organisms, not just entire species, to acquire new responses to new circumstances, and thereby to add behaviors to the repertoire created by evolution. In addition,social learning permits one individual species member to share learning with others of the same species (this is one definition of culture). The pace of social learning far outstrips that of evolution, so that learning provides a mechanism for new behavioral responses to spread quickly and widely through a population. Although all species are capable of learning, at least to some degree, learning is especially important in the natural lives of vertebrate species, and especially in mammalian vertebrates. Like us. And, it turns out, most human learning is social learning: we learn from each other's experiences, and we have even developed institutions, like libraries and schools, that enable us to share our knowledge with each other.

Classical Conditioning

One important form of learning, classical conditioning, was accidentally discovered by Ivan P. Pavlov, a Russian physiologist who was studying the physiology of the digestive system in dogs (work for which he won the Nobel Prize in Physiology or Medicine in 1904). Pavlov's method was to introduce dry meat powder to the mouth of the dog, and then measure the salivary reflex which occurs as the first step in the digestive process. Initially, Pavlov's dogs salivated only when the meat powder was actually in their mouths. But shortly, they began to salivate before the powder was presented to them -- just the sight of the powder, or the sight of the experimenter, or even the sound of the experimenter walking down the hallway, was enough to get the dogs to salivate. In some sense, this premature salivation was a nuisance. But Pavlov had the insight that the dogs were salivating to events that were somehow associated with the presentation of the food. Thus, Pavlov moved away from physiology and initiated the deliberate study of the psychic reflex -- not, as the term might suggest, something out of the world of parapsychology, but rather a situation where the idea of the stimulus evokes a reflexive response. Pavlov called these responses conditioned(or conditional)reflexes.

In honor of Pavlov's discovery, this form of learning is now called "classical" conditioning -- the term was coined by E.R. Hilgard and D.G. Marquis in their foundational text, Conditioning and Learning (1940). A classical conditioning experiment involves the repeated pairing of two stimuli, such as a bell and food powder. One of these stimuli naturally elicits some reflex, while the other one doesn't. With repeated pairings, the previously neutral stimulus gradually acquires the power to evoke the reflex. Thus, classical conditioning is a means of forming new associations between events (such as the ringing of a bell and the presentation of meat powder) in the environment.

The apparatus for Pavlov's experiments included a special harness to restrict the dog's movement; a tube (or fistula) placed in its mouth to collect saliva, a mechanical device for introducing meat powder to its mouth, and some kind of signal such as a bell.

Some writers have questioned whether Pavlov actually used a bell, as the myth has it. Pavlov was actually unclear on this detail in his own writing, and he probably used a buzzer, or a metronome, more often than he used a bell. And the bell was more like a doorbell then a handbell. He also used a harmonium to present different musical notes, and even electrical shock. A major problem is that the Russian words that Pavlov used to label his "bell" don't translate unambiguously into English. For an excellent biography which puts Pavlov's work in its social and scientific context, see Ivan Pavlov: A Russian Life in Science by Daniel P. Todes (2014).

In Phase 1 of the conditioning procedure, Pavlov presented the sound (or whatever) and the food separately. The dog would salivate to the food but make no response to the bell.
In Phase 2, Pavlov presented the sound immediately followed by food. The dog would still salivate to the food; but after several trials, it would begin to salivate to the sound as well.
In Phase 3, Pavlov presented the sound alone, no longer followed by food. After several more trials, the conditioned salivary response would eventually disappear.

The Sad Case of Edwin B. Twitmeyer

Actually, Pavlov wasn't alone in discovering classical conditioning. He had a co-discoverer in Edwin B. Twitmeyer (1873-1943), who reported on "Knee Jerks Without Stimulation of the Patellar Tendon" in his doctoral dissertation, completed in 1902 at the University of Pennsylvania (yay!), and at the 1904 meeting of the American Psychological Association, held in Philadelphia (Pavlov first reported his discovery at the 1903 International Medical Congress, in Madrid).

Just as Pavlov's discovery of the conditioned salivary reflex was accidental, so was Twitmeyer's. His dissertation was actually concerned with variability in the patellar reflex -- which, as an innate reflex, was supposed to be invariant. It wasn't. The patellar reflex is elicited, as everyone who has had a physical exam knows, by striking the patellar tendon (roughly on the kneecap) with a rubber hammer. Twitmeyer used a bell to warned his subjects that the hammer-blow was coming, and after about 150 pairings of bell and hammer, one of his subjects gave an involuntary knee-jerk response after hearing the bell, but before the hammer struck. Twitmeyer subsequently confirmed this observation in the remainder of his subjects.

Although Twitmeyer's dissertation was privately published, and thus not widely accessible, an abstract of his 1904 APA talk was published in the widely read Psychological Bulletin for 1905. The APA talk itself got a chilly reception: After Twitmeyer concluded his presentation William James, who was chairing the session, adjourned the meeting for lunch, effectively precluding any substantive discussion. Twitmeyer himself never followed up on his discovery. Having already been appointed to the psychology faculty at Penn after receiving his bachelor's degree from Lafayette College in 1896, he rose to the rank of full Professor in 1914, and served as Director of Penn's Psychological Laboratory and Clinic. Twitmeyer identified himself as a clinical psychologist, and most of his subsequent research concerned speech disorders, especially in children.

The Basic Vocabulary of Classical Conditioning

The procedure just described illustrates the basic vocabulary of classical conditioning:

The unconditioned stimulus (or US) is a stimulus (like the presentation of meat powder) that reliably evokes a reflexive response (like salivation).
The unconditioned response (or UR) is the innate reflexive response that is reliably evoked by an unconditioned stimulus.
The conditioned stimulus (or CS) is a stimulus (such as the ringing of a bell) that does not itself reliably evoke any particular reflexive response. In classical conditioning, the CS is paired with the US.
The conditioned response (or CR) is the response that comes to be evoked by a previously neutral conditioned stimulus (CS), after many pairings between the CS and the US. The CR generally resembles the UR.

As with the "bell-buzzer" debate, Pavlov didn't actually call his stimuli and responses "conditioned" and "unconditioned". Rather, the term he used in Russian translates better as "conditional" and "unconditional". And as we'll see later, the suffix "-al" is actually more appropriate. to make a long story short, the conditioned response is conditional on presentation of the conditioned stimulus. But Pavlov's first translators used the terms "conditioned" and "unconditioned", which frankly sound better in English, and it's too late to change our vocabulary now.

The process by which a conditioned stimulus acquires the power to evoke a conditioned response is known as acquisition. In traditional accounts of conditioning, acquisition of the CR occurs by virtue of the reinforcement of the CS by the subsequent US. The strength of the CR is measured in various ways:

The magnitude of the CR (such as the number of drops of saliva or its liquid volume). The magnitude of the CR is typically limited by the magnitude of the UR.
The probability that the CR will occur at all (e.g., the likelihood that any amount of salivation will follow presentation of the bell.

On the initial acquisition trial, when the CS and the US are paired for the very first time, there is only an unconditioned response to the US; there is no conditioned response to the CS.

On later trials, we begin to observe a response that resembles the UR, occurring after presentation of the CS but before presentation of the US. This is the first appearance of the CR.

Even later, we may observe the CR immediately after the presentation of the CS, well before the presentation of the US.

The characteristic "sigmoidal" or S-shaped curve portraying the acquisition of the CR is an ogive, in which there is a slow increase in response strength on the initial trials, followed by a rapid increase in middle trials, and a further slow increase in the last trials, ending in a plateau.

The learning curve is commonly characterized as negatively accelerated, and that's true so far as the middle and latter portions of the learning curve are concerned.
But the very early portions are more accurately characterized as positively accelerated.

If you look across a number of different textbooks, you will see two different versions of the “classic” learning curve. The general form of the learning curve, illustrated above, is presented again on the left-hand side of this image. Conditioning begins slowly, then speeds up (positive acceleration), then slows down (negative acceleration), and finally plateaus out. We obtain this curve when the organism is naive, has had no prior learning experience, or the behavior being learned is relatively complex. With simple behaviors, or organisms which have had some prior learning experience, as in the right-hand portion of the slide, we see only the negatively accelerated portion of the learning curve: the conditioned response accrues strength rapidly at first, and then slows down. It's as if the organism already knows about learning, or doesn't have much to learn.

Actually, learning can occur even before a CS is paired with a US. When a novel stimulus (NS), such as Pavlov's bell, is presented for the very first time, the organism will show an reflexive orienting response (OR) -- perhaps a startle response -- to that stimulus. But if that stimulus is presented repeatedly, all by itself, the magnitude of the OR will progressively diminish. This is known as habituation. It counts as learning because there is a change in behavior -- in this case, a change in the OR -- that occurs as a result of experience. Habituation is the very simplest form of learning, and has been observed in animals as simple as protozoa (Penard, 1947) -- and since protozoa are one-celled creatures, you can't get any simpler than that!

Habituation is an example of nonassociative learning, because the organism is not acquiring an association between CS and US novel -- for the simple reason that there is no US. Actually, technically speaking, there isn't a CS, either. It's just a stimulus. .
Classical conditioning "proper" is classified as associative learning, because the organism does forms an association between the CS and US (or, according to some theories, between the CS and the UR).
But the label "nonassociative" is something of a misnomer. When psychologists were first studying learning, they believed that organisms acquired associations between environmental stimuli and behavioral responses. In habituation, though, there's just a stimulus, and there's no response -- at least, once habituation has occurred. But that's not the right way to think about it. Later, when considering the research of Leo Kamin, we'll revisit habituation and see that, ff anything, habituation entails acquiring an association after all -- between a stimulus and nothing.

If the NS is now paired with a US, so that the NS becomes a CS, conditioning will occur. However, the CR will be acquired at a slower rate than if there had been no prior habituation trials. This phenomenon is known as latent inhibition (Lubow, & Moore, 1959).

Extinction is the process by which the CS loses the power to evoke the CR. Extinction occurs by virtue of unreinforced presentations of the CS -- that is, presentation of the CS alone, without subsequent presentation of the US. When the CS is no longer paired with the US, the CR loses strength relatively rapidly.

On the first extinction trial, there is a strong CR: after all, the organism does not yet "know" that the US has been omitted.

On later trials, the magnitude of the CR falls off, until it disappears entirely.

On extinction trials, the CR loses strength relatively rapidly. But it is not lost entirely, and it is possible to demonstrate that the CR is still present, in a sense, even after it seems to have disappeared.

Habituation can be thought of as a special case of extinction, in that the organism learns not to respond to the NS.

Spontaneous recovery is the unreinforced revival of the conditioned response. If, after extinction has been completed, we allow the animal a period of inactivity, unreinforced presentation of the CS will evoke a CR. This CR will be smaller in magnitude that that observed at the end of the acquisition phase, but CR strength will increase with the length of the "rest" interval.

If we continue with unreinforced presentations of the CS, the spontaneously recovered CR will diminish in strength -- it is extinction all over again.

If we continue with new reinforced presentations of the CS, the CR will grow in strength. The reacquisition of a previously extinguished CR is typically faster than its original acquisition, a difference known as savings in relearning.

During extinction, formal extinction trials can continue after the CR has disappeared, a situation known as extinction below zero. Of course, there is no further visible effect on the CR -- it is already at zero strength. However, extinction below zero has two palpable consequences: spontaneous recovery is reduced (though not eliminated), and reacquisition is slower (but still possible).

Spontaneous recovery, savings in relearning, and extinction below zero, have important implications for our understanding of the nature of extinction. Extinction is not the passive loss of the CR: the organism does not "forget" the original association between CS and US, and extinction does not return the organism to the state it was in before conditioning occurred. Spontaneous recovery and savings in relearning are expressions of memory, and they show clearly that the association between CS and US has been retained, even though it is not always expressed in a CR. Rather, it seems clear that the CR is retained but actively suppressed. Extinction does not result in a loss of the CR, but rather imposes an inhibition on the CR. The strength of the inhibition grows with trials, producing the phenomenon of extinction below zero. The inhibition also dissipates over time, producing spontaneous recovery. Thus, reacquisition isn't really relearning. Rather, it is a sort of disinhibition. Both acquisition and extinction, learning and unlearning, are active processes by which the organism learns the circumstances under which the CS and the US are linked.

Other major phenomena of classical conditioning can be observed once the conditioned response has been established. For example, The organism may show generalization of the CR to new test stimuli, other than the original CS, even there have been no acquisition trials on which these new stimuli have been associated with the US. The extent to which generalization occurs is a function of the similarity between the test stimulus and the original CS.

The generalization gradient is an orderly arrangement of stimuli along some physical dimension (such as the frequency of an auditory stimulus). The more closely the test stimulus resembles the original CS, the greater the CR will be. The generalization gradient provides one check on generalization: having been conditioned to respond to one stimulus, the organism will not respond to any and all stimuli. Response is greatest to test stimuli that most closely resemble the original CS. In 1987, Roger Shepard, a cognitive psychologist at Stanford, proposed a universal law of generalization (ULG), and claimed that this constituted the first universal law -- like Newton's laws of physics -- in psychology. According to the ULG, the likelihood of generalization decreases as an exponential function of the distance between two stimuli in psychological space. If not precisely universal, the ULG is nearly so: it has been found to apply across a large number of conceptual domains, sensory modalities, and holds for a variety of species.

Generalization, Frequency, and Musical Pitch

In discussing generalization of response among stimuli, it is easiest to use the example of the frequency of tones, because differences in frequency -- whether a tone is high or low -- are easy to appreciate. And the example is accurate so far as it goes. If you condition an animal to a tone CS of 250 cycles per second (cps; also known as hertz, abbreviated hz, after the physicist Heinrich Rudolf Hertz, 1857-1894), it will emit a stronger conditioned response to a tone of 300 hz than to one of 350 hz -- because a tone of 300 hz more closely resembles a tone of 250 hz than does a tone of 350 hertz.

With humans, though, things can get a little more complicated, because musical pitch is also related to the frequency of tones, but similarity among pitches is not just a matter of relative frequency. Thus, when tones are presented in the context of the diatonic scale familiar in Western music, the generalization gradient may be distorted by the vicissitudes of pitch similarities.

Tones that are an octave apart, such as Middle C and third-space C on the treble clef, are perceived as more similar than any other pair of tones.
Tones that are a major fifth apart, such as Middle C and second-line G on the treble clef, are also perceived as highly similar.
And tones that are a major third apart, such as Middle C and first-line E on the treble clef, are also perceived as similar, though not as similar as those separated by an octave or a major fifth.

Consider an experiment in which a subject is initially conditioned to respond to a tone of 262 hertz, roughly corresponding to Middle C. Such a subject may well show larger conditioned responses to tones of 524 hz (roughly 3rd-space C), 392 hz (second-line G), and 262 hz (1st-line E), than to either B-flat (233 hz) or D (292 hz), even though the former tones are more distant from the original CS, in terms of frequency, than the latter.

However, this may only occur if we establish a musical context for the tones in the first place -- for example, by embedding the C in the other pitches of the diatonic scale.Or by beginning the experiment by playing a tune in the key of C major. There are some experiments to be done here (hint, hint).

Discrimination provides a further check on generalization. Consider an experiment in which we present two previously neutral stimuli: one, the CS+, is always reinforced by the unconditioned stimulus; the other, the CS-, is never reinforced. As conditioning proceeds, the CS+ will come to elicit the CR, but the CS- will not acquire this power. If the CS+ and CS- are close to each other on the generalization gradient, both will initially elicit a conditioned response. But as conditioning proceeds, the CR to the CS+ will grow in strength, while the CR to the CS- will extinguish. The CR is only elicited by CSs that are actually associated with the US.

New conditioned responses can also appear even if they are very dissimilar to the original conditioned stimulus. Consider the phenomenon known as sensory preconditioning, which occurs before acquisition trials in which a CS is paired with a US .

In Phase 1 of a sensory preconditioning experiment, two neutral stimuli,CS1 and CS2, are initially presented together, without any reinforcing US. Neither of these CSs elicits any particular reflexive response. Because no US is involved, there will be no evidence of any CR being formed.
In Phase 2, the CS2 is reinforced by pairing it with a US, until a CR appears.
In Phase 3, we test CS1. If we have done the experiment right, the CR will also appear in response to CS1, even though CS1 has never been paired with the US.

Something similar happens in higher-order conditioning, except that the first two phases are reversed, so that higher-order conditioning occurs after acquisition trials in which CS is paired with US.

In Phase 1 of a higher-order conditioning experiment, a neutral stimulus,CS1, is paired with a reinforcing US, just as in the standard classical conditioning paradigm, until the usual CR appears.
In Phase 2, CS1 is preceded by another neutral stimulus, CS2, without any reinforcing presentation of any US.
In Phase 3, we test CS2. Again, if we have done the experiment right, the CR will also appear in response to CS2 -- even though, as in sensory preconditioning, CS1 has never been paired with the US.

The Scope of Classical Conditioning

By means of acquisition, extinction, generalization, discrimination, sensory preconditioning, and higher-order conditioning, stimuli come to evoke and inhibit reflexive behavior even though they may not have been directly associated with an unconditioned stimulus. By means of classical conditioning processes in general, reflexive responses come under the control of environmental events other than the ones with which they are innately associated.

The phenomena of classical conditioning are ubiquitous in nature, occurring in organisms as simple as the sea mollusk and as complicated as the adult human being.

Aplysia, a marine mollusk also known as the sea nonhare, has a only about 10,000 neurons in its entire nervous system, compared to 86 billion or more in the human nervous system; it doesn't have a brain as such, but its neurons are organized into nine ganglia (remember that the brain can be thought of as one huge ganglion). Nevertheless, the animal can display habituation and acquire simple conditioned responses. Prof. Eric Kandel of Columbia University shared the 2000 Nobel Prize for Physiology and Medicine for his work with Aplysia examining synaptic synaptic changes during learning (see Kandel, Science, 2001).
An even simpler organism, the Caribbean box jellyfish (Tripedalia cystophora), has only about 1,000 neurons distributed across its tentacles (roughly speaking), but no central nervous system as such. Nevertheless, the animal can learn to avoid obstacles as it moves around its environment, forming associations between visual and tactile stimuli (Bielecki et al., Current Biology, 2021).

Pavlov himself thought that all learning entailed classical conditioning, but this position is too extreme. Still, classical conditioning is important because, in a very real sense,

The laws of classical conditioning are the laws of emotional life.

Classical conditioning underlies many of our emotional responses to events -- our fears and aversions, our joys and our preferences.

Instrumental Conditioning

At roughly the same time as Pavlov was beginning to study classical conditioning, E.L. Thorndike, an American psychologist at Columbia University, was beginning to study yet another form of learning -- what has come to be known as instrumental conditioning. Beginning in 1898, Thorndike reported on a series of studies of cats in "puzzle boxes". The animals were confined in cages whose doors were rigged to a latch which could be operated from inside the cage. The animal's initial response to this situation was agitation -- particularly if it was hungry and a bowl of food was placed outside the cage. Eventually, though, it would accidentally trip the latch, open the door, and escape -- at which point it would be captured and placed back in the cage to begin another trial.

Over successive trials, Thorndike observed that the latency of the escape response progressively diminished. Apparently, the animals were learning how to open the door -- a learning which seemed to be motivated by reward and punishment.

On the basis of his studies of cats in puzzle boxes, Thorndike formulated a set of 8 Laws of Learning, of which three are particularly important for our purposes:

The Law of Readiness states that motivational states such as hunger arouse behavior.
The Law of Effect states that responses that lead to reward are strengthened, occurring more quickly and reliably, while responses that are unrewarded, or even punished, are weakened.
The Law of Exercise states that associations between stimuli (such as the puzzle box) and responses (such as tripping the latch) are strengthened by practice and weakened by disuse.

For the record, the other laws were:

The Law of Multiple Responses: organisms must be able to vary their responses to a stimulus, to give them the opportunity to stumble on the response which will be rewarded.
The Law of Set (or Attitude): an organism's momentary set or attitude will determine which rewards are effective (the opportunity to play tennis may not be rewarding to a golfer).
The Law of Prepotency of Elements: organisms must be able to distinguish between those elements of a situation that are really important, and those that are merely adventitious.
The Law of Response by Analogy: organisms response to novel situations by drawing analogies to familiar situations.
The Law of Associative Shifting: a response that has been conditioned to a number of different stimuli will be likely to be given in response to a new stimulus.

The general principle of instrumental conditioning is that adaptive behavior is learned through the experience of success and failure. Instrumental learning is also sometimes called operant conditioning, because the organism "operates" on the environment, changing it in some way (for example, changing the cage from one whose door is closed to one whose door is open), and this behavior is "instrumental" in obtaining some desired state of affairs (like food or simply escape from confinement).

Beginning in the 1930s, the study of instrumental conditioning was taken up by B.F. Skinner, a radical behaviorist.Behaviorism was a school of psychology founded by John B. Watson, then at Johns Hopkins University, who believed that psychology could become a legitimate science only by eliminating references to hypothetical mental states (which cannot be publicly observed) and confining the analysis to the relations between publicly observable behavior and the publicly observable environmental conditions under which it is observed. (Watson was forced to resign from Hopkins over a sexual scandal, and went on to a career in advertising. He invented the notion of the "coffee break" as a promotion for Maxwell House Coffee.) Like Watson, Skinner thought that behavior could be, and should be, explained solely in terms of the associations between stimuli and responses, and without reference to hypothetical states (such as hunger) existing in a hypothetical mind of an organism (including humans). Thus the term S-R behaviorism. Skinner was something of a visionary, and he is famous for his utopian novel, Walden II, which describes a community organized along behaviorist lines (he was an English major in college, contemplated a career as a writer, and indeed wrote some very beautiful stuff); and for his meditation on human nature,Beyond Freedom and Dignity. Both are very provocative books. A collection of Skinner's scientific papers, most of which are very readable, is entitled Cumulative Record.

A Note on Two "Functionalisms"

Tracing the relations between environmental stimuli (inputs) and organismic responses (outputs) is often called functional behaviorism, or simply functionalism, but this brand of functionalism (which is currently popular among some philosophers of mind and some theorists in artificial intelligence, a branch of cognitive science) should be clearly distinguished from the 19th-century "Chicago functionalism" of John Dewey and James Rowland Angell (Angel was, however, Watson's graduate mentor), which had its roots in the work of William James and which underlies this course.

Skinner refined Thorndike's apparatus into what has become known as the Skinner box, though Skinner himself did not use the term and actually disliked it. He preferred the term operant chamber. A generic operant chamber, intended to house an animal during learning trials, includes lights for presenting signals, levers or keys for collecting responses, a hopper for presenting food pellets, and a floor grid for presenting electrical shock.

In Phase 1 of a typical instrumental conditioning experiment a food-deprived animal is placed in the operant chamber. Notice that I did not describe the animal as "hungry". Like all behaviorists, Skinner abjured the use of mental- state terms like "huger", as unobservable and unscientific. Instead, he defined states of the organism in terms of publicly observable external referents, like hours and days of food-deprivation. Anyway, in Phase 1 of a typical instrumental conditioning experiment a food-deprived animal, like a pigeon, is placed in the operant chamber. Unlike Pavlov's dogs, which were restrained by harnesses, Skinner's pigeons were able to move about freely, and so they displayed a wide variety of behaviors -- including pecking at the key (pigeons love to peck). Under these conditions, the experimenter observes the base rate of key-pecking behavior (or whatever other behavior is of interest) in the absence of reinforcement.
In Phase 2 of the experiment, the key is connected to the food hopper in such a way that pecking the key causes a food pellet to drop into the hopper; and the pigeon eats it. Pigeons, especially food-deprived pigeons, love to eat. During this phase of the experiment we observe an increase in key-pecking behavior over the baseline. The animal's key-pecking behavior leads to reward, and so, in accordance with Thorndike's Law of Effect, this behavior is strengthened. This is the acquisition phase.
In Phase 3 of the experiment we change the situation a little, so that key-pecks produce food only when the key is illuminated (alternatively, there may be two keys in the chamber, one illuminated and the other dark). When the light is on, key-pressing produces food; when it is off, key-pressing has no effect. During this phase of the experiment, the bird will peck only when the key is illuminated (or, alternatively, the bird will peck only at the key which is illuminated). This is discrimination learning.
In Phase 4 of the experiment we disconnect the key from the hopper entirely, so that key-pecking no longer leads to food at all. Under these circumstances key-pecking eventually returns to the baseline level. This is extinction. As in the case of classical conditioning, we can also observe the spontaneous recovery of an extinguished response, as well as savings in relearning if the key is reconnected to the hopper.

The "Superstition" Experiment

B.F. Skinner demonstrated the power of Thorndike's Law of Effect with the following classic "superstition" experiment. A food-deprived (remember, if you're a behaviorist you can't say hungry) pigeon was placed in an operant chamber. As pigeons are wont to do, it displayed a variety of random pigeon behaviors: it wandered around the chamber, it groomed itself, it flapped its wings and stretched its neck, it cooed, and it pecked at various locations. Every 30 seconds, a food pellet was dropped into the hopper of the operant chamber; this occurred regardless of the pigeon's behavior. Over trials, each bird developed a stereotyped pattern of behavior, but the precise nature of this pattern was different for each bird. The only regularity was this: whatever behavior that had been emitted at the time that the first pellet dropped now began to occur more frequently.

This is a classic illustration of the Law of Effect. Initially, the association between behavior and reward was purely accidental. Nevertheless, following the principle that rewarded responses are strengthened, while unrewarded and punished responses are weakened, that particular behavior began to occur more frequently. Therefore, the bird was more likely to be displaying that behavior the next time a food pellet dropped into the hopper. So, that behavior was strengthened even more. Eventually, whatever behavior had originally coincided with reinforcement comes to dominate the behavior of that individual bird -- all because of an initially accidental link between behavior and reward.

And the "Air Crib"

There's a kind of urban legend circulating that Skinner raised his children in an infant-sized Skinner box: it's not true. Skinner, an inveterate tinkerer, did invent what he called the "Air Crib", a climate-controlled environment which he hoped would ease some of the burdens of child-rearing and foster child development. The Air Crib looked like a regular, if somewhat large, crib. It had a ceiling, three opaque walls, and a glass pane which could be opened to move the infant in and out. There were controls for temperature and humidity, a canvas floor, and sheeting which could be removed and washed when soiled. In this way, the infant had considerable freedom of movement. Skinner was publicized the Air Crib in an article in the Ladies Home Journal entitled "Baby in a Box: The Mechanical Baby-Tender" (1945). It has been estimated that at least 300 infants were raised in a version of the Air Crib (see Robert Epstein, "Babies in Boxes",Psychology Today, 1995). And contrary to rumors that Deborah eventually sued her father and committed suicide, she was alive and well in 2004, when she wrote a newspaper Op-Ed piece in the (Manchester)Guardian that was very positive about both Skinner and the device.

And a Surely Apocryphal Story

There is a story -- recounted by Martin E.P. Seligman in his autobiography, The Hope Circuit: A Psychologist's Journey from Helplessness to Optimism (2019) -- that the students in one of Skinner's undergraduates classes decided to look interested only when he moved toward the left corner of the stage, with the result that, eventually, he ended up stuck there.

The Vocabulary of Instrumental Conditioning

The experiment described above illustrates the basic vocabulary of instrumental conditioning, whose terms largely parallel that of classical conditioning -- though be careful, because their meaning sometimes changes slightly.

Reinforcement (Rft) is an event which increases the strength (probability) of the behavior (the conditioned response) which preceded it.

Positive reinforcement is presented following the conditioned response;
Negative reinforcement is terminated following the conditioned response.

Note that "positive" and "negative" do not necessarily mean "pleasant" (e.g., food) and "aversive" (e.g., shock). As it happens, positive reinforcers are typically pleasant (presentation of food is a good thing if you're a food-deprived pigeon); but then again, so are negative reinforcers (termination of shock is also a good thing). Reinforcers always increase the probability of the behavior being reinforced. This is the hardest thing about instrumental conditioning to get straight, because it is the most counterintuitive use of language. Blame Skinner, don't blame me. When most people think of "negative reinforcement", they really mean "punishment".Punishment has a technical meaning in the literature on instrumental conditioning, as it entails the presentation of a negative reinforcer.

A conditioned response (CR) is the behavior which is strengthened by reinforcement. The strength of the CR is usually indicated by response rate, or the frequency with which the organism displays the behavior.

A conditioned stimulus (CS) is an environmental event which leads to the performance of a conditioned response. Put another way, the CS is a signal or cue that the CR will be reinforced. Sometimes, as in Phase 2 of the typical experiment described above, the CS is the operant chamber itself. That is, the presence of the pigeon in the chamber is a cue that key-pecking will produce food. Other times, as in Phase 3 described above, the CS is some discrete feature of the environment -- such as a lighted key, or a buzzer or tone.

These technical definitions of CS and CR give us the term stimulus-response (or S-R) learning theory. The animal learns that emitting the CR (key-pecking) in the presence of the CS (the illuminated key) leads to reinforcement (food in the hopper). Or, to be a strict, radical, Skinnerian, functional behaviorist, reinforcement of the CR in the presence of the CS leads to an increase in the rate of the CR.

Classical conditioning can also be described in S-R terms. The key is to remember how instrumental conditioning defines reinforcement -- as any stimulus that increases the likelihood of the conditioned response. Thus, in classical conditioning, the CR (e.g., salivating) is reinforced by the US (meat powder) in the presence of the CS (the bell). By virtue of this reinforcement, the CR comes to be emitted in the presence of the CS.

Note that in instrumental conditioning there is no discussion of unconditioned stimuli or unconditioned responses. This is because the behaviors in question are not reflexive in nature, as they are in classical conditioning. Rather, these behaviors are emitted spontaneously by the organism. They are what we ordinarily call voluntary, as opposed to the involuntary behaviors involved in classical conditioning -- except that radical behaviorists like Skinner didn't like to talk about "voluntary" responses, or anything else that smacked of "free will", because they felt that all behaviors were under control of environmental stimuli and reinforcements.

The Phenomena of Instrumental Conditioning

Similarly, the major phenomena of instrumental conditioning parallel the classical case.

There is the acquisition of a conditioned response by means of reinforcement;
the extinction of that response by withholding reinforcement;
the generalization of the CR across a generalization gradient as a function of the similarity between the test stimulus and the original CS;
and discrimination learning in response to a discriminative stimulus which indicates when the CR will be reinforced.

Schedules of Reinforcement

To a great degree, the major phenomena of instrumental conditioning parallel those observed in the classical case: acquisition, extinction, generalization, and discrimination. However, studies of instrumental conditioning also illustrate a new concept: schedules of reinforcement, each schedule resulting in a different pattern of behavior.

The term refers to the contingent relationship between the organism's emission of its response and the environment's delivery of reinforcement. In the continuous case, reinforcement is delivered after every CR. In the partial case, reinforcement is occasionally withheld.Partial reinforcement retards acquisition, but it also increases resistance to extinction.

Continuous and partial reinforcement are also terms that occur in the vocabulary of classical conditioning, and they have the same effects. But there is another category of reinforcement schedules,intermittent reinforcement, that is unique to instrumental conditioning. There are four general types of intermittent schedules of reinforcement.

In fixed ratio (FR) schedules, reinforcement is delivered after a specific number of CRs (thus, an FR7 schedule delivers reinforcement after the organism has made 7 CRs).
In variable ratio (VR) schedules, the ratio of responses to reinforcements varies randomly around some average (thus, in VR7 schedule, the organism may be reinforced after 5, 6, 7, 8, or 9 CRs, etc., but the ratio will average 7 CRs to every reinforcement).
In fixed interval (FI) schedules, reinforcement is delivered following the first CR after a specific time interval has elapsed (thus, in a FI30 schedule, the organism is reinforced 30 seconds after it performs the CR, but not before, regardless of the number of CRs it has emitted).
In variable interval (VI) schedules, the required delay varies randomly around some average (thus, in a VI30 schedule, reinforcement might be delivered 20, 25, 30, 35, or 40 seconds after the CR, averaging out to 30 seconds).
Another schedule is the differential reinforcement of low rates (DRL), in which reinforcement is delivered only if a long interval (say, 30 seconds) elapses between CRs. In the differential reinforcement of high rates (DRH), reinforcement is delivered only if the interval is very short (say, 1 second).
Other schedules of reinforcement represent variations and combinations of these.

The Cumulative Record

In textbook figures that depict the effects of various schedules of reinforcement, the organism's cumulative responses are plotted as a function of time (plotted on the horizontal or X axis). This is known as a cumulative record of responses. Every time the organism makes a response, the line moves up a notch on the vertical (Y) axis. Thus, a horizontal tracing means that the organism has made no responses. The slope of the tracing indicates the response rate: shallow slopes indicate a slow rate of response (relatively few responses per unit time), while steep slopes indicate a relatively rapid response rate (relatively many responses per unit time).

B.F. Skinner invented the cumulative record technique, and the term served as the title for his autobiography.

Each schedule of reinforcement produces its own characteristic pattern of behavior. For example, DRL schedules typically produce a string of "ritualistic" responses, that are ineffective in terms of controlling reinforcement but nevertheless effectively fill the long interval between reinforcements.

FR schedules produce a two-valued learning curve, showing a pause immediately after reinforcement, and then an abrupt shift to a very high response rate.
FI schedules produce a scallop-shaped learning curve, in which response rate diminishes immediately after reinforcement, and then gradually increases as the time for the next reinforcement approaches.

Both features are eliminated by switching from fixed to variable schedules, which produce constant, stable rates of response.

With VR, the organism displays a relatively high rate of responding.
With VI, the rate is somewhat lower.

Both VR and VI schedules are highly aversive for the organism being conditioned.

More on Schedules of Reinforcement

The Matching Law and the Monty Hall Problem

Animals (and humans) can also be put on concurrent schedules of reinforcement. For example, pecking a green key might be reinforced on a VI5 schedule, while pecking a red key might be reinforced on a VI10 schedule. In such cases, the organism will distribute its responses between the two keys in proportion to their rate of reinforcement -- for example, pressing the red key about twice as frequently as the green key. The fact that animals will distribute their responses in proportion to the rate at which those responses are reinforced is called the matching law, which was first announced by Richard Herrnstein (1970), B.F. Skinner's protege at Harvard; see also the review by Peter deVilliers (1977) -- who was, in turn, Herrnstein's protege.

The matching law, in turn, was one of the first contacts between experimental psychology and neoclassical economic theory, as it seemed to reveal a fundamental, perhaps universal, law governing rational choice.

An interesting illustration of the matching law is provided when pigeons are confronted with a version of the Monte Hall problem, popularized by Let's Make a Deal, a television game show. The show's host, Monte Hall, would offer a contestant a valuable prize, such as a car or a vacation, which is hidden behind one of three closed curtains; behind another curtain is nothing; but behind the third curtain is a booby-prize, like a goat. After the contestant makes his choice, Hall opens one of the curtains to reveal nothing, and then offers the contestant the opportunity to change his mind. Note that, at this point, the prize lies behind one of the remaining curtains, while the goat is behind the other one.

Most contestants choose to stick with their original choice (pose this to your friends, and see what they do). But this is the wrong choice. The prior probability that the prize lies behind the contestant's original choice is 1/3. But that's the probability that the prize lies behind any of the curtain. Accordingly, the probability that the prize lies behind the other curtain -- the one that the contestant did not originally choose -- has now doubled to 2/3. Many people don't get this, even after multiple trials with the problem. But it turns out that pigeons catch on pretty quickly -- they're really good at matching responses to reinforcement rates, perhaps because they don't over-analyze the problem, using erroneous theories that lead them to misestimate probabilities. We'll return to the liabilities of estimation later, in the lectures on "Thought and Language".

Animal Behavior Enterprises

It's probably obvious, but the principles of instrumental conditioning provide the means by which animals are trained -- whether circus elephants, dolphins at Sea World, or -- I kid you not -- pigeons directing guided missiles

During World War II, Skinner himself had a contract with the War Department to develop a homing system for guided missiles. A television camera would be mounted in the nose cone, and the pigeon would be trained to peck at the target to keep the missile on course. Of course, the pigeon was also in the nose cone, but let's not go there.

Nothing

ever came of Skinner's project, but two of his graduate students, Keller and Marian Breland, shunned academic careers and went into business as Animal Behavior Enterprises training animals for entertainment and advertising. Eventually, they set of a sort of amusement park in Hot Springs, Arkansas the IQ Zoo, where paying customers could watch animals perform amazing tricks, all by virtue of instrumental conditioning and the schedules of reinforcement. Perhaps the most famous was the Bird Brain, designed by Bob Bailey, one of their associates, featuring a chicken who appeared to play tick-tack-toe -- and defeated all opponents, even Skinner himself (photos from the University of Central Arkansas).

Later, and bringing things full circle, the Brelands and Bailey became involved in a Navy project to train dolphins for such tasks as mine-sweeping and retrieving tools that had been dropped by divers. Later, they worked with the CIA to train birds and mammals to serve as couriers, and also to implant listening devices.

Of course, various technological developments rendered the whole program moot. But the principles introduced by Skinner, and developed by the Brelands and their associates, are still at the core of training techniques used today at places like Sea World.

For the whole story of Bailey, the Brelands, and the CIA, see "Animal Intelligence" by Tom Vanderbilt, Smithsonian, 10/2013.

Schedules of Reinforcement, Your Smartphone, and You

The people who run Facebook, Twitter, Instagram, and other social media make deft use of schedules of reinforcement in order to keep subscribers checking their sites. In particular, social media "notifications" make use of variable-ratio and especially variable-interval schedules to create something like addictive behavior.

You get a notification that some messages is waiting for you. You check your mail, and find it's an ad from Hope Depot.
As soon as you turn your attention to something worthwhile, you get another notification: this time, it's from Office Max.
And then another one, but this time it's from your boyfriend (or, better yet, about your ex-boyfriend). You read it avidly, and then turn your attention elsewhere.
And then another one comes; maybe it's another one about your ex, or it's your boyfriend after all (why isn't he writing?).

And so it goes.... Notice that you're only reinforced intermittently schedule, and that the schedule is actually pretty much unpredictable. So you keep checking, and checking, and checking. Notice, in the graph earlier, that VR schedules produce very high rates of responding. And recall, from the discussion earlier, that VR and VI schedules are both highly aversive.

Sound familiar? Apparently, your cat does this, too. Probably your dog as well.

There are ways to overcome this problem. I experienced it myself when I retired and we put our house up for sale. Our real-estate agent, who was very good, insisted on communicating with us via text messaging. My wife and I had never texted: in fact, we had to go to the Geek Squad to find out how to do it. But for the next two months, until the sale of the house was complete, we were constantly receiving notifications that messages were waiting. And because of the way we had our notifications set up, we got notifications for every email, and everything else that came across our phones. It was awful. But it was also addictive, because we had to keep checking in case one of those messages was important. We were relieved when the house sold, and we could just turn off notifications altogether, and get back to our lives. Now we still never text, unless one of us is traveling.

So that's the first strategy for ridding yourself of the tyranny of notifications. Turn notifications off. Then put yourself, on a fixed-interval schedule of reinforcement. Check your phone only a couple of times a day. Or, if you can't manage that, then only once an hour (like on the hour). Or setup "push" notifications, so that you only get notified when certain individuals send you a message.

For more on this topic, see "How to Stop Checking Notifications All the Time" by Smitha Milli, a graduate student in computer science at UC Berkeley, studying artificial intelligence and machine learning.

Also Hooked: How to Build Habit-Forming Products (2014) by Nir Eyal, which laid out for the social-media and gaming industries all the little tricks, beginning with good old-fashioned Skinnerian variable schedules of reinforcement, to "subtly encourage customer behavior" and "bring users back again and again. He called it the "Hook Model", which tells you all you want to know. Subsequently, Eyal produced another book, Indistractable: How to Control Your Attention and Choose Your Life (2019), as a kind of corrective to his earlier work. One tactic: silence "notifications" so there won't be so many interruptions to what you should be doing. Another: always sit next to someone who can see your screen. Eyal has been criticized for selling the disease, and he does seem to be trying to have it both ways; at the same time, he rejects the idea that social media and gaming are addictive. The fault, he thinks, is not in the technology, but in us. Still, his strategies and tactics for avoiding internet and social-media addiction seem useful.

This whole system has come to be known as the attention economy, which is an offshoot of the information economy. Herbert Simon, a psychologist who won the Nobel Memorial Prize in Economics (see more in the lectures on Thinking), noted that the amount of information available in the environment exceeds the limits of human comprehension. Obtaining information comes with a price, in that it consumes attention, which is a limited resource (see more in the lectures on Memory). It follows, Simon noted, that "a wealth of information creates a poverty of attention, and a need to allocate that attention efficiently among the overabundance of information sources that might consume it". Simon called for the development of a new profession of "information managers" to help us obtain and manage information.

But most of don't have professional information managers, at the same time that advertisements, tweets, Facebook posts, blogposts, and the like keep coming at us from all quarters through our laptops, tablets, and smartphones. Every time your "notifications" go off, somebody is competing for your attention. Most likely, they accrue revenue each time you click on a link. Even if not, you're paying the price, by paying attention to the notification when you could be paying attention to something else much more interesting and/or productive.

For more on the information economy, see "The Information Economy" by Hal R. Varian, Scientific American, 09/1995).

For more on the attention economy, see "The Attention Economy" by Filippo Menezer and Thomas Hills, Scientific American, 12/2020).

Me, personally:
I don't blog and I don't tweet; I'm not on Facebook and I'm not Linked-in.

The Scope of Instrumental Conditioning

By means of instrumental conditioning in general, and schedules of reinforcement in particular, voluntary behaviors come under the control of environmental events. The phenomena of instrumental conditioning are ubiquitous, or nearly so: every vertebrate organism, and some invertebrates as well, is capable of acquiring behaviors under conditions of reward and punishment.

Thorndike and Skinner believed that most adaptive behavior is the product of instrumental conditioning. Again, their position is probably too extreme. But the laws of instrumental conditioning do appear to account for the acquisition, maintenance, and loss of both adaptive and maladaptive voluntary behavior -- habitual behaviors of all sorts, and actions performed under conditions of incentive.

Classical and Instrumental Conditioning Compared and Combined

In several respects, classical and instrumental conditioning appear to represent two quite different forms of learning.

Classical Conditioning	Instrumental Conditioning
Reinforcement is not contingent on the organism's behavior. The US is delivered following the CS, no matter what the organism does.	Reinforcement is contingent on the organism's behavior. The "reward" or punishment is not delivered unless the organism makes the response to be conditioned.
The response to be conditioned is elicited involuntarily by the US.	The response to be conditioned is spontaneously emitted by the organism as a "voluntary" behavior.
The response being conditioned is "involuntary" (or reflexive) in nature.	The response being conditioned is a "voluntary" (or spontaneous) response.
Because classical conditioning is limited to involuntary, reflexive responses, relatively few responses can be conditioned.	Because instrumental conditioning is open to any behavior (or combination of behaviors) the organism is capable of emitting, a large, possibly infinite, number of responses can be conditioned.

One Form of Learning After All?

Procedurally, the two forms of conditioning represent quite different procedures for studying learning:

In classical conditioning, the organism forms an association between two stimuli, the CS and the US.
In instrumental conditioning, the organism forms an association between a stimulus (the CS) and behavior (the CR).

Donahoe and Vegas (2004) have argued that these differences are more apparent than real, and that classical conditioning also entails an association between the CS and the CR.

On the other hand, it seems equally likely that in instrumental conditioning the organism is forming an association between two stimuli -- between the CS and the reinforcement.

Ultimately, as Donahoe and Vegas argue, it may be that classical and instrumental conditioning are simply two forms of the same underlying learning process. But for now, the procedural differences between them are great enough that we will continue to consider them to be different forms of learning. As will be argued later, in classical conditioning the organism learns to predict events; in instrumental conditioning the organism learns to control them.

Avoidance Learning

Although classical and instrumental conditioning appear (to me, anyway) to represent two different forms of learning, most examples of adaptive behavior appear to involve combinations of classical and instrumental conditioning. That is, through classical conditioning the organism learns to anticipate some future event; through instrumental conditioning it learns to cope with that event.

This sort of combination has been studied in the laboratory in the form of avoidance learning. The procedure in a typical avoidance learning experiment is as follows:

A dog is placed in a long apparatus known as a shuttlebox, consisting of two compartments separated by a low barrier.
A tone CS is followed by a shock US, as in a standard classical conditioning experiment.
If the dog moves to the other compartment during the shock, the tone and the shock are both terminated immediately. This is known as an escape response.
If the dog moves to the other compartment during the tone-shock interval,after the tone comes on but before the shock comes on, the tone is terminated immediately and the shock never comes on at all, until the next trial. This is known as an avoidance response.

Early in training, the animal neither escapes nor avoids, but (naturally) shows agitation when the shock is presented.

This agitated behavior leads, inadvertently, to escape -- much like Thorndike's cats inadvertently tripped the latch to open the door to their puzzle boxes. Over successive trials, the latency of the escape decreases.
Eventually the animal makes the "escape" response during the tone-shock interval, before the shock even comes on; this is, effectively, the first true avoidance response. Over further successive trials, the latency of the avoidance response decreases, until the animal makes it shortly after the tone is presented.

At this point, the experimenter may turn the shock off entirely. Even so, the animal will continue to make avoidance responses, as if the shock were still connected. In this sense, avoidance learning shows a failure of extinction.

The two-factor theory of avoidance learning proposed by O. Hobart Mowrer (1947) illustrates how avoidance combines classical and instrumental conditioning. According to Mowrer, by virtue of the pairing of the tone CS with the shock US two kinds of learning occur.

Because the unconditioned response to shock is fear, the animal acquires a classically conditioned fear response to the tone.
On the instrumental side, the escape response is reinforced by the termination of the shock (and the reduction of unconditioned fear), while avoidance is reinforced by the termination of the tone (and the reduction of conditioned fear).

As we will see later, Mowrer was somewhat wrong to attribute avoidance learning to the reduction of conditioned fear, but his essential point, that avoidance combines classical and instrumental conditioning, remains valid.

What is Learned in Conditioning?

So far, we have simply described the phenomena of conditioning -- acquisition, extinction, generalization, discrimination, reinforcement, and the like. But what actually happens in learning? Or, put another way, what is the organism learning from experience?

The Stimulus-Response Theory of Learning

Learning was once thought to be as automatic as reflexes, taxes, and instincts. Just as these are innate stimulus-response associations, part of the organism's biological endowment, so classical and instrumental conditioning was thought to represent acquired stimulus-response connections, formed as a result of experience but no less automatic.

As its name implies, S-R learning theory holds that what is learned in conditioning is an association between a stimulus and a response -- an association that is strengthened by reinforcement.

In the case of Pavlov's dogs, the association is between the bell CS and salivation, and the salivary CR is reinforced by the meat powder US.
In the case of Thorndike's cats, the association is between the puzzlebox CS and the lever-pressing, and the lever-pressing CR is reinforced by escape.
In the case of Skinner's pigeons, the association is between the key CS (or, perhaps, between the illuminated key) and key-pecking, and the key-pecking CR is reinforced by food pellets.

Traditional stimulus-response theories of learning were based on four assumptions:

Association by Contiguity: associations are formed between events that occur close together in space and time. Or, put another way, the repeated co-occurrence of two events creates an association between them, so that the appearance of one evokes the idea of the other. In classical conditioning, the contiguity is between two events in the environment: the conditioned stimulus and the conditioned response induced by reinforcing the conditioned stimulus with the unconditioned stimulus. In instrumental conditioning, the contiguity is between the organism's behavior (the conditioned response) and the situation (the conditioned stimulus) in which it is reinforced.
Arbitrariness: By virtue of reinforcement, any stimulus can become associated with any response, so long as the stimulus can be sensed by the organism (a blind rat can't respond to a visual stimulus) and the response is in the organism's repertoire as a voluntary or involuntary action (a rat can't be conditioned to fly). The arbitrariness assumption is also known as equipotentiality, a term already introduced in the discussion of the functional specialization of the brain.
The empty organism: Behavior (remember that the proponents of the S-R theory were mostly behaviorists in the mold of Watson and Skinner) can be understood solely in terms of stimulus inputs to and response outputs from the organism. In order to understand learning and other aspects of behavior, we do not need to go "inside" the organism to understand its inner structures and functions. We need only focus on stimuli and responses, and can treat the organism as if it were empty. In other words, the organism can be thought of as a "black box" connecting stimuli and responses -- a black box that need never be opened.
The passive organism: The organism is not active during learning. Rather, all the "action" is in the environment, which "stamps in" associations between contiguous stimuli and responses. This assumption gives us the metaphor of "conditioning", and the idea that behavior (reflexes in the case of classical conditioning, non-reflexive behaviors in the case of instrumental conditioning) are under the control of environmental events. There is no notion of intentionality or free will in stimulus-response behaviorism, nor any valid distinction between "voluntary" and "involuntary" behaviors -- because the very notion of a behavior being "voluntary" smacks of free will and mentalism, both anathema to radical behaviorism.

The stimulus-response theory of learning, and the assumptions on which it was predicated, dominated the study of learning for more than 50 years since Watson. Beginning in the 1960s, however, experiments began to challenge this view of learning as a passive, associationistic process. These experiments showed that there were two broad types of constraints on what can be learned -- biological and cognitive. And in revealing these constraints, research overturned the four assumptions of S-R learning theory and completely changed our view of learning.

Biological Constraints on Learning

One important line of research challenged the arbitrariness assumption that organisms could learn to attach any response in their repertoire to any stimulus in the environment, by showing that some conditioned responses are easier to acquire than others.

This research begins with work by the American psychologist John Garcia and his colleagues on a phenomenon known as taste-aversion learning (or bait shyness). Garcia grew up on a sheep ranch in the American southwest, where ranchers routinely used poison to control coyotes and other predators. Garcia knew from this experience that when animals eat poisoned food or drink poisoned liquids, and nonetheless survive, they will avoid that substance later (hence the term, "bait-shyness"). Garcia and his associates developed a laboratory analogue of bait-shyness in an attempt to study the anticipatory nausea which some cancer patients develop in the course of receiving chemotherapy. Garcia's paradigm was a variant on classical fear conditioning:

Rats were exposed to a compound CS while drinking water. By "compound", we mean that the CS was not a simple stimulus, such as Pavlov's bell. Rather, it was characterized as "bright, noisy, sweet" water: the water was flavored with saccharine, and there was a flashing light and clicking sound in the background. . The animals were exposed to all three elements of the compound CS simultaneously while water was made available for them to drink.And because they were all somewhat water-deprived, they all drank during exposure to the compound CS.
Exposure to the CS was followed by one of two unconditioned stimuli:

Foot shock: the delivery of an electrical shock through the floor grid of the test cage -- which elicits pain immediately as a UR.
A sub-lethal dose of X-rays, which induced nausea in the rats some time later. Note that X-rays cannot be sensed by the organism: they are invisible, make no sound, have no taste or smell, and cannot be felt. This fact is important, because it makes it clear that any association established is between the conditioned stimulus (bright, noisy, sweet water) and the unconditioned response (nausea), as traditional S-R theories of learning hold. There can be no association established involving an event that the organism cannot pick up through its sensory apparatus.

Garcia and his associates found that the animals' avoidance behavior depended on the US to which they had been exposed.

Later, learning was tested through an avoidance procedure. The animals were presented with two sources of water, and allowed to drink from either one.

From one source, the water was flavored with saccharine, but there were no sounds or lights presented.
From the other source, the water was unflavored, but drinking was accompanied by flashes and clicks.

If the US had been foot-shock, they avoided water associated with the bright, noisy CS and preferred the water associated with the sweet CS.
If the US had been X-rays, they avoided the sweet water and preferred the bright, noisy water.

In other words, the animals formed associations between shock and sight and sound, and between nausea and taste; but they made no connection between nausea and taste, or between shock and sight and sound. This outcome violates the arbitrariness assumption of traditional S-R theories of learning, because all elements of the compound CS occur at precisely the same time and place. Thus, they all have precisely the same spatial and temporal contiguity with respect to the US. Therefore, under the assumption of arbitrariness or equipotentiality, they should all have been equally powerful as CSs. But they were not.

This experimental outcome is commonly interpreted as indicating that the potency of a stimulus is related to the evolutionary history of the species. Rats are nocturnal animals, and under ordinary circumstances choose their food according to its taste. Therefore, their evolution has disposed them to form associations between the taste of food and its gastrointestinal consequences, but not between sight or sound and nausea. The explanation is supported by experiments on birds (like quail), who are sight-feeders. They quickly form associations between nausea and visual stimuli, but not between nausea and taste.

From Coyotes to Sheep to Wolves

Garcia became interested in bait shyness because of its use by sheepherders and other ranchers in the natural control of coyotes and other predators, but you don't have to be a predator to be susceptible to bait shyness.

In 2007, Morgan Doran, a farm advisor with the University of California Agricultural and Natural Resources Cooperative Extension, based in Davis, began a program of research on bait-shyness in sheep. Sheep and goats are often used for brush control and weed abatement -- you can see them, for example, in the Oakland and Berkeley Hills in an attempt to prevent wildfires from spreading through dry overgrowth. And vintners have been interested in using this same technique for weed control in vineyards.

That's all very good on paper, but the practical problem is how to get the sheep to eat the weeds, and not the very tasty tender shoots of young grapevines!

In Doran's study, a group of sheep are allowed to feed freely on vine leaves, and then they are fed a capsule filled with lithium chloride -- which, while not lethal, induces pretty severe nausea. A control group is also allowed to feed on the grape leaves, but gets a placebo capsule. Results from a pilot study indicates that the sheep will, in fact, avoid the grape leaves in the field, and focus their feeding on the leaves.

A similar project is underway in Marin County's dairyland, where cattle have been trained to prefer a particular kind of thistle.

Turning the tables, bait-shyness (and preparedness) has been enrolled in the effort to protect the Mexican wolf, which was hunted to near extinction by ranchers seeking to protect their cows and sheep from predation. An experiment with captive Mexican wolves shows promise in getting the animals to avoid sheep, and might be effective in wildlife management as well.

Who says that animal research has no practical significance!? Or that's it's bad for the animals.

Contiguity versus Contingency in Conditioning

For example, the principle of association by contiguity, already challenged by Garcia's experiments on taste-aversion learning, is further undermined by certain peculiarities of classical conditioning.

In what is known as the standard paradigm for classical conditioning, the CS precedes the US by a short interval, approximately 1 second, and the termination of the CS is simultaneous with the onset of the US. This situation usually yields excellent conditioning.

In delay conditioning, the duration of the CS is lengthened, although its termination is still simultaneous with the onset of the US. This situation also yields good conditioning, even though the temporal contiguity between CS and US onset has been degraded somewhat by the delay.

In trace conditioning there is also a delay, but in this case the CS goes off before the US comes on.In other words, whereas in delay conditioning there is a delay between CS onset and US onset, in trace conditioning there is a delay between CS offset and US onset. Nevertheless, trace procedures also yield good conditioning. Because of the interval between CS offset and US onset, trace conditioning must be mediated by something like a memory trace of the CS -- hence the name given to the procedure. But the important point is that, as in delay conditioning, trace conditioning gives good results despite the degradation in temporal contiguity between CS and US.

In simultaneous conditioning, the onset and of the C S and the onset of the US occur at precisely the same time. Obviously, this situation optimizes temporal contiguity. Nevertheless, in contrast to the standard, delay, and trace paradigms, conditioning does not occur in the simultaneous paradigm -- even though there is perfect contiguity between the CS and the US.
In backwards conditioning, the onset of the CS actually follows the onset of the US. However, the temporal distance between the two stimuli is preserved -- for example, a US-CS interval of about 1 second. In other words, the CS and the US are still highly contiguous in terms of the spatial and temporal relations between them.Nevertheless, no conditioning occurs. In fact, there is evidence that the formation of the CR is actually inhibited in the backwards paradigm. For example, in a standard fear-conditioning experiment, where a tone CS precedes a shock US by about 1 second, the animal will quickly acquire a conditioned response of heart-rate acceleration to the tone (remember that one of the components of the flight-or-fight response, mediated by activation of the sympathetic branch of the autonomic nervous system, is an increase in heart rate). However, in backwards conditioning, where the shock precedes the tone, the animal will actually show heart-rate deceleration in response to the tone.

These kinds of results highlight the distinction between contiguity and contingency.

In contiguity, the CS co-occurs with the US: they are contiguous, or close together, in space and time.
In contingency, the CS predicts the US: the occurrence of the US is contingent on the prior occurrence of the CS.

Given the results just summarized, we can conclude several things about the role of contiguity and contingency in conditioning.

Conditioning is best when the CS and US are both contiguous and contingent -- as in the standard paradigm, where the CS predicts that the US will occur shortly.
Conditioning is also good when the CS and US are contingent but not contiguous -- as in delay and trace conditioning, where the CS predicts that the US will occur after some delay.
Conditioning is poor when the CS and US are contiguous but not contingent -- as in simultaneous conditioning, where the CS cannot predict the US because the two stimuli occur simultaneously.
Conditioning is actually inhibited in backwards conditioning, where the CS occurs close in time to the US, but the CS actually predicts the absence of the US.
Conditioning is also inhibited in extinction, where the CR no longer predicts that the US is forthcoming. In "extinction below zero", the conditioned inhibition is strengthened even further.

According to conventional S-R learning theory, associations are formed by virtue of the spatiotemporal contiguity between events in the environment, stimuli and responses, or actions and their outcomes. That is to say, associations are formed between two elements that occur closely together in space and time. However, an increasing body of evidence, including the outcomes of various classical-conditioning paradigms, indicates that contiguity is not the important element in learning. Rather, the important element is contingency: the degree to which one event (etc.) predicts another (etc.).

Put another way, conditioning occurs when the CS acts as a signal that the US is forthcoming. In backwards conditioning, however, the CS signals that the US is not forthcoming. In backwards fear conditioning, the CS actually serves as a safety signal -- informing the animal that the shock will not be forthcoming for a while. The CS has value as a signal only when there is a contingent relationship between the CS and the US, regardless of whether the CS and US are temporally and spatially contiguous. The conclusion is that contingency is more important than contiguity: conditioning occurs only when the CS predicts the US. When the CS is uninformative about the US, no conditioning occurs. And when the CS predicts the absence of the US, as in extinction or backwards conditioning, the CR is actually inhibited.

The Rescorla Experiment

A compelling demonstration of the role of contingency in classical conditioning was provided in a classic experiment by Robert Rescorla (1967), for his doctoral dissertation at the University of Pennsylvania (after many years at Yale, Rescorla returned to his alma mater in a faculty role). In this experiment, Rescorla varied the predictability of a shock US, given the presentation of a tone CS.

In one condition of the experiment, the CS was a perfect predictor of the US, in that the CS always immediately preceded the US (that is, within 1 second or so). No CS was ever presented that was not immediately followed by a US; and no US was ever presented that was not immediately preceded by a CS. Thus, expressed in terms of probabilities:

p(US \| CS) = 1.0; and	[Read this as "the probability that the US will occur given the prior occurrence of the CS is 1".]
p(US \| no CS) = 0.0.	[Read this as "the probability that the US will occur given no prior occurrence of the CS is 0"]

This condition resulted in very good conditioning.

In another condition of the experiment, the CS was a less-than-perfect predictor, because Rescorla interspersed a number of unreinforced CSs -- that is, CSs that were not immediately followed by USs. Thus, of all the CSs that were presented, half were not followed by USs. However, the US never occurred unless it was immediately preceded by a CS. Again, expressed in terms of probabilities:

p(US | CS) = 0.5 and p(US | no CS) = 0.0.

This condition still resulted in fairly good conditioning.

In a third condition of the experiment, the CS rendered ineffective as a predictor of the US, because Rescorla interspersed a number of unsignalled USs -- in fact, half of the USs -- USs that were not immediately preceded by CSs. Now, the situation was that CSs and USs occurred randomly, independently of each other. Expressed in terms of probabilities:

p(US | CS) = 0.5 and p(US | no CS) = 0.5.

Under these conditions, no conditioning occurred, even though the CS and US were frequently presented together in the same place at the same time.

The upshot of Rescorla's experiment, which stands as a modern classic in psychology, is that conditioning is not simply the formation of an association between spatially and temporally contiguous stimuli. Rather, conditioning occurs only when the CS provides information about the US. The amount of information provided may be estimated as the difference between two probabilities:

p(US | CS) - p(US | no CS).

In the first condition of Rescorla's experiment, this difference is 1.0, and results in good conditioning.
But in the second condition, this difference is reduced to 0.5: still positive, thus resulting in conditioning, but not as high as 1, thus not as good as in the first condition.
In the third condition, the difference is 0.0, and no conditioning results.

Conditioning occurs only if, and to the degree that, the CS is a reliable predictor of the US. Put another, conditioning occurs only if the US is more likely following a CS than in the absence of the CS. What's amazing about this is that it appears that even organisms as simple as the white rat, or simpler, are in some sense computing the conditional probabilities involved. The computation is not necessarily conscious, of course -- the rats haven't taken Statistics 2, after all. But it is a computation nonetheless.

The Kamin Experiments

The importance of the predictive relationship between the CS and the US is underscored by two other phenomena discovered by Leo Kamin.

Kamin's first experiment concerned the phenomenon of overshadowing. Consider two standard conditioning preparations:

in the first, a bright light CS is followed by shock US;
in the second, a soft tone CS is followed by shock.

Both preparations yield good conditioning. And when we combine these two effective CSs into a single compound CS, bright light and soft tone, a compound presented simultaneously and followed by shock, just as Garcia did with his compound of "bright, noisy, sweet" water, what we find is good conditioning,

But what happens if, after we condition the organism to the compound, we test the two elements separately? When we do, we get a good CR to the light, but not to the tone. This is not a problem of differential preparedness, as in the Garcia experiment, because neither light nor tone is particularly prepared or contraprepared to serve as a signal for shock. Instead, once more, the result violates the assumption of association by contiguity. Both the light and the tone were equally contiguous with the shock. But it appears that the more salient, noticeable CS, in this case the bright light, overshadows the less salient or noticeable one. Both are contiguous with the shock, and both are good predictors of the shock as well, but conditioning occurs to the CS that is more salient.

The second experiment concerned the phenomenon of blocking. As background to this research, recall that in standard classical fear conditioning, a foot-shock US is preceded by a tone or light CS. Under these conditions, we get good conditioning of fear, as represented by such conditioned emotional responses as heart-rate acceleration, in response to previously neutral CSs.

We now give an animal acquisition trials with a compound CS, consisting of a tone and a light presented simultaneously, followed by a shock US in the usual manner. After 16 pairings of tone and light followed by shock, we test the animal's response to a variety of stimuli:

When we test the animal's response to the compound CS, we see evidence of fear conditioning, as expected.
When we test the animal's response to each element of the compound, presented individually, we also see evidence of fear conditioning to each of the elements presented alone.

But something different happens when the procedure is reversed, and conditioning trials with the compound CS are preceded by conditioning with only one element alone.

In Phase 1 of a blocking experiment, the animal receives 16 trials with an elementary CS, such as a noise followed by a shock; at the end of this phase, the animal will show a conditioned fear response to the tone.
In Phase 2 of a blocking experiment, the animal now receives 8 additional trials with a compound CS, in which the noise and light appear simultaneously, followed by shock.

What happens when we now test the animal's response to presentation of the light alone?

The first prediction is that the animal should now show fear conditioning to the compound CS. This does in fact occur.
However, the further prediction of association by contiguity is that light alone should now evoke the fear CR as well, because for eight trials it has appeared close together in space and time with the shock US.But, in fact, no conditioning accrues to the light. If we test the tone, however, the animal will continue to show conditioned fear.

Here are the actual results of some of Kamin's experiments.

When animals are conditioned to fear a noise, and then are tested with a light, there is little evidence of a conditioned response. After all, they've been conditioned to a noise, and don't know anything about lights.
When animals are conditioned to the tone/light compound, and then are tested with a light, they show a big conditioned response. After all, light has been paired with shock.
But when animals are first conditioned to the noise, and then receive further conditioning trials with the noise/light compound, testing with the light yields no conditioned response. It's as if they never received any pairings of the light and noise at all.

Apparently, the prior conditioning to the noise has "blocked" conditioning to the light. This surprising outcome is explained in terms of the information provided by the various CSs. In the case of the compound CS, the new element, light, is redundant with the noise. Expressed in terms of Rescorla's conditional probabilities:

p(shock | noise) = p(shock | noise + light) = 1.0.

Now, the outcome would be different under different conditions.

For example, if the light preceded the noise, which in turn preceded the shock, conditioning would accrue to the light as well as the noise: this is because the light predicts the noise which predicts the shock.
Similarly, if there was a change in the US, such as its latency or intensity, conditioning would also accrue to the light as well as the noise: this is because the light predicts this change, providing extra information about when it will occur, or how strong it will be.
Finally, consider an experiment in which the animal is conditioned to the noise, and then receives trials where the noise/light compound is not followed by shock. Ordinarily, unreinforced presentation of the noise would yield extinction of fear to the noise. But in this case, testing response to the noise alone yields a big conditioned fear response. The noise alone still predicts shock; the light, in combination with the light, noise predicts the absence of shock.

This leads us to a clarification of the principle of association by contingency:

Conditioning occurs only when the CS signals a change in the US.

Kamin concluded, further that conditioning only occurs when the US surprises the organism. In the presence of a surprising event, the organism then searches the environment for possible predictors of that event. Among these, it will pay attention to the most reliable predictor, which becomes the effective CS. If there is more than one reliable predictor, it will attend to most salient predictor, leading to the phenomena observed in the "overshadowing" experiment. And it will ignore stimuli that lack predictive power, leading to the phenomena observed in the blocking" experiment.

We can see now how habituation entails associative learning after all. In Kamin's view, the stimulus is novel, and so the organism searches the environment for anything that might be correlated with it. In the case, of shock, it might find that a certain tone is a reliable predictor of when the shock will occur. But now suppose, as in habituation, the organism is presented only with the tone. Over the course of habituation, the organism learns that the tone is associated with nothing -- nothing else (like a CS) predicts the tone, and nothing else (like a US) follows from it. It doesn't have any meaning, so the organism just stops paying attention to it.

Kamin's experiments are important because they simultaneously undermine three assumptions of classical S-R learning theory.

Because the elements of the compound CS are equally contiguous with the US, but differ in terms of the degree to which they predict the US, the assumption of association by contiguity must be wrong.
Because conditioning occurs only when the US surprises the organism, the assumption of the empty organism must be wrong: in order to understand conditioning, we must know what is going on in mind of the organism -- what it's expecting, and whether it's surprised.
And the assumption of the passive organism is also wrong: the surprised organism is actively searching its environment for predictors, and focusing its attention on some events to the exclusion of others.

Pretty good for one experiment. No wonder it's a classic.

Experimental Neurosis

The importance of predictability is vividly illustrated in some classic demonstrations of experimental neurosis in animals. In some early studies of discrimination learning, performed by Shenger-Kristovnikova in Pavlov's laboratory, dogs were conditioned to salivate to a circle or an ellipse, and then the axes of the stimulus were progressively changed, so that the circle became more elliptical, or the ellipse more circular. The result was that, at some point, the dogs became distressed -- seemingly anxious -- a phenomenon that became known as experimental neurosis. One explanation was that this increase in anxiety occurred because the animals could no longer predict the onset of the food US (Mineka & Kihlstrom, 1978).

Another illustration of the importance of predictability comes from the work of Norman R.F.Maier (1939, 1960) on frustration and fixation in discrimination learning. Maier employed an apparatus known as the Lashley jumping stand in which a rat was perched on a shelf and were forced (by a puff of air) to jump to one of two targets. If they chose the correct target (e.g., marked with a cross) a barrier fell and the rat could gain access to food. If they chose the incorrect target (e.g., marked with a circle), the barrier remained in place and the rat fell into a net. Like Pavlov's dogs, the rats learned to make this discrimination; but then, like Shenger-Kristovnikova, Maier made the discrimination increasingly difficult. At some point, the rats became fixated on one target or the other. Even when the discrimination was made easier, they persisted in making their original, maladaptive choice. In some cases, the frustration seemed to lead to epilepsy-like convulsive seizures. Maier's claim that frustration could cause convulsive seizures was disputed by Morgan & Morgan (1939), but later analyses showed that he was correct on that score, as well (Dewsbury, 1993).

Learned Helplessness

Similar considerations apply to instrumental conditioning. The behaving organism is searching for predictability, but it is also searching for control. It wants to know what to do about forthcoming events, not just where and when to expect them. In instrumental conditioning, the organism is acquiring these expectancies of control.

Recall that Mowrer's two-factor theory of avoidance learning, discussed above, predicts that avoidance learning will be facilitated if the organism has already undergone fear conditioning. The idea is that the organism already knows to fear the CS, and all it has to do is to learn to avoid the US.

The role of these expectations can be seen clearly in the phenomenon of learned helplessness, discovered by Martin E.P. Seligman, Steven Maier, and Bruce Overmaier when they were graduate students at the University of Pennsylvania, working under Richard L. Solomon. To test Mowrer's theory, they performed a number of experiments involving dogs engaged in avoidance learning in a shuttlebox.

In Phase 1, a dog receives classical fear conditioning trials in which a tone CS is paired with a foot-shock US, until the animal reliably shows a conditioned emotional response.
In Phase 2, the same dog is placed in a shuttlebox for avoidance training. This is a long box divided into two sections by a small barrier, over which the dog can leap. A tone comes on, followed by foot-shock delivered through the floor of one section. If the animal leaps the barrier to the other section while the shock is on, the shock will be terminated. If the animal leaps the barrier to the other section while the tone is on, the shock will be eliminated for that trial. Then the procedure is repeated for several more trials.

In standard avoidance-learning situations, without Phase 1, animals learn the avoidance response readily, and shuttle nonchalantly back and forth from one section to the other as tones come on. However, Overmeier & Seligman (1967) discovered that in their new situation, with Phase 1 inserted before Phase 2, avoidance learning was actually retarded.

The animals would passively accept the shock, stand or sit on the electrified grid, and show considerable signs of distress. In fact, the dogs looked somewhat depressed.
They made few inadvertent escape or avoidance responses, so they rarely received any reinforcement.

In a subsequent experiment, Seligman & Maier (1967) used a yoked-control design to insure that animals in the two conditions received exactly the same amount of shock. In each pair, one dog could escape shock while the other received the same amount of shock as the first dog, no matter what he did. In a subsequent avoidance experiment, the "escape" animals responded like controls who had received no pretreatment of any kind, while the "yoked" animals showed considerable evidence of learned helplessness.

Proper avoidance responding can be established in dogs who have been pretreated with inescapable shock, but only by forcibly dragging the dogs from one side of the shuttlebox to the other.

Why does this happen? Seligman and his associates reasoned that learned helplessness reflects the acquisition of negative expectations of control. In classical fear conditioning, the shock is both inescapable and unavoidable. Tone is followed by shock, and there is nothing the animal can do about it, because in classical conditioning reinforcement is not contingent on the subject's behavior. It is only contingent on the CS. Accordingly, the animal in such a situation acquires a negative expectation that nothing can be done about the shock. This negative expectation, in turn, generalizes to the avoidance learning situation.

Learned helplessness is significant because it may underlie certain forms of clinical depression. But it also has great theoretical significance, because it shows that instrumental behavior is determined by the organism's expectancies, not by environmental events.

Helplessness at the World Trade Center

In the aftermath of the terrorist attacks of September 11, 2001, emergency-service workers at the World Trade Center employed "search and rescue" dogs to locate victims, living and dead, who might have been buried under the rubble. These animals were trained through instrumental conditioning procedures to sniff out human bodies: basically, when they found a body they received a reward (a similar training procedure is used for the "drug-sniffing" dogs employed by the police). At the WTC, however, there were very few such bodies to be found -- not because there weren't any victims, of course, but because the victims' bodies had been pulverized into dust by the collapse of the building. As a result, the search-and-rescue dogs became obviously depressed -- because they were not able to do the job they were trained to do. In the language of learned helplessness, the animals were not able to engage in behaviors that controlled reward. In order to maintain the animals' motivation for the job, emergency-service workers would sometimes lie down in the rubble -- just to give the dogs somebody to find -- or, in the language of learned helplessness, to maintain their sense of control.

The bottom line is that conditioning is the wrong metaphor for learning. A better metaphor might be computing. The learning organism is trying to figure things out, and it does this by, in some sense, computing conditional probabilities.

In classical conditioning, the organism is learning to predict its world, by computing which events (CSs) predict other events (USs).
In instrumental conditioning, the organism is learning to control its world, by computing which actions (CRs) lead to desirable, and undesirable, changes in the environment.

Prediction and control. Conditional probabilities. Signals. Information. That's what "figuring it all out" is all about.

The Role of Reinforcement

A similar point can be made with respect to the role of reinforcement in learning. The conventional view, expressed in Thorndike's Law of Effect, which says that nothing is learned in the absence of reinforcement. In classical conditioning, the CS must be followed by a reinforcing US. In instrumental conditioning, the CR must be followed by reward or punishment. However, a number of experiments now make clear that reinforcement is not necessary for learning to occur.

More Vicissitudes of Classical Conditioning

Consider, for example, two phenomena of classical conditioning discussed earlier.

In sensory preconditioning, the CS1 elicits a CR even though it has never been paired with the US.
Similarly, in higher-order conditioning, the CS2 elicits a CR even though it has never been paired with the US.

In both instances, the animal has learned to respond to a stimulus even though its response to that stimulus has never been reinforced. However, we can explain these phenomena by an extension of the principle of association by contingency, which states that animals in conditioning experiments learn the predictive relationships among events in their environment.

In sensory preconditioning, the animal is learning not just that CS2 predicts the US, but that the CS1 predicts CS2. Therefore, by transitivity, the CS1 predicts the US as well.

In higher-order conditioning, the animal learns not just that the CS1 predicts the US, but that the CS2 predicts CS1. Therefore, again by transitivity, the CS2 predicts the US as well.

Latent Learning

A similar point is made with respect to instrumental learning by classic studies on latent learning performed by Edward C. Tolman of the University of California, Berkeley (after whom the Education/Psychology Building at Berkeley is named). Tolman's experiments involved a maze-learning procedure, in which hungry rats were placed in the start box of a maze, and food placed in the goal box. Over trials, the rats would learn, through trial and error, the route through the maze. In theory, these responses -- turn left here, turn right there, go straight, whatever -- were reinforced by the delivery of food in the goal box. Intuitively, this makes sense, but Tolman asked whether the reinforcement was really necessary for learning to occur.

According to UCB's Prof. Donald Riley, who was a student of Tolman's, Tolman used the same maze in almost all of his experiments on animal learning, from the 1930s up through the 1950s. It was designed to be extremely flexible, with doors and curtains that could block certain passageways, or obscure them from view; it was also automated, with a number of relays to control the animal's access to various portions of the maze. UCB Prof. Ervin Hafter says that the maze itself was designed by another UCB professor, Lloyd Jeffress. See also the article by M.H. Elliott in University of California Publications in Psychology (1928, 4, 20ff), which published much of Tolman's research -- including the famous Tolman & Honzik (1930) paper discussed below.

The experiment, by Tolman and Honzik, involved three groups of rats:

Group 1 was rewarded on every trial with food in the goal box. As expected, they showed a gradual reduction in errors.
Group 2 received no reward on any trial. They showed no reduction in errors, taking a relatively long time to make their way from the start box to the goal box on each trial.
For Group 3, reward was introduced on Trial 11, after 10 trials with no reward. These animals behaved, for the first 10 trials, like their counterparts in Group 2. However, on Trial 11, they showed an immediate reduction in errors, and subsequently behaved similarly to Group 1.

Tolman concluded that the animals in this group learned how to get from the start box to the goal box on the first 10 trials, but just needed a reason to do it. This reason was provided on Trial 11 and subsequent trials. In other words, Tolman's animals learned the maze without any reinforcement. Over 10 trials of exploration, they developed a "mental map" of their environment, which was subsequently available for use for a variety of purposes. However, they didn't perform a goal-directed response until the introduction of reinforcement established a goal.

Put another way,

Reinforcement controls performance rather than learning.

Curiosity and Intrinsic Motivation

A similar point was made in research on rhesus monkeys published in the early 1950s by Harry Harlow of the University of Wisconsin (later to become famous for his studies of "monkey love" and "motherless monkeys". In one set of studies, Harlow presented his monkeys with a wooden "puzzle lock" consisting of a series of latches which, when moved in the right order, would open a door. Some animals were rewarded with food (rhesus monkeys love Foot Loops) for making correct moves; others received no reward at all. Harlow observed no difference in the monkeys' problem-solving behavior. In fact, if they were hungry, hunger appeared to interfere with solving the puzzle. If they were not hungry, but were "rewarded" with food anyway, they usually stored the food for later consumption. Harlow concluded that the monkeys were simply curious about the puzzle. In his view, curiosity is an aspect of intrinsic motivation, or the desire to perform an activity without the promise or prospect of reward. This is not to say that animals are not also motivated by extrinsic considerations such as hunger and thirst, only that these are not the only rewards. Considering only extrinsic motivation such as hunger, Harlow's monkeys learned whether they were rewarded or not.

Here's another, more recent example. The Laboratory of Ornithology at Cornell University hosts a number of "bird-cams", where we can watch bids online. One of these views a site on the Hawaiian island of Kauai, where a number of Laysan albatrosses nest and raise their young. In 2015, attention focused mostly on a female albatross chick, Niaulani ("Niau", for short), whose nest was closest to the camera. As she matured, Niau discovered a lawn sprinkler, where she seemed to enjoy taking a shower every morning. As time went on, before she fledged, every morning she would climb a little mound in the nesting area wait for the shower to come on, and then play in the spray. The shower met no biological need, so it's not at all clear that it was reinforcing in the usual sense that food pellets are reinforcing for Skinner's pigeons. But the important point is that she knew where the shower would spray, and when it would come on, and in the early morning, before the sprinklers were turned on, she would go to the mound and wait in anticipation of its onset (see picture). She learned what was going to happen, where, and when, all without benefit of any obvious reinforcement. As far as we can tell, her behavior was intrinsically motivated. Link to YouTube video of Niau taking a shower.

Which reminds me of an episode from my own history as an undergraduate psychology major. I was a college student at a time when it was customary for psych majors to get their own white rat, or pigeon, to train in a maze or "Skinner box". Colgate, where I went to college, was no exception, except that the professor there who specialized in animal learning, Nicholas Longo, was a comparative psychologist who specialized in fish (he had been a student of Morton Bitterman, who studied learning in invertebrates). So I got a goldfish to train. In the experiment, which was devised by Longo, fish were trained to press a paddle with their snouts in order to deliver a small worm as a reward -- a pretty basic Skinnerian setup. At the same time, a small light would illuminate the fish's tank. Of course, the fish learned to press the paddle to get the reward. But for one group of fish, in a control condition, pressing the paddle simply illuminated the tank -- there was no worm. Still, the fish learned to press the paddle -- as indicated by increased response rate in the cumulative record of responses, and cessation of responding whenever the light was disconnected from the paddle. Again, however, there was no traditional reward -- no food or anything like it. Nor had the light previously been associated with food for this group of fish (they were in a control condition). In retrospect, my interpretation is that the fish, like Harlow's monkeys, just liked pressing the paddle and changing their environment. We know, from Seligman's work on learned helplessness, that animals learn to control their environment. And maybe the fish just pressed the paddle, and turned on the light, to relieve boredom!

Now, of course, you could always say that this just reflects a misunderstanding of the nature of reinforcement. Remember how Skinner defined reinforcement: reinforcement is anything that increases the probability of behavior. For Niau, the albatross chick, being bathed by the sprinkler increased the probability of mound-sitting-on behavior. And for Nick Longo's fish, the light increased the probability of paddle-pressing behavior. But that's circular: Why does something increase the probability of some behavior? Because it's reinforcing. And why is it reinforcing? Because it increases the probability of some behavior! Noam Chomsky, a linguist whom will meet again in the lectures on Language, pointed this out in his 1959 review of Skinner's 1957 book, Verbal Behavior. And he was right. In the 1930s and 1940s, Skinner's influence was such that people could ignore, or even discount, Harlow's, and even Tolman's, research. But in 1959, on the cusp of the Cognitive Revolution in psychology, people were ready for Chomsky's devastating critique of Skinner's whole system -- and especially its emphasis on the power of reinforcement.

Statistical Learning

The point of all of this is that organisms are built to learn from experience, and they do this naturally, in the ordinary course of everyday living, without requiring reinforcement, by computing the contingent probabilities among the objects and events that they observe in their environments. This learning mechanism is sometimes known as statistical learning, because the organism samples the environment and then makes probabilistic inferences about what is going on in it -- what are technically known as the transitional probabilities from one thing to another (Aslin & Newport, 2012).

Here's an example of statistical learning in the domain of language. As we'll see later in the lectures on Language and Communication, an early phase in language learning occurs when an infant learns to recognize the particular phonemes -- basic sound units -- and combinations of phonemes that occur in his or her native language. Saffran, Aslin, and Newport (1996) presented eight-month-old human infants with a steady stream of speech-like sounds consisting of four randomly ordered three-syllable nonsense words, such as:

pa bi ku go la tu da ro pi ti bu do go la tu ti bu do da ro pi pa bi ku pa bi ku da ro pi go la tu ti bu do.

Note that, in such a string, the transitional probabilities of syllables within words (e.g., pabi within the word pabiku is a perfect 1.0, while the transitional probability of syllables across words (e.g., tuda between the words golatu and daropi is only 0.33). They then tested the infants' recognition of individual worlds by presenting them with "legal" real words, like pa bi ku, and non-legal "part-words" like tu da ro.

How do you test word-recognition in infants? One way is to give them an artificial nipple to suck on, and measure the rate at which they do so: when they're surprised, they stop sucking for a moment. In this experiment, the infants were placed in front of a blinking light, and changes in their looking behavior were used as an index of surprise.

Anyway, the upshot of the experiment was that, after only two minutes of exposure, the infants were able to discriminate between legal and non-legal words. Learning occurred, a very sophisticated learning at that, just by listening to the audio stream, without any reinforcement at all.

Other experiments have shown similar learning effects with sequences of musical tones as well as syllables; and in the visual domain, as infants learned the spatial arrangements of shapes in scenes.

And it's been shown that statistical learning extends to neonates as well as to infants.

Moreover, infants can generalize from the stimulus materials to which they've been exposed to novel stimulus materials. For example, infants who have been exposed to one set of pseudo-words in a pattern such as dadapi or pabibi also recognized novel pseudo-words arranged in the same AAB or ABB pattern, such as kikino or golala. In other words, they acquired something like a concept or a rule that went beyond the specific instances to which they had been exposed to cover novel elements or combinations of elements.

In statistical learning, infants are doing exactly what Pavlov's dogs and Thorndike's cats and Rescorla and Kamin's rats were doing: learning the structure of the world, acquiring expectations about what goes with what and what is going to happen next, simply through observation.

The Bottom Line on Reinforcement

Learning occurs naturally in most behaving organisms. Some species are so well adapted to their environmental niches, and their environmental niches are so stable, that they have little need (or opportunity) to learn much more than where they are likely to find food. For other species, a capacity for altering behavior through learning is itself an important adaptation. Through the experience of various contingencies, organisms acquire information about events in their environment, and about the outcomes of behavior. Reinforcement merely motivates the organism to act on what it learns, in order to achieve certain outcomes, and avoid others.

Reinforcement plays a particularly limited role in language learning. Babies do not learn their native language through trial and error, mediated by reinforcement. Rather, they simply pick up language by being exposed to it. Human babies seem to be innately programmed to learn natural language, merely through exposure to a linguistic community.

Expertise

Reinforcement may not be necessary for learning, but practice is. Hardly anything is learned in a single trial, and that is especially true for complex motor and cognitive skills like learning to play a musical instrument or reading music. In a famous paper, Anders Ericsson and his colleagues (1993), interviewed musicians and determined that, by age 20, the best violinists had engaged in deliberate practice for a cumulative amount of more than 10,000 hours, compared to 7,800 hours for merely "good" violinists, and 4.600 hours for the least-accomplished group. Assuming that they began playing the violin at 5 years of age, that comes to more than 666 hours per year, or about an hour per day, every day, week in an week out. Findings such as these led Eriksson (2007) to conclude that "extended and intense practice" was the feature that most distinguished elite performers from "normal adults". Eriksson's research, in turn, formed the basis of the 10,000 Hour Rule" popularized by Malcolm Gladwell in his book, Outliers (2008). That is, it appears to take about 10,000 hours to become an expert at something. And indeed, when you examine the histories of elite performers, 10,000 hours seems about right -- the equivalent of about 250 40-hour workweeks.

A dramatic example of the power of practice is provided for the Polgar Sisters, Susan, Sofia, and Judit, who all became excellent chess players. Before the Susan, the oldest, was born -- or even conceived, or for that matter before her parents were even married! -- her father-to-be announced to his bride-to-be, on their very first date, that he would raise his children to be geniuses. At the age of three, mostly by happenstance, Susan expressed an interest in a chess set, and so the three girls practiced chess, as much as eight hours a day, In the end, Susan became the first female grandmaster (and now makes a living as the coach of the chess team at Webster University); her younger sisters also achieved grandmaster status, and Judit, the youngest sister (who, perhaps not coincidentally, had the benefit of her older sister's expertise), is generally considered to be the best female chess player in history.

But it's not just practice. The Polgar sisters were raised in a family environment characterized by a great deal of love and support as well as disciplined practice, and this family support also appears to be important.

Of course, talent matters, too. A twin study by Mosing et al. found that individual differences in musical ability -- defined as the ability to make subtle discriminations of pitch, rhythm, and melody -- had a substantial genetic component, accounting for about 50% of population variance (for more on how such calculations are made, see the lectures on Psychological Development). Most of the remaining variance was accounted for by the nonshared environment. Somewhat surprisingly, Mosing et al. reported that music practice had no effect on musical ability. That is to say, there was no difference in test performance between monozygotic twins who differed in the amount of musical practice (e.g., between two twins, one of whom became an orchestra musician, and the other of whom became a brain surgeon). Interestingly, Mosing et al. also found a substantial genetic contribution to the amount of practice that their subjects engaged in, explaining about 69% of population variance.

Still further doubt on the 10,000 Hour Rule was cast by a meta-analysis of studies of expertise by a meta-analysis of expertise studies by Macnamara et al. (2014). These investigators surveyed a large number of studies of the effects of practice on skilled performance, covering games, music, sorts, education, and professional activities. Across 88 studies involving more than 11,000 subjects, they found that the average correlation between deliberate practice and performance was .35, explaining about 12% of total variance. This outcome, they claim, is inconsistent with Ericsson's claim that individual differences in performance are mostly explained by individual differences in practice.

It has to be said that the claim that practice has no effect on expertise, and that all the action is in the genes -- which is what Mosing et al. expressly state in the title of their paper -- is implausible on the face of it.

In the first place, Mosing et al. assessed expertise in musical perception, not in their subjects ability to sing or play an instrument. It is one thing to have an innate "ear" for pitch, melody, and rhythm, which is what they tested. It is another thing entirely to have innate "fingers" for the violin or clarinet.
And, for that matter, Macnamara et al. didn't take account of either talent or expertise. Nobody ever said that, just by virtue of practice alone, one could be concertmaster of the San Francisco Symphony. And, for that matter, one can still be a pretty good violinist, but still not perform at that level.
One can say, without fear of contradiction, that the ability to play the violin or the clarinet is not innate. The oldest known violin dates to 1555, and even if you take into account its ancestors, such as the Middle Eastern rebec or even the ancient Greek lyre, that's simply not enough time for a "violin" gene to have evolved. The clarinet is of even more recent vintage, even if you count its heritage in the recorder or the ancient flute.
Anyone who's ever played a musical instrument, or sung seriously (i.e., outside the shower, church, or karaoke bar) knows that it takes practice. Maybe that practice builds on some innate digital agility, or something like that. But at the very least, you've got to learn the fingerings, and the fine points of technique. And that takes practice -- probably about 10,000 hours worth, if you want to become good.
None of this means that practice is all there is, and that anyone can become an expert at anything, so long as he's willing to put in the time. But again, if you're willing to put in the time, there's no reason why you couldn't become pretty damn good.

Perceptual-Motor Learning

The development of expertise can be illustrated by the development of motor skill, such as grasping an object, descending a flight of stairs, playing a musical instrument, or shooting a basketball. The process, as outlined by Fitts and Posner (1967), the acquisition of a motor skill (or, for that matter, probably mental skills as well) comprises three basic stages:

Cognitive: Understanding the requirements of a task and how to approach it.
Associative: learns the movements required to perform the desired action;
Autonomous: an extended period of practice during which the motor skill is automatized, and the person doesn't have to think about it any longer.

The general idea is that the skilled performance starts out as conscious and deliberate, and ends up as unconscious and automatic. A familiar example is learning to drive a standard-shift car. At first, you're conscious of stepping on the clutch, shifting from neutral into first, then easing up on the clutch while gradually accelerating -- and then repeating the whole process again when shifting from 1st to 2nd gear, from 2nd to 3rd, 3rd to 4th, and so on. Then, after a while, it all happens automatically, without much thinking at all. Maybe if you've parked on a hill, or in a tight parking spot, you have to think about what you're doing. But mostly not, once the skill has been automatized.

As a motor skill is automatized, control gradually shifts from the cerebral cortex to the cerebellum.

A similar model was proposed later by Stuart and Hubert Dreyfus, two brothers who taught at UCB (1980; see also their 1986 book, Mind Over Machine).

A good laboratory model for perceptual-motor learning is visual adaptation, a popular experimental paradigm modeled on prism-adaptation research pioneered by George Stratton, founder of UCB's psychology department. Stratton fashioned eyeglasses (actually, a monocle, because a pair of eyeglasses holding the prisms proved to be too cumbersome) that inverted left and right and up and down. In this manner, an object which appeared in the lower left quadrant of his visual field was actually in the upper right, and so on. Stratton then studied his ability to move about, and reach for objects, in this altered visual world. With time, Stratton was able to make a remarkably good adjustment to these changed visual circumstances.

Interestingly, neurological patients with damage to the cerebellum are unable to adapt in this manner. UCB's Prof. Rich Ivry (who, not coincidentally, was a student of Posner's) and other researchers on visual adaptation suggest that the cerebellum is involved in the construction of an efference copy of successful movement -- such as reaching for a coffee cup, descending a flight of stairs, or shooting a basketball. When a subject makes a movement, feedback from the sensory system is compared to this efference copy. Any discrepancy between ideal and actual generates a sensory prediction error and instigates corrective activity. The result is a constant updating of the efference copy, as the person masters the motor skill. All of this goes on unconsciously, mediated by the cerebellum rather than the cortex.

Interestingly, this unconscious cerebellum-based learning can conflict with the conscious control of action. For example, subjects in a prism-adaptation experiment can consciously redirect their pointing activities to adjust for the displacement by the prism. But over time, they actually get less accurate -- one of the few instances where "practice makes imperfect"! Apparently, this is because the unconscious learning, mediated by the cerebellum, conflicts with the subjects' conscious strategies.

If subjects are told that the prism is displacing the object by 45º, then can deliberately adjust their motor activity so that they point accurately, even on the first trial.
But over subsequent trials, they get less accurate, so that after 80-100 trials, they are missing the target by about 25º.

Apparently, the cerebellar learning process is independent of the cortical one, and eventually the former wins out over the latter. Again, patients with cerebellar damage don't have this problem -- which underscores the role of the cerebellum in unconscious perceptual-motor adaptation. By the same token, patients with frontal-lobe damage perform worse than neurologically intact subjects. Apparently, the frontal lobe exerts some degree of counter-control over the cerebellar process, without which the cerebellar process runs unchecked.

Social Learning

Usually, we think of learning as entailing the direct experience of environmental events, organismal responses, and their outcomes. In classical conditioning, Pavlov's dog gets the food after hearing the bell. In instrumental conditioning Thorndike's cats get freedom after pressing the latch. But can animals learn from the experience of other animals? This is the question of vicarious or observational learning.

Observational Fear Conditioning

The phenomenon of observational learning was first demonstrated convincingly in the laboratory by Susan Mineka, who was then at the University of Wisconsin (she is now at Northwestern University), in a study of snake fear in rhesus monkeys.Rhesus monkeys born and raised in the wild are universally afraid of snakes. This is quite adaptive: after all, the monkeys live in an environment where there are lots of deadly snakes, vipers as well as constrictors. Therefore, traditional theory has held that the fear of snakes in rhesus monkeys is innate, programmed by evolution in much the same way that instincts are. The only problem with the theory is that monkeys who are born and raised in laboratory conditions do not fear snakes. When exposed to a snake, they show no signs of fear. Therefore, it seems that snake-fear must be acquired through experience. But, if you think about it, it's not entirely clear how you learn from experience to fear a deadly snake. Because after the first encounter, you're dead (snakes are like that). Therefore, Mineka proposed that monkeys acquire their fear of snakes vicariously, from observing the reactions of other monkeys when they encounter snakes. Thus, snake fear is not innate, but a learned part of what might be thought of as "monkey-culture".

Mineka conducted an ingenious series of experiments to investigate the social learning of snake fear in rhesus monkeys. For her test of fear, she employed a piece of equipment known as the Wisconsin General Test Apparatus (WGTA), in which the monkey is seated in a restraining chair, something like a baby's high chair, while being presented with various stimuli and making responses. Mineka offered the monkeys a highly desirable food treat (Fruit Loops are dandy for this purpose), but in order to obtain the treat it had to reach past a snake or some other object. Response latency, or the time it took the animal to reach past the object, was the measure of fear: the longer the latency, the more fear.

Mineka's initial study compared monkeys reared in the wild and in the lab in their response to various test stimuli such as real, toy, and model snakes (the real snake was a small boa constrictor), black and yellow cords, and a painted wood block. As expected, the wild-reared monkeys were more afraid of the snakes than were the lab-reared monkeys.

For her first vicarious conditioning study, Mineka paired a (snake-phobic) wild-reared adult with a (non-snake-phobic) lab-reared adolescent (in her first study, the adult was actually the parent of the adolescent).

She pretested the adolescent in another apparatus, known as the Sackett Circus (after Gene Sackett, the researcher who invented it), which is a chamber with four compartments. Three of these compartments contained a real, toy, or model snake. The fourth compartment contained a wood block. The wild-reared adults avoided the compartments with snakes, but the adolescents were indifferent to them.

Then the adolescent was allowed to observe, for the first time, the reaction of the adult to a snake presented in the WGTA. After exposure to the fearful adult, the adolescents now behaved very differently. Now they strongly avoided the snake compartments.

In other words, the adolescents learned to fear snakes -- not from having unpleasant experiences with snakes themselves, but merely from watching an adult react negatively to them. They learn, from observing other monkeys behave fearfully, that snakes are things to be feared. One is reminded of "You've Got to be Carefully Taught", from the Rogers and Hammerstein musical South Pacific (1949). During World War II, Lt. Joe Cable has come to the base to conduct an espionage mission against the Japanese forces on a neighboring island. He falls in love with Liat, the daughter of Bloody Mary, but despairs of gaining acceptance for their biracial love back in the United States:

You've got to be taught to hate and fear
You've got to be taught from year to year
It's got to be drummed in your dear little ear
You've got to be carefully taught

You've got to be taught to be afraid
Of people whose eyes are oddly made
And people whose skin is a different shade
You've got to be carefully taught

You've got to be taught before it's too late
Before you are six or seven or eight
To hate all the people your relatives hate
You've got to be carefully taught
You've got to be carefully taught

Link to a recording of William Tabbert singing this song, from the original Broadway cast.

Actually, it turns out that you don't have to be carefully taught. Kids will pick up on their parents' likes and dislikes, prejudices and aversions, just by observing their behavior. But "careful teaching" does make the process of social learning go faster.

Mineka performed a number of variants on this basic experiment, with increasingly sophisticated methods, to explore the parameters of observational conditioning.

She discovered that she could also obtain vicarious learning when an unrelated adult served as the model for the adolescent.
And she discovered that prior benign experience with snakes could immunize adolescents against the effects of later vicarious exposure.

In her most fascinating experiment, Mineka discovered that, despite the central role of vicarious experience, observational learning was also constrained by preparedness. In this study, she modified her apparatus, employing mirrors and video so that she could independently vary what the model and the target see. For example, an adult model might see a snake, and react fearfully, while the adolescent sees a flower rather than the snake. From the adolescent's point of view, then the adult is reacting fearfully to the flower, not the snake. Will an adolescent who sees such a thing subsequently show fear of flowers?

The answer is no: Vicarious fear conditioning occurs only to snakes and snakelike objects. It does not occur to the flower.

Snake fear in rhesus monkeys is not innate, but it does appear to be highly prepared, so that it can be acquired with little vicarious experience.
Flower fear in rhesus monkeys, if indeed there is any such thing, is unprepared, or perhaps even contraprepared -- acquired with difficulty, if at all.

Some independent evidence for the preparedness of snake fear comes from a study by Shibasaki and Kawai (2009), who taught lab-reared macaque monkeys (who, of course, had never encountered snakes before) to detect snakes in an array of flowers, or flowers in an array of snakes. The animals learned both tasks, but they were significantly faster at finding the snakes. It's as if their brains were already wired to pick up on snake-like stimuli.

Vicarious or observational learning is fascinating, but it is also theoretically important, because it is another instance of learning in the absence of reinforcement. That is, the animal learns, even though reinforcement is provided to the other animal.

Link to Dropbox URL where you can view or download a video of Mineka's research on observational learning in rhesus monkeys.

Language Acquisition

In humans, perhaps the most powerful and dramatic example of observational learning occurs in the domain of language. By the time they are 4 or 5 years of age, every normal human child has become a fluent speaker of his or her native language -- that is, whatever language the child's parents and others speak in his or her presence.

Beginning at birth, and perhaps even in the womb, the infant learns to detect the particular sounds of his or her native language, and how they are combined to form words.
Before an infant can walk, he or she will be able to recognize many words. Toddlers, aged 12-24 months, learn about three new words per day; preschoolers about 5-8 new words a day; older children and adolescents about 10-15 words a day. By the time children are 5 years old they will have a vocabulary of about 10,000 words; this grows to about 70,000 words by adulthood.
In late toddler-hood, children begin to string words together to form sentences. And the sentences get longer and more complex. Again, by the time the child is 4 or 5 years old, they are virtually complete masters of the grammar of their native language. But even before they can speak long, complex sentences, they can understand them when they are spoken to by others.
The learning of words and their meanings (which we call the semantics of language), and of the grammatical rules that string words together (which we call the syntax of language) actually feed off each other, so that children use their knowledge of words to infer grammatical rules, and use their knowledge of syntax to learn what new words mean.

The acquisition of language occurs effortlessly, and it occurs without reinforcement, before they are ever formally taught the rules of grammar in elementary school (which used to be called "grammar school", after all), and get graded for learning them. It all happens by the child hearing spoken language, and connecting what is said to what is going on in the world around them. In this sense, language acquisition is a lot like Tolman's latent learning.

By contrast, even our closest primate relatives, chimpanzees, have no ability to learn language. They may learn some "words" in the form of symbols, spoken or visual, that represent things like bananas. But even after years of effortful training, they have essentially no ability to use syntactical rules to form and understand meaningful sentences. When it comes to language, the "smartest" chimpanzee can't hold a candle to the dullest human 5-year-old.

In fact human language learning is so effortless and automatic that many linguists speculate that there is an innate capacity for language -- a "language acquisition device" that is a product of evolution, and which is a unique feature of human nature. Knowledge of English or Chinese or Swahili or Farsi isn't innate, but the mechanism that allows children to learn these languages does appear to be.

Put another way, language acquisition is highly prepared in humans. Just like rhesus monkeys are highly prepared to learn to fear snakes, so human beings are prepared to learn language. In chimpanzees, the best we can say is that language learning is unprepared, and it may even be contraprepared -- which is why chimpanzees can't learn syntax no matter how much training they receive.

Social interaction is critical to language acquisition: without models. Not only does the child require exposure to spoken language (and thus to the people who speak it), but the child needs to be exposed to what others are doing, and looking at, when they speak. You can't just play a CD of spoken English under the child's crib and expect it to learn semantics and syntax (though it will learn the basic sound patterns). The child has to interact with other people. And these people don't even have to speak. Deaf children whose parents and teachers use sign language, will effortlessly pick up the semantics and syntax of sign language, just like hearing children pick up whatever language their parents speak.

And this interaction has to occur within a particular interval of time -- roughly, before the onset of puberty. "Wild" children, who are raised in isolation from others until they reach adolescence, never really "get" language. Within the more normal range of human experience, children who are raised in a bilingual environment -- say, with parents who speak both English and Spanish -- will effortlessly learn both languages, and speak both without an accept. But if the learning of one language is delayed -- say, until high school or college -- it is very hard to gain facility in the second language, and the person is likely to speak it with a decided accent. So, as with imprinting, there appears to be a critical period in language learning.

The capacity to learn language appears to be innate, a gift of human evolution. And there is a critical period in language learning. But despite this innate component, language acquisition requires exposure to a linguistic environment. In this sense, it fits the true definition of learning as a change in knowledge that occurs as a result of experience. And instead of being taught deliberately, through the direct experience of rewards and punishments, is occurs vicariously -- just by virtue of observation, without any particular reinforcement.

Social Learning and Imitation

As language acquisition illustrates, observational learning is particularly important in humans. If you think about it, we do not learn all that much through the direct experience of trial and error, reward and punishment. Rather, most of our learning comes through interactions with others. To take a somewhat extreme example, physicians don't learn how to perform surgery by trial and error. Rather, they learn surgery by watching experienced surgeons perform, and by being taught by them. When a surgeon takes a scalpel to his or her first patient, he or she already knows what to do and how to do it.

Albert Bandura, of Stanford University, argues that human social learning takes two forms:

Learning by Example, in which we model our behavior on that of other people -- much like Mineka's rhesus monkeys.
Learning by Precept, in which we are deliberately taught by other people -- the kind of learning that goes on in school and college.

Language plays a particularly important role in learning by precept, as it provides a very flexible, efficient way of communicating our thought and knowledge to others. Humans have a far greater capacity for language than any other species, and so it is not surprising that so much of our social learning is accomplished through language.

Consciousness also plays an important role in learning by precept. To deliberately teach someone something presupposes that you are aware of it yourself. Without conscious awareness, there could be no conscious intent, and so no sponsored teaching of the sort that is critical to learning by precept.

Although most studies of learning performed before 1950 employed lower animals such as rats, dogs, and pigeons for subjects, the ultimate object of inquiry was humans. The major theories of learning assumed, explicitly or implicitly, that the same principles of learning adduced to explain simple behavior in these species would also be found relevant to complex human behavior. This program of application to the human case was pursued most prodigiously by B.F. Skinner, in his analyses of personality and social behavior (1953) and language (1957). According to Skinner, human behavior is performed under the conditions of stimulus control. Rather than focusing on internal dispositions such as traits and motives, or cognitive constructs such as expectation, a proper analysis of personality will focus on the individual's reinforcement history, as well as on discriminative stimuli and reinforcement contingencies present in the current environment. Human behavior is complex only insofar as the stimulus conditions in which it occurs are complex.

Other investigators also took up the Skinnerian program. For example, Staats and Staats (1963) attempted to apply the principles of learning to problems in personality, motivation, and social interaction, among other topics. Their work is not exactly Skinnerian in nature, because it attempts to come to grips with certain aspects of language that are outside the scope of Skinner's analysis. Nevertheless, the list of psychologists whom they cite as the inspiration for their efforts begins with Skinner, and includes most of major figures identified with the behaviorist analysis of learning. Staats' most recent statement of his theory, in fact, is entitled Social Behaviorism (1975).

At the same time, it became clear that certain aspects of complex human behavior resisted conventional behavioral analysis. As one example, already discussed, language does not seem to be acquired through the principles of conditioning and reinforcement that are central to behaviorist analyses. The same is true of many human social behaviors. The problem of accounting for learning without direct experience of reinforcement ultimately lead to the development of a different cognitive theory of personality: cognitive social learning theory.

A step in this new direction was taken with the social learning theory of Miller and Dollard (1941). According to Miller and Dollard, personality consists of habits formed through learning. The learning process, in turn, is described in terms of a version of S-R learning theory proposed by Clark L. Hull. According to Hull, a habit represents a strong connection between some stimulus and some response. This association is acquired by virtue of drive-reduction: in the presence of the stimulus, the behavior has led to the satisfaction of some drive (you can see the connection to Thorndike's Law of Effect).

Although Hull conceived of these drives as biological in nature, Miller (1951) later added concept of acquired (or secondary) drive. That is, through conditioning some external stimuli come to possess some of the properties of an internal drive state. For example, while fear is an innate drive, elicited by noxious stimulation, it can also be conditioned to previously neutral stimuli. Habits can be learned because they lead to fear reduction (a primary drive), and also because they eliminate fear stimuli (secondary drives). Drive-reduction theory thus provides the basic elements of personality viewed as a system of habits, in the form of principles of learning. A drive is any need which activates behavior. It can be innate, or it can be acquired through experience. However, drive itself does not give any particular direction to behavior. This directionality is given by the operation of other principles. Hull's theory, like Freud's, assumes that people are motivated to maintain homeostasis, eliminating states of tension. Drive-reduction serves to reward behavior.Responses are behaviors that lead to rewards. Finally,cues are stimuli that determine the selection of responses. Thus, personality can be viewed as a system of habits acquired and maintained through drive-reduction. Individual differences in habitual responses to environmental stimulation comprise the whole of personality.

Miller and Dollard argued that in order to understand human personality, it was necessary to understand the principles of learning. However, because the habits that comprise personality are social behaviors, it is also important to understand the social circumstances in which that learning takes place. Thus, Miller and Dollard called their approach social learning theory. In this regard, it is interesting to note that the theory represents the collaboration between Miller, a psychologist, and Dollard, a sociologist. Thus, personality becomes an interstitial field, combining different levels of analysis.

Like Skinner's stricter behavioral approach, social learning theory as stated would seem to imply that the person must have direct experience with reinforcement in order to establish habits. As noted, this is unlikely to be the case. In order to cope with this problem, Miller and Dollard postulated a drive of imitation. Imitation is a process by which similar actions are performed by two individuals in response to appropriate cues. At the start, imitation is a behavior which can be reinforced by the environment, just as other behaviors are. When rewarded regularly, however, it takes on the properties of an acquired drive. Thereafter, the individual is motivated to imitate the behavior of others -- to copy their behavior in order to obtain the same rewards that they receive from their actions. Imitation is widespread because the culture reinforces it strongly, as a means of maintaining social conformity and discipline. For this reason, although imitation is an acquired drive (and therefore optional in principle), it is almost a necessary consequence of socialization.

Miller and Dollard discussed two principal forms of imitation. In both forms, one person matches another's behavior.

In matched-dependent behavior, however, only the model recognizes the cues that elicit the behavior. A good example is crowd behavior, where people engage in certain actions (like applause or yelling) simply because other people are doing so, without knowing why.
Copying is a much more deliberate act, in which one person consciously conforms his or her behavior to that of another person. This entails awareness of the cues that elicit the behavior of the model. Imitative behavior is central to social learning, and thus to personality. It is readily observed in even the youngest children, and indeed whenever one person possess more authority or knowledge than another. Imitation, especially matched-dependent behavior, is the chief means by which patterns of behavior are passed from one individual to another.

Social Learning and Expectations

Although some social-learning theorists continued to embrace the tradition of functional behaviorism into the 1960s and 1970s the break from the behaviorist view of social learning was apparent in the Rotter's Social Learning and Clinical Psychology, which appeared in 1954 (see also Rotter, 1955, 1960; Rotter, Chance, & Phares, 1972). Where Staats and Staats (1963), writing almost a decade later, were still acknowledging the primary influence of Skinner and other functional behaviorists, Rotter (1954) acknowledged the influence of no behaviorists at all. Rather, he aligned himself with the dynamic psychologist Adler and the gestalt psychologists Kantor and Lewin (see also Rotter, Chance, & Phares, 1972, p. 1). From the beginning, Rotter intended his theory as a fusion of the drive-reduction, reinforcement learning theories of Thorndike and Hull with the cognitive learning theories of Tolman and Lewin. Although Rotter's version of social learning theory often uses behaviorist vocabulary, it is with a clear cognitive twist.

In the first place, Rotter is less interested in behavior than in choice, an internal mental state which obviously manifests itself in behavior. Rotter's cognitive-social learning theory employs three basic concepts:

Behavior potential is the probability of a particular behavior occurring in some situation, given the available reinforcement contingencies.
Expectancy is the person's subjective probability that a particular reinforcement would occur as a function of his or her engaging in some specific behavior in some specific situation.
Reinforcement value refers to the degree to which the individual would prefer some outcome above all others, provided that the probabilities of the outcomes were equivalent. These three terms are combined to yield the basic predictive formula (1954, p. 108):

Rotter's intellectual debt to the behaviorists is clear. Instead of predicting behavior in general, behavior is predicted only under certain conditions. When these conditions change, the behavior may likely change as well. Moreover, the behaviorist construct of reinforcement is central to his theory. However, Rotter's departure from the behaviorists is equally clear: whereas behaviorists such as Skinner hoped to dispense with mental constructs entirely, Rotter places them at the center of his theory. Although the behaviorists defined reinforcements objectively in terms of their effects on behavior (Thorndike's empirical law of effect), Rotter defines them subjectively: the value attached to any potentially reinforcing event is subjective, and one person's meat can be another person's poison. Moreover, whereas behaviorists defined reinforcement contingencies objectively, in terms of the contingent probability of the event given a particular response, Rotter clearly defines them subjectively, in terms of the individual's cognitive expectations. Finally, Rotter defined the situation in psychological terms, as it is experienced by the individual, and as the individual ascribes meaning to it.

Cognitive Social Learning Theory

Rotter labeled his approach a social learning theory, and employed some of the concepts and principles of reinforcement theory in it. Nevertheless, his approach is less a theory of learning than it is a theory of choice. That is to say, Rotter is primarily concerned with how expectancies and values govern the choices we make among available behaviors. However, the theory has relatively little to say about how those expectancies, values, and behavioral options are acquired -- except to say that they are acquired through learning. It remained for another social learning theorist, Albert Bandura (Bandura, 1971, 1977, 1985; Bandura & Walters, 1963) to add to the concept of expectancies an explicit theory of the social learning process. Like Miller and Dollard, Bandura stressed the role of imitation in social learning. However, his concept of imitation departs radically from theirs in that it no longer functions as a secondary drive. By emphasizing cognitive processes over reinforcement, observation over direct experience, and self-regulation over environmental control, Bandura took a giant step away from the behaviorist tradition and offered the first fully cognitive theory of social learning.

Bandura's behaviorist roots are seen most clearly in his earliest statement of social learning theory,Social Learning and Personality Development (Bandura & Walters, 1963). On the surface, this book seems to draw heavily on Skinnerian analyses of instrumental conditioning. For example, there is a great deal of attention paid to the role of reinforcement schedules in the maintenance of behavior. Bandura and Walters argued that most social systems operated on some combination of fixed- and variable-interval schedules of reinforcement. For example, Bandura and Walters argued that most social reinforcements are delivered on an intermittent schedule. For example, family routines such as dining, parent-child interactions, shopping trips, and the like occur in a relatively unchanging cycle. Insofar as these activities can take on reinforcing properties, then, they are delivered on a fixed-interval schedule: the child cleans his plate at dinnertime during the week, and then gets to sit on his mother's lap during the family television hour on Saturday night. Other social reinforcements, however, seem to be delivered on a variable-interval. When a child seeks her mother's attention, she may get immediately, or at some time in the future when her mother doesn't have her hands full. Still other situations seem to involve the differential reinforcement of high or low rates of behavior. If a father pays attention to his child only when she kicks and screams, he is virtually guaranteeing that she will misbehave when she wants attention.

For a number of reasons, Bandura and Walters argued, most social reinforcements are dispensed on complex schedules combining variable ratios and variable intervals. In some respects, this complexity reflects the unreliability of social reinforcement. Often, the reinforcing agent is simply not present when the target behaviors occur -- in such a case, reinforcement must be deferred to a later time. And because humans are not automated machines, they will sometimes simply fail to deliver reinforcements that are due. Perhaps more important, the complexity of social reinforcement schedules reflects the complexity of social demands. It is rarely enough simply to perform a certain social behavior: it must be done in a particular way. A child asked to set the dinner table will not be rewarded simply for piling dishes and utensils; the forks have to be on the left side of the plate, and the blade of the knife turned inward. As Bandura and Walters note, effective social learning entails both adequate generalization and fine discriminations.

Social learning is also complex because of the wide variety of factors that affect the effectiveness of social reinforcements. For example, Bandura and Walters noted that children with strong dependency habits (note the phrase) are more susceptible to social reinforcement. Moreover, the prestige of the reinforcing agent is important, as is the match between the person and the agent on such attributes as gender. The person's internal states of deprivation, satiation, and emotional arousal are also important. The point is that social reinforcement is complex but not chaotic or haphazard. Social behavior is maintained by virtue of schedules of reinforcement, even if the precise nature of that schedule is sometimes hard to discern.

Although Miller's theory gained impressive support from analyses of animal behavior, Bandura and Walters were critical of its application to the case of human social behavior. For example, they argued that deliberate social learning also played a role in displacement. Thus, parents often direct their children's aggressive behaviors towards some targets rather than others, and displacement itself is maintained by contingencies of reinforcement. Clear examples of this may be found in scapegoating and other examples of prejudice towards minorities and other out-groups. By and large, these sorts of aggressive behaviors are not simply selected by the vicissitudes of the generalization gradient. Rather, children get their prejudices from their parents: as Rogers and Hammerstein wrote in South Pacific, "You've got to be carefully taught" whom to hate and fear.

While agreeing on the importance of reinforcement in the control of behavior, Bandura and Walters differed most from their behaviorist predecessors over the manner in which behavior was acquired in the first place. Taken at their word, Skinner and other functional behaviorists actually appear to deny that new behaviors are learned at all. Rather, responses already in the organism's repertoire come to be elicited by certain environmental cues by virtue of the law of effect. What are acquired are new patterns of behavior, by virtue of shaping and successive approximations. That is, a piece of behavior is synthesized from more elementary behaviors already in the organism's repertoire. Bandura and Walters, while agreeing that shaping procedures can be effective, doubted that they were responsible for the acquisition of most complex human social behaviors. Like Miller and Dollard, Bandura argues that social learning is largely mediated by imitation.

On the basis of anthropological studies as well as informal observation, Bandura and Walters argued that socialization -- the acquisition of socially sanctioned beliefs, values, and patterns of behavior -- was largely mediated by imitative learning. In some cultures, for example, young boys and girls are provided with miniature replicas of the tools used by their parents, and they spend a great deal of time tagging along with their parents practicing their use -- thus preparing for their adult roles. Similarly, children in the United States (and other developed societies) are given toys that the child can use to imitate adult behavior. In this way, for example, children in all cultures acquire behaviors consistent with the occupational roles deemed appropriate by their culture for persons of their gender.

Gender-role socialization is far from the only example of learning by imitation. In some tribal cultures, children even obtain their sex education by watching adults engage in various aspects of mating behavior. Certain aspects of language acquisition, such as the meanings and pronunciation of words, are learned largely through observation and imitation of other people. In addition, certain complex motor and cognitive skills appear to be acquired in this manner. Medical residents do not learn to perform surgery through a trial-and-error process. Rather, they learn by watching skilled practitioners operate, and by reading about the procedures in textbooks. In a very real sense, a surgeon knows how to do surgery before he or she ever puts a scalpel to a patient -- that is, before there can be any direct experience of trial and error. On a more mundane level, driver education courses in high schools make sure that students have acquired basic skills in handling an automobile before they ever take to the road.

In tribal cultures, parents and older siblings are probably the models for most imitation. They are, after all, the primary agents of socialization. However, this purpose may also be served by exemplary models sanctioned by the parents: children are constantly being encouraged to emulate various national heroes and mythological figures, as well as the children next door. In technologically advanced societies, models for imitation are provided by books, television, movies, and other media as well as by real life. One of the sources of the constant controversy over children's television viewing concerns the kinds of models presented to children in cartoons and action series. A major function of written and oral language is this kind of cultural transmission. By virtue of linguistic communication, we can tell someone what to do in a particular situation -- describe the behavior, and indicate when it should be performed -- instead of letting the person discover the relations between cues, acts, and outcomes for him- or herself. For this reason, social learning by imitation is highly efficient. In a complex, highly developed society, it also seems necessary.

While agreeing with Miller and Dollard that imitation is an important source of social learning, Bandura and Walters took issue with the theory that imitation -- either as a general tendency or of a specific act -- is acquired through reinforcement. For example, developmental studies show that children imitate others before they ever are reinforced for doing so. Very young infants, up to about four months of age, engage in pseudo-imitation, in which they repeat some simple act (like babbling) displayed by their caretaker. However, this imitation will not occur unless the infant him- or herself had just recently performed the same act. Somewhat older infants will engage in genuine imitation of others, in circumstances where they have not just performed the same act themselves. The extent to which behavior will occur will depend on the degree to which the child's sensorimotor operations have developed. For example, children cannot reliably stick out their tongues in imitation of adults, until they have acquired some mental representation of their facial anatomy (Piaget, 1951; but see Meltzoff & Moore, 1977). Children are not reinforced for this: it simply happens, apparently as a reflection of an innate tendency to do so.

Even imitation of specific behaviors is not learned by virtue of reinforcement. The behaviorist model of imitation involves three elements: a discriminative stimulus (S^d) that serves as a cue, the response of imitating the model (R), and the reinforcing stimulus (S^r). By virtue of the law of effect, repeated reinforcement of the imitative behavior will make that behavior more likely to occur. However, a classic experiment on aggression by Bandura (1962) shows that this is not the case. Children watched a film in which a model displayed novel aggressive behaviors (that is, behaviors not previously in the children's repertoires) towards a "Bobo the Clown" doll. In one condition, the model was punished for this behavior; in another, he or she was rewarded; in a third condition, there were no consequences to the behavior of any sort. In a later test, children who viewed the punished model showed less imitative aggression than those who viewed the rewarded model; interestingly, those who viewed the unreinforced model displayed the same amount of aggression than those who saw the model rewarded. This first test was performed under conditions of no incentive. In a second test, the children were promised a reward for imitating the model: under these circumstances, the group differences disappeared. Thus, novel aggressive behaviors were acquired by the children even though they were not reinforced for imitating the behavior. However, the performance of these behaviors was under reinforcement control: those who saw the model punished were less likely to engage in the behaviors themselves, until instructed that the reinforcement contingencies had been changed.

In a later statement, Bandura (1977) argued that there are two forms of learning.Learning by response consequences is the kind of trial-and-error acquisition of knowledge familiar from the operant behaviorism of Skinner. However, this learning is given a cognitive emphasis. Direct experience provides information concerning environmental outcomes and what must be done to gain or avoid them. As a result, the person forms mental representations of experience that permit anticipatory motivation and behavioral self-control.Modeling involves learning through vicarious experience -- by observing the effects of other's actions. While a term such as "modeling" encompasses learning through example, Bandura also uses it to cover learning through precept -- deliberate teaching and learning, often mediated by linguistic communication.

Although Bandura goes beyond Rotter in discussing the process of social learning, his analysis of performance is similar to Rotter's in many respects. That is, Bandura agrees that the person's behavior is governed primarily by his or her expectancies concerning the future. Our responses to various situations are governed by information we possess concerning forthcoming events, and the outcomes of our actions. These expectancies are formed, respectively, through processes resembling classical and instrumental conditioning -- except that conditioning is given an active, cognitive interpretation as opposed to the conventional passive interpretation in terms of the laws of practice and effect. Moreover, conditioning is not the only -- or even the most important -- way that these expectancies can develop. Rather, they can be acquired vicariously through precept and example.

Expectations before the fact are, of course, subject to revision by the information gained subsequently. The actual consequences of an environmental event, for example, or of a person's actions, serve to confirm or revise the person's expectations. These consequences can be directly experienced by the person in question, or they may be experienced vicariously through observation or symbolic mediation. Moreover, in discussing the consequent determinants of behavior, Bandura stresses the role of aggregate as opposed to momentary outcomes. In his view, people are more influenced by what happens in the long run than by minor setbacks, delays, and irregularities. In large part, this is due to the cognitive capacities of humans, whose powerful memories permit them to transcend even long intervals, and integrate information from different points in time.

A unique feature of Bandura's social-learning theory is the active role played by the self. Behaviorist doctrine, of course, eschewed any reference to the self as an active organizer of experience or agent of action. Such talk was banned as mentalistic and ultimately beyond the pale of science. Insofar as the self was discussed at all, it was as (in Skinner's terms) a system of responses. As a cognitive theorist, however, Bandura (1977) permits the self to take an active, executive role in the regulation of behavior. In this way, the self plays a role as both an antecedent and a consequent determinant of behavior.

In the cognitive view offered by Tolman and by Rotter, outcome expectancies are vitally important determinants of behavior. That is, we tend to engage in behaviors that we expect will lead to outcomes we desire, and prevent outcomes we dislike. Bandura agrees that outcome expectancies are important. However, he has also added a new concept:self-efficacy expectations (Bandura, 1977, 1978). While it is obviously important that the individual expect that a particular behavior will lead to a certain outcome, it is equally important that the person have the expectancy that he or she can reliably produce the behavior in question. Note that the actual state of affairs is irrelevant here. It does not matter whether the person can, in fact, perform some particular action. What matters is whether the person thinks he or she can. Self-efficacy expectations are conceptually similar to the sense of mastery, and have important motivational properties, in that they determine whether the person will even attempt the behavior in question.

An example of self-efficacy can be found in the literature on learned helplessness. As a rule, dogs placed in a shuttlebox will acquire escape and avoidance responses fairly readily, shuttling back and forth in response to stimuli signaling forthcoming shock. However, dogs who have first received classical fear conditioning are retarded in learning escape and avoidance. In some instances, they simply sit and take the shock passively. Learned helplessness can also be produced in humans. For example, subjects who have been exposed to unsolvable anagram problems are retarded in completing subsequent problems that are solvable. Although the learned helplessness effect is quite complex, it appears to involve the subject's belief that he or she cannot master the situation. In fact, that is objectively not the case: the shock in the shuttlebox is avoidable, and the dog has in his repertoire the necessary behavior; the second set of puzzles is soluble, and the student has the intelligence to do so. Yet, experience has taught the subject to believe otherwise (if we can speak of beliefs in lower animals), and this belief controls behavior.

Self-efficacy can serve as an example of how antecedent expectations develop through social learning. Obviously, one source of self-efficacy is performance accomplishments: the personal experience of success and failure. Repeated failure experiences will lower the person's expectancy that he or she can effectively control outcomes. But the same sorts of expectancies can be generated through vicarious experience. Observing other people's success or failure will lead to appropriate expectations about oneself -- at least to the degree that one perceives oneself to be similar to those other people. But perceived self-efficacy can also be shaped in the absence of any experiential basis whatsoever, merely through verbal persuasion. A person who is repeatedly told that he or she is incapable of accomplishing some goal, especially if that information comes from an authoritative source, may actually come to believe it about him- or herself. Perceived self-efficacy can also change on a moment-to-moment basis, depending on the person's emotional state. Feelings of elation may increase feelings of mastery (sometimes beyond all reason, as in the megalomania of a manic patient), while anxiety or depression may reduce them. Finally, self-efficacy can vary from one situation to another. Even though a person has not encountered a particular problem before, he or she may have a high degree of self-efficacy if it closely resembles some other problem that the person has been able to master in the past.

Another way in which Bandura departs radically from the behaviorist analysis of social learning is by embracing the concept of self- reinforcement. Recall that Skinner objected to self-reinforcement on the ground that it was ineffective as a means of behavioral control. However, Bandura acknowledged that people can effectively regulate their own behavior in the absence of, or in opposition to, schedules of external reinforcement. For example, a run-of-the mill jogger can reward herself by finishing in the top half of a local road race, even though she will never get a medal for her performance. Alternatively, a college professor may feel remorse about flunking a student, even though he receives praise from his dean for upholding academic standards. It is so common to find writers, painters, and composers pursuing their own vision even though the are denied any professional recognition, that the image of the starving artist has become part of our cultural mythology. By means of goal-setting and self-reinforcement, people can free themselves from environmental control. This independence of the person from environmental control distinguishes Bandura's social learning theory from its behaviorist forebears.

In principle, self-reinforcement frees people from external control. As a practical matter, however, the essential first step in self- regulation, setting the standard, tends to be based on imitation. That is, we set standards for ourselves that a similar to those set for themselves by those we admire. These models may be our parents, teachers, or spiritual leaders. However, models may also come from other sources, such as books, films, and media. One important consequence of literacy, coupled with free access to books and magazines, is that we encounter potential models whose standards may be quite different from those whom we would otherwise meet. Modeling our standards on those individuals is another way in which we free ourselves from the constraints of our local social environment.

In addition to standard-setting, Bandura postulates three other component processes in self-regulation. The person must monitor his or her own performance, and evaluate it according to the standard set for him- or herself. The dimensions on which the performance is evaluated can vary widely, as can the precise standards. Very often, the individual will measure him or herself against actual or assumed population norms; or, some single individual will serve as the standard of comparison; in other circumstances, the standard will be set by the person's own previous behavior. It is important, of course, not to set standards that cannot be met. Research in a variety of domains, from academic achievement to weight loss, indicates that people should set goals for themselves that are clearly specified, and of only moderate difficulty. Vague or unambiguous goals, of course, are not goals at all. Setting an unattainable goal obviously has motivational drawbacks, while setting a goal that is too easy to accomplish will yield little or no satisfaction in its accomplishment. (It should be noted that the same considerations apply to goals set by others, as when parents enforce standards for their children's behavior.)

Once the evaluation has been made, the person will reinforce his or her performance appropriately. These rewards come in two forms, tangible and symbolic. The student who aces an exam may reward herself with a movie or punish herself by canceling a date; or she may just praise or censure herself. The effectiveness of self-praise or self-reproach, in the absence of tangible consequences, is currently subject to considerable debate. However, research clearly shows that people -- even young children -- who fail to meet their own performance standards will deny themselves reward. Apparently, such internal states as self-esteem and self-efficacy have their own motivating properties. While behavior that is controlled only by external contingencies will be unreliable in the absence of those contingencies, our selves are always with us. Thus, in principle self- reinforcement should lead to more effective behavioral regulation, because it is less subject to situational variation.

Moreover, human intelligence and consciousness permits us to project the consequences of our actions far into the future. Traditional behavioral theories, of course, assert that present behavior is under the control of past events, and that future prospects that have no parallel in the past are very weak determinants of behavior. However, this is clearly not the case. The emergence of political movements supporting environmental protection and nuclear disarmament are clear examples of the control of behavior by the future. We have had no experience of the greenhouse effect or nuclear winter, but the prospects of them in the future led us to try to protect the ozone layer, and reduce the number of nuclear warheads, today. The behaviorist analysis of future determinants is largely correct when it is applied to lower animals, with their limited cognitive capacities. Bandura's openness to such determinants is another mark of the extent to which social learning theory has embraced cognitivism, and abandoned its behaviorist roots.

Social Learning as the Cognitive Basis of Culture

Social learning is the cognitive basis of culture, which anthropologists define as the customary beliefs, social forms, and material traits of a racial, ethnic, or social group, transmitted through informal learning and formal training from one generation to the next. This inter-generational transmission cannot be accomplished through the genes: there is no inheritance of acquired characteristics. Instead, if must be accomplished by learning -- which is to say,social learning, through example and precept. It is through social learning, both informal modeling and in formal institutions (such as schools and libraries) organized for the purpose, that we pass down its knowledge, beliefs, and attitudes from one generation to the next. In this way, each generation builds on the advances made by those who went before, and doesn't have to start "from scratch".

Which raises the question of whether nonhuman animals have "culture" as well. Observations of animals behaving in their natural environment suggests that animals do indeed learn vicariously from observing the experiences of others, and in this respect possess sets of cultural traditions that are passed from one generation to the next.

Chickadees who watch another chickadee open a milk bottle learn more quickly to open it themselves.
Red squirrels who watch another red squirrel open hickory nuts learn to do that more quickly.
Israeli roof rats (no kidding -- that's a real species!) quickly learn how to open pine cones obtain the seeds inside, if they have the opportunity to watch an older roof rat do so.
Chimpanzees living in a rain forest in Cote d'Ivoire (the former French colony of Ivory Coast) employed a hammer-and-anvil system to crack the extremely hard shells of the panda nut in order to obtain the high-calorie kernel inside (Mercader et al.,Science 296:1452-1455, 2002). Using archeological methods, anthropologists discovered that this behavior had been going on for more than 100 years at a particular "anvil" site, to which the chimps brought both the nuts and rocks to be used as hammers. It takes the animals as long as seven years to learn how to crack a panda nut properly, but the important thing is that individual animals do not appear to start the learning process from scratch. Rather, the behavior is passed down, chiefly from mother to child, by a process of imitation, or vicarious learning by example. The proper nut-cracking technique has been observed only by some bands of West African chimps, suggesting that it is part of these groups' "ape culture", passed from generation to generation by social learning.

A) After filial imprinting on the costumed human pilot of a microlight aircraft, young cranes followed the flight path of this surrogate parent, adopting it as a traditional migratory route. (B) Female fruit flies (left) that witness a male marked with one of two colors mating (top right) later prefer to mate with similarly colored males. This behavior is further copied by others, initiating a tradition. (C) Bighorn sheep translocated to unfamiliar locations were initially sedentary, but spring migration and skill in reaching higher-altitude grazing grounds expanded over decades, implicating intergenerational cultural transmission. (D) Groups of vervet monkeys were trained to avoid bitter-tasting corn of one color and to prefer the other. Later, when offered these options with no distasteful additive, both naïve infants and immigrating adult males adopted the experimentally created local group preference. (E) Young meerkats learn scorpion predation because adults initially supply live prey with stingers removed and later provide unmodified prey as the young meerkats mature. (F) A humpback whale innovation of slapping the sea surface to refine predation, known as “lobtail feeding,” spread over two decades to create a new tradition in hundreds of other humpbacks.

Actually, that case has now been made convincingly by Andrew Whiten, a comparative psychologist at the University of St. Andrews in Scotland, from which the illustration above is taken ("The Burgeoning Reach of Animal Culture", Science, 04/02/2021). Whiten defines culture as "the inheritance of an array of behavioral traditions through social learning from others". I would add that we can be sure a species has culture when the behavior in question has been passed via social learning through three generations of individuals; is observed in distant relatives, or nonrelatives, as well as parents and offspring; and appears only in a geographically circumscribed group. In any event, Whiten summarizes compelling evidence for culture, acquired and transmitted through social learning, in a wide variety of species: not just chimpanzees and crows, who are the usual subjects in these kinds of studies, "but also in a rapidly growing range of animal species, from cetaceans to a diverse array of birds, fish, and even invertebrates".

For a photojournalistic account of the development of culture in Japanese snow monkeys, see "On the Origin of Culture" by Ben Craig in Smithsonian magazine (01-02/2021). The snow monkeys in Yamanouchi famously bathe in the local hot springs, but this behavior is relatively recent, having been first observed by Kazuo Wada, a primatologist at Kyoto University, in 1963. Until that time, the monkeys had avoided the spring, which was a tourist attraction (known as an onsen). But a group of primatologists had been giving the monkeys apples to eat as an extra food source. One day, an apple rolled into the spring, a monkey went into the spring to fetch it, and then went back in. Within a few months, the younger monkeys in the troop were regularly bathing in the hot springs, and their offspring (sorry) began to do so as well. By 1967, the scientists had to build a special onsen for the monkeys, to keep them away from the tourists. In another instance, researchers began giving another troop of monkeys extra provisions of grain and sweet potatoes. One day a young monkey dipped a sweet potato into a nearby stream before eating it. Her mother and a younger sibling soon followed suit, and before long every newborn had learned sweet-potato washing from its mother. Later, the monkeys switched from washing in the stream to washing in the sea -- presumably because the salt water made the potatoes taste better. When the researchers stopped supplying potatoes, that same young monkey began soaking grain in water to separate it from the sand -- and that behavior, too, quickly spread through the group.

Along with consciousness and language, the capacity for learning, and especially for social learning, which creates the capacity for culture, is one of the greatest gift of evolution to the human species. Language may be unique to humans. Consciousness probably isn't. Nor is the capacity for social learning, and thus for culture. It is an open question whether individuals can learn from watching animals of other species. But these instances certainly leave open the possibility of learning vicariously, taking others as models for one's own behavior. In that sense, at least some nonhuman species have at least the rudiments of culture. So be kind to your web-footed friends.

Biological Bases of Learning

The principles of learning, such as association by contingency, and the emphasis on predictability (for classical conditioning) and controllability (for instrumental conditioning) are pretty well established at the psychological level of analysis -- the level of stimulus and response, expectancy, and the like. At the biological level of analysis, the ability to learn -- to change one's behavior as a result of experience -- obviously must reflect changes in the organism's nervous system, and indeed the ability to learn is an important example of the plasticity of the nervous system -- the ability of the nervous system to be modified. But what exactly is going on in the nervous system when an organism learns something? This is one of the important tasks of behavioral neuroscience.

In contemporary neuroscience, research on the molecular and cellular basis of learning and memory focuses on the synapse, which mediates the connection between one neuron and another. Logically, in order for learning to occur, the connection between two neurons, or more likely two ensembles of neurons, has to be modifiable. Neural plasticity, or the ability for neural function to change with experience, must be possible -- or learning couldn't occur at all.

The fact that at least some phenomena of classical conditioning can be observed in every organism that has a nervous system has allowed behavioral neuroscientists to gain important insight into precisely how the nervous system is modified when organisms learn something. In work that won the Nobel Prize for Physiology and Medicine in 2000 (shared with Arvid Carlsson and Paul Greengard) Prof. Eric Kandel of Columbia University has examined synaptic changes in the marine mollusk, Aplysia, as it acquired a simple conditioned response (Science, 2001). The task was made easier by the fact that Aplysia has only about 10,000 neurons in its entire nervous system, compared to 86 billion or more in the human nervous system; and they're huge, so its relatively easy to watch individual neurons operate as the organism learns.

The most important of these synaptic changes is long-term potentiation, an increase in the sensitivity of a postsynaptic neuron as a result of repeated stimulation by a presynaptic neuron. This is the neural representation of both a simple association -- an association between neurons that is created as a result of repeated pairing of CS and US.

In the simplest case, suppose that a single afferent Neuron A represents the stimulus, and a single efferent Neuron B represents the response. Synaptic transmission between them B occurs via a neurotransmitter substance, such as glutamate, flowing presynaptic A to postsynaptic B. NMDA receptors on the postsynaptic B allow calcium ions to enter B's cell body. This in turn recruits more AMPA receptors, increasing the sensitivity of B to glutamate contributed by A. This temporary change in sensitivity are made permanent by certain proteins. Voila! Long-term potentiation.

Of course, it's not that simple, because both the stimulus and the response are represented by whole clusters of "ensembles" of neurons activated simultaneously, not by a single neuron from each. But the neurons in each of these ensembles are, in turn, bound together by virtue of long-term potentiation. In both cases, "neurons that fire together, wire together".

And, in fact, there are two somewhat different mechanisms for neural plasticity:

Long-term potentiation: If neuron A synapses onto neuron B, and the two repeatedly fire together, B becomes more sensitive to A, and requires less neurotransmitter in order to discharge.
Presynaptic facilitation: If neuron A synapses onto neuron B, and the two repeatedly fire together, A comes to release more neurotransmitter into the synapse with B then it did before conditioning.

You get the idea.

The Nature of Learning

Behaving organisms are not just machines, operating by reflex, taxis, or instinct. Rather, even organisms with very simple nervous systems are able to modify their behavior in accordance with what they have learned. Much learning can be described in terms of classical and instrumental conditioning, and combinations thereof. But not all learning is of this sort: language learning is a particularly salient example of learning merely through exposure to others, without any reinforcement.

What is learned is not a simple connection between stimulus and response. Rather, the learning organism forms a mental representation of the world and its relation to it: of objects, events, its own behavior, and the contingent relations between them.

In light of modern experiments on predictability, controllability, and social learning, we should revise our definition of learning.

Learning is not a change in behavior that occurs as a result of experience. That definition is a holdover from the radical behaviorism of Watson and Skinner, who thought that notions of mind, mental life, and the like were not scientific, and that psychology could only be a science if it became a science of behavior.
Rather, learning is the acquisition of knowledge through experience -- either the direct experience of classical and instrumental conditioning, or the vicarious experience of social learning. This knowledge is then used to guide behavior.

We cannot understand learning solely by focusing on events outside the organism, tracing connections between stimuli and responses, and treating the organism as if it were empty. Rather, we must go inside the "black box", to see how the mind is structured, and how its structures operate. We need to understand the principles by which information about the world is acquired through sensation and perception, retained through memory, transformed through thought, and communicated by language. These matters are the province of cognitive psychology.

For a comprehensive survey of the psychology of learning, see The Psychology of Learning and Behavior by B. Schwartz and S.J. Robbins (Norton, 1978), and subsequent editions.The most up-to-date of these is Learning and Memory by B. Schwartz and D. Reisberg (1991).

For a thorough discussion of behaviorism, see Behaviorism, Science, and Human Nature by B. Schwartz and H. Lacey (Norton, 1982).

For the classic survey of theories of learning, see the various editions of Theories of Learning by E.R. Hilgard and G.H. Bower (1st ed. by E.R. Hilgard, published by Appleton-Century -Crofts, 1948; 5th ed. by G.H. Bower and E.R. Hilgard, published by Prentice-Hall, 1981).

This page last revised 1004/2023.

Learning

Reflexes, Taxes, and Instincts

Reflexes

Taxes

Taxes and Reflexes in the Neonate Kangaroo

Instincts

A Nobel Prize for Ethology

Instincts in Humans?

Meanings of "Instinct"

From Instinct to Learning

Limitations on Innate Behaviors

Hatching Behavior in Ridley's Sea Turtles

Evolutionary Traps

Learning Defined

Classical Conditioning

The Sad Case of Edwin B. Twitmeyer

The Basic Vocabulary of Classical Conditioning

Generalization, Frequency, and Musical Pitch

The Scope of Classical Conditioning

Instrumental Conditioning

A Note on Two "Functionalisms"

The "Superstition" Experiment

And the "Air Crib"

And a Surely Apocryphal Story

The Vocabulary of Instrumental Conditioning

The Phenomena of Instrumental Conditioning

Schedules of Reinforcement

The Cumulative Record

More on Schedules of Reinforcement

The Matching Law and the Monty Hall Problem

Animal Behavior Enterprises

Schedules of Reinforcement, Your Smartphone, and You

The Scope of Instrumental Conditioning

Classical and Instrumental Conditioning Compared and Combined

Classical Conditioning

Instrumental Conditioning

One Form of Learning After All?

Avoidance Learning

What is Learned in Conditioning?

The Stimulus-Response Theory of Learning

Biological Constraints on Learning

From Coyotes to Sheep to Wolves

Contiguity versus Contingency in Conditioning

The Rescorla Experiment

The Kamin Experiments

Experimental Neurosis

Learned Helplessness

Helplessness at the World Trade Center

The Role of Reinforcement

More Vicissitudes of Classical Conditioning

Latent Learning

Curiosity and Intrinsic Motivation

Statistical Learning

The Bottom Line on Reinforcement

Expertise

Perceptual-Motor Learning

Social Learning

Observational Fear Conditioning

Language Acquisition

Social Learning and Imitation

Social Learning and Expectations

Cognitive Social Learning Theory

Social Learning as the Cognitive Basis of Culture

Biological Bases of Learning

The Nature of Learning