Thinking:

Reasoning, Problem-Solving, Judgment, and Decision-Making

Learning, perceiving, and remembering require more than forming associations between stimuli and responses, extracting information from environmental stimuli, and reproducing information stored in memory traces. Rather, the person is actively attempting to predict and control the environment by constructing mental representations of objects and events in the present world, and reconstructing episodes of past personal experience.

Learning is a process of generating and testing hypotheses, in which the person tries to figure out what predicts what, and what controls what, in the environment.
Perception is a constructive process, in which the person tries to figure out what is out there, where it is, what it is doing -- and, ultimately, what it means.
Memory is a reconstructive process, by which the person tries to figure out what happened in the past, where, when, and to whom -- and, again, what it means.

Thus, the basic functions of perceiving, learning, and remembering depend intimately on reasoning, inference, problem-solving, judgment, and decision-making -- in short, on thinking.

The term "thinking" covers a variety of mental activities, from the daydreaming of a high-school student in study hall to the kind of problem-solving that brought the astronauts of Apollo 11 safely back to Earth. McKellar (1957) distinguished between two general kinds of thinking:

Autistic thinking is the sort of thing we do when we are daydreaming. It's not to be confused with autism as a mental illness. Everyone engages in autistic thinking.
Imaginative thinking involves the mental manipulation of symbolic representations that is aimed toward solving a particular problem or achieving a particular goal.

Just as memory frees behavior from control by the immediately present stimulus environment, and thinking performs much the same kind of function. For Piaget, thinking allows us to create a model of the world in our minds; for Vygotsky, thinking allows us to restructure that model mentally to figure out ways that things could be different -- other than the way they are.

Thinking takes a variety of forms, including reasoning, problem-solving, judgment, and decision-making. Often, thinking ends up in a choice to engage in one action or another.

Thinking both acts on and creates mental representations of ourselves and the objects and events we encounter in the world around us. These mental representations can be based on perception of the current environment, or memories of the past, or our imagination. Bruner (1957) noted that representations come in three basic forms:

Enactive representations are closely tied to motor activities.
Iconic representations come in the form of visual and other mental images.
Symbolic representations exist in the form of words, and can represent both concrete objects and events (such as tigers and running) and abstract ideas (such as freedom and justice).

The doctrine of normative rationality serves as a backdrop for this discussion. According to this doctrine, human beings are rational creatures, and rational creatures think according to the rules of logic and make decisions according to a principle of rational choice.This is a classical philosopher's view of human thought, promoted from Aristotle to Descartes and beyond -- a sort of philosopher's prescription for how people should think. But psychology is an empirical science, not so much concerned with prescribing how people should think as with describing how they do think. In fact, psychological research shows that when people think, solve problems, make judgments, decisions, and choices, they depart in important ways from the prescriptions of normative rationality. These departures, in turn, seem to challenge the view of humans as rational creatures -- but do they really?

Concepts and Categories

One topic that neatly illustrates the role of "higher" mental processes in "lower" mental processes is perception, which is dependent on the perceiver's repertoire of concepts stored in memory. As Jerome Bruner (1957) has noted, every act of perception is an act of categorization. In identifying the form and function of objects (the basic task of perception), stimulus information makes contact with pre-existing knowledge (as Ulric Neisser has noted, perception is where cognition and reality meet). In perceiving we decide how, in general, to think about objects and events; how they are similar to other objects and events encountered in the past. When we classify two objects or events as similar to each other, we construe them as members of the same category, an act which reveals a concept which is part of our semantic memory.

Defining Categories

A category may be defined as a group of objects, events, or ideas which share attributes or features in common. A concept is the mental representation of a category, abstracted from particular instances. Concepts serve important mental functions: they group related entities together into classes, and provide the basis for synonyms, antonyms, and implications.

Technically, categories exist in the real world, while concepts exist in the mind. However, this technical distinction is difficult to uphold, and psychologists commonly use the two terms interchangeably. In fact, objective categories may not exist in the real world, independently of the mind that conceives them (a question related to the philosophical debate between realism and idealism). Put another way, the question is whether the mind picks up on the categorical structure of the world, or whether the mind imposes this structure on the world.

Some categories may be defined through enumeration: an exhaustive list of all instances of a category. A good example is the letters of the English alphabet, A through Z; these have nothing in common except their status as letters in the English alphabet.

A variant on enumeration is to define a category by a rule which will generate all instances of the category (these instances all have in common that they conform to the rule). An example is the concept of integer in mathematics, which is defined as the numbers 0, 1, and any number which can be obtained by adding or subtracting 1 from these numbers one or more times.

The most common definitions of categories are by attributes: properties or features which are shared by all members of a category. Thus, birds are warm-blooded vertebrates with feathers and wings, while fish are cold-blooded vertebrates with scales and fins. There are three broad types of attributes relevant to category definition:

perceptual or physical features help define natural categories like birds and fish;
functional attributes, including the operations performed with or by objects, or the uses to which they can be put, are used to define categories of artifacts like tools (instruments which are worked by hand) or vehicles (means of transporting things);
relational features, which specify the relationship between an instance and something else, are used to define many social categories like aunt (the sister of a father or a mother) or stepson (the son of one's husband or wife by a former marriage).

Of course, some categories are defined by mixtures of perceptual, functional, and relational features.

Still, most categories are defined by attributes, meaning that concepts are summary descriptions of an entire class of objects, events, and ideas. There are three principal ways in which such categories are organized: as proper sets, as fuzzy sets, and as sets of exemplars.

The Classical View: Categories as Proper Sets

According to the classical view of categories as proper sets, which has come down to us from Aristotle (his book on Categories, part of the Organon) the objects in a category share a set of defining features which are singly necessary and jointly sufficient to demarcate the category.

By singly necessary we mean that every instance of a category possesses that feature;
by jointly sufficient we mean that every entity possessing the entire set of defining features is an instance of the concept.

Examples of classification by proper sets include most mathematical and scientific categories:

Geometrical figures

All triangles have three features in common: they are 2-dimensional geometric figures with 3 sides and 3 angles.
All quadrilaterals also have three features in common: they are 2-dimensional geometric figures with four sides and four angles.

Animals:

All birds are warm-blooded vertebrates with feathers and wings.
All fish are cold-blooded vertebrates with scales and fins.

Geological time is divided into eons: we're currently in the Phanerozoic eon.

In some schemes, there are also supereons, which subsume eons.

Eons are divided into eras, like our current era, the Cenozoic.
Eras are divided into periods, like our current period, the Quaternary.
Periods are divided into epochs, like our current epoch, the Holocene.

(Although some geologists argue that we've entered a new epoch, the "Anthropocene", about which I'll say more in the lectures on Development).

Epochs are divided into ages or stages, like the Stone Age and the Little Ice Age.
Ages are divided into stages or chrons: We're currently in the Subatlantic stage.

In each case the features listed are singly necessary for category membership because every instance of the category possesses each of the features.

And in each case the features listed are jointly sufficient for category membership because any object which possesses all these features is an instance of the category.

According to the proper set view, categories can be arranged in a hierarchical system (hierarchies again!). Such hierarchies represent the vertical relations between categories, and yield the distinction between superordinate and subordinate categories.

Subordinate categories (subsets) are created by adding defining features: thus, a square is a subset of the geometric category quadrilateral (four-sided figure).
Superordinate categories (supersets) are created by eliminating defining features: thus, a quadrilateral is a superset which includes squares, rectangles, and parallelograms.

Such hierarchies of proper sets are characterized by perfect nesting, by which we mean that subsets possess all the defining features of supersets (and then some).

Thus, in Euclidean geometry, there are four types of geometrical figures:

points are geometrical figures with no dimensions;
lines are geometrical figures with 1 dimension;
planes are 2-dimensional geometrical figures; and
solids are 3-dimensional geometrical figures.

Nested underneath the superordinate category planes are subordinate categories defined in terms of lines and angles:

triangles are 2-dimensional figures with 3 sides (or edges) and 3 angles (or vertices);
quadrilaterals are 2-dimensional geometrical figures with 4 sides and 4 angles;
pentagons are 2-dimensional geometrical figures with 5 sides and 5 angles;
hexagons are 2-dimensional geometrical figures with 6 sides and 6 angles;
and so on.

Nested underneath the subordinate categories triangles and quadrilaterals are additional sub-subordinate categories defined in terms of equality of sides and angles.

For the category triangles:

equilateral triangles have 3 equal sides and 3 equal angles;
isosceles triangles have 2 equal sides and 2 equal angles;
scalene triangles have no equal sides and no equal angles.

Alternatively,triangles can be classified in terms of their internal angles:

right triangles have 1 internal angle of 90^o;
oblique triangles have no 90^o angles;

Therefore,oblique triangles can be further subdivided:

acute triangles have all internal angles smaller than 90^o;
obtuse triangles have 1 internal angle larger than 90^o.

And for the category quadrilaterals:

trapeziums have no parallel sides.
trapezoids have 2 opposite sides parallel to each other;
parallelograms have both pairs of opposite sides parallel;

And parallelograms can be further subdivided:

rhomboids have no right angles;
- rhombuses are a subcategory of rhomboids with all 4 sides equal in length;
rectangles have 4 right angles;

squares are a subcategory of rectangles with all 4 sides equal in length.

Another example of classification by proper sets is the biological taxonomy set down by Carl Linnaeus, a Swedish naturalist, in the 18th century (see "Organization Man" by Kennedy Warne,Smithsonian, May 2007).

Linnaeus first divided the natural world into three great kingdoms -- animal, vegetable, and mineral.
- Plants and animals were further subdivided into classes. (Minerals, or at least rocks, can be further divided into sedimentary, igneous, and metamorphic, but Linnaeus didn't do that; he was interested solely in various forms of life.)
  - Classes were further subdivided into orders.
    
    Orders were further subdivided into families.
    
    Within each family were various genera.
    
    Within each genus were various species.
    
    And within species, of course there were subspecies.

In the Linnean system, every living thing is known by its genus and species name -- thus, modern humans are known as homo (our genus)sapiens (our species)sapiens (our subspecies). Every species, is identified by a set of defining features that are singly necessary and jointly sufficient to identify an individual as a species member.

For plants, Linnaeus focused on sexual characteristics, identifying classes based their stamens, and orders depending on their pistils (Species Plantarum, 1753).
For animals, Linnaeus grouped mammals according to such features as teeth, toes, and teats; insects by their wings; and birds by their beaks and feet (Systema Naturae, 1736 and later).

There was a place for everything, and everything was in its place. In his Systema Naturae, he depicted himself as a new Adam, in the Garden of Eden, naming the plants and animals. As he said of himself, "Deus creavit, Linnaeus disposuit" --"What God created, Linnaeus organized". For a recent biography of Linnaeus, see The Man Who Organized Nature by Gunnar Broberg, reviewed by Kathryn Schulz in "You Name It", New Yorker 09/21/2023). By way of background, Schulz includes a brief but very interesting history of various attempts to organize nature, beginning with Aristotle. , underscoring Linneaus's achievements. She also points out that Linnaeus polite society with a description of plant reproduction "that looked less like heterosexual monogamy than like homosexuality, polygamy, miscegenation, and incest".

And not just Linnaeus. Linnaeus brought logical order to something that people have been doing since long before Aristotle. In Genesis (2:18-23), God gave to Adam the task of naming all the animals (see Adam's Task: Calling the Animals by Name (1986) by Vicki Hearne, a poet, philosopher, and animal trainer, which actually isn't so much about naming animals as it is about training them, but it's a beautiful book nonetheless). Categorization enables us to appreciate the world around us in all its variety -- what things are alike, and what things are different.

An important subdiscipline of anthropology studies folk taxonomy -- how different cultures organize the life -- flora and fauna -- they find around them. Cecil Brown, an anthropologist, has studied about 200 different folk taxonomies, and found that across a wide variety of cultures, people employ the same basic categories:

among animals: fish, birds, snakes, mammals, and what Brown calls "wugs" (worms, insects, and the like);
among plants: trees, bushes, vines, and herbs.
Across cultures, people generally form subcategories by using two-part names, like grizzly bear or live oak.

And even the words that represent these categories can sound alike from one language to another. Brent Berlin, an anthropologist and ethnobiologist (you'll meet him again in the lectures on Language), read pairs of names, one bird and one fish, taken from Huambisa, an indigenous language spoken in Peru, and found that American college students could guess which word named which kind of animal at levels significantly above chance. Apparently, some names just sounded "birdy", and others sounded "fishy". There may even be a part of the brain, in the temporal lobe, that deals specifically with categorizing things as living or nonliving (talk about localization of function!).

Linnaeus epitomizes what Adam Shorto called "the Age of Categories" (in a review of The Invention of Air by Steven Johnson,New York Times Book Review, 01/25/2009). In the Age of Reason and the Age of Enlightenment, a great deal of intellectual energy was employed in sorting everything in the natural world into its proper category. And also parts of the social world. It was at roughly this time, for example, that universities began to organize themselves into academic departments, with the philosophers in a different department from the theologians, and the physicists in a different department from the philosophers. And all of what has sometimes (maybe by Rutherford, in the same breath as his "stamp collecting" quote; maybe by Wordsworth, in "A Poet's Epitaph") been sneeringly called "botanizing" was predicated on a proper-set view of categorization, in which there was a place for everything and everything had a place.

Other proper sets can also be arranged into hierarchies of supersets and subsets:

Superset

People

Subsets

Male

Female

Sub-subsets

Boy

Youth

Man

Bachelor

Husband

Widower

Girl

Maiden

Woman

Wife

Widow

Superset

Federal Government Officials

Subset

Executive

Legislative

Judicial

Sub-subsets

President, Vice-President

Cabinet Secretary

Administrator

Senator

Representative

Supreme Court

Court of Appeals

District Court

Magistrate

In the classical view of categories as proper sets, the horizontal relations between adjacent categories, within a particular level of a vertical hierarchy, are governed by an "all or none" principle. Because category membership is determined by a finite set of defining features, and an object either has the features or doesn't have them, then an object is either in a category or not. There are no half-way points.

If you're a figure in Euclidean geometry, you're either a point, a line, a plane, or a solid, with no in between.
- If you're a plane, you're either a triangle, or a quadrilateral, or whatever.
  - If you're a triangle, you're either an equilateral triangle, an isosceles triangle, or a scalene triangle.
    - If you're a scalene triangle, you're either a right triangle or an oblique triangle.
      - And if you're an oblique triangle, you're either an acute triangle or an obtuse triangle.
  - If you're a quadrilateral, you're either a trapezium, or a trapezoid, or a parallelogram.

It follows from the classical view of categories as proper sets that categories have a homogeneous internal structure. Because all instances of a category share the same set of defining features, all instances are equally good representatives of the category. All quadrilaterals are alike with respect to their pertinent features.

The classical view of categories therefore yields two procedures for categorization:

If you want to define a category of objects, then determine the set of defining features shared by all members of the category.

On inspection, all triangles, of whatever kind, will have 2 dimensions, 3 sides, and 3 angles.
On inspection, all birds, of whatever kind, will be warm-blooded vertebrates with feathers and wings.

And if you want to categorize a new object into a familiar category, the procedure is only slightly more complicated:

Analyze, through perception, the features of the object.
Retrieve, from memory, the defining features of various candidate categories.
Match the features of the object to the defining features of the candidate categories.
If there is a match, assign the object to the appropriate category.

Thus, if you encounter a new animal, and you want to determine whether it is a bird of a fish,

Determine whether it is a vertebrate or not; whether it is warm- or cold-blooded; whether it has feathers or scales; and whether it has fins or wings.
If it is a warm-blooded vertebrate with feathers and wings, call it a bird.
If it is a cold-blooded vertebrate with scales and fins, call it a fish.

How We Categorize: The Classical View

The classical view of concepts formed the basis for the earliest research on "higher" mental processes: Hull's (1920) studies of concept-identification.

As psychology approached the cognitive revolution, interest in concept-identification was revived in classic research by Bruner, Austin, and Goodnow. Following the classical view, these investigators viewed categorization as assigned objects to proper sets. They constructed stimuli which varied on 4 dimensions: the number of objects in a rectangle, the number of borders around an object, the shape of the object, and the color of the object. Subjects were presented with stimuli one at a time, along with information as to whether the object belonged to some set the experimenter had in mind. And their task was to discover what the experimenter's concept was.

Bruner et al. identified two broad strategies for concept-identification, depending on whether they focused on the stimuli as a whole, or only parts of the stimuli.

Wholist: In this case, subjects took as their initial hypothesis that the concept consisted of all the features of the first positive instance. They then eliminated any features that did not continue to occur in future positive instances. If their initial classification was correct, they would maintain their hypothesis. If their classification was incorrect, they would abandon their hypothesis and form a new hypothesis based on the next positive instance. That is, the new hypothesis would consist of features common to the old hypothesis (when correct), plus the features of the current positive instance. The advantage of this strategy is that subjects never have to remember any instances. They only have to remember the features of the current hypothesis.
Partist: In this case, the subjects took as their initial hypothesis some subset of the features of the first positive instance, and then changed their hypothesis as trials went on so as to be consistent with the features present in all positive instances. If their classification was correct, they would maintain their hypothesis. If the classification was incorrect, they would choose a new hypothesis. The problem with this strategy is that a different subset of features was present in all instances. Thus, the strategy was disadvantageous, because subjects were forced to remember individual past instances.

Problems with the Classical View

The proper set view of categorization is sometimes called the classical view because it is the one handed down in logic and philosophy from the time of the ancient Greeks. But there are some problems with it which suggest that however logical it may seem, it's not how the human mind categorizes objects.

For example, some concepts are disjunctive: they are defined by two or more different sets of defining features.

An example is the concept of a strike in the game of baseball: a strike is a pitched ball at which the batter swings but does not hit; or a strike is a pitched ball which crosses home plate in the strike zone, which baseball's rule book defines as "that area over home plate the upper limit of which is a horizontal line at the midpoint between the top of the shoulders and the top of the uniform pants, and the lower level is a line at the hollow beneath the kneecap" -- assuming that the umpire calls it a strike; or a strike is a pitch hit into the foul zone, except that the third such hit, and any subsequent such hits, does not count as a strike.

Speaking of which, a strike is also a strike if the umpire calls it a strike. According to baseball legend, as retold by the postmodernist literary scholar Stanley Fish ("No Flag on the Play" by Nick Paumgarten,New Yorker, 01/20/03), the great major-league umpire Bill Klem was behind the plate at one game when "The pitcher winds up, throws the ball. The pitch comes. The batter doesn't swing. Klem for an instant says nothing. The batter turns around and says, 'O.K., so what was it, a ball or a strike?' And Klem says, 'Sonny, it ain't nothing 'till I call it'". Fish continued: "What the batter is assuming is that balls and strikes are facts in the world and that the umpire's job is to accurately say which one each pitch is. But in fact balls and strikes come into being only on the call of an umpire". Fish's story illustrates the postmodernist contention that many "facts" of the world do not exist independent of people's minds -- they are cognitive or social constructions.

Another example is jazz music, which comes in two broad forms, blues and swing, (or, depending on which critic you read, blues and standards), which are completely different from each other. Jazz can't be defined simply as improvisational music, because there are other forms of music, for example Bach's organ pieces and some rock music, which also involve improvisation.

Disjunctive categories violate the principle of defining features, because there is no defining feature which must be possessed by all members of the category.

Another problem is that many entities have unclear category membership. According to the classical, proper-set view of categories, every object should belong to one category or another. But is a rug an article of furniture? Is a potato a vegetable? Is a platypus a mammal? Is a panda a bear? We use categories like "furniture" without being able to clearly determine whether every object is a member of the category.

Consider the tomato. In 1883, Congress enacted a Tariff Act which placed a 10% duty on "vegetables in their natural state", but permitted duty-free import of "green, ripe, or dried" fruits. The Customs Collector in the Port of New York, seeing the prospects of increased revenues, declared that tomatoes were vegetables and therefore taxable. The International Tomato Cartel (honest, there was such an organization) sued, and the case (known as Nix v. Hedden)eventually reached the United States Supreme Court, which unanimously declared the tomato to be a vegetable, while knowing full well that it is a fruit. As Justice Gray wrote for the bench:

Botanically speaking, tomatoes are the fruit of a vine, just as are cucumbers, squashes, beans, and peas. But in the common language of the people, whether sellers or consumers of provisions, all these are vegetables which are grown in kitchen gardens, and which, whether eaten cooked or raw, are, like potatoes, carrots, parsnips, turnips, beets, cauliflower, celery and lettuce, usually served at dinner in, with, or after the soup, fish, or meats which constitute the principal part of the repast, and not, like fruits, generally, as dessert.

Nearly a century later, the Reagan administration, trying to justify cuts in the budget for federal school-lunch assistance, likewise declared tomato ketchup -- like cucumber relish -- to be a vegetable.

While tomatoes are commonly considered to be vegetables of a sort to be found on salads and in spaghetti sauce, and not fruits of a sort found on breakfast cereals or birthday cakes, Edith Pearlman did find Tomate offered as a dessert in a restaurant in Paris ("Haute Tomato",Smithsonian, July 2003). Nevertheless, as the British humorist Miles Kingston noted, that "Knowledge tells us that a tomato is a fruit; wisdom prevents us from putting it into a fruit salad" (quoted by Heinz Hellin,Smithsonian, September 2003).

And in 2005, the state of New Jersey considered a proposal to have the "Rutgers tomato" declared the official state vegetable, despite the fact that the state is far better known for its Jersey sweet corn (the tomato had already lost to the highbush blueberry in the contest to become the official state fruit). The tomato lobbyists, which included a class of fourth graders, actually cited the Supreme Court decision as justification ("You Say Tomato" by Ben McGrath,New Yorker, 03/21/05).

Arkansas solved this problem neatly: the South Arkansas Vine Ripe Pink Tomato has been officially designated as both State Fruit and State Vegetable.

And it's not just tomatoes. In the 16th century, or so the legend goes, missionary Catholic priests in Venezuela petitioned the Pope to classify the capybara, a large rodent (think very large nutria) with webbed feet which spends a lot of time in the water, as a fish. Apparently the locals didn't care for the traditional Catholic practice of abstaining from meat on Fridays, or for a whole month during Lent -- not least because the local food supply was pretty limited as it was. To this day, capybara is a favored food at Easter, much like turkey is popular at Thanksgiving in the US. And people who have tried it say it tastes a little fishy (see "In Days Before Easter, Venezuelans Tuck into a Rodent-Related Delicacy" by Brian Ellsworth, New York Sun, 03/24/2005).

In 2006, another court case emerged over whether burritos were properly classified as sandwiches. Panera Bread Company operated a bakery cafe (sort of like Boudin's in the Bay Area) in the White City Shopping Center in Shrewsbury, Massachusetts. When White City contracted with Qdoba Mexican Grill to open a franchise, Panera sued White City for violation of a contract which gave Panera exclusive right to sell sandwiches in the mall. Expert testimony by Chris Schlesinger, chef at the All Star Sandwich Bar in Cambridge, Massachusetts, pointed out that sandwiches were of European derivation (remember Lord Sandwich), while burritos are uniquely Mexican in origin. Moreover, a sandwich consists of two slices of leavened bread, while the single tortilla in a burrito is unleavened. (Never mind that many burritos have two or more tortillas. Schlesinger sells hot dogs at his sandwich bar, with the disclaimer that the item is "not a real sandwich, but a close friend"). In his ruling, however, Judge Locke -- like Justice Gray before him -- noted that "a sandwich is not commonly understood to include burritos, tacos, and quesadillas, which are typically made with a single tortilla and stuffed with a choice filling of meat, rice, and beans". ("Arguments Spread Thick: Rivals Aren't Serving Same Food, Judge Rules" by Jenn Abelson,Boston Globe, 11/10/2006).

As another example: American elementary-school students are commonly taught that there are five "Great Lakes" on the border between the United States and Canada -- Ontario, Erie, Michigan, Huron, and Superior. But in 1998, at the behest of Senator Patrick Leahy of Vermont, Congress voted to designate Lake Champlain, which lies on the border between Vermont and New York, as a sixth Great Lake. Leahy's logic seemed to be that the Great Lakes were all big lakes that were on political boundaries, and Lake Champaign was also a big lake on a political boundary, so Lake Champaign ought to be a Great Lake too (the designation was written into law, but later revoked).

Furthermore, it is difficult to specify the defining features of many of the concepts we use in ordinary life. A favorite example (from the philosopher Wittgenstein) is the concept of "game". Games don't necessarily involve competition (solitaire is a game); there isn't necessarily a winner (ring-around-the-rosy), and they're not always played for amusement (professional football). Of course, it may be that the defining features exist, but haven't been discovered yet. But that doesn't prevent us from assigning entities to categories; thus, categorization doesn't seem to depend on defining features.

And another example, this time from astronomy. Everybody knows the names of the nine planets -- Mercury, Venus, Earth, Mars, Jupiter, Saturn, Uranus, Neptune, and Pluto -- the last discovered in 1930, and named by Venetia (!) Burney Phair (1919-2009), then an 11-year-old British girl infatuated with astronomy (the name of the planet came first, and then the name of the Disney dog). Except some astronomers think that Pluto isn't really a planet; and, in fact, Pluto isn't represented, except by a crummy plaque, in the new Rose Exhibition Hall of the Hayden Planetarium in New York City. The problem, to some minds, is that Pluto is too small to qualify as a planet, and seems to be made mostly of ice -- and so it might better be considered to be some sort of asteroid, or even a comet.

This rethinking of Pluto's status was stimulated by astronomical findings that there are many icy "Kuiper Belt Objects" (KBOs), orbiting the sun between Neptune and Pluto, that are almost as large as Pluto is. Viewed from that perspective, Pluto more closely resembles a KBO than a planet -- put another way, the diameter of Pluto seems to be within 2 standard deviations of the mean of all KBOs. And there went its status at the Hayden Planetarium. However, new measurements, reported in 2004, have indicated that even the largest KBOs are a lot smaller than Pluto after all. Put another way, the diameter of Pluto seems to be well outside the distribution of KBO diameters after all. Viewed from that perspective, Pluto more closely resembles a planet than a KBO -- especially when you consider the heterogeneity of planets, with some (like the inner planets) mostly rocky, and others (like the outer planets) fairly gaseous. I'm not an astronomer, and so I'm not qualified to take a position on this matter -- though, frankly, having memorized a list of nine planets in third grade, and being infatuated with the story of Pluto's discovery by Clyde Thombaugh, I'm partial to planetary status. The important point is that Pluto shows that even in the natural world of geophysics, category membership can be on a continuum. Where the dividing line is between a body that is too small to be a genuine planet, on the one hand, and a body that is a genuine planet even though it's really really small, on the other, is in the end largely a matter of judgment.

The astronomical situation was worsened in 2005, with the discovery of yet another solar system object, larger than Pluto by about 50%, and even more distant from the sun (about 6.4 billion miles, compared to about 3.7 billion miles for Pluto). The object, formally designated 2003-UB313 (because it first appeared on images of the sky taken in 2003), and in some quarters already named for Xena, the Greek warrior princess of television fame. So, Xena was bigger than Pluto, and thus more planet-like; but then again, its orbital plane was even more eccentric than Pluto's, tilted by about 44^o off the ecliptic (the plane of Earth's orbit), about twice the angle of Pluto's eccentric orbit (which was another factor in thinking that Pluto wasn't a planet after all).

The situation was resolved, not to everyone's satisfaction, in 2006, when the International Astronomical Union voted to reclassify Pluto (and Xena, now renamed Eris, after the Greek goddess of discord), as dwarf planets. In some sense, the new classification scheme is constructed along classical, proper-set lines:

A planet is round, orbits the Sun, and has cleared its orbital zone of debris. Eight solar-system bodies fit this definition: the eight traditional planets, excluding Pluto.

There are two sub-categories of planets: rocky planets like Earth, and gaseous planets like Neptune.

And there are two sub-sub-categories under each of these.

A dwarf planet (a subset of the category planet) is round, orbits the sun, but has not cleared its orbital zone.

An exoplanet is a planet-like object that orbits a star other than the Sun.

But even this is an anomaly, because in classical proper-set structure, subsets are created by adding defining features, while supersets are created by subtracting them. Thus, we have the anomalous situation in which a subset actually has fewer defining features than the superset in which it is included. For the system to make sense, then, at least from a classical viewpoint, planets would have to be subsets of the superset dwarf planets, which of course doesn't make any linguistic sense. That, or planet and dwarf planet are at the same level in the taxonomy. But from the point of view of the International Astronomical Union, "dwarf planets" aren't really planets at all. Go figure.

The battle apparently isn't over. Even Alan Stern, the principal investigator (PI) for the New Horizons spacecraft set to encounter Pluto in 2015, calls the current classification of Pluto, and planets in general, stupid. "They really got it wrong", he told the New York Times ("NASA Spacecraft Get a Closer Look at Dwarf Planets Pluto and Ceres", by Kenneth Chang, 01/20/2015; see also "Plutonic Love" by Michael Lemonick, Smithsonian Magazine, 06/2015). In an article on New Horizons in Scientific American, Stern writes that "I, along with most other planetary scientists I know, refer to Pluto as a planet and do not use the International Astronomical Union planet definition, which excludes Pluto, in speech or research papers" ("Pluto Revealed", 12/2017).

Christopher Russell, the PI for another mission, to the dwarf planet Ceres, was quoted as saying that "If all of the scientific community starts referring to Vesta [another dwarf planet] and Ceres and Pluto as planets, then eventually everyone will come along". Then again, Stern believes that anything that is round but not a star should count as a planet -- which would make most moons into planets (actually, Stern would label them "secondary planets"), which would mean that we'd have secondary planets orbiting primary planets which orbit stars. That doesn't sound like an improvement. Watch the newspapers, when the IAU is sure to revisit this issue at some future meeting.

In fact, in 2025 National Geographic hosted just such an exchange on the planetary status of Pluto, timed to the 95th anniversary of the planet's (sic) discovery ("Are We Sure Pluto Isn't a Planet?" by Eric Alt, 06/2025). The participants were the above-mentioned Mike Brown, the Pluto-the-planet-killer, and Philip Metzger, a planetary scientist who thinks Pluto's reclassification was a scientific blunder. Brown, for his part, claims an "attachment" to Pluto; he just doesn't think that Pluto qualifies to be a planet. He points out that most of the "pro-Pluto" researchers are people who worked on the "New Horizons" fly-by of Pluto: ironically, when the mission was launched, Pluto was a planet; when the mission got there, it wasn't. Because this is not about astronomy, but classification, we'll give Metzger the last word: "Taxonomy is part of science, and the taxonomy that matters is the one that the scientists are using and finding useful".

The whole sordid story is told by Neil deGrasse Tyson in The Pluto Files: The Rise and Fall of America's Favorite Planet (2009). See also How I Killed Pluto and Why It Had It Coming (2010) by Mike Brown, who discovered Eris. Both authors enthusiastically supported Pluto's demotion. Neil deGrasse Tyson, as Director of the Rose Center for Earth and Science at the American Museum of Natural History, in New York City, went so far as to leave Pluto out of a display of the solar system.

See also Pluto Confidential: An Insider Account of the Ongoing Battles Over the Status of Pluto by Laurence A. Marschall and Stephen P. Maran (2009).

Perhaps the best account of the discovery and demotion of Pluto is provided by Discovering Pluto: Exploration at the Edge of the Solar System by Dale P. Cruikshank and William Sheehan, both avowed Pluto-lovers (2018; reviewed by Priyamvada Natarajan in "In Search of Planet X", New York Review of Books, 10/24/2019). Sheehan also documented the controversy over the "canals" on Mars in Planets and Perception, discussed in the lectures on Sensation and Perception.

And yet another example -- one not for the fainthearted. In the United States, the Humane Slaughter Act requires that animals are rendered unconscious before they are butchered -- presumably to minimize suffering on the part of cows, sheep, and pigs. Poultry is exempt from this law, however: maybe someone at USDA thinks that chickens and turkeys don't suffer when they're killed. But the Department of Agriculture classifies rabbits -- yes, those cuddly little mammals with the long ears and wiggling noses -- as poultry, thus exempting them from the provisions of the Humane Slaughter Act. Why anybody would do this isn't clear, though you can bet that the rabbit-slaughtering industry found it too expensive, or inconvenient, to comply with the law. Why the USDA didn't just specifically exempt rabbits (and other small animals) from the Act, instead of classifying them as poultry, isn't clear either, though you can bet that nobody from the USDA wanted to testify before Congress that the Eastern Bunny and the Velveteen Rabbit didn't have consciousness, feel pain, and suffer. Actually, if you take both bets, then the policy makes a sort of perverted sense, doesn't it?

Yet another problem is imperfect nesting: it follows from the hierarchical arrangement of categories that members of subordinate categories should be judged as more similar to members of immediately superordinate categories than to more distant ones, for the simple reason that the two categories share more features in common. Thus, a sparrow should be judged more similar to a bird than to an animal. This principle is often violated: for example, chickens, which are birds, are judged to be more similar to animals than birds.

The chicken-sparrow example reveals the last, and perhaps the biggest, problem with the classical view of categories as proper sets: some entities are better instances of their categories than others. This is the problem of typicality. A sparrow is a better instance of the category bird -- it is a more "birdy" bird -- than is a chicken (or a goose, or an ostrich, or a penguin). Within a culture, there is a high degree of agreement about typicality. The problem is that all the instances in question share the features which define the category bird, and thus must be equivalent from the classical view. But they are clearly not equivalent; variations in typicality among members of a category can be very large.

Pioneering work by UCB's Prof. Eleanor Rosch showed just how variable category instances are with respect to their typicality. In her 1975 study, she asked subjects to rate how "good" each instance was of a variety of categories, using a 1-7 scale. In each case, the items in question had been listed by subjects who were asked to generate instances of various categories. Here are some examples:

Fruits are the reproductive bodies of seed plants, but orange,apple, and banana were rated as good examples;olive,pickle, and squash were rated as very poor examples (tomato got a very low rating of 5.58, close to pumpkin and nut).

Vegetables are plants grown for an edible part, but pea,carrot, and green beans were rated as good examples;dandelion,peanut, and rice were rated as poor examples (tomato got a rating of 2.23 as a vegetable, close to greens and lima beans).

Sports are forms of recreation, but football,baseball, and basketball were rated as good examples;checkers,cards, and sunbathing were rated as poor examples.

Vehicles are means of transporting things, but automobile,station wagon, and truck were rated as good examples;wheelbarrow,surfboard, and elevator were rated as poor examples.

Even when the category in question has a clear and obvious proper-set structure, differences in typicality can occur. In a study inspired by Rosch's, Armstrong, Gleitman, and Gleitman (1983), asked subjects to provide goodness-of-example ratings for "prototype" categories, such as fruit,vegetable, sport, and vehicle, and got results very similar to hers. But they also got the same kind of results when subjects were asked to rate "well-defined" categories -- even from subjects who agreed, in advance, that the categories were proper sets:

Even number: the integers 4,8, and 10 were rated as better examples than 18,34, and 106.

Odd number: the integers 3 and 7 were rated as better examples than 23,57,501, and 447.

Female:mother and housewife were rated as better examples than princess,waitress,policewoman, and comedienne.

Plane Geometry Figure:square,triangle, and rectangle were rated as better examples than circle,trapezoid, and ellipse.

There are a large number of ways to observe typicality effects: when classifying objects, subjects react faster to typical than to atypical members; children and nonnative speakers of a language learn typical members before atypical ones; when generating instances, typical ones appear before atypical ones (this is the phenomenon of spew order); and typical members are more likely to serve as reference points for categorization than are atypical members. That is, you are much more likely to say that an object is a bird because it looks like a sparrow than that it is a bird because it looks like a chicken. A major-league baseball umpire is much more likely to call a strike if the pitch is below the batter's belt, even though many above-the-belt pitches fall well within the official strike zone (New York Times, 11/08/00)..

Typicality appears to be determined by family resemblance: typical members share lots of features with other category members, while atypical members do not. Thus, sparrows are small, and fly, and sing; chickens are big, and walk, and cluck.

Typicality is important because it is another violation of the homogeneity assumption of the classical view. It appears that categories have a special internal structure which renders instances nonequivalent, even though they all share the same singly necessary and jointly sufficient defining features. Typicality effects indicate that we use non-necessary features when assigning objects to categories. And, in fact, when people are asked to list the features of various categories, they usually list features that are not true for all category members.

The implication of these problems, taken together, is that the classical view of categories is incorrect. Categorization by proper sets may make sense from a logical point of view, but it doesn't capture how the mind actually works (this is a theme which runs through the course).

The Revisionist View: Categories as Fuzzy Sets

Recently, another view of categorization, introduced originally by the philosopher Ludwig Wittgenstein, has gained status within psychology: this is known as the probabilistic or the prototype view. According to this view, categories are fuzzy sets, in that there is only a probabilistic relationship between any feature and category membership. No feature is singly necessary to define a category, and no set of features is jointly sufficient.

Some features are central, in that they are highly correlated with category membership (most birds fly, a few, like ostriches, don't; most non-birds don't fly, but a few, like bats, do). Central features are found in many instances of a category, but few instances of contrasting categories.
Other features are peripheral: there is a low correlation with category membership, and the feature occurs with approximately equal frequency in a category and its contrast (birds have two legs, but so do apes and humans).

Fuzzy Sets and Fuzzy Logic

The notion of categories as fuzzy rather than sets, represented by prototypes rather than lists of defining features, is related to the concept of fuzzy logic developed by Lofti Zadeh, a computer scientist at UC Berkeley. Whereas the traditional view of truth is that a statement (such as an item of declarative knowledge) is either true or false, Zadeh argued that statements can be partly true, possessing a "truth value" somewhere between 0 (false) and 1 (true).

Fuzzy logic can help resolve certain logical conundrums -- for example the paradox of Epimenides the Cretan (6th century BC), who famously asserted that "All Cretans are liars". If all Cretans are liars, and Epimenides himself is a Cretan, then his statement cannot be true. Put another way: if Epimenides is telling the truth, then he is a liar. As another example, consider the related Liar paradox: the simple statement that "This sentence is false". Zadeh has proposed that such paradoxes can be resolved by concluding that the statements in question are only partially true.

Fuzzy logic also applies to categorization. Under the classical view of categories as proper sets, a similar "all or none" rule applies: an object either possesses a defining feature of a category or it does not; and therefore it either is or is not an instance of the category. But under fuzzy logic, the statement "object X has feature Y" can be partially true; and if Y is one of the defining features of category Z, it also can be partially true that "Object X is an instance of category Z".

A result of the probabilistic relation between features and categories is that category instances can be quite heterogeneous. That is, members of the same category can vary widely in terms of the attributes they possess. All of these attributes are correlated with category membership, but none are singly necessary and no set is jointly sufficient.

Some instances of a category are more typical than others: these possess relatively more central features.

According to the probabilistic view, categories are not represented by a list of defining features, but rather by a category prototype, or focal instance, which has many features central to category membership (and thus a family resemblance to other category members) but few features central to membership in contrasting categories.

Here's a nice demonstration of family resemblance: "The 5 Browns" -- Desirae, Deondra, Gregory, Melody, and Ryan, siblings from Utah who are classically trained pianists (all at the Julliard School, and all at the same time, yet). They perform individually and in various combinations, most spectacularly as a piano quintet -- meaning, in this instance, not a piano plus string quartet, but rather five pianos.

Here's another one. In 2013, the Democratic candidate for mayor of New York City was Bill de Blasio, a white man who was married to an African-American woman. Several commentators noted that his biracial son, Dante, looks just like him -- except for the skin tone and that glorious Afro.

And here's a very clever one. On the left is a photograph of the philosopher Ludwig Wittgenstein, to whom we owe the concept of family resemblance. On the right is a composite photograph created by superimposing Ludwig's portrait with those of this three sisters (for some reason his brother, Paul, famous for commissioning Ravel's "Piano Concerto for the Left Hand", was left out). Ray Monk, reviewing a book of photographs of Wittgenstein and his (very famous) family, writes: "the eyes, the nose, and the mouth look like they belong to the same person, enabling one to see directly the very strong family resemblances that existed between thee four siblings" ('Looking for Wittgenstein", New York Review of Books, 06/06/2013).

It also follows from the probabilistic view that there are no sharp boundaries between adjacent categories (hence the term fuzzy sets). In other words, the horizontal distinction between a category and its contrast may be very unclear.

Thus, a tomato is a fruit but is usually considered a vegetable (it has only one perceptual attribute of fruits, having seeds; but many functional features of vegetables, such as the circumstances under which it is eaten).
Dolphins and whales are mammals, but are usually (at least informally) considered to be fish: they have few features that are central to mammalhood (they give live birth and nurse their young), but lots of features that are central to fishiness.
Improvisation is a central feature of jazz, but jazz has other characteristic features that, while not definitive, tend to distinguish it from other musical forms. For example, the performer has a certain amount of freedom from the composer's intention, as might be represented in a notated score -- freedom about sonority, phrasing, rhythm, ornamentation, even the notes themselves. There is a lot more group interaction among players, as reflected in such features as "call and response", and the passing of solo "choruses" back and forth between performers.

Artistic Styles as Fuzzy Sets

Art historians and critics list various "styles" of art, such as Impressionism and Abstract Expressionism, but these styles are sometimes hard to define. This is especially the case for the style known as Art Deco, which flourished in the first half of the 20th century (the term itself was coined in the 1960s, during a revival of interest in the style). The difficulty of defining Art Deco was confronted by the curators of Art Deco: 1910-1939, an exhibition mounted by the Victoria & Albert Museum in London, and which was also on view in San Francisco and Boston. As discussed by Ken Johnson, in a review of the Boston show (New York Times, 08/20/04):

...Art Deco quickly proved to be amazingly adaptable and omnivorously appropriative. It could swallow up just about any style from any period and transform it into something cool, jazzy, and contemporary, and it could turn just about any commodity... into an object of desire.... The question is, how many different things can be called Art Deco before the term becomes meaningless?.... Ms. Benton and Mr. Benton [co-curators of the exhibition] opt for inclusiveness. Following an idea of the philosopher Ludwig Wittgenstein, they suggest that we think of Art Deco not as a singular style defined by a particular set of attributes, but as an extended family of styles related in many different ways but not all sharing the same defining characteristics.

A similar tack would probably be useful in defining other artistic styles and periods.

The Exemplar View

The probabilistic or prototype or fuzzy-set view of categories solves many of the problems that beset the classical or proper-set view, but it too has problems. Accordingly, some theorists now favor a third view, which denies that concepts are summary descriptions of category members. According to the exemplar view, concepts consist of lists of their members, with no defining or characteristic features to hold the entire set together. In other words, what holds the instances together is their common membership in the category. It's a little like defining a category by enumeration, but not exactly. The members do have some things in common, according to the exemplar view, but that's not important for categorization.

When we want to know whether an object is a member of a category, the classical view says that we compare the object to a list of defining features; the probabilistic view says that we compare it to the category prototype; the exemplar view says that we compare it to individual category members. Thus, in forming categories, we don't learn prototypes, but rather we learn salient examples.

Teasing apart the probabilistic and the exemplar view turns out to be fiendishly difficult. There are a couple of very clever experiments which appear to support the exemplar view, but most investigators are worried about it because it seems to uneconomical. The compromise position is that we categorize in terms of both prototypes and exemplars. For example, and this is still a hypothesis to be tested, novices in a particular domain categorize in terms of prototypes and experts categorize in terms of exemplars.

Beyond Similarity

The proper-set, prototype, and exemplar views of categorization are all based on a principle of similarity. What members of a category have in common is that they share some features or attributes in common with at least some other member(s) of the same category. However, recently it has been recognized that some categories are defined by theories instead of by similarity.

For example, in one experiment, when subjects were presented with pictures of a white cloud, a grey cloud, and a black cloud, they grouped the grey and black clouds together; but when presented with pictures of white hair, grey hair, and black hair, in which the shades of hair were identical to the shades of cloud, subjects grouped the white and grey hair together. Because the shades were identical in the two cases, grouping could not have been based on similarity of features. Rather, the categories seemed to be defined by a theory of the domain: grey and black clouds signify stormy weather, while white and grey hair signify old age.

What do children, money, insurance papers, photo albums, and pets have in common? Nothing, when viewed in terms of feature similarity. But they are all things that you would take out of your house in case of a fire. The objects listed together are similar to each other in this respect only; in other respects, they are quite different.

These observations tell us that similarity is not necessarily the operative factor in category definition. In some cases, at least, similarity is determined by a theory of the domain in question: there is something about weather that makes grey and black clouds similar, and there is something about aging that makes white and grey hair similar.

The theory-based approach to categorization holds that categories are not represented by either features (defining or characteristic) or exemplars (each with their own individual sets of features). Rather, categories are represented by a theory that explains why category members have the features they have. It is the theory that unites the features, rather than the features that unite the category members.

One way or another, concepts and categories have coherence: there is something that links members together. In classification by similarity, that something is intrinsic to the entities themselves; in classification by theories, that something is imposed by the mind of the thinker.

Prescriptions and Descriptions

"Every act of perception is an act of categorization", in Jerome Bruner's memorable phrase. Concepts permit us to go "beyond the information given" by the stimulus (another quote from Bruner), and make inferences about the features of the objects, and the implications of the events, in our immediate stimulus environment. Categorization is also of interest to psychologists because it illustrates one way in which the way people actually think departs from the way that logicians would have us think. From a strictly logical standpoint, categories are best viewed as proper sets, with lists of defining features showing what the members of various categories have in common. But from a psychological point of view, that's not what categories look like at all. The categories we use are fuzzy sets, united by characteristic features, or sets of exemplars with no unifying summary at all, or sets of objects united only by a theory of some domain or another. It's one thing for logicians to prescribe how we should think. Psychology seeks to describe how we actually do think. And those thought processes, so far as we can tell, depart radically from the prescriptions of normative rationality.

Social Categorization

People can be categorized, too, just like any other objects. Think of how your friends divide their acquaintances up into wonks and nerds, jocks and princesses (I know that you would never stoop to such thing!). And, in fact, categorization is critical to how we perceive and interact with other people. There's a whole literature on social categorization, which you'll encounter if you take an advanced course on Social Cognition, like the one I taught at Berkeley.

There's an old joke that there are two kinds of people: those who say that there are two kinds of people and those who don't. Dwight Garner, a book critic and collector of quotations (see his book, Garner's Quotations: A Modern Miscellany), has collected these quotes along the same lines ("Let's Become More Divided", New York Times, 01/31/2021).

Mankind is divisible into two great classes: hosts and guests. — Max Beerbohm

There are two kinds of people in this world: those who know where their high school yearbook is and those who do not. — Sloane Crosley, “I Was Told There’d Be Cake”

The world is divided into two types: the idle and the anti-idle. The anti-idle I hereby christen ‘botherers.’ — Tom Hodgkinson, “How to Be Idle”

There are two kinds of people in the world, those who leave home, and those who don’t. — Tayari Jones, “An American Marriage”

Either you’re a crunchy person or you’re not. — Marion Cunningham, “The Breakfast Book”

Instead of this absurd division into sexes they ought to class people as static and dynamic. — Evelyn Waugh, “Decline and Fall”

The world, as we know, divides unequally between those who love aspic (not too many) and those who loathe and fear it (most). — Laurie Colwin, “More Home Cooking”

The world is divided into two classes — invalids and nurses. — James McNeill Whistler

For me, all people are divided into two groups — those who laugh, and those who smile. — Vladimir Nabokov, “Think, Write, Speak”

The world is home to two kinds of folk: those who name their horses and those who don’t. — Téa Obreht, “Inland”

Freddie, there are two kinds of people in this world, and you ain’t one of them. — Dolly Parton, in “Rhinestone”

Perhaps there are two kinds of people, those for whom nothingness is no problem, and those for whom it is an insuperable problem. — John Updike, “Self-Consciousness”

There are only two kinds of people, the ones who like sleeping next to the wall, and those who like sleeping next to the people who push them off the bed. — Etgar Keret, “The Bus Driver Who Wanted to Be God”

“Sheep” and “goats” — The two classes of people, according to Hugh Trevor-Roper

“Cats” and “monkeys” — The two human types, according to Henry James

“Cleans” and “Dirties” — The two kinds of writers, according to Saul Bellow

“Hairy” and “Smooth” — The two kinds of playwrights, according to Kenneth Tynan

There are some who can live without wild things and some who cannot. — Aldo Leopold, “A Sand County Almanac”

What he failed to understand was that there were really only two kinds of people: fat ones and thin ones. — Margaret Atwood, “Lady Oracle”

There are two kinds of people in the world: the kind who alphabetize their record collections, and the kind who don’t. — Sarah Vowell, “The Partly Cloudy Patriot”

There are only the pursued, the pursuing, the busy, and the tired. — F. Scott Fitzgerald, “The Great Gatsby”

I divide the world into people who want to control something and those who want to make something. — Henri Cole, in The Paris Review

The world is divided into two types of fishermen: those who catch fish and those who do not. — Jacques Pépin, “The Apprentice”

There truly are two kinds of people: you and everyone else. — Sarah Manguso

There are two kinds of people, and I don’t care much for either of them. — Eric Idle, “Always Look on the Bright Side of Life”

There may be said to be two classes of people in the world; those who constantly divide the people of the world into two classes, and those who do not. Both classes are extremely unpleasant to meet socially. — Robert Benchley, in Vanity Fair

Similarity Judgments

Concepts and categories illustrate two types of judgment.

In induction, we draw a general conclusion from evidence about particulars -- e.g., by abstracting a category from knowledge of its exemplars (dogs and cats are alike in that both are warm-blooded, four-legged, furry mammals).
In deduction, we draw a conclusion about particulars from knowledge about general principles -- e.g., by assigning features to an object on the basis of the category to which it belongs (Mazdas are automobiles; automobiles have wheels; therefore Mazdas have wheels).

Most categorization, a form of deductive inference, proceeds by feature matching. Suppose that an observer wishes to categorize some entity, and that categories are construed as proper sets. In this case, The observer perceives the features of an entity, and retrieves the defining features of some plausible category. If the entity possesses every defining feature, it is assigned to the category in question.

If categories are construed as fuzzy sets, the process is slightly different. In this case, the concept is represented by a category prototype, which merely possesses many features that are central to category membership. Because no object is expected to possess all the features that are characteristic of the category, the entity can be assigned to a category even though it doesn't possess all the features held by the prototype. This makes clear that categorization is an act of judgment: the observer must decide whether an instance is similar enough to the prototype to count as an instance of the category, and whether it is more similar to the prototype of the target category than it is to the prototype of some contrasting category.

If categories are construed as collections of exemplars, with no summary prototype, the process is still judgmental in nature. The entity is compared to one exemplar after another, and the observer must decide whether the entity most closely matches some exemplar of the target category, or an exemplar of some contrasting category.

From a fuzzy-set point of view, the process of categorization begins with a query that specifies a target category -- e.g., "Is a tomato a fruit?". In response to this query, the judge retrieves a mental representation of the category "fruit" -- usually, an apple or some other prototypical instance. The judge then goes on to analyze the features of both the focal entity ("tomato") and the target prototype ("apple"), paying attention to the central features of the concept, and less attention to peripheral features. If there is sufficient overlap between tomato and apple, then tomato is assigned to the category fruit.

Note that the threshold for categorization is like the threshold for detection. It is not absolute, and varies as a function of task demands, expectation, and motivation. If a stranger were to ask you if a tomato were a fruit, you might say "no, it's a vegetable" without thinking. But if the same question were posed by your botany teacher, you might think about it a while.

Except for theory-based categories, all judgments of category membership are essentially judgments of similarity. Similarity judgments take the general form:

A (the subject) is similar to B (the referent).

For example, is a tomato similar to an apple? In categorization, the subject A is the entity, while the referent B is the category. For example, is a tomato (similar to) a fruit? Like categorization, similarity judgments proceed by feature matching. The judge compares the features of the subject A and the referent B, and if there is sufficient overlap, judges A to be similar to B.

However, similarity judgments are a little more complex than this. In particular, they are asymmetrical -- that is, the judgment may differ radically when the subject and the referent are reversed.

Consider, for example, the use of qualifiers like "virtually" or "almost":

Orange-red is almost pure-red.
100 is virtually 103.
An ellipse is almost a circle.

When these statements are reversed, they sound funny. And in fact, experiments show that orange-red is rated as more similar to red than red is to orange-red, etc.! As a rule, judged similarity is greatest when a category prototype served as the referent. But this shouldn't happen, because the features of the two objects remain the same regardless of which occupies the place of subject, and which the place of referent. Therefore, the amount of feature overlap should also remain constant. The fact that similarity judgments change indicates that they are based on more than a merely mechanical counting of features.

Consider, now, comparisons of objects that lie at the same level in a conceptual hierarchy:

North Korea is similar to China.
Canada is similar to the United States.

When these statements are reversed, again they sound funny. And, again, experiments show that North Korea is judged to be more similar to China than China is to North Korea! As a rule, judged similarity is greatest when the more salient entity serves as the referent. Again, this shouldn't happen if similarity judgments are based merely on feature overlap.

Now consider a figure of speech known as simile:

Life is like a play.

What this aphorism means is that in the course of everyday living, people play different roles, and enact scripts in front of audiences. But now consider the reversal of the simile:

A play is like life.

This means something different -- perhaps, that the theater captures the essence of reality. The effect of the simile, as a literary device, depends on the similarity of meaning between subject and referent. But often the original, and its reversal, communicate different meanings.

Or, the reversal is simply meaningless.Consider the following simile:

Life is just a bowl of cherries.

Again, this shouldn't happen if similarity judgments is based merely on feature overlap.

Now consider the phenomenon of transitivity. In mathematics:

If A = B and B = C then A = C; and
If A < B and B < C then A < C.

However, transitivity doesn't necessarily apply to similarity judgments. Consider the following statements:

Jamaica is like Cuba.
Cuba is like Russia.
Therefore, Jamaica is like Russia.

Note that the features of Jamaica, Cuba, and Russia don't change from one statement to the other. However, the wording of the statement focuses attention of different attributes of the entities in question: Jamaica and Cuba are alike with respect to climate, Cuba and Russia are alike with respect to politics. Thus, different features are being matched in each case -- proof positive that in making similarity judgments, people don't just count features; they select them as well.

Finally, consider the problem of similarity and difference. Similarity and difference are opposites. If similarity is a function of the number (or, more precisely, the proportion) of overlapping features, then difference is a function of the number (or proportion) of distinctive features. And since, the number of features is constant, the percentage of distinctive features is given by the percentage of overlapping features, subtracted from 100. Therefore, two objects that are highly similar cannot be highly different. In fact, however, this turns out not to be the case. In some experiments, a query about similarity is compared into a query about difference. In one condition, subjects are posed the following questions:

How similar are China and Japan?
How similar Ceylon and Nepal?

The result is that China is perceived as more similar to Japan than Ceylon is to Nepal.

But in another condition, subjects are posed superficially similar questions:

How different are China and Japan?
How different are Ceylon and Nepal?

The result is that China is perceived as more different from Japan, than Ceylon is from Nepal! This can't happen if similarity is the opposite of difference, and both judgments proceed merely by counting features.

Results such as these led the late Amos Tversky to propose the following model for similarity judgments.

In comparing two objects, the judge determines the features of both subject A and referent B.
Then the judge counts the number of attributes shared by A and B, the number of attributes distinctive to A (thus not shared by B), and the number of attributes distinctive to B (thus not shared by A).

In the ideal case, similarity is a function of shared attributes, so the number of distinctive attributes decreases similarity. But according to the model, the wording of the query leads to a differential attentional focus on the objects' attributes:

judgments of similarity focus attention on shared attributes, while
judgments of difference focus attention on distinctive attributes.

Moreover, the nature of the objects produces a differential weighting of distinctive attributes:

distinctive attributes of the referent are weighed more than those of the subject;
of a salient entity, more than a non-salient one; and
of a category prototype, more than a variant.

Thus, similarity judgments do not depend on mechanical countings, but also on how the judge deploys attention. The is yet another departure from normative rationality, which would have us judge similarity simply by counting the relative number of features that two objects have in common.

Algorithms and Heuristics in Judgment and Decision Making

According to the ideal model, similarity judgments proceed by fact retrieval. The judge retrieves facts about the subject and the referent, counts up shared and distinctive attributes, compares the figures, and outputs a judgment of similarity or difference. If the judge applies this simple rule, he or she will never make a mistake. Such a rule is known as an algorithm: a logical, systematic rule for problem-solving, judgment, and inference (the word is derived from the Arabic al-Khwarizmi, a famous Arab-Islamic mathematician). When a problem is soluble, application of the appropriate algorithm inevitably leads to the correct solution within a finite number of steps.

An Example: Algorithms for Estimation

The algorithmic approach to human judgment may be illustrated by the problem of estimation. Suppose you were a political analyst, and were asked to determine the percentage of Democrats and Republicans among undergraduate students on campus. The simplest solution is simply to poll every single Berkeley student, ask them whether they're a Democrat or Republican or neither, and tally the numbers. But that assumes that headcount, not an estimation, is better, and anyway some people wouldn't tell you their party affiliation, if indeed they had one, because after all it's none of your business. Moreover, even if everyone were willing to divulge their political leanings, it's just too hard to make an exact headcount. So you're going to have to get the answer some other way. Fortunately, we have a set of statistical algorithms that will do the job right: We draw a representative sample of the population, determine the frequency of Democrats and Republicans in this sample, and then extrapolate from this sample to the population at large. There are clear rules for creating unbiased samples, such as random sampling (in which every member of a population has an equal chance of being sampled) and stratified sampling (in which every subgroup of the population is sampled according to its relative size). And we have clear rules about what may be inferred from such samples, such as of the determine the confidence limits around the estimates, how to make the confidence limits as narrow as possible, and the like. If you follow these rules, you won't have to do a headcount, and you're very unlikely to make a mistake.

Estimation and the United States Census

The problem of estimation is confronted by the U.S. government every 10 years. The U.S. Constitution mandates a census of the population every 10 years, in order to provide information for the allocation of Congressional representatives, federal funds, and the like. In fact, the Constitution mandates a headcount ("enumeration") of the population, even though it is pretty clear that a headcount misses a lot of people. It wouldn't matter so much if these misses were random, but there is every reason to think that they are systematic: headcounts don't count people who don't want to be counted, such as undocumented immigrants, and they miss people who don't have addresses, like the homeless, or who move rapidly, such as migrant laborers and other people in poverty. California claims to have a lot of such people inside its borders, and it often complains that the census undercounts Californians, so that California doesn't get its fair share of Congressional representatives or federal dollars. One proposal to solve this problem is to estimate the population through the use of well-established statistical algorithms for sampling and extrapolation.

Pilot tests indicate that such statistical estimation procedures will, in fact, lead to a more accurate estimate of the population than the current headcount. But the Constitution specifically requires enumeration, and there is a legitimate debate over whether estimation is a constitutionally valid procedure for taking the census. It is possible that a switch from enumeration to estimation would actually require a Constitutional amendment. Certainly, at the very least, any estimation procedure would have to withstand scrutiny by the Supreme Court.

There are political barriers to adopting an estimation scheme, as well. For example, states like California with large populations would probably get even more of an advantage than they have now over states like South Dakota with small populations. Moreover, it is possible that giving additional Congressional representatives to states like California and New York, with large populations of ethnic minorities and urban poor, would effectively benefit the relatively "liberal" Democratic party.

Why then, do people depart from the ideal model, and fail to apply the correct algorithm? One problem is that people don't always have all the needed information available to them. In other cases, a judgment is required too soon to permit application of a lengthy, time-consuming algorithm; or the person is not motivated to expend the cognitive effort required to do so. These conditions are collectively known as judgment under uncertainty.

Under these circumstances, when algorithms cannot be applied, people rely on judgment heuristics: shortcuts, or "rules of thumb", that bypass the logical rules of inference. Heuristics permit judgments to be made under conditions of uncertainty. However, they are also applied when an algorithm is available. This is because they tend to be successful, despite the fact that they violate normative rules for inference. Thus, they inject substantial economies into the decision-making process. Unfortunately, reliance on heuristics also increases the probability of making an erroneous judgment or inference.

Studies of the mistaken judgments that people make have revealed a number of heuristic principles, among which are representativeness, availability, simulation, and anchoring and adjustment.

Heuristics are commonly called "rules of thumb" -- a characterization that has been criticized in some quarters as sexist and misogynistic. The argument is that the phrase, "rules of thumb", comes from an 18th-century legal ruling which permitted men to beat their wives, so long as the stick was no bigger than the width of their thumbs. This etymology is incorrect. According to Henry Ansgar Kelly, a linguist, the phrase originated in the 17th century with pretty much the meaning it has today -- as a practical method based on experience ("Rule of Thumb and the Folklaw of the Husband's Stick", Journal of Legal Education, 1994). Sir William Hope, in the Compleat Fencing Master (1692) wrote "What he doth, he doth by rule of Thumb, and not by Art". In 1721, Kelly's Scottish Proverbs, contained the injunction that "No Rule so good as Rule of Thumb".

Darwin, Franklin, and Judgment Under Uncertainty

When Charles Darwin was trying to decide whether to get married, he considered the effects that marriage, and children, might have on his scientific career, he made two lists, one of arguments for and one of arguments against. Surveying the two lists, he then wrote his decision: "Marry, Marry, Marry, QED [for Quid Est Demonstrandum, "that which was to be proved" -- the formula used at the end of geometric proofs]. But we have no idea how he compared the two lists.

A rather more rational technique was invented by Benjamin Franklin, in what the Founding Father called "Prudential Algebra". Franklin prepared two lists, as Darwin was to do, but he also assigned a numerical value to each item, and then eliminated items of equal value. As he wrote, "If I find a Reason pro equal to some two Reasons con, I strike out the three... and thus proceeding I find at length where the Balance lies". A pretty neat idea, except that Franklin didn't say how he assigned the values, or how he chose which "Reason pro" to counterbalance with which "Reason cons".

These two anecdotes are from Farsighted: How We Make the Decisions That Matter the Most (2019), a survey of "decision science" by Steven Johnson, a leading popular-science writer who specializes in the social and behavioral sciences. Reviewing the book in the New Yorker ("Choose Wisely", 01/21/2019), Joshua Rothman described the problem of judgment under uncertainty beautifully:

Ideally, we’d be omniscient and clearheaded. In reality, we make decisions in imperfect conditions that prevent us from thinking things through. This, Johnson explains, is the problem of “bounded rationality.” Choices are constrained by earlier choices; facts go undiscovered, ignored, or misunderstood; decision-makers are compromised by groupthink and by their own fallible minds. The most complex decisions harbor “conflicting objectives” and “undiscovered options,” requiring us to predict future possibilities that can be grasped, confusingly, only at “varied levels of uncertainty.” (The likelihood of marital quarreling must somehow be compared with that of producing a scientific masterwork.) And life’s truly consequential choices, Johnson says, “can’t be understood on a single scale.” Suppose you’re offered two jobs: one at Partners in Health, which brings medical care to the world’s neediest people, and the other at Goldman Sachs. You must consider which option would be most appealing today, later this year, and decades from now; which would be preferable emotionally, financially, and morally; and which is better for you, your family, and society. From this multidimensional matrix, a decision must emerge.

But that still doesn't tell us what values to assign to the pros and cons -- or even how to determine whether they really are pros or cons. Rothman continues:

"Decision theory"... has tended to hold that sound decisions flow from values. Faced with a choice -- should we major in economics or art history? -- we first ask ourselves what we value, then seek to maximize that value.... [T]he promise of decision theory is that there's a formula for everything.... Plug in your values, and the right choice comes out.

But (there's always a but):

In recent decades, some philosophers have grown dissatisfied with decision theory. They point out that it becomes less useful when we're unsure what we care about, or when we anticipate that what we care about might shift.

Which, I suppose, is another way of characterizing judgment under uncertainty.

The Representativeness Heuristic

Representativeness is a heuristic employed in categorization and other judgments of similarity. It is also employed in judging the probability of a forthcoming event, and in judgments of causality. The use of representativeness may be illustrated with the following examples of judgmental error.

The Birth Problem. In hospital birth records, which is the most likely sequence of boys and girls?

BBBBBB
GGGBBB
GBBGBG

Most people choose sequence #3. They reject #1 on the grounds that, if sex is determined randomly, there should be 1/2 boys and 1/2 girls. Moreover, they reject #2 on the grounds that boys and girls should be interspersed. Put another way, sex is determined randomly, and #3 looks more random than the others. But in fact, all these probabilities are equal. Assuming that the probability of any single birth being a boy = 1/2 and the probability of any single birth being a girl = 1/2, then the probabilities associated with each of these sequences is given by (1/2)6 = 0.156.

p(BBBBBB) = .5*.5*.5*.5*.5.*.5 = 0.0156.
p(GGGBBB) = .5*.5*.5*.5*.5.*.5 = 0.0156.
p(GBBGBG) = .5*.5*.5*.5*.5.*.5 = 0.0156.

The Gambler's Fallacy. On a fair roulette wheel, half the numbers are "red" and half the numbers are "black". Which of the following runs is most likely to end with a "red" number?

RBRRBRB_
BBBB_
BBBBBBBBBB_

Most people (except, perhaps, people who have just been exposed to the "birth problem" above!) choose #3. #1 looks random, and unpredictable. People are prepared to accept a short run of B, as in #2, but they expect that, following a long run of B, the random process will have to "straighten out" and produce an R. But the probability of getting a "black" number is 1/2, and the probability of getting a "red" number is also 1/2. Therefore, on any particular spin of the wheel, the probability of its stopping on a "red" number is 1/2,regardless of what has happened before.

p(R| RBRRBRB_) = .5
p(R| BBBB_) = .5
p(R| BBBBBBBBBB_) = .5

A long run of Bs (or Rs, for that matter) doesn't fit our intuitive notion of randomness, and so we expect that an R will come up soon, to make the sequence appear more random.

The gambler's fallacy relates to the birth problem in an interesting way. Some couples, after producing a string of all girls or all boys, decide to try "one more time" to get a child of the opposite sex. Even though the sex of this new child is completely independent of the sexes of their previous children, these couples seem to think that the series will "even itself out", so that their own particular family will look more like a random assortment of girls and boys. Of course, this tactic fails about 50% of the time. An interesting example: Mara Silverman and her husband, after giving birth to three boys, wanted a girl as well. So they tried again, this time using a variant on artificial insemination that was supposed to increase the chances of producing a child of a particular sex. The result was twins, both boys ("Surprise Delivery", by Mara Silverman and Gilliam Silverman, New York Times Magazine, 11/02/03).

A famous example of the Gambler's Fallacy occurred at the Casino in Monte Carlo on August 18, 1913. At one roulette wheel, the ball kept falling on a black number. Each time, increasing numbers of customers wagered increasing amounts of money that the next turn the ball would fall on red. But it actually continued to fall on black, 26 times in a row. This is why the Gambler's Fallacy is also known as the Monte Carlo Fallacy.

Another interesting example of the gambler's fallacy was reported in "The Ethicist", a weekly column by Randy Cohen that appeared in the New York Times Magazine (12/08/01). A woman gambled for hours at a slot machine. When she left the machine, her cousin began playing it, and on her first attempt hit the jackpot. The first woman demanded that her cousin split the proceeds with her 50-50, on the grounds that her unsuccessful plays had laid the foundation for the cousin's later win. But because each play of the machine is independent of all those that had gone on before, the first woman's losses had nothing to do with the second woman's win. The ethicist concluded that the second woman owed nothing to the first.

The Representativeness Heuristic may be defined as a strategy whereby judgments are based on the extent to which an event is similar in essential features to its parent population; or, alternatively, on the extent to which an event reflects the salient features of its presumed generating process. Representativeness is not a bad strategy -- judgment by resemblance is essentially how we make judgments of similarity, and judgments of category membership. However, applying the representativeness heuristic can lead to judgmental errors when we fail to appreciate the base rates of events or the prior odds of the event in question -- e.g., that the probability of a newborn being a boy is 1/2, regardless of who was born earlier in the day.

Justice Gray's categorization of a tomato as a vegetable rather than as a fruit, discussed earlier, illustrates the representativeness heuristic in categorization. From his point of view, tomatoes don't resemble fruits, at least so far as their functions in foods are concerned. Rather, they resemble vegetables. Justice Gray is not focusing on the defining features of the categories in question. Rather, he is focusing on non-necessary features that are highly correlated with category membership, such as whether tomatoes are served with the main course or with dessert. He made his judgment based on family resemblance: tomatoes have features resembling those of other vegetables, but they don't have features resembling those of other fruits. Certainly, the tomato is not a typical fruit; rather, it more closely resembles the prototypical vegetable.

We can also see the representativeness heuristic at play in other debates over public policy, in the general assumption that there should be a resemblance between a cause and its effect.

Many people oppose sex education in the schools, and the availability of birth control for minors, on the assumption that sex education and birth control will cause premature sexual activity, and thus pregnancy and abortion. In fact, any increase in teenage pregnancy occurring since these policies were introduced is caused by an increase in the numbers of teenagers. Sex education and birth control seem to keep teenage pregnancy and abortion down, if not teenage sex itself.
Many people oppose violence on television, movies, and video games because they believe that it causes people to behave violently in society. This idea is intuitively appealing, but in fact it has been very difficult to prove that violence in the media causes violent behavior.
Many people promote music and arts education in the schools because they believe that these programs enhance student achievement in other domains, such as math and science. While music and art education may make people look smarter, there is no evidence of any causal relationship between these programs and academic excellence. In fact, the true situation may well be the other way around -- that schools with a large proportion of intelligent students are forced to offer these students music and arts programs as well as the "basics".

The rise of "e-cigarettes", which provide nicotine as a vapor without cancer-causing tobacco smoke, has raised concerns among some anti-smoking advocates that e-cigarettes will serve as a kind of "gateway drug", leading people to shift from e-cigarettes to the real thing. Also, smoking e-cigarettes just looks very much like smoking real tobacco cigarettes. As a spokesman for the Campaign for Tobacco-Free Kids put it, "If it walks like a duck and it talks like a duck and it sounds like a duck and it looks like a duck, it is a duck" (quoted by Joe Nocera in "Two Cheers for E-cigarettes", New York Times, 12/07/2013).

In fact, studies indicate that most e-cigarette users are trying to quit their tobacco habit; very few people actually make the reverse transition.
And all the evidence indicates that e-cigarettes are less hazardous than the real thing.
OK, you're at risk for becoming addicted to nicotine (or already are), but the real harm from cigarettes comes from inhaling tobacco smoke.
Moreover, research indicates that "vaping" e-cigarettes actually promotes smoking cessation. One thing that ex-smokers miss is that cigarettes gave them something to do with their hands -- it gives them a comforting "look and feel" of smoking, which is what they really want. (Let's be clear about this: it's inhaling tobacco smoke that's harmful and cancer-causing, not inhaling nicotine vapor. And smokeless tobacco and is also going to kill you, through oral cancer. You may think that smoking anything looks ugly, and that "vapers", like smokers, ought to be able to control themselves. But that's a moral judgment, not a scientific one.)

Actually, it's not just the anti-smoking crowd that identifies e-cigarettes with the real thing. The e-cigarette manufacturers do, too. Big Tobacco owns the companies that produce e-cigarettes, and has resisted attempts by the Food and Drug Administration to regulate e-cigarettes as drug-delivery devices. Which is, of course, what they are. Instead, Big Tobacco wants e-cigarettes to be classified as tobacco products. Which, of course, they're not. But they do look like tobacco products. Maybe Big Tobacco hopes that, after all, e-cigarettes will serve as a gateway to the real thing. Or maybe they feel that, one way or the other, there's more money to be made from sales in 7-Eleven's than from sales through pharmacies.

No, I don't vape. Or smoke. The point here is that most objections to vaping are based on the fact that it looks like smoking, regardless of its health effects.

One way or another, these public policy positions are all examples of the intuitive idea that "like causes like". Something that looks like a cigarette is going to cause people to smoke cigarettes. That's what the representativeness heuristic is all about.

The Availability Heuristic

The availability heuristic is employed in judgments of frequency and probability. Here are a couple of examples of availability at work.

The Word Problem. Subjects are asked which is greater: the number of words in English that begin with the letter K,or the number of English words that have K as their third letter. The question is repeated for the letters L,N,R, and V. Most people choose the former, when in fact, for these particular consonants there are more of the latter, with a ratio of roughly 2:3 (based on a count of 3- to 7-letter words by Mayzner & Tresselt, 1965). But it's easier to think of words beginning with a letter than words with that letter in third position -- hence the error.

*Letter*	*Count in 1st Position*	*Count in 3rd Position*
K	152	283
L	717	873
N	478	993
R	494	1658
V	116	397
Total	1957	3310

The Committee Problem. Given a group of 10 people, how many different committees can be formed consisting of two individuals? How many consisting of eight individuals? Most people estimate that there will be more committees of 2 than committees of 8. But of course, every committee of 2 automatically creates another "committee" consisting of the 8 people remaining in the original group of 10: 10 - 2 = 8!

The number of committees of k people that can be formed from a group of N people is given by the binomial coefficient (^N_k).

For committees of 2: (¹⁰₂) = 45.
For committees of 8: (¹⁰₈) = 45.

So, there is exactly the same number of committees of two, and committees of eight. However, if you try this yourself, you will find that it is easy to generate lots of unique combinations of two people:1 and 2,2 and 3, 3 and 4,5 and 6, and so on. By contrast, it is harder to generate unique combinations of eight people:1,2,3,4,5,6,7, and 8,2,3,4,5,6,7,8, and 9,3,4,5,6,7,8,9, and 10,2,3,4,5,6,7,8, and 10, and so on -- you quickly lose track of where you are.

The Fame Problem. Two groups of subjects are presented with a list of 39 names.

List 1 consists of 19 famous women and 20 non-famous men.
List 2 consists of 19 famous men, and 20 non-famous women.

Later, when they are asked to estimate the frequency of famous names on the list, both groups estimate that there were more famous than non-famous names on the list they read. They also recall more famous than non-famous names. Apparently, subjects' estimates were biased by the ease with which they could recall list items.

The availability heuristic is a procedure that bases judgments of frequency and probability on the ease with which instances can be brought to mind. Once again, this is not a bad strategy: more frequent events ought to come more. But it ignores factors other than frequency that can affect fluency -- like the priming effects discussed in the lectures on memory.

"Becoming Famous Overnight". The role of priming in availability, leading to an incorrect judgment of frequency, is illustrated by a variation on the Fame Problem devised by Larry Jacoby and his colleagues. In Jacoby's experiment, subjects were presented with a list of 100 items consisting of 20 non-famous names presented once, and another 20 non-famous names presented four times each.

After a 24-hour retention interval, the subjects made "fame judgments" on a list of 100 names including 40 non-famous names from the list studied the previous day, 20 new non-famous names, and 60 new famous names. The subjects judged most of the new famous names to be famous, and most of the new non-famous names to be non-famous. For the old non-famous names, however, they were more likely to judge those that had been presented once as famous, compared to those that had been presented four times. Apparently, prior presentation of the non-famous names induced a priming-based feeling of familiarity. The subjects had fairly good explicit memory for the names presented four times, so this feeling of familiarity was correctly attributed to the previous study session. However, their explicit memory was poor for the names presented only once, so their feeling of familiarity was falsely attributed to fame. So, by virtue of priming, the feeling of familiarity, and the availability heuristic, 20 non-famous people became famous overnight.

The Simulation Heuristic

The simulation heuristic is related to availability. Like availability, it is used to make estimates of probability; but it can also be used to make judgments of causality.

The Undoing Problem. Imagine that there are two different travelers heading for the airport for two different flights that are scheduled to leave at the same time (as sometimes happens in big airports). They meet on the curb outside their hotel and decide to share a cab to the airport. The cab is caught in a traffic jam, and the driver tells them they should expect to miss their flights. In fact, the travelers get to the airport 30 minutes late. When they arrive at their departure gates:

Traveler A is told that his plane left on time;
Traveler B is told that his plane was delayed, but left 5 minutes ago.

Who is more upset? Most people judge that B is more upset than A, because it is easier to imagine ways he could have saved 5 minutes, than to imagine ways he could have saved 30 minutes. If he had gotten out of bed when his alarm rang, or taken less time in the shower, or hadn't stopped for that doughnut in the hotel lobby, he might have made his plane.

In the simulation heuristic, judgments are based on the ease with which plausible scenarios can be constructed. But, of course, there is no guarantee that the imagined scenario would have occurred.

Simulation is related to availability because both are based on fluency -- on the ease with which things come to mind. While availability is related to the ease of retrieval from memory, simulation is related to ease of imagination.

The simulation heuristic is an important determinant of certain emotional states known as the counterfactual emotions:

Frustration
Regret
Grief
Indignation

Each of these emotional states depends on a comparison between some actual outcome and "what might have been". The easier it is to imagine a plausible alternative scenario, the stronger the emotional reaction to "what might have been".

The American poet John Greenleaf Whittier (1807-1892) wrote, in "Maud Muller" (1856):

God pity them both! and pity us all,
Who vainly the dreams of youth recall;
For of all sad words of tongue or pen,
The saddest are these: "It might have been!"

To which another American poet, Bret Harte (1836-1902), replied, in "Mrs. Judge Jenkins" (1867):

If, of all words of tongue and pen,
The saddest are, "It might have been,"
More sad are these we daily see:
"It is, but hadn't ought to be."

The Anchoring and Adjustment Heuristic

Anchoring and adjustment is used in making estimates of various kinds.

The Extrapolation Problem. Subjects are asked to estimate (not to calculate) the product of the following multiplication problems.

1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 = ____.
8 x 7 x 6 x 5 x 4 x 3 x 2 x 1 = ____.

People who are asked to estimate the ascending sequence generally give a lower result than those who are asked to estimate the descending sequence, even though the two answers are, of course, precisely the same -- 40,320. When you estimate the ascending sequence, the multiplication is easy to do in your head, and the intermediate results rise slowly. When you estimate the descending sequence, the intermediate results rise rapidly, and the arithmetic quickly becomes difficult to do without paper and pencil.

The United Nations Problem. In this experiment, subjects are asked to estimate the percentage of member states of the United Nations that come from the continent of Africa. Different groups of subjects are given different initial estimates as a starting point:

The subjects are first asked to state whether the percentage is higher or lower than the starting point, and then asked to give a final estimate. The finding is that subjects who receive the lower initial estimate give a lower final estimate than those who received the higher initial estimate. This is true even when the initial estimate is given by a purely random process, such as a spin of a roulette wheel!

In the anchoring and adjustment heuristic, people anchor their judgments on whatever information is available (such as an initial value suggested by the formulation of a problem, or by a partial computation) and then adjust their judgments in the appropriate direction. However, subjects often fail to adjust sufficiently from the initial value, so that the initial value serves as an anchor on the final estimate. In this way, final estimates are overwhelmingly influenced by initial estimates.

One real-life situation in which we see anchoring and adjustment at work is in the power of "first impressions". In social perception, our first impressions of a person tend to persist, even after we have come to know the person better. Our first impressions serve as an anchor on our final impressions, because we insufficiently adjust them to take account of later information.

Problem-Solving

Learning, perceiving, and remembering are special cases of problem-solving, where the problem is to predict and control environmental events, to construct a mental representation of the current environment, and to reconstruct a mental representation of the past. But what do we mean by a problem?

All problems have a few elements in common:

Conditions: the information given with the statement of a problem. These conditions include the specification of the initial state at which problem-solving begins.
Goal: the goal state toward which problem-solving is directed.
Operations: procedures and transformations that will get from the initial state to the goal state.
Obstacles: constraints on and impediments to these operations, also specified in the givens. Genuine problems always involve some obstacle: they cannot be solved in a single step. If they could be solved in a single step, then they wouldn't be genuine problems.

Actually, judgment and inference are special cases of problem-solving, in which the information given serves as the conditions; the goal is to make a specific inference or to render a specific judgment; and the algorithms and heuristics are the operations that achieve the goal.

Means-End Analysis

One algorithm for problem solving is known as means-end analysis, or difference reduction:

Form a mental representation of the current state and the goal state (at the beginning of problem solving, the current state is the initial state).
Calculate the difference between them.
Execute some action that reduces this difference.
Repeat steps 1-3 until the current state is identical with the goal state. At this point, the problem is solved.

The power of means-end analysis can be illustrated with the "Tower of Hanoi" problem devised by the French mathematician Edouard Lucas in 1883. According to the legend (which Lucas apparently invented) of the Tower of Hanoi:

In Hanoi there is a temple in which there is a tower of 64 sacred golden disks, trimmed with diamonds. The disks are stacked on top of each other, with the largest on the bottom and the smallest on the top. The monks must move the disks from one location to another, one at a time, such that a larger disk is never placed on top of a smaller disk. Besides the original location, and the new location, there is only one other place in the temple sacred enough to hold the disks. The legend holds that before the monks complete the task, their temple will crumble into dust and the world will end.

A laboratory model of the Tower of Hanoi problem involves three posts and three disks.

Imagine a tower consisting of three disks positioned on a peg such that the largest disk is on the bottom and the smallest disk on the top. There are also two other pegs, which contain no disks at all. The problem is to transfer all three disks from the first peg to the third. You may move only one disk at a time, and you may never put a larger disk on top of a smaller disk.

There are couple of ways to solve this problem: here is one:


(1) move the smallest disk from the left peg to the right peg;	(2) move the medium disk from the left peg to the middle peg;	(3) move the smallest disk from the right peg to the middle peg;

4) move the largest disk from the left peg to the right peg;	(5) move the smallest disk from the middle peg to the left peg;	(6) move the medium disk from the middle peg to the right peg;

(7) finally, move the smallest disk from the left peg to the right peg.

This solution will work no matter how many disks there are.

Returning to the actual legend, how long would it take to complete the full task, moving 64 disks from one location to another?

To start with, moving 64 disks in the manner specified will require

18 quintillion,
446 quadrillion,
744 trillion,
73 billion,
709 million,
551 thousand,
and 615 moves.

Now, you can calculate that there are

60 x 60 x 24 x 365 = 31,536,000 seconds in a year

So at a rate of one move per second, then it would take the monks more than

5 billion years to complete their task.

That's longer than the age of the Earth itself. So, if it takes the monks longer than 1 second to complete each move, which seems likely, then it seems we're safe for a while (after all, the Universe itself is about 14 billion years old).

A somewhat more complicated example of means-end problem solving is now known as the Hobbits and Orcs problem, after characters in the "Lord of the Rings" fantasy trilogy by J.R.R. Tolkein (The Fellowship of the Ring, The Two Towers, and The Return of the King).

This problem was known as "Missionaries and Cannibals" in the 19th century, but many modern authors prefer "Hobbits and Orcs" on grounds of political correctness. The problem is also known as "Jealous Husbands", after a scenario in which the missionaries and cannibals become husbands and wives, and the constraint is that no woman may be with another man unless her husband is present. The problem is very old, having been originally posed by Alcuin, an English monk who died in 804 CE. Alcuin's version was known as "Brothers and Sisters", following a scenario in which a Muslim woman cannot be in the presence of an unrelated man unless a male relative is also present.

On one side of a river are three hobbits and three orcs. Orcs eat hobbits when they outnumber them. The creatures have a boat on their side that is capable of carrying two creatures at a time across the river. The goal is to transport all six creatures to the other side of the river. At no point on either side of the river can orcs outnumber hobbits (or the orcs would eat them).

Employing means-end analysis, you would first create a representation of the current state.
Then you create a representation of the goal state.
Then you compare the initial state with the goal state.

	Then you reduce the difference by moving one hobbit and one orc to the other side.
	Now you recalculate the difference and take another step that will reduce the difference further.
	Then you recalculate again and take another step.
	And recalculate yet again And take still another step.
	And again, and again.

An algorithm like this will eventually solve the problem -- though there's a hidden obstacle, and a trick to solving it, that I'll leave up to you to find and solve for yourself!

In all these cases, the algorithm is a kind of recipe for solving the problem at hand. Like a recipe, it specifies all the necessary ingredients to make a particular dish, the amounts of each, the order in which they are to be combined, and the techniques the cook should use. Like a recipe, if you follow the steps closely, you will achieve the desired end.

An Algorithm for Rubik's Cube

Here's a more familiar puzzle that can be solved by algorithms: Rubik's Cube, invented in 1974 by Erno Rubik, a Hungarian professor of architecture -- students who attend the Budapest Semester in Cognitive Science at Eotvos University (ELTE), in Budapest, get a Rubik's Cube and a swell T-shirt as well as a wonderful introduction to the field). Most of us probably just hammer away at the cube, but Jessica Fridrich, a professor in the Department of Electrical and Computer Engineering at SUNY Binghamton, devised a set of 53 -- count 'em, 53 -- algorithms that form the basis of most attempts at "speedcubing".

In the Fridrich Method, the player first solves the top two layers of the three-layer cube, which is relatively easy (except, apparently, for me!).

Then, to solve the third layer, the player applies one of 40 algorithms in what is called the orientation phase.
Finally, in the permutation phase, the player applies one of 13 algorithms to complete the puzzle.

There's a similar set of algorithms for the 4-layer cube.

Experts like Prof. Fridrich (who came in 2nd in the 2003 world championship -- another person, also using her technique, came in 1st), not only know these algorithms, but are extraordinarily quick to recognize which algorithm should be applied -- as in chess, expertise is closely linked to memory, and the speed with which players can recognize familiar positions. Some speedcubers can solve Rubik's Cubes blindfolded, after inspecting the cube for less than a minute.

If you want the algorithms, you'll have to go to her website. But here's a sample, taken from the New York Times ("Specializing in Problems that Only Seem Impossible to Solve" by Bina Venkataraman, 12/16/2008).

Heuristics in Problem-Solving

In general, there are two types of problems.

In well-defined problems, all the components of a problem are completely specified -- for example,

If 3X + 3 = 12 Then X = ?????

There is only one way to represent this problem, and only one correct solution.

In ill-defined problems, there is uncertainty about one or more of the components -- for example,

If 3X + 3Y = 12 Then X = ?????

Ill-defined problems admit many possible representations, and permit many possible solutions.

While well-defined problems can be solved through the application of some algorithm, there are no algorithms available to solve ill-defined problems. Accordingly, when presented with an ill-defined problem, the problem-solver often applies some problem-solving heuristic.

As with perception, understanding a problem is an act of categorization. The problem-solver makes a judgment about the similarity between the problem at hand and other problems whose solution is known. But categorization in problem-solving shares the difficulties inherent in judgments of similarity.

For example, people often approach problem-solving through the representativeness heuristic, judging the best solution to a problem based on superficial similarity. This can lead to the einstellung phenomenon, or inappropriate problem-solving set. For example, a subject can be given experience with a series of problems, all of which can be solved in the same way. This will naturally induce a problem-solving set. Then, when presented with a problem that is superficially similar to the others, but can't be solved in the same way (or, alternatively, can be solved more easily in a different way), the problem-solver may focus on the inappropriate solution suggested by the earlier problems in the series.

As another example, problem-solvers often rely on the availability heuristic, judging the best solution on the ease with which solutions can be recalled. This can lead to functional fixedness. For example, if the elements of a problem all have familiar functions, but the task involves thinking of novel functions, the subject may focus on the inappropriate uses and thus fail to solve the problem.

In either case, einstellung or functional fixedness, the subject won't be able to solve the problem. In order to get unstuck, he or she must develop a new representation of the problem. This can happen when something occurs to break the subject's set, or when the subject is presented with clues about alternative functions.

Again, heuristics can mislead, but they aren't necessarily bad. Some problems can't be solved by algorithms, and so people must resort to heuristics. Others can be solved by algorithms, but this approach is uneconomical -- in which case, heuristics can be used to conserve time and effort. Again, the only problem is that people do not always appreciate the possibility that heuristics can lead to inappropriate solutions.

Heuristics, Biases, and Noise

The foregoing is often called the heuristics and biases approach because one consequence of relying on judgment heuristics such as representativeness and availability is to bias judgments in various directions -- for example, by promoting stereotyping according to race, ethnicity, or national origin, gender, or sexual orientation. But there's more to judgment error than systematic biases. There's also noise, or what we'd call, in statistical terms, variability or just plain error; or what the psychometricians call unreliability. To give a familiar example: if you step on a bathroom scale twice in rapid succession, you'd expect to get the same reading each time. To the extent that you don't, the scale reading is unreliable, variable, or just plain error-prone -- in a word, noisy. Noise is not the same as bias: if you really weigh 200 pounds, and the scale gives you readings of 199, 194, 191, and 196, it's biased towards underestimation, giving you an average reading of 195 when your true weight is 200; but it's also noisy, because it doesn't give you a reading of 195 each time you step on the scale. Bias can be corrected: if you've got a bad scale, you can simply add or subtract 5 pounds. Noise isn't so easy to correct, because you never know when you're getting your true weight.

Daniel Kahneman, who with Amos Tversky initiated the "heuristics and biases" approach to human judgment and decision-making, claim that noise, so defined, "is a large source of malfunction in society" (Noise: A Flaw in Human Judgment (2021) by Kahneman, Olivier Sibony, and Cass R. Sunstein; summarized by Kahneman et al. in "Bias is a Big Problem, But So Is 'Noise'", New York Times, 05/17/2021; see also "For a Fairer World, It's First Necessary to Cut Through the 'Noise'" by Steven Brill, New York Times Book Review, 05/30/2021). For example, Kahneman et al. cite a study in which real sitting judges who were asked to to judge (sorry) the appropriate sentences in a set of hypothetical cases. The average sentence meted out in these cases was 7 years in prison, but the average difference among the judges, given the same case, was 3.5 years -- an enormous amount of variability. The conclusion was that sentencing was, in part, a lottery, with the outcome given by the random assignment of case to judge. Put another way, "The judicial system is unacceptably noisy". Kahneman et al. cite a number of such studies involving insurance underwriters, radiologists, economic forecasters, and even fingerprint experts.

Kahneman et al. cite a number of sources of judgmental noise, including irrelevant factors such as the weather -- and even, in one notorious study, whether the judge has had lunch befoe he pronounced sentence! Across judges, individual differences in bias toward severity or leniency can introduce noise in sentencing judgments even if the whole group of judgments averages out to zero bias. Individuals also differ in terms of their judgment policy: one judge may take victim impact statements into account, while another may consider a perpetrator's history of poverty or abuse. Kahneman et al. propose that organizations engage in noise audits to identify various types of noise, such as occasion noise, in which judgments vary according to circumstances (like time of day), and system noise, which is built into the organization.

While society has taken steps to reduce if not eliminate bias (many symphony orchestras now hold blind auditions for new members, resulting in an increased representation of women and minorities), so far it has done little to reduce noise in human judgments -- a practice the authors call decision hygiene. Kahneman et al. propose that, as with heuristics and biases, simply being aware of noise is the first step towards reducing its impact. They also suggest that individual judgments be averaged, as in economic forecasting; or that judges follow sentencing guidelines, much as physicians follow formal practice guidelines for diagnosis and treatment; managers can use structured interviews when considering new employees. In many circumstances, practice guidelines may be preferable to strict algorithms, because they allow a decision-maker to take account of individual circumstances (judges hate sentencing guidelines because they're too strict). Still, it's important to keep track of serious departures from established guidelines -- which may indicate systematic bias. Noise is important, Kahneman et al. write, because it leads to error and even injustice.

Of course, the most efficient way to eliminate both bias and noise in human judgment is to take the human out of it -- employing statistical or computational algorithms. As noted in the lectures on Methods and Statistics, , in a variant on the statistical vs. clinical prediction

Hypothesis-Testing

Similar sorts of heuristics and biases are apparent in another aspect of thought,hypothesis testing. People are engaged in hypothesis testing all the time in the ordinary course of everyday living, just as scientists are in the laboratory.

On the basis of some theory, or perhaps on the basis of some initial observation, we formulate a hypothesis.
Then we test the hypothesis by seeking further evidence which bears on it.
If the evidence is consistent with our hypothesis, we retain it.
If the evidence is inconsistent with the hypothesis, we revise or reject it.

Confirmatory and Disconfirmatory Strategies

Essentially, there are two strategies for hypothesis testing: confirmatory and disconfirmatory.

In the confirmatory strategy, we actively seek evidence that is consistent with our hypothesis. Unfortunately, such a strategy cannot prove a hypothesis to be correct: lots of evidence may be consistent with a hypothesis, even though a single contrary example will disprove it. When we seek confirmatory evidence, we avoid contrary evidence.
In the disconfirmatory strategy, we actively seek evidence that is inconsistent with our hypothesis. This may seem counterintuitive, but a disconfirmatory strategy increases the probability of finding the contrary example that will prove our hypothesis incorrect. Accordingly, this is the logically correct way to test hypotheses, and it is the way trained scientists are supposed to go about their work.

Unfortunately, people rarely employ the disconfirmatory strategy, and have a strong tendency toward confirmatory hypothesis-testing.

Confirmatory Hypothesis Testing and the Iraq War

In 2002 and 2003, both the American and the British governments justified initiating a "war of choice" against Iraq in terms of Saddam Hussein's possession, and willingness to use, biological, chemical, and nuclear weapons of mass destruction against his own people and his neighboring countries. For example, in his 2002 presentation to the United Nations Security Council, Secretary of State Colin Powell offered some 29 different pieces of evidence that Saddam possessed WMDs and intended to use them. In the aftermath of the war itself, it proved difficult to locate any of these weapons. Nobody doubts that Saddam possessed and used chemical weapons during the Iran-Iraq war, and following Operation Desert Storm; the question is whether Saddam still possessed such weapons in 2003. This raised the question of how the Bush Administration came to rely on intelligence that proved so obviously wrong.

For example, in his 2003 State of the Union Address to Congress, President George W. Bush cited a report from British intelligence that Saddam had tried to buy "yellowcake" uranium from an unnamed African country -- despite the fact that this report had already been discounted by Ambassador Joseph Wilson, who had been commissioned by the CIA to investigate it. In a review of this incident, President Bush's own Foreign Intelligence Advisory Board concluded that while there was "no deliberate attempt to fabricate" evidence, the White House was so desperate "to grab onto something affirmative" about Iraq's nuclear program that it disregarded evidence that the "yellowcake" claim was at the very least questionable ("Bush's Claim Blamed on Eagerness to Find Weapons" by Walter Pincus, reprinted from the Washington Post, published in the San Francisco Chronicle, 12/24/03).

This is a classic example of confirmatory bias -- looking for evidence to support the hypothesis that Iraq was developing nuclear weapons, and ignoring evidence that it was not. As of the end of 2003, several committees of the House of Representatives and the Senate were investigating other aspects of the intelligence leading up to the Iraq War. It remains to be seen whether other instances of confirmatory bias will be uncovered.

Actually, disconfirmatory hypothesis-testing doesn't come naturally to trained professional scientists, either. In fact, for most of the history of science, scientists thought that they should proceed more or less inductively, by collecting evidence that would prove their theories to be correct. They were persuaded otherwise by Karl Popper (1902-1994), an Austrian-born English philosopher, whose treatise on The Logic of Scientific Discovery (1934) has since become a classic of the philosophy of science. In contrast to common practice, Popper argued that scientists should engage in a strategy of falsification, deducing hypotheses from their theories and then deliberately looking for evidence that would prove their theories false. As Popper put it: "No number of sightings of white swans can prove the theory that all swans are white. The sighting of just one black one may disprove it". Incidentally, Popper was a political philosopher as well as a philosopher of science, and he applied the falsification principle to politics as well. In The Open Society and Its Enemies (1945), he cautioned against political philosophies that claim to possess "certain knowledge". He argued that human institutions, like individual human beings, are fallible, and that they should be open to criticism and new ideas. Popper's argument, written as a critique of Plato, Hegel, and Marx, was also intended to apply to regimes such as the Soviet Union under Stalin, and provides the rationale for the Open Society Institute, founded by the Hungarian-born philanthropist George Soros to aid democratization and economic reforms in the nations of Eastern Europe after the fall of the Iron Curtain. And, of course, it applies to other social institutions as well, including institutional religions -- although Popper himself wasn't particularly open to criticism! At one point, he so enraged another Austro-English philosopher, Ludwig Wittgenstein, that Wittgenstein actually threatened him with a fireplace poker. The episode, which took place in 1946 at a meeting of Wittgenstein's Moral Science Club in a Cambridge University seminar room, in the presence of the philosopher Bertrand Russell (how's that for a seminar!) is the subject of a best-selling book,Wittgenstein's Poker (2002) by journalists David Edmonds and John Eidinow.

Just as disconfirmatory hypothesis-testing doesn't come naturally to scientists, it doesn't come naturally to laypeople either, as the following demonstrations indicate.

The Triads Problem (also known as the Generation Problem). Subjects are given a sequence of three numbers, and are asked to determine the rule that generated the sequence. They are to test their hypotheses about the rule by generating new sequences. The experimenter then gives them feedback as to whether the newly generated triad conforms to the rule. When they are satisfied they have enough evidence, the subjects are to state the rule itself.

Here is how a typical subject behaves when given the initial triad 2, 4, 6.

Test:8, 10, 12. Feedback:Yes, it conforms to the rule.
Test:14, 16, 18. Feedback:Yes.
Test:20, 22, 24. Feedback:Yes.
Hypothesis: The sequence is made up of adjacent even numbers. Feedback:Wrong, that is not the rule.
Test:1, 3, 5. Feedback:Yes.
Hypothesis:Add 2 to the previous number. Feedback:Wrong.
Test:2, 6, 10. Feedback:Yes.
Test:1, 7, 13. Feedback:Yes.
Hypothesis:Add a constant to the previous number. Feedback:Wrong.

-- At which point the subject may well give up, and learn that the generative rule is: Any ascending sequence of three numbers!

Why does it take so long for the typical subject to find the rule? Because he or she employs a confirmatory strategy for hypothesis-testing, seeking evidence that is consistent with the hypothesis. The problem is that the test results may also be consistent with other hypotheses as well. It would have been better to seek evidence which is inconsistent with one's hypothesis. If your hypothesis is "Add 2 to the previous number", and you test with the sequence 2, 5, 8, and learn that this sequence is in fact consistent with the rule, then you know immediately that your hypothesis is wrong.

Because it includes this critical test, the disconfirmatory strategy is the most efficient and logical way to test a hypothesis -- which is why it is taught in courses on scientific methodology. But people strongly prefer the less efficient, misleading confirmatory strategy -- like judgment heuristics, the confirmatory bias in hypothesis testing is another departure from normative rationality.

The Selection Problem (also known as the Card Problem):

Subjects are shown four cards, each of which has a letter on one side and a single digit on the other. The faces showing are A,M,6, and 3. The subjects are then asked which cards they would turn over to test the following hypothesis:

IF there is a vowel on one side, THEN there is an even number on the other.

The typical subject chooses to inspect the A card alone, or else the A card and the 6 card.

Apparently, such subjects are employing the confirmatory bias in hypothesis testing, by checking the reverses of the cards specified in the rule:

The A card should have an even number on the reverse.
The 6 card should have a vowel on the reverse.

But the hypothesis could be right even if the 6 has a consonant on the other side: vowels must have even numbers on the other side, but there is no such restriction on consonants. "IF" does not mean "IF and ONLY IF".

A better strategy for testing this hypothesis would be disconfirmatory in nature:

The A card must have an even number on the reverse.
The 3 card cannot have a vowel on the reverse.

Such a strategy is sometimes called diagnostic, because it simultaneously seeks evidence that is consistent with the hypothesis (i.e., whether A has an even number on the reverse)and inconsistent with the hypothesis (i.e., whether 3 has a vowel on the reverse).

Conditional Reasoning

The selection problem also illustrates the liabilities of conditional reasoning. Consider again the statement of the hypothesis to be tested:

IF there is a vowel on one side of the card,THEN there is an even number on the other.

The hypothesis states an antecedent condition P, that there is a vowel on one side of the card, and the outcome to be expected as a consequence,Q, that there is an even number on the other side of the card. Stated abstractly,

IF P(the antecedent)THEN Q (the consequent),

or, in the notation of formal logic,

P ----> Q.

Conditional reasoning is, in turn, a form of deductive reasoning. In deductive reasoning, we reach a particular conclusion from certain general premises, as in the famous example:

All men are mortal.
Socrates is a man.
Therefore, Socrates is mortal.

Stated in conditional terms:

IF someone is a man,	THEN he is mortal.
P	Q

Given this major premise (P---->Q) and the minor premise that Socrates is a man (P), then we can conclude that Socrates, like all men, is mortal, (Q). This much is straightforward enough, but it turns out that people have a great deal of trouble with such conditional arguments, in which two premises are followed by a conclusion.

In the discussion that follows, we assume throughout that the major premise,P---->Q, is true. Given information about the antecedent P and the consequent Q, what else can we conclude?

Affirming the Antecedent. First, if we know that P is true (that is, that some creature is a man), then, by an argument known as modus ponens (the term comes to us from Aristotle), or affirming the antecedent, we can conclude that Q is true as well -- that, indeed, the creature is mortal.
Denying the Consequent. Second, if we know that Q is not true (that is, that some creature is not) mortal, then by an argument known as modus tollens, or denying the consequent, we can conclude that P is not true either -- that, whatever it is, the creature is not a man.

Empirical studies have shown that people usually have no problem reasoning with modus ponens, or affirming the antecedent. However, people sometimes fail with modus tollens, or denying the antecedent. But there are two other errors in conditional reasoning:

Denying the Antecedent. If we know that P is not true, logically we cannot conclude anything about Q. If a creature is not a man, we cannot know whether it is mortal. Some other creatures, besides men, may also be mortal. So the fact that a creature is not a man does not mean that it is not mortal. Nevertheless, people often make the error of denying the antecedent, concluding from the fact that P is not true that Q is not true either.
Affirming the Consequent. If we know that Q is true, logically we cannot conclude anything about P. If a creature is mortal, we cannot know whether it is a man. Some other creatures, besides men, may also be mortal. So, the fact that a creature is mortal does not mean that it is a man. Nevertheless, people often make the error of affirming the consequent, concluding from the fact that Q is true that P is true as well.

Given these sorts of considerations, we can determine the best way to test a conditional hypothesis of the sort presented in the selection problem:

IF there is a vowel on one side,	THEN there is an even number on the other.
P	Q

We can construct a truth table for conditional reasoning. In abstract terms:

P	Q	IF P THEN Q
True	True	True
True	False	False
False	True	True
False	False	True

Made concrete:

Letter	Number	IF Vowel THEN Even
Vowel	Even	True
Vowel	Odd	False
Consonant	Even	True
Consonant	Odd	True

The truth table shows the best way to test the hypothesis specified in the selection problem:

Test P: by modus ponens, the A card must have an even number on the other side.
Test not Q: by modus tollens, the 3 card must not have a vowel on the other side.

If either test fails, the hypothesis is false. As the truth table makes clear, any other combination of vowels and consonants, and odd and even numbers, would be consistent with the hypothesis, even if the critical test showed that it was actually false.

Why do people perform poorly on the selection problem and other instances of conditional reasoning? In general, researchers have focused on three possibilities:

Matching hypotheses with data: because the hypothesis to be tested in the selection problem focuses on vowels and even numbers, people gravitate toward tests that include these same elements.
Representativeness: the matching strategy may be supported by the representativeness heuristic, which suggests that the solution resembles the problem, and therefore will also involve vowels and even numbers.
Availability: the matching strategy may also be supported by the availability heuristic, by which the statement of the problem, in terms of vowels and even numbers, would lead these elements of a possible solution to come more easily to mind.

But this is not all there is to it, because there are many cases of successful conditional reasoning. Consider the following conditional hypothesis:

IF it rains,THEN they'll cancel the game.

Presented with such a familiar scenario, people have few problems with conditional reasoning.

If told "They canceled the game" and asked "Did it rain?", they are less likely to affirm the consequent.
If told "It did not rain" and asked "Did they cancel the game?", they are less likely to deny the antecedent.
People know that games can be canceled for reasons other than rain.

So, reasoning is affected by the wording of the problem presented to subjects. People may not do all that well with abstract statements involving Ps and Qs, or with unfamiliar content such as cards with letters and numbers on them, but they may do just fine with concrete examples that match the real-world conditions they have encountered in the ordinary course of everyday living.

Conditional Reasoning in Psychotherapy:

Inferring Child Abuse and other Trauma from Adult Behavior

Child sexual abuse (CSA) -- indeed, any form of child abuse -- is a major problem in American society. Often, CSA is caught as it occurs, when the victim tells a trusted adult, or some alert adult catches on that something is wrong. But sometimes, CSA goes undetected.

Some clinicians believe that CSA can be inferred, retrospectively, from observations of certain features of adult personality and behavior: certain mental and behavioral symptoms, certain patterns of sexual behavior, certain patterns of physiological response, etc. When a patient presents these sorts of symptoms, the clinicians may conclude that he or she was probably abused as a child.But this inference is unwarranted.

First, there are empirical problems with inferring CSA from adult behavior.

CSA does not necessarily cause any particular symptoms. There are no symptoms shared in common by all abuse victims.
In particular, there are no pathognomonic symptoms of CSA -- that is, no symptoms that are exclusively associated with CSA.

But even if it were true that CSA caused certain symptoms, the inference of abuse from these symptoms would be unwarranted on logical grounds.

The conditional reasoning associated with inferring abuse from symptoms takes the following form:

If abuse occurred

Then the symptom will be present.

Assume, for illustrative purposes, that the argument is valid (which it is not, because there are no specific symptoms associated with abuse). What then could we conclude logically?

By modus ponens, or denying the antecedent, if we know that abuse did not occur, then we can conclude that the symptom will not be present.
And by modus tollens, denying the consequent, if we know that the symptom is not present, then we can conclude that the abuse did not occur.

But under the same assumptions, it would be a logical fallacy to draw either of the following conclusions:

If the abuse did not occur, we cannot conclude that the symptom should not be present. This would be an example of the fallacy of denying the antecedent. After all, the symptom might occur anyway, for reasons other than abuse.
If the symptom is present, we cannot conclude that the abuse occurred. This would be an example of the fallacy of affirming the consequent. Because the symptom could occur for other reasons, its presence does not necessarily imply that abuse occurred.

Inferring that CSA occurred, based on the presence of a behavioral symptom, is logically valid only in the case of a pathognomonic symptom -- one that is exclusively associated with CSA. Then, we would have conditional reasoning of the following form:

If and only if abuse occurred,

Then the symptom will be present.

But there are no such behavioral symptoms currently known to science. Therefore, inferring abuse from symptoms is logically invalid.

The issue goes beyond the matter of CSA (or other forms of child abuse or even the sexual abuse of adults). Many clinicians also infer that a patient has experienced some form of trauma from the fact that he or she presents some of the symptoms associated with post-traumatic stress disorder (PTSD). The inference takes precisely the same form, and is logically fallacious for the same reason. Given the assumption that trauma causes the symptoms of PTSD,

If trauma then PTSD symptoms,

we cannot conclude that a patient who shows the symptoms of PTSD has been traumatized. This would be another example of affirming the consequent -- a logically invalid inference because the symptoms associated with PTSD can also occur for other reasons.

Implications for Human Rationality

Research on categorization, judgment heuristics, biases in hypothesis testing, and problems in conditional reasoning suggest that human thinking is riddled with errors -- that, as Alexander Pope wrote (in his Essay on Criticism, 1711) "To err is human". These errors, in turn, seem to undermine the assumption, popular in classical philosophy and early cognitive psychology, that the decision-maker is logical and rational. According to this traditional model:

People reason about events by following normative principles of logic.
Their judgments, decisions, and choices are based on a principle of rational self-interest.
Rational self-interest is expressed in the principle of optimality: the desire to maximize gains and minimize losses.
And rational self-interest is also based on utility, by which people seek to optimize in the most efficient manner possible.

This normative model is enshrined in traditional economic theory as the principle of rational choice, an idealized prescription of how judgments and decisions should be made.

Rational Choice on the Titanic and the Lusitania

An interesting test of rational choice theory was offered by Benno Torgler and his colleagues (2009) in a comparison of survival rates for two nautical disasters -- the sinking of the Titanic in 1912, after it struck an iceberg on its maiden voyage across the Atlantic; and the sinking of the Lusitania in 1915, after it was torpedoed by a German U-boat. The two ships were quite comparable in terms of the number of passengers that they carried, and the mix of first-class and other passengers. But the Lusitania sank in only 18 minutes, forcing the passengers to act on their instincts, while the Titanic stayed afloat for three hours, allowing passengers time to reflect, and for cultural norms (like "women and children first") to guide their behavior.

Children had a 14.8% higher chance of surviving on the Titanic, compared to adults; on the Lusitania, children were 5.3% less likely to survive;
Adults accompanying children had a 19.6% higher chance;
Women had a 50% higher probability of surviving on the Titanic, compared to men, while on Lusitania they had a survival rate 1.1% lower;
First-class passengers had a 44% higher probability.

Of course, there were other factors in play. First-class passengers had a higher probability of surviving the Titanic sinking, in part, because "steerage" passengers were prevented from reaching the lifeboat deck.

Nevertheless, this "natural experiment" does seem to show that, in panic situations where people are behaving unthinkingly, the "every man for himself" logic of rational choice theory seems to prevail.

But the same study also shows that, given just a little time to reflect, cultural norms take over. Not to mention social pressure: the the crew of the Titanic following a "women and children first" policy, not to mention staying on board themselves (the ship's band ostensibly continued playing -- including the hymn, "Nearer, My God, to Thee" -- as the ship went down), it would have been harder for adult men to seize places in the lifeboats.

The theory of rational choice has its origins in the work of Jeremy Bentham, a British philosopher in the 18th century, but found its classic modern expression in the work of Johann von Neumann and Oskar Morgenstern, Theory of Games and Economic Behavior (1944), foreshadowed by von Neumann's earlier paper"On the Theory of Parlor-Games" (including poker).

Some authorities say that von Neumann was the inspiration for the character of Dr. Strangelove in the eponymous Stanley Kubrick film. His colleagues at Princeton's Institute for Advanced Study referred to him as one of "The Martians". For an unbelievably engaging biography which details his contributions to economics and other social sciences, including the psychology of judgment, reasoning, choice, and decision-making, sciences, see The Man from the Future by Ananyo Bhattacharya (2022; reviewed in "Fortress of Logic" by David Nirenberg, The Nation, 11/26/22). Also The Martian's Daughter, an autobiography by von Neumann's own daughter, Marina von Neumann Whitman. Nirenberg writes:

Today, game theory and its computational algorithms govern not only our nuclear strategy but also many parts of our working world (Uber, Lyft, and many others), our social lives (Meta, TikTok) and love affairs (Tinder), our access to information (Google), and even our sense of play. Von Neumann's ideas about human psychology provided the founding charter for the algorithmic "gamification" of the world as we know it.

Hastie and Dawes (2001) summarized the principles of rational choice as follows:

Rational choices are based on the decision-maker's current assets -- what he or she has in hand at the time the choice is made.
Rational choices are based on the possible consequences of each option available.
Uncertain consequences are evaluated by probability theory.
Rational choices are, therefore adaptive within the constraints of the probabilities and values associated with each possible consequence.

R. Duncan Luce (in Individual Choice Behavior: A Theoretical Analysis, 1959) added another principle, which is now called Luce's Choice Axiom: the relative probability of choosing one item over another should be independent of other available choices. If, given a choice between Thin Mints and Samoas, someone favors the Thin Mints 2/3 of the time, that ratio should remain constant even when a third choice, such as Do-Si-Dos, is introduced into the scenario.

Rational choice is a prescription for rational thinking by homo economicus -- "man the economist". But it becomes clear very quickly that people frequently depart from this view of how they should make their decisions.

Still, human beings depart systematically from the precepts of expected-utility theory. Hastie and Dawes outline a number of such departures:

Choosing out of habit.
Choosing on the basis of conformity.
Choosing on the basis of authorities.

But that turns out that this is just the beginning. Empirical evidence concerning departures from the prescriptions of rational choice have led to wholesale revisions of our views of economic decision-making, and of decision-making in other areas as well.

And this isn't just true of the ordinary person-on-the-street. Even experts make the same sorts of mistakes as novices. Kahneman and Tversky even collected some of their data concerning judgment errors at meetings of statisticians and economists! More generally, Philip Tetlock (who used to be on the faculty at UCB) analyzed the predictions of experts in political science and economics, and found that they were inaccurate most of the time (in Expert Political Judgment: How Good Is It? How Can We Know?, 2006). With funding from the Federal government, Tetlock and his colleagues (including his spouse Barbara Mellers, who also used to be on the UCB faculty) run the Good Judgment Project, intended to improve skills in forecasting and other probabilistic judgments. The fact that Tetlock and Mellers are enrolling expert decision-makers in this project, including experts drawn from the government intelligence community, tells you just how fallible decision-making can be. Every year, the GJP runs a "forecasting tournament" in which various groups of experts compete to make the most accurate predictions of political and economic events. contest

Philip Tetlock on expert decision-making: "If you were a dart-throwing chimpanzee, the long-term average of your forecast for binary problems would be 50/50. Our best forecasters are beating the intelligence agency analysis by as much as 30%, and fall somewhere between the chimpanzee and the omniscient being. The question is, how close can we get them?" (quoted in Penn Arts & Sciences Magazine, Summer 2014).

Sunk Costs

Consider, first, the following scenario:

Two people attend a concert.
A purchased a regular ticket for $75.
B bought his ticket from a "scalper" for $150.
Both tickets are nonrefundable.
The concert is terrible: the singer is lip-syncing, she's not lip-syncing in time to the music, and at one point she lip-synced a different song than her backup band was playing.
Both feel the urge to leave at intermission.

Which person is more likely to leave?

Most people think that A is more likely to leave than B.B will stay, in an attempt to make the expenditure worthwhile. But this is irrational. Because the tickets are nonrefundable, the money is lost whether he stays or leaves. So he might just as well leave and do something more worthwhile.

The problem of sunk costs was introduced by Tversky & Kahneman (1981), who posed the following problem (based on 1981 ticket-prices):

Two people have decided to see a play.
Tickets cost $10.
As A approaches the ticket booth, he discovers that he has lost a $10 bill.

Will he still buy the ticket?

B buys a ticket, but loses it before he enters the theater.

Will he buy another ticket?

The results illustrated what is known as a preference reversal. Almost everyone says that A will still buy the ticket, but a majority say that B will not. But both are out the same amount of money. It shouldn't matter why.

Arkes and Blumer (1985) documented sunk costs in a study of actual theater-goers. The subjects were subscribers to the theater season at Ohio University. The first 60 customers who approached the ticket window to buy a season's subscription to 10 events they were randomly assigned to one of three groups.

One group was sold the season's subscription at the regular price of $15.
The second group was told that, as part of a special promotion, their tickets would be discounted to $13.
The third group was told that, as part of a (really) special promotion, their tickets would be discounted to $8.

This was in the spring term, for performances over the next academic year. Arkes and Blumer then followed these subscribers, to see if the purchase price had an effect on their actual attendance. It did: for the first half of the season, those who had paid full price were much more likely to attend the scheduled performances. It didn't work out that way for the second half of the season: Over that interval of time, of course, things can come up that cause you to change your plans. But the first half of the season still showed a sunk costs effect. The money was already spent: why should the amount you paid determine whether you actually attend the performances?

The phenomenon of sunk costs is important for rational choice theories because rational choices should be based on current assets -- the situation as it stands now. It doesn't matter what you've already spent, or lost.

Amos Tversky himself was immune to the problem of sunk costs. In The Undoing Project (2016), his dual biography of Kahneman and Tversky, Michael Lewis says that when Tversky (who died in 1996) would go to the movies with his wife (Barbara, also a distinguished psychologist), if he didn't like the movie he'd leave her there, go home and watch Hill Street Blues on television, and then return to the cinema to pick her up. As he put it: "They've got my money, but they're not going to get my time!".

Sunk costs loom large in debates over public policy. From the Vietnam War in the 1960s and 1970s to the wars in Iraq and Afghanistan since the events of September 11, 2001, policy makers have justified more troops, and more money, by referring to the number of lives already sacrificed, and the amount of money already spent. But that blood and treasure is gone: Viewed from the perspective of rational choice theory, policy decisions should consider only current assets, and the value of the possible outcomes.

Another is the Congressional debate over federal funding for the Tennessee-Tombigbee Waterway Project in the 1980s. The Tennessee-Tombigbee Waterway Project was, perhaps, the largest waterway project in all of history, connecting two important rivers, and allowing barge traffic to go from the midsection of the United States to the Gulf of Mexico. It was originally estimated to cost about $323 million in 1970, when the project was begun. But, as so often happens, the project was plagued by delays and cost overruns, and became known as a classic example of pork-barrel spending in Congress. In 1981, after $1.1 billion had already been spent, on a project originally estimated to cost $323 million, there was a serious proposal to abandon the project. And congressional representatives from the states involved, both Republican and Democrat, argued that to terminate the project before it was completed would be an unconscionable waste of public money. So much money had been spent, the project just had to be completed. In the end, the project was completed, at a cost of $2 billion dollars, a 600% cost overrun, almost 600%. And the fact of the matter is, that commercial barge traffic on the waterway never approached what had been projected. Its proponents argued that, once the waterway was opened, it would flow at about 27 million tons of cargo in just its first year of operation. But in fact cargo traffic on the Tennessee-Tombigbee Waterway has never exceeded more than about eight billion tons. Whether that reduced tonnage is enough economic benefit to justify the expense of the Waterway -- that's an important question. But the worst reason to continue construction of a project, or to continue any project at all is that you've already spent a great amount of money on it. Those expenditures are in the past, that money's gone, rational choices are made based on the decision maker's current assets.

Expected Value Theory

The assumptions of rational choice are commonly reflected in expected value theory, the standard framework for decision-making according to neoclassical economic theory.

What is Neoclassical Economics?

Classical free-market economic theory emphasizes principles of rational choice because the plausible assumption that choices are made rationally simplifies economic analysis. Consider, for example, two nearby convenience stores. A sells milk for $1 per quart, while B sells the same quart of milk for $1.50. Who in their right mind would buy their milk from B? And once B notices that nobody's buying his milk, either he'll lower his price or stop stocking the item in his inventory. There's no reason, in theory, to think that larger markets don't work in exactly the same way.

Economics is in some sense a behavioral science, but in the neoclassical view, the behavior in question is not human behavior, or even the behavior of a human social institution. It is the behavior of a system. According to Adam Smith (1723-1790), whose Wealth of Nations is the bible of the neoclassical school, economic forces are natural and impersonal, and economic trends are guided by an "invisible hand" of human rational self-interest.

Classical economic theory holds that free markets are self-regulating, with wages, prices, savings, and investments all spontaneously adjusting to each other. Free markets are also self-optimizing, and reach equilibrium under conditions of full employment.

Neoclassical economics maintains, however, that a free market can reach equilibrium at any level of employment. For example, during the Great Depression that began in 1929, savings were not turned into investments because high unemployment meant that there was no demand for consumer goods. Accordingly, "free" markets occasionally need an outside stimulus, in the form of public investment, which operates through a "multiplier effect". It is the recognition of this need for an outside stimulus that distinguishes neoclassical economics from the "classical"laissez-faire capitalism of thinkers like Adam Smith, who argued that free markets were guided by the "invisible hand" of human rationality.

Largely the invention of the British economist John Maynard Keynes, neoclassical theory is sometimes called Keynesian economic theory (see "The End of Laissez-Faire, 1926, and The General Theory of Economics, Employment, and Money, 1936). Keynes argued that psychological and sociocultural considerations could help understand how even "rational" economies fail. For example, while classical economic theory assumes that the purpose of having money is to spend it, people have a tendency to hoard cash -- especially in times of economic downturn, which is precisely when the economy needs it most! When an economic system is stuck, as it was to be in the Great Depression of the 1930s, Keynes argued that no purpose was served by waiting for the economy to right itself in the long run (as Adam Smith would have suggested) because "In the long run we are all dead" (Tract on Monetary Reform, 1923). If we want the kind of life that a thriving economy makes possible, Keynes argued, people have to run the economy, instead of letting the economy run them. Keynes believed that public intervention in "free markets" was necessary because economic systems are not governed by economic principles. In this way, Keynesian economic theory laid the foundation for modern "behavioral economics", which injects psychological principles into our understanding of economic behavior. The core of the Keynesian view is that economic decision making takes place in an environment of inevitable uncertainty which cannot be reduced (rationalized) as "measurable risk".

Keynes gets a bad reputation from neoclassical economists, like Milton Friedman, who favor the "efficient markets hypothesis" that something like Adam Smith's "invisible hand" guides the market, and that, left to their own devices, decision-makers will always choose rationally. Someone once asked him 'If you're so smart, why aren't you rich?" -- to which Keynes casually replied, "As a matter of fact, I am rich!". See "The Remedist", a profile of Keynes published in the New York Times Magazine (12/14/2008) by Robert Skidelsky, author of John Maynard Keynes: 1883-1946: Economist, Philosopher, Statesman.

According to expected value theory, people chose the option that has the highest value (usually expressed in terms of monetary gains or losses). Value, in turn, is calculated as follows:

Value = outcome * probability of that outcome.

For example, consider a person who is offered a choice between two gambles:

A 1 in 3 chance of winning $75 (expected value = 1/3 * $75 = $25); and
B 1 in 2 chance of winning $40 (expected value = 1/2 * $40 = $20).

According to expected value theory, a rational decision-maker should chose the option which affords the greatest gain (or, alternatively, the least loss). In this case, he or she would choose A, the 1/3 chance of winning $75. Expected value theory is a kind of algorithm for making decisions and choices: calculate the expected values, and choose the option that affords the highest gain or the least loss.

But people's behavior routinely violates expected value theory. The clearest example is in the lottery. If you buy a lottery ticket for $1, and have a 1 in 1 million chance of winning $1,000,000, the expected value of the gamble is $1 -- an even break. But lotteries have worse odds than that, meaning that people buy lottery tickets where the expected value of the gamble is less than the amount of the bet! Why do people do this? For most people, $1 has relatively little value, and they think that it is worth investing for a chance, however small, to make a big gain. This is not bad reasoning, but it is completely irrational from the point of view of rational choice theory.

There are other common departures from expected value theory. For example, in the choice described above, many people will choose the bet with the highest odds -- thus, choosing the gamble with the lower expected value. But at least they have a better chance of winning something. In the context of gains, people are risk averse -- they prefer a sure thing, and 1/2 odds are closer to a sure thing than 1/3 odds are.

What's a Life Worth?

Expected-value theory, and other judgment theories like it, are often off-putting because they appear to attach a dollar amount to every decision and choice. In part, that's just a convenience: you've got to have something to measure, and dollars are as good as anything else. Somewhere there's a problem that takes the form of, "How many eggs would you trde for this bicycle seat?. You just need a measure, and in this case eggs are probably as good as dollars -- but then again, everybody's familiar with the value of dollars, and not so much with the value of eggs.

One of the thorniest problems in economics is: What value do we place on a human life? In one sense, of course, all life is precious. But for those who support the death penalty, some lives are more precious than others. And in tort law, such as cases of wrongful death, money damages are awarded based on the value of the life that was lost. This is different from criminal law -- which, at least ostensibly, values all lives the same.

For example, if a person is killed in a traffic accident, his or her survivors are entitled to compensation for loss of income, pain and suffering, any medical treatments related to the accident, and funeral costs. This is an example of compensatory justice, which aims to restore people to the position they would have been in, had the death not occurred.
As another example, after the terrorist attacks of September 11, 2001, the September 11th Victim Compensation Board retained Kenneth Feinberg, a lawyer who had an excellent reputation as an arbitrator, to determine compensation for the victims (or their families) based on the economic, dependent, and noneconomic value of each life lost. Feinberg set the noneconomic value (essentially, pain and suffering) at $250,000; dependent value was set at $100,000 for each spouse and another $100,000 for each dependent. The economic value was determined by the victim's age and anticipated future earnings at the time of death; because many of the victims earned multimillion-dollar incomes in the financial industry, Feinberg set a tentative cap of $231,000 on annual income, in order to avoid extremely large payments, which might raise concerns about economic inequality. He ended up distributing approximately $7 billion to more than 5,000 victims and families.

Feinberg was so good at this job that he was subsequently asked to fulfill the same function for the BP Deepwater Horizon Disaster Victim Compensation Fund;
and the fund established for victims of the 2013 Boston Marathon bombings;
and the fund established for victims of the Boeing 737Max crash;
and several other similar jobs.

In cases of government regulation, the value of a human life is generally set at $10 million. This figure is arrived at in various ways. Assume, for example, that the government is considering a new regulation that would save 1 out of every 100,000 lives per year.

In stated preference studies, people are asked how much they would be willing to pay to eliminate a mortality risk of 1/100,000 deaths per year. The average answer comes out to about $100 -- which, multiplied by the 100,000 people who would pay to save that one life, yields $10 million.
In revealed preference studies, economists study things like the wage and salary premiums which people actually receive for a jobs with high morality risk -- say,1/100,000 deaths per year. Again, the average answer comes out to about $100, which also yields close to the $10 million figure.

How such valuations are made is discussed in Ultimate Price: The Value We Place on Life by Howard Steven Friedman (2021), reviewed by Cass Sunstein in "What Price Is Right?", New York Review of Books, 07/09/2021 (from which the examples and quotes in this box are taken). Friedman criticizes the kinds of calculations described above, arguing that "The courts' reliance on economic losses suggests that a person's life is limited to a simple cash flow analysis", which is "inconsistent with basic principles of fairness and human dignity". Instead, he argues that we should "Value all lives the same" -- $10 million, or whatever.

But Sunstein, in his review, argues that the inequality of which Friedman complains isn't as unequal as it seems. He writes:

To be sure, the consequence is to give more to a family that has lost a forty-year-old breadwinner than to a family that has lost a seventy-year-old breadwinner. But that does not discriminate against old people; it simply reflects the goal of compensatory justice, which is o restore plaintiffs, to the extent possible, to the status quo ante. There's nothing unfair, or discriminatory, about compensating people for teir lost income. You might well also want to redistribute income from rish to poor, but wrongful death actions in court are not a sensible place for trying to do that....

...Friedman is right to raise questions about whether $10 million is the correct number; among other things, it does not even try to capture the profound impact on family and friends of losing someone they love. But he offers little guidance about what would be a better numbebr, or even about how we could go about identifying it....

Feinberg, for his part, wrote a long letter to the editor addressing questions about the September 11th Fund raised in both Friedman's book and Sunstein's review ("What's It Worth?", New York Review of Books, 09/23/2021). He points out that 97% of eligible families joined the fund he administered, forgoing the opportunity to pursue individual litigation -- which is pretty good evidence that his system actually did promote "fairness and human dignity". He further argued that his methods could serve as precedent fur future attempts to value the loss of individual human lives.

Expected Utility Theory

Problems like these indicate that expected value theory, however good it might appear from the point of view of normative rationality. These and other problems with expected value theory led to a new development, the expected utility theory of Von Neuman and Morgenstern (1944). According to expected utility theory, decisions are based on a choice among utilities, not values. Utilities, in turn, are described in terms of personal value, which may be different from monetary value. Thus, suppose that a person is presented with the choice between two gambles described above, but only needs $10 to go to the movies. The bet with the smaller value has the greater utility, because he only needs $10. The bet with the larger value has surplus value, but no greater utility, and it is deemed not worth the risk. In general, expected utility theory assumes that people are averse to risk. However, this is not necessarily the case. If the person needs $30 to buy some medicine for a sick pet, the riskier gamble may be more attractive, because if he wins he is closer to the amount of money he needs: it has greater utility. Utilities cannot be described objectively: they are subjective, and idiosyncratic to the individual decision-maker. Choice is not simply a matter of a blind calculation of values, but rather can only be understood by taking into account the circumstances of the individual decision-maker -- his or her goals, motives, and the like.

Expected utility theory is an important milestone in the psychology of judgment and decision making, because it injects psychological considerations into the process.

                                                                             Gambling and Life

For interesting excursions into the mind of homo economicus, and probabilistic thinking in general, see two books by Nate Silver, the famous elections prognosticator introduced in the lectures on Methods and Statistics. Silver, remember, is famous for successfully predicting the outcomes of elections based entirely on poll results. A rabid basketball fan, who uses a version of sabremetrics to analyze player performance, Silver is also a world-class poker player, who at one time ranked in the top 300 players listed in the Global Poker Index.

In The Signal and the Noise (2012), Silver wrote about the use of probability and statistics to predict real-world outcomes.

In 2016, he bet (based on expected-value theory) $100 that Donald Trump would win the Presidency in 2016, and won.

At the time, the betting markets were giving odds of 5:1 against Trump, based on a 17% chance that he would win.

Silver's own model gave Trump a 29% chance of winning.

Using Silver's results, the expected value of a $100 bet was $145 ($500 x 0.29).

And the expected value of a loss was $71 ($100 x 0.71).

$145-$71 = $74, which is the expected value of the gamble. If you've got $100 to spare, you might just as well make the bet.

Note that the logic holds even if the expected value of the gamble were only $1 -- so long as you can afford to lose $100!

Note, too, that it doesn't matter who Silver wanted to win. And it doesn't matter who Silver expected to win, either. He knew what the odds were, and his own calculation suggested that Trump's chances of winning were far less than 50-50 (for the record, he predicted that Clinton would win). But that doesn't matter in terms of betting. All that matters is the expected value of the gamble. Betting on Trump yielded a higher expected gain than the corresponding expected loss.

And the expected value of a Trump victory was also greater than that of a Clinton victory.

At 5:1 odds favoring Clinton, a $100 bet would yield a net gain of only $20, so the expected value of betting on a Clinton victory was (0.71 x $20) or $14.20.

Factor in the expected value of the loss ($100 x .29 = $29), and the expected value of the entire gamble is -$14.80. Don't do it.

In 2024, Silver published On the Edge: The Art of Risking Everything (reviewed by Idrees Kahloon in "The Power of Thinking Like a Poker Player", New Yorker, 09/09/2024), exploring a wide range of applications of expected-value theory, Bayes's Theorem, and the like.

Hedge funds do exactly what Silver did with Trump -- they hedge their bets by betting on predicted losers.

Venture capital doesn't invest in large, established firms like Microsoft and Facebook (although they might have done way back when those firms themselves were start-ups). Instead, they invest in small start-up companies that might be bought by Microsoft and Facebook.

According to Silver, poker is a model for life, which involves (these are Kahloon's words, with my interjections)) "properly calculating risk in the face of very imperfect information [which is what expected-value and expected-utility theories do] -- and recalibrating it as you gain more information [which is what Bayes's Theory does].

Kahloon further notes that expected-utility theory provides as methodology for the moral philosophy known as utilitarianism -- "that what is morally right can be determined by whatever maximizes the greatest good (or utility)".

But utility theory also has its problems. For one thing, it assumes that utilities are independent of the probabilities attached to outcomes. If I need $10 to go to the movies, that's what I need, regardless of whether the odds attached to the bet are 1 in 2 or 1 in 4 or 1 in 10. Same goes if I need $30 for medicine for my pet. This is because the individual's subjective interests are assumed to remain constant. Put another way: if individual differences in choice behavior are determined by individual differences in utilities, then utilities should remain constant despite changes in the probabilities attached to outcomes.

The Ellsberg Paradox and the Vietnam War

One of the problems with utility theory is illustrated by the "Ellsberg Paradox", discovered by Daniel Ellsberg in his economics PhD dissertation, completed at Harvard in 1962 (reissued in 2001 as Risk, Ambiguity, and Decision; Garland Press, 2001). Ellsberg, who subsequently served as an intelligence officer in the Marine Corps and worked as a defense policy analyst in the Pentagon and at the RAND corporation during the Vietnam War, became famous in 1971 for releasing the secret "Pentagon Papers" to the press.President Nixon's attempts to undermine Ellsberg's credibility led to the formation of the secret "plumbers unit" (to stop leaks) in the White House, and eventually to the Watergate scandal and Nixon's resignation under the cloud of impeachment.

In one version of the paradox, subjects are presented with two jars, A and B, each containing a mix of red and blue marbles. They are required to bet on the probability that a blue marble will be drawn randomly from one of the bins, and allowed to choose the bin from which the marble will be drawn. Further, the subjects are informed that Jar A contains 50 red and 50 blue balls, but is given no information about the distribution of colors in Jar B. The finding is that subjects generally prefer that the marble be drawn from Jar A rather than Jar B.

Such a preference is "irrational" according to expected utility theory, which holds that utilities are determined by the objective probabilities attached to various outcomes. The objective probability of randomly drawing a blue marble from Jar A is 50/100 or 0.5. But, in the absence of any other information, the objective probability of the same draw from Jar B is also 0.5 -- subjects have no reason to think otherwise. Yet they'll prefer A to B, and bet more on the draw from A than on the draw from B.

But it gets better. In another version of the Ellsberg paradox, the same person who was asked to select a jar from which a blue marble would be drawn, is now asked to select a jar from which a red marble will be drawn. Assume that the subject has already chosen Jar A over Jar B for the BLUE case, as most subjects do. Logically, that must mean that he or she thinks that Jar B is unlikely to contain more blue marbles than red marbles -- otherwise, the subject would have chosen B rather than A. Logically, then, the subject must also believe that Jar B is likely to contain more red marbles than blue marbles. However, subjects searching for red marbles still prefer Jar A to Jar B!

The full Ellsberg paradox, involving a choice for red followed by a choice for blue, reveals that what subjects are trying to maximize is not likelihood but rather uncertainty. With Jar A, you know precisely what the distribution of red and blue marbles is. Jar B may have the same distribution, or it may have a distribution biased toward red or biased toward blue. You just don't know. And you don't like it.

Ellsberg's eponymous paradox actually laid the foundation for his later actions in the Pentagon Papers case. For Ellsberg, his act was one of patriotism, not disloyalty or treason. He was an economist, and true to his grounding in the principles of rational choice, he believed that people can't make proper decisions unless they are given proper information; but that if they are given all the information they need, they will make the best decision. As an aide in Vietnam and the Pentagon, he knew that the Vietnam War was going badly, but he also believed that President Johnson had not gotten the right information from his military advisors -- and that President Nixon wasn't getting the right information from his advisors, either. If Johnson had known the full story, Ellsberg believed, he would have changed course in Vietnam. If Nixon only knew, then he would change course, as Johnson should have, and bring the war to an end.

But when Ellsberg gained access to the Pentagon Papers, at RAND, he quickly learned that his beliefs were wrong. President Johnson had been getting good information about Vietnam, but he kept making bad decisions anyway. Presumably, the same thing was happening with Nixon. What was happening, apparently, was that Johnson and Nixon were both blinded by ideological beliefs and political commitments -- such as the "domino theory", the desire to avoid a humiliating military defeat, concern about whether a permanent division of Vietnam would replicate the unsatisfactory situation in Korea (or, for that matter, Germany), and a belief that the application of more and more military might would force the North Vietnamese to surrender.

In Ellsberg's view, the only people who weren't fully informed about Vietnam were the American people themselves, who had been systematically lied to by government officials (including successive Presidents). Ever the theorist of rational choice, Ellsberg decided to make the papers public -- in the hope that the American electorate, given proper information, would make the right decision and force a change in government policy. Ellsberg tells the whole story in his memoir,Secrets: A Memoir of Vietnam and the Pentagon Papers (Viking, 2002). For Ellsberg, "secrets" and lies -- bad information, or no information at all -- are responsible for bad public policy.

Ellsberg's release of the Pentagon Papers may indeed have shortened the Vietnam War, but there is another lesson in this episode about how people reason to decisions and choices. Johnson and Nixon didn't base their decisions on poor (bad or missing) information. They based their decisions on a set of ideological beliefs and policy commitments. Given their stance, the war must continue. (However, neither Johnson nor Nixon believed that victory in Vietnam was so important that the military should be allowed free reign in Vietnam, or that nuclear weapons should be used against the North. Full-scale escalation might well have brought about some sort of victory in Vietnam. But, for domestic and international political considerations, they simply were unwilling to go that far. Thus, the military's constant complaint that it was being "hamstrung" by the government.)

Ellsberg didn't share their stance, and so he came to a different decision -- based on essentially the same information as Johnson and Nixon had. As Nicholas Lehmann put it in his review of Ellsberg's memoir ("Paper Tiger"New Yorker, 11/04/02):

American Vietnam policy mystified and enraged Ellsberg because its goal, preventing Vietnam from becoming a Communist-governed country, was much less valuable to him than it was to Congress, the public, or the various Presidents during the years when the American commitment was being ratcheted up to the level of full-scale war....

In the end, the Vietnam War can't be reduced to a problem of miscalculated probability. It is of the utmost importance right now [as the United States was contemplating going to war against Iraq] that we understand that the decision to go to war is ideological, not informational: the reason people disagree vehemently about war in Iraq is not that the facts on the ground or the true prospects of American military success are being kept hidden. What they disagree about is under what conditions and by what means the United States should try to affect the governance of other countries. It's not what we know but what we believe in that makes all the difference.

As Kahneman and Tversky might put it, Ellsberg and Nixon came to different decisions, not because they had different informational inputs into their choices, but because of the way their choices were framed.

Anyway, the Ellsberg paradox shows that psychological considerations affect risky choices. It's not the probability that matters, but the level of certainty (or uncertainty) attached to that probability. With Jar A, subjects are certain that the probability of drawing a blue marble is 0.5. With Jar B, the probability might be more, but it also might be less. In mathematical terms, the distribution of probabilities is known for Jar A, but unknown for Jar B. It's not the objective probabilities that matter -- it's the subjective probabilities, which are affected by the level of certainty attached to them.

Phenomena such as the Ellsberg Paradox suggested to economists and other financial theorists that economic behavior could not be predicted on the basis of abstract economic laws alone, as if the people making these choices didn't matter.

In this way, the Ellsberg Paradox laid the foundation for the rise of a new subfield,behavioral economics, which takes account of psychological theories of reasoning, choice, and decision-making. Behavioral economics is the cutting edge of economic theory these days, and its proponents are reaping the major awards in the field, including the Nobel Prize. It is essentially psychological in nature, although economists are not necessarily eager to concede this point!

The Allais Paradox

Problems with expected-utility theory began to crop up almost immediately, first in the form of the Allais paradox uncovered by Maurice Allais (1911-2010, and pronounced al-LAY), a French economist who won the Nobel Prize in Economics in 1988.

Imagine that you are offered the choice of the following gambles:

100% chance of winning $1 million

89% chance of winning $1 million;

1% chance of winning nothing;

10% chance of winning $5 million.

Most people will, rationally, choose A, because A involves a sure gain.

But now imagine that you are offered the choice of the following gambles:

C	89% chance of winning nothing; 11% chance of winning $1 million.
D	90% chance of winning nothing; 10% chance of winning $5 million.

Most people will, rationally choose D, whose expected outcome is much greater ($500,000 vs. $110,000).

And that's as predicted by expected utility theory. What's not predicted is that the same individual, who rationally chose A, would also choose D. Let's see why.

First, you should realize that the sure thing in 1A, decomposes into an 89% probability of winning $1 million plus an 11% probability of winning $1 million.

Therefore, both A and B choices offer, to begin with, an 89% chance of winning $1 million.
Disregarding these equal outcomes, then,B boils down to:

1% probability of winning nothing vs.
10% probability of winning $5 million.

Applying the same logic,D decomposes into an 89% probability of winning nothing plus a 1% probability of winning nothing.

Therefore, both C and D offer, to begin with, an 89% probability of winning nothing.
Disregarding these equal outcomes, then D boils down to:

1% probability of winning nothing vs.
10% probability of winning $5 million.

So here's the paradox. Choice A is identical to Choice C, and Choice B is identical to Choice D. An individual who, by reason of the utilities involved, prefers A should also prefer C.

Actually, this only happens when the outcomes are large -- expressed in the millions of dollars. When the outcomes are more reasonable, expressed in tens of dollars, things go fine. So, the Allais paradox reveals that people have a lot of difficulty rationally choosing between unfamiliar quantities. Perhaps the emotion associated with millions of dollars gets in the way?

Preference Reversals

The Allais paradox is one of a whole class of paradoxes involving preference reversals. Expected-utility theory holds that utilities should remain constant despite changes in the probabilities attached to outcomes. But this doesn't always hold true. Consider again the choice offered earlier:

For example, consider a person who is offered a choice between two gambles:

A 1 in 3 chance of winning $75 (expected value = 1/3 * $75 = $35); and
B 1 in 2 chance of winning $40 (expected value = 1/2 * $40 = $20).

Under expected-value theory, the person would be expected to choose A, because it has the higher expected value. But under subjective utility theory, the person might well choose B, because it has the higher probability attached to a positive outcome.

Now consider the following choice:

C 1/5 chance of winning $40 (expected value of $8); and
D 1/4 chance of winning $30 (expected value of $7.50.

Now assume, for the purposes of this example, that a person wishes to maximize expected value: he or she will choose C. But now let us double the probabilities attached to these outcomes:

E 2/5 chance of winning $40 (expected value of $16); and
F 2/4 chance of winning $30 (expected value of $15).

A person who wishes to maximize value (the normatively rational choice) will choose E, even though it has the lesser odds, because it has the greatest value. But now let us double the probabilities again:

G 4/5 chance of winning $40 (expected value of $32); and
H 4/4 chance of winning $30 (expected value of $30).

Under these circumstances, individuals who preferred gambles C and E will now shift to H. People in general are risk-averse, preferring sure gains to a risk, even if these gains do not maximize value. This is known as the certainty effect, and it compromises both expected-value and expected-utility theories, which assert that choices depend on values or utilities, regardless of the probabilities involved.

The certainty effect, in turn, is a special case of a general class of judgment behaviors known as preference reversals. Consider the following gambles:

I 11/12 chance of winning $12 (expected value = +$11) and1/12 chance of losing $24 (expected value = -$2);
J 2/12 chance of winning $79 (expected value = +$13) and10/12 chance of losing $5 (expected value = -$5).

When asked, people prefer these two gambles about equally, with perhaps a slight preference for I. But now, when asked how much they would sell a gamble for (where selling price is another index of preference), people generally ask for a higher price for J than for I -- even those who initially preferred I to J. Thus, preference is reversed.

Preference for a gamble and the selling price of a gamble are two alternative procedures for measuring choice. But in the preference reversal studies, people generally choose the gamble with the higher probability of winning, regardless of its value -- in the example above, gamble I, which has an 11/12 chance of winning something; but they will set a higher price for the gamble with the higher outcome, regardless of the probability attached -- in the example above, gamble J, which has a projected outcome of $79. Preference reversals are normatively irrational, because they illustrate the "Law of Contradiction", which states that reasoning procedures that reach contradictory conclusions from the same evidence are irrational. If you prefer I over J, then you shouldn't sell J for a higher price than I.

Choice behavior also frequently violates another principle of normative rationality, that of transitivity. Given three choices,A,B, and C, if you prefer A >B and B >C, then you should also prefer A >C. But this doesn't always occur.

Consider the following examples, which I owe to Prof. Reid Hastie of the University of Chicago School of Business.

Imagine that you have to choose a date for Friday night:

A is very intelligent, decent-looking, but a credit risk;

B is of normal intelligence, not too good looking, but has an income in the millions;

When asked to choose, most people prefer A to B. But now consider the choice between B and a third person,C:

C is pretty dumb, has smashing good looks, and earns a decent income.

When asked to choose between B and C, most people prefer B. But when asked to choose between A and C, most people prefer C to A -- a violation of transitivity.

Notice that if we calculated expected values, we wouldn't get into this problem. If we assign each highly desired attribute a value of 2, each "acceptable" attribute a value of 1, and each undesirable attribute a value of 0, the value of each choice is 3. So you shouldn't really have any preference at all among the three dates. Moreover, we wouldn't have this problem if you rely on overall global utility. Suppose that the most important thing is that your date be smart. Then you will prefer A >B, and B >C, and A to C. Same thing goes if your utility is focused on looks or money. The violation occurs because people focus on different attributes for each choice. A is preferred to B because A has more brains and better looks (2 out of three attributes are positive); B is preferred to C because B has better looks and better income (2 out of three attributes are positive, but it's a different two). But C is preferred to A because C has better looks and better income than A (another set of two positive attributes).

Now imagine that you are going to buy a new car:

A The salesperson offers you a price of $18,000, and you accept it.

B Then the salesperson offers you a special coating to protect the underbody for $400 more, and you agree to it. Thus, you prefer B to A.

C Then the salesperson offers you a top-of-the-line stereo system for $500 more, and you agree to that too. Thus, you prefer C to A.

D After a few more special offers, each of which you've agreed to (thus, effectively preferring D to C), the salesperson announces that the total price of the car is now $23,000. At this point, many people will revert to the stripped-down car for the original price of $18,000. Thus, they now prefer A to D.

Framing Effects

Preference reversals and sunk-cost effects occur because people do not simply calculate values or utilities. Instead, they focus their attention on certain aspects of a choice rather than others, and this attentional focus can be shifted by how the problem is framed. Framing also affects other aspects of judgment and decision making -- for example, the way a problem is framed may provide the initial value used in the anchoring and adjustment heuristic, or it may shape the choices made in hypothesis -testing. A great deal of attention has focused on the role of framing in judgments concerning risky prospects, as in the following problem.

The Coin-Toss Problem

Imagine that you are offered the following gamble:

If the outcome of a coin-toss is "tails", you lose $10.
If the outcome of a coin-toss is "heads", you will win a certain amount of money.
How much winnings do you have to be offered before you will take the gamble?

According to rational choice theory, you should take the gamble if the prospective winnings are any amount greater than $10. For example, if you will get $12 for heads, the value of the gamble is ($12 - $10) x .5 = $1; or, put another way, (-$10 x .5) + (+$12 x .5) = $1. Assuming that you've got a good use for that $1 (here were talking about expected utility theory), you should take the bet, because that's the expected value (or, for that matter, the expected utility) of the gamble.

But people don't do this. Offered the gamble as described, the average person wants the prospect of winning about $20 before he or she will take the risk of losing $10. Losses loom larger than gains. In fact, when people evaluate gambles, the potential gains must be about twice as large as the potential loss before they'll take a chance. This phenomenon is known as loss aversion.

The Disease Problem

Imagine that you are a public health official facing the impending outbreak of a deadly disease. Based on past experience, the disease is expected to kill 600 people. You have two alternative programs available:

If Program A is adopted, 200 people will be saved.
If Program B is adopted, there is a 1/3 probability that all will be saved, and a 2/3 probability that none will be saved.

Note that while Program A entails the certainty that some lives will be saved, Program B entails risky prospects: all might be saved, but all might be lost. Which program do you choose? Faced with this choice, people prefer Program A to Program B by a ratio of about 2:1.

But this preference is irrational from the perspective of rational choice. Recall how we are to evaluate choices according to rational choice theory:

Expected Value of Choice = Expected Outcome * Probability of Outcome.

Accordingly,

The expected value of Program A, where it is certain that 200 lives will be saved, is 200 x 1.0 = 200.
The expected value of Program B, where there is a chance that all 600 lives will be saved, is 600 x 1/3 = 200.

Viewed rationally, then the outcomes of the two choices are identical.

Now, it is possible that the 2:1 preference for Program A is a simple case of risk aversion. However, consider the choice between two other programs:

If Program C is adopted, 400 people will die.
If Program D is adopted, there is a 1/3 probability that none will die, and a 2/3 probability that all will die.

Note that while Program C entails the certainty that some lives will be lost, Program D entails risky prospects: all might be lost, but some might be saved. Which program do you choose? Faced with this choice, people prefer Program D to Program C by a ratio of about 2:1.

Note that this preference is also irrational from the perspective of rational choice.

The expected value of Program C, where it is certain that 400 lives will be lost is 400 x 1.0 = 400.
The expected value of Program D, where there is a chance that all 600 lives will be saved, is 600 x 2/3 = 400.

Viewed rationally, then, the outcomes of these two choices are identical. In fact, the four programs are normatively equivalent: in each case, we can expect to save 200 lives and lose 400 lives. Note too, that this is an instance of a preference reversal:

When the choice is stated in terms of lives to be saved, people prefer certainty over risk.
But when the choice is stated in terms of lives to be lost, people prefer risk over certainty.

Why, then, do people strongly prefer one program over another? According to Kahneman and Tversky, Programs A and B focus on gains -- people's lives to be saved. Under such circumstances, people prefer a sure gain, and are averse to risk. But Programs C and D focus on losses -- people who will die. Under these circumstances, people prefer to avoid a sure loss, and accept a certain level of risk.

A similar kind of effect appears in research by Paul Slovic (Judgment & Decision Making, 2007), employing scenarios relating to humanitarian relief.

In one experiment, subjects were presented with a water shortage in an African refugee camp: diverting scarce resources would be expected to save 4,500 lives. In one version of the scenario, the subjects were told that the population of the camp was 11,000 people. In another version, they were told that the population was 100,000 people. People were more likely to divert resources to the smaller camp than the larger one, even though the number of lives to be saved (the prospective gain) was the same in both instances.
In another experiment, subjects were asked to take the role of an executive at a foundation fighting disease. In one scenario, they were told that Disease A killed 20,000 people, and a $10,000,000 donation could save 10,000 lives. In another version, they were told that Disease B killed 290,000 per year, and the same donation would save 20,000 lives. People were more likely to give the grant to fight Disease A, even though giving the money to fight Disease B would save more lives.

In these scenarios, and others Slovic tested, judgments seemed to be based on the percentage of lives that would be saved, not the absolute number. It's a little like the Weber-Fechner Law, discussed in the lectures on "Sensation", where the same amount of change is perceived to be smaller or larger, depending on the baseline. The light of a candle makes a big difference in a dark room, but much less difference in a room that is brightly illuminated. It seems like 4,500/11,000 is smaller than 4,500/100,000, and it is - -percentagewise. But looked at another way, in either case 4,5000 lives are saved. The outcome is the same in either case, but judgments differ depending on the background context.

The Disease Problem and the problem of Sunk Costs illustrate framing effects in judgment and decision making. That is, judgment is not invariant over different descriptions of a problem. Rather, judgment depends on how the problem is framed -- i.e., whether there is a focus on gains or losses. Framing effects violate normative rationality, which holds that rational choice is determined by an abstract description of the problem at hand. The expected value, or expected utility, of a choice is a matter of algebra. Choices should not depend on the wording of a problem -- whether lives would be saved or lost, whether it is $25 or$150 that is already down the drain.

Bernoulli's Error

The concept of utility, essentially meaning "pleasure", was introduced by Daniel Bernoulli (1700-1782), a French mathematician who was among the first to ask how people make decisions under conditions of risk (this is the same guy who formulated Bernoulli's principle in physics, which states that as the speed of a moving fluid increases, the pressure of that fluid also increases).

Bernoulli considered the problem -- actually formulated by his cousin -- of a merchant whose ship is carrying goods from Amsterdam to St. Petersburg (this is known as "The St. Petersburg Paradox"). If there is a 5% probability that the ship will sink, how much insurance should he carry? In considering this question, Bernoulli focused on the merchant's "states of wealth":

How much wealth will the merchant have if the ship arrives safely?
How much will he have if the ship sinks?
How much will he have if he buys insurance? (and the ship doesn't sink)?
How much will he have if he doesn't (and the ship sinks)?

Bernoulli's emphasis on total wealth, or the pleasure (utility) it will buy, is a perfectly rational way to go about about deciding whether to buy insurance, and how much. Note, however, that the decision to buy insurance is the same no matter how much wealth the merchant has. It doesn't matter whether he has a thousand dollars (or francs, or whatever) or a million.

The problem is that people (including 18th-century French merchants) don't think that way. They don't focus on their total wealth. They focus on how much they are likely to gain and lose by a particular decision -- and losses loom larger than gains.

Kahneman and Tversky's work on judgment under uncertainty was largely ignored by economic theorists until they pointed out Bernoulli's error. At that point, Richard Thaler (now at the University of Chicago), Vernon Smith (now at George Mason University) and other economists began to notice what they were doing (see, for example, Exuberance is Rational. Or at Least Human", an article on Thaler by Roger Lowenstein,New York Times Magazine, 02/11/01). Behavioral economics was invented, focusing on how people actually make economic decisions, and psychological considerations were injected into what formerly had been abstract economic theorizing.

Prospect Theory

The phenomena of framing and other departures from rational choice are explained by the prospect theory offered by Daniel Kahneman and Amos Tversky. In order to deal with the problems that confront subjective utility theory, Kahneman and Tversky proposed a psychological alternative known as prospect theory. Prospect theory begins, as utility theory does, with the assumption that people base decisions on subjective utilities, not on the objective values associated with their choices. Then, prospect theory goes on to really take into account the psychology of the individual decision-maker. In this way, it seeks to explain a number of phenomena of decision making that are anomalous from the point of view of expected value theory and expected utility theory -- for example, the fact that losses loom larger than gains; or that first impressions can shape final judgments -- remember the anchoring and adjustment heuristic? Or that vivid examples can overshadow statistical summaries -- remember the representativeness heuristic? Prospect theory is always concerned with how people make judgments, decisions, and choices under conditions of risk -- that is, where outcomes are, to a greater or lesser extent, uncertain. The basic idea of prospect theory is that such judgments are not made merely by the kinds of calculations envisioned by the doctrine of human normative rationality, the theory of rational choice. Rather, they are greatly influenced by how problems and outcomes are framed. That's where prospect theory gets its name. Prospect theory emphasizes how judgments, decisions and choices are framed -- how they appear to the person making them, how the problem is perceived. It's kind of the other side of the coin in the constructivist view of perception. In the constructivist view, perception involves judgment and decision making. In prospect theory, judgment and decision-making depend of perception.

Among the phenomena that prospect theory attempts to explain are these:

losses loom larger than gains
first impressions shape later judgments
a single vivid example can outweigh an abstract statistical summary.

Prospect theory is concerned with how people make judgments, decisions, and choices under conditions of risk -- that is, where outcomes are to a greater or lesser extent uncertain. The basic idea of prospect theory is that such judgments are not made merely by the kinds of calculations (e.g., outcome x probability) envisioned by the doctrine of human normative rationality, or by the theory of rational choice; rather, they are greatly influenced by how problems and outcomes are framed. "Prospect theory" emphasizes how judgments, decisions, and choices are framed; how they appear to the person making them -- in other words, how the problem is perceived.

In prospect theory, decisions are made in two phases.

Editing, in which the judge sets a reference point which separates various outcomes into gains and losses.
Evaluation. in which the judge computes a value of one or more potential choices based on their outcomes and the probabilities attached to them.

The value function is asymmetrical, such that losses generally have larger value than gains.

Or, as Kahneman and Tversky put it, Losses loom larger than gains.

Prospect theory is a modification of utility theory, in that it shares with utility theory the assumption that people base their decisions on subjective utilities, not objective values.

However, utility is measured in terms of gains and losses, not absolute wealth.
Unlike utility theory, however, it does not assume that people are always risk averse.

People prefer sure gains when their attention is focused on gains.
But they prefer to take a risk when their attention is focused on loss.

Attention is focused by the way the choice is worded, resulting in framing effects.

In calculating the utility of a choice, people do not multiply utilities by the objective probability attached to them. Rather, they employ a psychological concept of probability, which tends to overweight very high and very low probabilities. Thus, probable (but not certain) gains look better, and probable (but not certain) losses look worse, than they really are.
People do not evaluate utilities in an absolute sense. Rather, the evaluation occurs against the background provided by some reference point -- such as the 600 lives projected to be lost in the disease problem. Framing, a focus on lives saved or lives lost, alters this subjective reference point. The result, again, is to make risky prospects look better or worse than they actually are.

Kahneman and Tversky's work on judgment heuristics and risky prospects injected psychological considerations into economic theory, continuing a tradition initiated by Herbert Simon. While rational choice is an idealized prescription of how economic decision makers ought to behave, prospect theory is an empirically based description of how people actually make judgments and decisions.

A Digression on the Psychology of Coincidence

A coincidence is a surprising concurrence of events, perceived as meaningfully related, with no apparent causal connection.

In fact, many coincidences have a hidden cause. In other cases, the coincidence is an illusory product of selective memory or perception. But in other cases, events that appear to be extremely unlikely coincidences are in fact highly probable, if you give proper consideration to the base rates of the events themselves. This illustrates "the law of very large numbers": with a large enough sample, any outrageous thing is apt to happen.

Consider, for example, Evelyn Adams, who won the New Jersey State Lottery twice within four months in 1986. Intuitively, such an event seems extremely implausible. For example, if the odds of an individual winning a lottery are 1 in a million (1/1,000,000), then the odds that an individual will win twice are 1 in a trillion. But now consider how many such lotteries there are, how many people play them each week, and how many different numbers are drawn by the average player. These considerations increase markedly the probability that someone,somewhere, will win some lottery twice in their lives. In fact, a proper calculation indicates that the odds of a double lottery winner somewhere in the United States, over any given four-month period, are roughly 1 in 30. Over a seven-year period, the odds are even, 1:2. Thus, what looks like an extremely long shot, then, comes closer to a sure thing.

Or, to take another example, a double hole-in-one in golf. The odds of this happening to someone are about 1.85 billion to 1; but when you consider how many golfers there are, and how many times they play, such a thing is likely to happen -- to someone, somewhere -- about once a year.

Here's an extreme example: Tsutomu Yamaguchi, an engineer who worked for Mitsubishi, was on a business trip to Hiroshima, Japan, on August 6, 1945: he stepped off a streetcar at the instant that the "Little Man" atomic bomb was exploded over that city. He suffered burns and ruptured eardrums, but returned to his home in Nagasaki. On August 9, he was telling his supervisor about his experience when "Fat Man", another atomic bomb, went off. What's the chance of being hit by two atomic bombs? Mr. Yamaguchi was the only officially recognized survivor of both atomic blasts, but it has been estimated that there were about 165 such individuals -- enough that they get a special name,nijyuu hibakusha ("twice-bombed people"). After the war he worked for the American occupation forces and as a schoolteacher, as well as for Mitsubishi. He wrote a memoir, spoke against nuclear weapons, and was featured in a documentary film about twice-bombed people. He died in 2010 at the age of 93. (see "Tsutomu Yamaguchi, 93; Survived 2 A-Bombs" by Mark McDonald,New York Times 01/07/2010; photograph from the Washington Post.

If you consider close, but not exact, coincidences, the probabilities involved increase even more. For example, the odds of two people in even a fairly small group sharing the same birthday are surprisingly high. They go even higher if they are required only to have a birthday within a day of each other, and even higher if the birthdays can occur within a week.

An important factor in the perception of coincidence is the phenomenon of "multiple end points". When what counts as a coincidence is not specified in advance, the random co-occurrence of any two chance events will qualify (e.g., that two strangers will share the same birthday, or the same first name, or the same hometown). Even though the likelihood of any one particular coincidence may be low, the likelihood that some coincidence will occur is very high. Those amazing psychics who advertise on TV capitalize on multiple end points to boost their reputations for accuracy. As one of their testimonials states: "Something will happen in the future to make everything make sense". But what that is isn't specified in advance.

An article in the New York Times Magazine,"The Odds of That", by Lisa Belkin (08/11/02), further illustrates the difficulty that people have in reasoning about coincidences.

For more on the mathematics and psychology of coincidences, see:

Beyond Coincidence: Amazing Stories of Coincidence and the Mystery and Mathematics Behind Them by M. Plimmer & B. King (2006.

This material is based largely on examples given by Medin and Ross in their textbook,Cognitive Psychology, to which the interested reader is referred for a more extensive discussion of rational choice theory and its alternatives.

Are People Just Plain Stupid?

The basic functions of learning, perceiving, and remembering depend intimately on judgment, inference, reasoning, and problem solving. In this lecture, I focus on these aspects of thinking. How do we reason about objects and events in order to make judgments and decisions concerning them?

According to the normative model of human judgment and decision making, people follow the principles of logical inference when reasoning about events. Their judgments, decisions, and choices are based on a principle of rational self-interest. Rational self-interest is expressed in the principle of optimality, which means that people seek to maximize their gains and minimize their losses. It is also expressed in the principle of utility, which means that people seek to achieve their goals in as efficient a manner as possible. The normative model of human judgment and decision making is enshrined in traditional economic theory as the principle of rational choice. In psychology, rational choice theory is an idealized description of how judgments and decisions are made.

But in these lectures, we noted a number of departures from rational choice:

the organization of concepts as fuzzy sets;
problems with judgments of similarity;
the application of judgment heuristics rather than algorithms;
a confirmatory bias in hypothesis testing;
difficulties with conditional reasoning; and
framing effects.

These effects seem to undermine the popular assumption of classical philosophy, and early cognitive psychology, that humans are logical, rational decision makers -- who intuitively understand such statistical principles as sampling, correlation, and probability, and who intuitively follow normative rules of inference to make optimal decisions.

The "Irrationality" of Deception

Another departure from normative rationality can be seen in deception. Ordinarily, we think of lying and cheating as something that people do in order to gain some advantage. Which is true, except that there is more to it than that. Dan Ariely, a psychologist who specializes in the behavioral economics pioneered by Simon and by Kahneman and Tversky, has developed a simple paradigm for studying dishonesty, in which people are given an opportunity to lie or cheat under conditions in which they do not think they can be caught (of course, they're wrong). He finds that the considerations of rational choice -- specifically, the amount of money to be gained, or the probability of being caught -- have no effect on whether people will lie or cheat.

Here are some factors that, in Ariely's research, tend to increase dishonesty:

Whether people are able to rationalize their dishonest behavior.
When they experience conflicts of interest.
If they are more creative (meaning, perhaps, that they can think of more and better ways of being dishonest!).
If they have committed previous immoral acts.
If their resources are depleted (OK, now that is a consideration for rational choice).
If other people will benefit from our dishonesty (and so might this one be -- except that it's not the subject himself who is benefiting).
If the subject has observed others behaving dishonestly.
If the subject belongs to a culture (like an organization) that provides examples of dishonest behavior.

And here are some factors that he finds increase honesty:

If the subject makes a pledge to behave honorably (like the honor codes at the military academies and some colleges).
If subjects sign a promise to be honest before they choose how to behave.
If subjects receive "moral reminders" (posting the Ten Commandments near the office coffee pot).
If subjects perform their duties under close supervision.

Ariely reports the results of his research in The (Honest) Truth About Dishonesty: Why We Lie to Everyone -- Including Ourselves (2012).

Some psychologists have taken these departures from normative rationality as grounds for what I have come to call the "People Are Stupid" School of Psychology (PASSP), a school of psychology that occupies a place in the history of the field alongside the structuralist, functionalist, behaviorist, and Gestalt "schools". The fundamental assumption of this group of psychologists is that people are fundamentally irrational: They don't think very hard about anything, and they let their emotions and motives get in way of their cognition. PASSP also believes that people usually operate on "automatic pilot", meaning that we don't pay too much attention to what is going on, what we are doing, so that we are swayed by first impressions, immediate responses. PASSP also believes that we usually don't know what we are doing, either: Our behavior is mostly unconscious, and our "reasons" little more than post-hoc rationalizations for our behavior. From the perspective of PASSP, consciousness is not necessarily a good thing, because it actually gets in the way of adaptive behavior!

Bounded Rationality and Satisficing

But do the kinds of effects documented here really support the conclusion that humans are irrational? Not necessarily. Normative rationality is an idealized description of human thought, a set of prescriptive rules about how people ought to make judgments and decisions under ideal circumstances. But circumstances are not always ideal. It may very well be that most of our judgments are made conditions of uncertainty, and most of the problems we encounter are ill-defined. And even when they're not, all the information we need may not be available, or it may be too uneconomical to obtain it. Under these circumstances, heuristics are our best bet. They allow fairly adaptive judgments to be made. Yes, perhaps we should appreciate more how they can mislead us, and yes, perhaps we should try harder to apply algorithms when they are applicable, but in the final analysis:

It is rational to inject economies into decision making, so long as you are willing to pay the price of making a mistake.

Human beings are rational after all, it seems. The problem, as noted by William Simon, is that human rationality is bounded. We have a limited capacity for processing information, which prevents us from attending to all the relevant information, and from performing complex calculations in our heads. We live with these limitations, but within these limitations we do the best we can with what we've got. Simon argues that we can improve human decision-making by taking account of these limits, and by understanding the liabilities attached to various judgment heuristics. But there's no escaping judgment heuristics because there's no escaping judgment under uncertainty, and there's no escaping the limitations on human cognitive capacity.

Simon's viewpoint is well expressed in his work on satisficing in organizational decision-making -- work that won him Nobel Memorial Prize in Economics. Contrary to the precepts of normative rationality, Simon observed that neither people nor organizations necessarily insist on making optimal choices -- that is, always choosing that single option that most maximizes gains and most minimizes losses. Rather, Simon showed that organizations evaluate all the alternatives available to them, and then identify those options whose outcomes are satisfactory (hence the name, satisficing). The choice among these satisfactory outcomes may be arbitrary, or it may be based on non-economic considerations. But it rarely is the optimal choice, because the organization focuses on satisficing, not optimizing.

Satisficing often governs job assignments and personnel selection. Otherwise, people might be overqualified for their jobs, in that they have skills that are way beyond what is needed for the job they will perform.

Satisficing also seems to underlie affirmative action programs. In affirmative action, we create a pool of candidates, all of whom are qualified for a position. But assignment to the position might not go to the candidate with the absolutely "highest" qualifications. Instead, the final choice among qualified candidates might be dictated by other considerations, such as an organizational desire to increase ethnic diversity, or to achieve gender or racial balance. Affirmative action works so long as all the candidates in the pool are qualified for the job.

And satisficing plays a role in more personal, intimate matters, like finding a mate.

Lori Gottlieb, in Marry Him: The Case for Settling (2010), points out that large numbers of women navigate the dating scene armed with checklists -- sometimes actually written out -- of the attributes of their fantasy husbands, searching for a perfect match in real-life (men do this too, but they're not the subject of Gottlieb's book). This sets them, and the men they date, up for failure as they go from one speed-dating party to the next, cruise internet dating sites, or even hire professional matchmakers. Instead, Gottlieb advises women (and men) to find someone who's good enough and settle for him (or her) -- and then, presumably, work at making it work.
Gottlieb's point is generalized by Renata Salecl, in Choice (also published in 2010 -- there must have been something in the air!), who argues that searching for the perfect choice is likely to only make us miserable), and we'd be better off just choosing, for God's sake, and then learning to live with and bounce back from the consequences of the sometimes-bad choices we make.
Along these lines, in Unintended Consequences: Why Everything You've Been Told About the Economy is Wrong (2012), Edward Conrad (a former colleague of Mitt Romney's at Bain Capital) suggests the following three-step heuristic for finding a satisfactory marital partner:

Estimate the number of potential mates in your geographic area.
"Calibrate" the marriage marketplace by dating as many different people as you can.
Select the first person you meet who is a better match than the best one you met during the calibration phase. This match is highly likely to be the best match you'll find.

In On Settling (2012), Robert Goodin, a philosopher, argues that decision-making is especially difficult when we face two acceptable choices. In a philosophical problem known as Buridan's ass, a donkey is placed equidistant between two equally desirable bales of hay, and starves to death because it cannot choose which one to eat. Goodin argues that, under such circumstances, if we just settle, we can focus our mental and physical energies on other things - -things where we really do have important choices to make. "Settling" in one domain allows for "striving" in other domains.

Jonah Lehrer, a science writer who wrote How We Decide, summarizing recent work on the psychology of choice, once described himself as "pathologically indecisive" ("Mind Games", interview with Deborah Solomon,New York Times Magazine, 12/04/2008). "I wrote the book because I would spend 10 minutes in the cereal aisle choosing between Honey Nut Cheerios and Apple Cinnamon Cheerios". But we become pathologically indecisive only if we try to make choices between things that aren't, really, all that different. Choosing classic Cheerios would be better for him (less sugar), and choosing Total or Product 19 would be as well (more nutritional). Following Simon's principle, we should first determine which options are the really good ones. Then, instead of trying to make distinctions among these, choose arbitrarily, or whimsically -- but choose!

"Fast and Frugal" Heuristics

Another way of stating Simon's principles of bounded rationality and satisficing is with the idea of "fast and frugal heuristics" proposed by the German psychologist Gerd Gigerenzer.

The bottom line in the study of cognition is that humans are, first and foremost, cognitive beings, whose behavior is governed by percepts, memories, thoughts, and ideas (as well as by feelings, emotions, motives, and goals). Humans process information in order to understand themselves and the world around them. But human cognition is not tied to the information in the environment.

We go beyond the information given by the environment, making inferences in the course of perceiving and remembering, considering not only what's out there but also what might be out there, not only what happened in the past but what might have happened.

In other cases, we don't use all the information we have. Judgments of categorization and similarity are made not by a mechanistic count of overlapping features, but also by paying attention to typicality. We reason about all sorts of things, but our reasoning is not tied to normative principles. We do the best we can under conditions of uncertainty.

In summary, we cannot understand human action without understanding human thought, and we can't understand human thought solely in terms of events in the current or past environment. In order to understand how people think, we have to understand how objects and events are represented in the mind. And we also have to understand that these mental representations are shaped by a variety of processes -- emotional and motivational as well as cognitive -- so that the representations in our minds do not necessarily conform to the objects and events that they represent.

Nevertheless, it is these representations that determine what we do, as we'll see more clearly when we take up the study of personality and social interaction.

Psychology and the Nobel Prize in Economics

In 2002, in an event long anticipated by his colleagues, Daniel Kahneman received the Nobel Memorial Prize in Economic Science, sharing the award with Vernon L. Smith of George Mason University (Amos Tversky, having died in 1996, was not eligible to share the award, but in his press statements Kahneman clearly acknowledged his collaboration with Tversky).

By the way, Kahneman, who was born in Tel Aviv, Israel, in 1934, has a long connection with Berkeley. He received his PhD from here in 1961, and he taught here from 1986 to 1994. He is now Eugene Higgins Professor of Psychology and Professor of Public Affairs at Princeton University, but he returns to Berkeley nearly every summer and has hosted summer conferences here. Kahneman's early work on perception and attention certainly influenced his later studies of how people perceive the choices that they face, and how they attend to different kinds of information when evaluating those choices.

Kahneman's award was for work in "psychological and experimental economics, and his part of the citation reads as follows"

'for having integrated insights from psychological research into economic science, especially concerning human judgment and decision-making under uncertainty'.

The citation continues:

Traditionally, much of economic research has relied on the assumption of a 'homo economicus' motivated by self-interest and capable of rational decision-making. Economics has also been widely considered a non-experimental science, relying on observation of real-world economies rather than controlled laboratory experiments. Nowadays, however, a growing body of research is devoted to modifying and testing basic economic assumptions; moreover, economic research relies increasingly
on data collected in the lab rather than in the field. This research has its roots in two distinct, but currently converging, areas: the analysis of human judgment and decision-making by cognitive psychologists, and the empirical testing of predictions from economic theory by experimental economists. This year's laureates are the pioneers in these two research areas.

Daniel Kahneman has integrated insights from psychology into economics, thereby laying the foundation for a new field of research. Kahneman's main findings concern decision-making under uncertainty, where he has demonstrated how human decisions may systematically depart from those predicted by standard economic theory. Together with Amos Tversky (deceased in 1996), he has formulated prospect theory as an alternative, that better accounts for observed behavior. Kahneman has also discovered how human judgment may take heuristic shortcuts that systematically depart from basic principles of probability. His work has inspired a new generation of researchers in economics and finance to enrich economic theory using insights from cognitive psychology into intrinsic human motivation.

Link to video of Nobel interview with Kahneman and Smith, 12/12/02 (requires RealPlayer).

Link to Kahneman's Nobel Prize Lecture, 'Maps of Bounded Rationality", 12/08/02 (requires Real Player).

Link to an interview with Daniel Kahneman.

For a dual biography of Kahneman and Tversky, see The Undoing Project: A Friendship that Changed Our Minds (2016) by Michael Lewis, the author of Moneyball (2003) -- maybe it, too, will be made into a movie; it should! "The Undoing Project" was K&T's name for their collaboration -- undoing the psychologically unrealistic portrait of judgment and decision-making which had been assumed by classical economic theory up to that time.

Actually, there's a deeper connection between K&T -- at least K -- and baseball. Baseball is notorious for its statistics: the whole point of Moneyball, book and movie, is how Billy Beane, the general manager of the Oakland Athletics, employed statistical analysis (which the baseball writer Bill James called sabermetrics, a term derived from the Society for American Baseball Research, or SABR) to develop a winning baseball team despite severe budgetary constraints (in 2002, the As -- no apostrophe, please! -- had a personnel budget of $44 million, compared to $125 million for the Yankees). Implicit in the book (actually, it's pretty explicit) is a critique of traditional practices in baseball, which rely a lot on subjective judgments and are prone to stereotyping and other forms of bias. The As didn't win the World Series, but they did get to the playoffs in 2002 and 2003. Billy Beane was more influenced by Bill James and sabermetrics than by Dan Kahneman, and Michael Lewis didn't learn about Kahneman until he read a review of his own book, but it turns out that Kahneman's 2011 book, Thinking Fast and Slow, is something like required reading for baseball managers, scouts, and coaches. According to Joe Lemire, a sports writer for the New York Times, once they've read it, "they never think of decisions the same way" ("This Book Is Not About Baseball. But Baseball Teams Swear by It", NYT 02/25/2021). They've learned about cognitive biases such as the representativeness heuristic (a good pitcher should look like a pitcher), stereotypes, and other liabilities of intuitive judgment, and how to avoid or overcome them.

John Davenport, a colleague when I was at the University of Wisconsin, used baseball as illustrative material in his statistics courses, especially when teaching the principles of good graphic presentations. His self-published books, Baseball Graphics (1979) and Baseball's Pennant Races: A Graphic View (1981) are still available on Amazon, and well worth purchasing, if you're an avid fan of either baseball or graphics.

The award of the Nobel Prize in Economics to Daniel Kahneman is the climax of a long process that acknowledges the impact of psychology on economic theory, and corrects the normative view of human rationality that underlies traditional economic thinking (for more details, click on the links to the "Nobel E-Museum".

In 1978, Herbert A. Simon (of Carnegie-Mellon University) was the first psychologist to receive the Nobel Prize in Economics, "for his pioneering research into the decision-making process within economic organizations" (though it must be said that Simon is much, much more than a psychologist, given that he holds simultaneous academic appointments in political science, public administration, and computer and information sciences as well as psychology, and he has made research contributions to the philosophy of science, applied mathematics and statistics, operations research, economics, and business administration; he was a pioneer in cognitive science and artificial intelligence, and one of the leaders of the cognitive revolution within psychology). Whereas the traditional economic "theory of the firm" assumed that there was no distinction between organizations (such as firms) and the individuals (such as entrepreneurs) in them (it is a cardinal principle of sociology, however, that the behavior of organizations cannot be reduced to the behavior of their individual members); individuals were assumed to be rational actors, interested in optimizing outcomes; and so organizations were assumed to be rational actors, optimizing outcomes, as well. Simon's 1947 book,Administrative Behavior, changed all that. Simon introduced the notion of judgment under uncertainty, and substituted the goal of "satisficing" for "optimizing". In the process, he articulated a wholly new, and more psychologically realistic, concept of human rationality that laid the foundation for the work of Kahneman, Tversky, and others like them.
In 1994, John L. Nash (the economist whose struggles with paranoid schizophrenia were portrayed by Russell Crowe in the movie "A Beautiful Mind", based on the book of that title by Sylvia Nassar) shared the prize with John C. Harsanyi, and Reinhard Selten for work (actually, his doctoral dissertation!) on experimental "games as the foundation for understanding complex economic issues". Nash made a fundamental distinction between cooperative and non-cooperative games, and arrived at a universal solution for the non-cooperative case, known as the "Nash equilibrium", in which all players get what they want. Games of the sort that Nash studied, such as the prisoner's dilemma, are actually laboratory models of dyadic social interaction, whether between two people or two groups, and as such are studied by social psychologists as well as economists.
In 2000, Daniel L. McFadden (of UC Berkeley) shared the prize with James J. Heckman (Chicago) for work in microeconomics. McFadden is an economist, not a psychologist, but his work has been heavily influenced by psychologists like Kahneman and Tversky. McFadden's work is on the theory of "discrete choice", which has to do with how people make decisions in such matters as occupation or place of residence -- exactly the sort of thing that Kahneman and Tversky have studied from the psychological side. For his part, the economist Heckman has focused on problems of sampling data from larger populations that are identical to those that confront psychologists (as you remember from our earlier discussion of research methods).
In 2001, the George A. Akerlof (also of UC Berkeley) shared the prize with A. Michael Spence (Stanford) and Joseph E. Stiglitz (Columbia) for work on markets with asymmetric information. While traditional economic theory assumes that actors have identical information about a choice or a bargain, the fact is that in the real world sellers usually have more information, or more accurate information, than buyers -- an asymmetry that can lead to the "adverse selection" of low-quality products. In an example often cited by Akerlof, if you buy a used car, the salesman knows whether the car is a lemon, but you do not -- at least until you get it off the lot. The problem of asymmetric information is a variant on the problem of "judgment under uncertainty", which Kahneman and Tversky argue is the central characteristic of most, if not all, judgments made in the real world. Akerlof, by the way, was once denied promotion at Berkeley, and his 'Lemons" paper, which was specifically cited by the Nobel committee, was rejected by several journals as "trivial". He is married to Janet Yellen, a UCB emeritus professor who in 2013 was appointed the first woman to chair the Federal Reserve Board (she was denied promotion at Harvard).

Akerlof, now retired from Berkeley, is promoting a variant on behavioral economics which he calls identity economics, which argues that people's self-concepts, and their racial, gender, and ethnic identities shape their economic decision-making more than the normatively rational prescriptions of homo economicus.

Also in 2001, Matthew Rabin, an economist at Berkeley, received the John Bates Clark Medal, awarded every year to the best economist in the United States under age 40 (earlier, Rabin had received one of the "genius" awards from the MacArthur Foundation). Rabin was cited expressly for his work incorporating psychological principles into economics -- for example, by showing that people make economic choices based on principles of fairness and reciprocity, not just self-interest. Rabin's award is generally considered to represent a major milestone for behavioral economics, and the incorporation of psychological principles into economic theory, because it is likely to encourage other young economists to follow the same path.
The 2002 Nobel Prize to Kahneman and Smith consolidated the triumph of psychology within economics (Richard Thaler probably should have shared the prize with Kahneman and Smith; but Nobel prizes can only be divided in three, and leaving the third potential slot may have been the Prize committee's way of acknowledging Tversky's contributions, even in his absence; in any event, Thaler himself got the Prize in 2017). Smith was recognized "for having established laboratory experiments as a tool in empirical economic analysis, especially in the study of alternative market mechanisms' What Smith did, essentially, was to introduce laboratory research into economics, thereby creating a field of "experimental" or "behavioral" economics that uses experimental paradigms very similar to those used in cognitive and social psychology to test new, alternative market designs before they are implemented in the real world. Smith's work thus recognizes that economics is not just a science of how economies behave in the abstract, but rather is a science of the economic behavior of individuals and groups -- a behavioral science, like psychology.
The 2005 Nobel Prize was shared by Robert J. Aumann and Thomas C. Schelling for their research using games such as the Prisoner's Dilemma to study conflict and cooperation. We'll talk about the Prisoner's Dilemma game later, in the lectures on Personality and Social Interaction.
The 2013 Nobel Prize was shared by Robert J. Schiller, of Yale University, and two professors from Chicago, Eugene Fama and Peter Hansen. Schiller's work is also in behavioral economics, emphasizing the irrationality of markets -- as illustrated by the phenomenon of "irrational exuberance", or unjustified optimism, concerning stock markets (interestingly, Fama won the award for theoretical work that argued precisely the opposite, that markets were efficient, and completely rational, over the long run). More recently, Schiller has emphasized the power of stories, and their viral spread through such media as the internet, to affect economic phenomena such as depressions and recessions, and bubbles (see his 2020 book, Narrative Economics: How Stories Go Viral and Drive Major Economic Events). Reviewing his book, Cass Sunstein (a law professor who, as discussed below, is also an important contributor to behavioral economics), notes that Schiller defines "narrative" or "stories" very broadly, to stand for a whole host of social and emotional influences on economic choices ("Once Upon a Time There Was a Big Bubble", New York Review of Books, 01/14/2021). Many of these are the subject-matter of psychology, particularly social psychology.
The 2017 Nobel Prize was given to Richard Thaler, who is generally recognized as the inventor of behavioral economics -- which, remarkably, he did as a member of the so-called "Chicago School" of economics, which emphasizes the "invisible hand" of market forces and assumes the essential rationality of economic decision-making. If the 2002 Prize recognized psychology's contribution to economics, the 2017 Prize recognized the contribution of economics to psychology.

The 2002 Nobel Prize to Kahneman (and, implicitly, Tversky as well) was recognized editorially by the Wall Street Journal and the New York Times as marking a major shift within economic theory, toward recognizing what the Times (10/12/02) called "the human element" in economics. And the kinds of behavioral anomalies identified by Kahneman and Tversky, Thaler, and others can no longer be dismissed as "noise" in an otherwise rational, self-regulating system. Economics is not just the science of disembodied markets; it is the science of buyers and sellers interacting in markets in the real world, and so psychology has to play a role in economic theory. As a writer in the Journal put it, "Economics isn't just about supply and demand curves anymore" ("Nobel winners for economics are new breed" by Jon E. Hilsenrath, 10/10/02). The Journal erred in citing Kahneman as "the first psychologist to win economics' highest honor" -- surely Simon counts; but he is not likely to be the last. And even if he is, he and Tversky (and Simon) have changed economics forever, by forcing economic theory to confront the vicissitudes of individual reasoning, judgment, decision-making, and problem-solving.

At the same time, it should be acknowledged that the psychological contributions of Kahneman and others were foreshadowed within economics itself. As noted earlier, utility theory already injected psychological considerations into economics by emphasizing the subjective value -- utility -- of a choice. Here are two other examples:

Thorstein Veblen (1857-1929), an early American economist (undergraduate student of John Bates Clark, he of the Clark Medal for early career contributions to economics) who coined the very term "neoclassical economics", wrote in A Theory of the Leisure Class: An Economic Study of Institutions (1899) about conspicuous consumption, in which economic choices -- such as, to take a modern example, what car to buy -- were dictated by their symbolic value rather than their actual value -- i.e., for their contribution to social comparison and what Veblen called "competitive display". In other words, by "wants" as opposed to "needs". For an intellectual biography of Veblen (who led a very interesting life outside the classroom), see Veblen: The Making of an Economist Who Unmade Economics (2020) by Charles Camic (reviewed by Kwame Anthony Appiah in "The Prophet of Maximum Productivity", New York Review of Books, 01/14/2021).

Influenced by Veblen, John Kenneth Galbraith argued in The Affluent Society (1958) and The New Industrial State (1967) that the assumption of rationality on which neoclassical economics was built was probably not valid. (It is probably not a coincidence that Galbraith based his books on his experience in the "real world " of agricultural economics, wartime economic controls, and service as President Kennedy's ambassador to India.) For example, Galbraith argued that rather than obeying laws of supply and demand, industry used advertising and marketing to persuade people to buy goods that industry wanted to produce. For details, see Happiness: Lessons from a New Science by Richard Ledyard (2005).

Next stop: political science, where theories of rational choice also predominate, as they once did in economics....

...Or maybe not. Many traditional rational-choice economists are highly resistant to the new behavioral economics, so much so that some departments threaten to split into two, with one economics department emphasizing rational choice and the other emphasizing behavioral economics, along with other sub-fields such as labor economics, the economics of development, economic history, and the like -- in other words, one department will study orthodox economic theory, while the other will study economics in the real world, often in a manner that challenges orthodoxy. This is almost certainly a bad thing for economics, because it means that the orthodox rational choicers and the heterodox behavioralists, among others, will not interact with each other. Good ideas flourish when they are challenged, but splitting economics into orthodox and heterodox departments seems designed to reduce the challenge to orthodox assumptions. Maybe traditional rational-choice economics isn't so rational after all!

Actually, intra-disciplinary pressures aren't unique to economics. They're everywhere. There are the theoreticians and the experimentalists within physics; in biology, the folks at molecular and cellular biology vs. the integrative biologists and ecologists; in anthropology, the physical anthropologists vs. the cultural. In psychology, we have at least two splits: science vs. practice, and those who are biologically inclined vs. those who lean more towards the social sciences. None of these splits is necessary, however, and it would be nice if all those who share a particular disciplinary commitment -- whether to understanding economic behavior, the origin.

Behavioral Economics and Public Policy

Kahneman, Tversky, and their confreres have had a palpable effect on public policy. For example, in 2009 Cass Sunstein, a prominent behavioral economist at the University of Chicago, joined the Obama administration with an express charge to bring the insights of modern behavioral economics to matters of public policy. His program, outlined in Nudge: Improving decisions about Health, Wealth, and Happiness (2008, co-authored with Richard Thaler) and Simpler: The Future of Government (2013) make the case for a variant of the liberal "nanny state", which would encourage people to do what the government thinks is good for them without resorting to mandates, prohibitions, and incentives to that might impinge on the individual freedom to do what he or she wants. These policies essentially capitalize on well-known judgment heuristics and biases of the sort discussed in these lectures. "Nudges" (also known as decision support technologies) employ interventions that encourage people to make particular choices, but in such a manner that people can avoid those choices cheaply and easily.

An example is organ donation. When you get your driver's license, you may be asked if you want to donate your organs in case of your death. If so, your license is earmarked to this effect. But the current practice is to require people to actively say "yes" to organ donation, resulting in relatively low rates of organ donation. Some other countries, however, make organ donation the default option: it's assumed that the person will agree to donate, but he or she is still given the choice to "opt out". Under these circumstances, rates of organ donation are significantly higher. And, as a general rule, opt out choices produce a higher rate of the target behavior than opt in choices. They create the nudge, encouraging the individual to do something that is beneficial to self and/or others, without actually requiring him or her to do so.

President Obama's executive order formally creating the Office of Information and Regulatory Affairs was entitled "Using Behavioral Science Insights to Better Serve the American People" (009/15/2015).Sunstein had to leave OIRA to return to his academic position at Chicago, but the work continued in other hands. We do not know whether OIRA will continue under the Trump administration.

A similar office now privatized as the "Behavioral Insights Team", known to insiders as "the nudge unit", has been set up in the United Kingdom. Very quickly, its projects saved the British government more than 10 times its operating costs. The World Bank, United Nations, and many individual countries have set up similar programs to harness social science, and in particular the modern psychology of judgment and decision-making, in the service of public policy. See "When Nudge Comes to Shove", The Economist, 05/20/2017.

On the other hand, some writers criticize the whole idea behind nudges as a threat to human freedom, because they involve mass manipulation verging on totalitarian mind control, rather than individual choice and decision-making. That is, people's behavior is directed in ways that are deemed "good for them" without their active assent or even complete knowledge of what is going on. Proponents reply that the programs don't threaten freedom, so long as they are put in place by democratically elected leaders, and give people genuine choices.

Opponents call it "the nanny state". Proponents call it "libertarian paternalism". The whole point of "nudge" programs is to change behavior without constraining it, and without using economic incentives. Thus, people are nudged in a particular direction, but still, in the words of the free-market economist Milton Friedman, "free to choose".

That is, unless the "nudgers" don't play fairly. Some of the potential for abuse is illustrated by an incident during the US presidential campaign of 2020 (see "How Trump Steered Supporters into Unwitting Donations" by Shane Goldmacher, New York Times, 04/04/2021). Shortly before the election, the Trump Campaign was running short of money. In an effort to raise money quickly, added to its online donation page a pre-checked box permitting automatic weekly contributions. So, unless they unchecked the box, people who intended to make a one-time donation of, say, $100 instead contributed $100 per week -- at least, that is, until their credit-card bills arrived. Later, the campaign added a second pre-checked box, known internally as the "money bomb", which doubled a person's contribution. If a donor left both boxes checked, someone who intended to make a one-time donation of $100 ended up, by default, making weekly contributions of $200. Goldmacher reports that some donors saw their credit cards overdrawn, or their bank accounts depleted. The Trump Campaign ended up issuing more than 530 thousand refunds totaling some $64.3 million, more than 10% of the money they raised during the fall. By contrast, the Biden campaign, issued only 37,000 refunds totaling $5.6 million, or about 2% of the total. The technology used by the Trump Campaign was, essentially, to "nudge" people into giving more than they intended, or that was good for them. The Trump Campaign was not the only marketing campaign to use pre-checked boxes to steer customers into making unwanted purchases. It's common when selling magazine subscriptions, for example. Both are examples of how "nudging" can be misused.

One thing for sure is that such nudges save money. A review co-authored with Thaler and Sunstein (Benartzi et al., Psychological Science, 2017) found that four different "nudge" programs were more cost-effective than traditional alternatives. For example, it is well known that most Americans do not save enough for their retirement. Simply informing employees about the importance of retirement savings, even when it emphasizes how little the typical worker saves, obviously doesn't work -- if it did, people would save more! Traditional programs to promote retirement savings typically rely on financial incentives, such as tax credits or matching contributions from employers. They work, but a simple check-off system in which new employees were required to indicate their preferred contribution rate (which could have been $0) was much more cost-effective. This "nudge" cost only $2, and increased the retirement savings rate by an average of $200 -- a benefit-cost ratio of 100:1. By contrast, simply providing information to employees cost $4.04, and yielded an increase of only $58.95 (a ratio of 15:1), and a program in which the employer matched 50% of employee contributions cost $82.40, and yielded an increase of only $244.50 (a ratio of 3:1). Similar advantages were found for "nudge" programs intended to increase college enrollment, conserve energy, and increase flue vaccinations. Nudges work, and they work better -- "more bang for the buck") than traditional educational or incentive programs.

More recently, Sunstein has backtracked a little -- but not really. As a regulator in the Obama Administration, he frequently argued in favor of requiring restaurants and even movie theaters to provide information about the caloric contents of the foods they sold -- leading one of his friends to complain that "CASS RUINED POPCORN". In Too Much Information: Understanding What You Don't Want to Know (2020) he argues that, while governments, industries, and other institutions should be transparent, people need not be told everything: we should only be given information that, if acted upon, would improve our lives. An example: would you want to know the exact date of your death? Or whether you were genetically predisposed to an incurable, fatal illness? Most people answer "No" to both questions. If you think about it, though, this position is actually consistent with the overall goals of the behavioral economists. The fundamental insight of behavioral economics is that, by virtue of natural cognitive limitations, people have difficulty processing all the information that is available to them. The availability of "too much information" often leads to decision-making paralysis, or simply to bad decisions. Sunstein, the government regulator, wants now to regulate the information available to people to emphasize only that which will foster good, adaptive decision-making.

See "" by Shane Goldmacher, NYT 04/04/2021.

For a look at the work of OIRA, especially its activities related to the water crisis in Flint, Michigan, see "Good Behavior" by Sarah Stillman, New Yorker, 01/23/2017).

Bayes's Theorem: A New Perspective on Rationality

Bayes's Theorem, initially derived by Thomas Bayes, an 18th-century English clergyman and further developed by Pierre-Simon Laplace, a 19th-century French mathematician, offers an alternative perspective on judgment and decision-making, especially under the conditions of uncertainty that dominate human experience, thought, and action in the real world. I discussed Bayes's Theorem in the lectures on Methods and Statistics in Psychology. Now, I bring it back in the context of judgment and decision-making.

Bayes's Theorem can be stated succinctly:

Old Belief + New Data = Better Belief.

In mathematical terms, Bayes's Theorem is stated as follows:

$P(A\mid B)=\frac {P(B\mid A) \cdot P(A)}{P(B)}$

Where A and B are two statements; p(A | B) is the probability that A is true given that B is true, etc.

Put in ordinary language:

p(evidence given the hypothesis) x p(hypothesis)

p(hypothesis given the evidence) = --------------------------------------------------------------------

p(evidence)

Here's a famous example of Bayes's Theorem in operation (slightly modified from Casscells et al. NEJM 1978):

A woman goes for a routine mammogram and receives a positive test. Epidemiologists know that about 1% of women aged 40-50 have breast cancer. Assume that a woman who actually has breast cancer has an 80% chance of receiving a true positive result -- that is, an indication that she does, in fact, have breast cancer. But a woman who does not have beast cancer has a 10% chance of receiving a false positive result -- that is, an indication that she has breast cancer when in fact she does not. What is the likelihood that a woman who has received a positive test actually has breast cancer?

When a group of physicians was asked this question (Eddy,1988), their average answer was 75%. But Bayes's Theorem gives quite a different answer: 8.33%.

Assume that A = breast cancer, and B = a positive test. Therefore:

                                                                                                    p(positive test given cancer) x p(cancer, with or without positive test)

                                                p(cancer given a positive test) = -------------------------------------------------------------------------------------------

                                                                                                                           p(positive test, with or without cancer)

Here's how it works (based on Hastie & Dawes (2010, Fig. 8.6).

First, cast the information provided into a 2x2 table. Imagine a sample of 1,000 women aged 40-50 who have a mammogram:

We know that 10 of these women (1%) will actually have breast cancer.

Of these, 8 (80% of 10) will have a true-positive mammogram.

The remaining 2 (20%) will have a false-negative mammogram.

The remaining 990 women (1,000 - 10) will have no cancer.

But 99 of these (10% of 990) will nonetheless have a false-positive mammogram.

And the remaining 891 women (90% of 990) will get a true-negative mammogram.

Cancer

No Yes

Test Result
Positive
99
8
107

Negative
891
2
893

990
10
1000

Therefore, the actual likelihood of having cancer, given a positive test result, is 8/(8+99) = 0.075, which is far less than 75%. Still, a positive test result increases a woman's likelihood of having breast cancer by roughly 750%.

I know this doesn't sound right, but it is -- and that's the point. People aren't very good at handling probabilities, and this is true even of physicians who administer and interpret these tests on a daily basis. The problem is inflated by base-rate neglect, also known as the base-rate fallacy: there is a tendency to focus on the 90% true-positive rate, but to ignore the base rates.

Note: This is emphatically not reason for women to forego mammography. That's what biopsies are for. Mammography is only an initial screening device, and a biopsy is needed to determine whether whatever anomaly appears in the scan is really a malignant tumor.

When you think about it, we're faced with problems like this all the time -- some of them, like the cancer example, very consequential indeed. But we don't always have our calculators at hand to compute the relevant probabilities. And, worse, we don't always have access to information about base rates, which would allow us to construct the 2x2 table used here. Still, there are things we can do. Reviewing a couple of recent books on human rationality in the New Yorker (including Rationality: What It Is, Why It Seems Scarce, Why It Matters by Steven Pinker and The Scout Mindset: Why Some People See Things Clearly and Others Don't by Julia Galef), Joshua Rothman offered the following advice ("Thinking It Through", 08/23/20121):

There are many ways to explain Bayesian thinking... but the basic idea is simple. When new information comes in, you don't want it to replace old information wholesale. Instead, you want it to modify what you already know to an appropriate degree. The degree of modification depends both on your confidence in your preexisting knowledge and on the value of the new data. Bayesian reasoners begin with what they call the "prior" probability of something being true, and then find out if they need to adjust it....

Bayesian easoning implies a few "best practices". Start with the big picture, fixing it firmly in our mind. Be cautious as you integrate new information, and don't jump to conclusions. Notice when new data points do and do not aler your baseline assumptions (most of the time, they won't alter them), but keep track of how often those assumptions seem contradicted by what's new. Beware the power of alarming news, and proceed by putting it in a broader, real-world context.

In a sense, the core principle is mise en place. Keep the cooked oonfirmation over here and the raw information over there; remember that raw ingredients often reduce over heat. But the real power of the Bayesian approach isn't procedural; it's that it replaces the facts in our minds with probabilities... [A] Bayesian assigned probabilities to [verbal] propositions. She doesn't build an immovable world view; instead, by continually updating her probabilities, she inches closer to a more useful account of reality. The cooking is never done.

For more on applications of Bayes's Theorem, see:

Rational Choice in an Uncertain World (2e, 2010) by Reid Hastie and Robyn Dawes (the best single introduction to the psychology of judgment and decision-making).
The Theory that Would Not Die, How Bayes' Rule Cracked the Enigma Code, Hunted Down Russian Submarines and Emerged Triumphant from Two Centuries of Controversy by Sharon Bertsch McGrayne (2011).

Intelligence

People in general might not be stupid, but some people appear to be smarter than others, and psychology has been interested in these individual differences in intelligence for well over 100 years. How do we measure these differences? What is the nature of human intelligence? Where do they come from?

The Origins of Mental Testing

Intelligence testing, and mental testing in general, has its origins in two rather different trends, elitist and egalitarian.

Sir Francis Galton (1822-1911), a cousin of Charles Darwin, got interested in mental abilities and mental testing. Darwin had shown that the physical characteristics of animals had been shaped by natural selection, as species adapted to their environments. Similarly, Galton thought that natural selection also operated on mental characteristics. In his study of Hereditary Genius (1869), he argued that intellectual distinction passed through families. Of course, he had no idea about genes, but he was convinced that individual differences in mental abilities were heritable, and that natural selection favored high levels of ability. Galton invented the study of anthropometrics, measuring people's mental as well as physical characteristics at a testing station he established at London's South Kensington Museum in 1884. Along with his student, Karl Pearson, he invented a statistic, the correlation coefficient, to measure the relations between traits (Pearson did the actual mathematical work, and the statistic is formally known as "Pearson's product-moment correlation coefficient", abbreviated r). Alas, Galton also founded he Eugenics Society, intended to give natural selection a hand by promoting the selective breeding of high-ability individuals, and discouraging the breeding of those with low levels of ability.
Across the English Channel, in France, there was a move to open public education to all qualified children. It was clear, however, that there were some children who would not profit from education, or who at least required special educational programs. In order to insure that the selection for schooling was made on a strictly objective basis, uncontaminated by class, religious, or ethnic prejudice, the minister of public instruction commissioned one of France's leading psychologists, Alfred Binet (91857-1911) to devise a test that could serve this purpose. Binet, collaborating with another French psychologist, Theodule Simon (1873-1961), produced their "Scale for Measuring Intelligence" in 1905.

The Evolution of IQ

The original Binet-Simon scale consisted of some 30 items which were intended to be "work-samples" of the kinds of activities that French elementary schoolchildren typically engaged in. A later version of the test, published in 1908, had almost 60 such items. Originally, Binet and Simon simply arranged the items of their test in increasing order of difficulty, based on pilot testing. Any child would pass some tests, and then start to fail. Children who started to fail later in the testing sequence then were deemed more intelligent than children who failed earlier.

That's okay, but where's the cutpoint? Where's the threshold that determines whether the child is appropriate for regular schooling or perhaps needs special education? So a couple of years later, Binet and Simon did something different. They grouped their items into clusters according to the age-level of children who routinely passed them --so very young children might be able to follow a moving object with their eyes, and they might be able to find and eat a square of chocolate wrapped in paper, but they might not be able to repeat a sentence 15 words in length or tell how two common objects are different. Somewhat older children might be able to tell how two common objects are similar or different, might be able to make rhymes, but not be able to use three nouns in a sentence, or define abstract terms. In this way, Binet and Simon developed norms for test performance for each age level, from age 3 to age 13, and then they compared the child's performance to children in various age groups. The items passed by each child determined his or her mental age. Children whose mental age was lower than their chronological age were deemed to be more or less mentally retarded. If the child passed tests that were also passed by a majority of 4- or 5-year-olds, but, but failed tests that were passed by a majority of 6- or 7-year-olds, the child would be given a mental age of 5. If the child passed tests that were also passed by a majority of 9-year-olds, but not by a majority of 10-year-olds, then the child would be given a mental age of 9. So if schooling started in France when children were 5 years old, a child with a mental age of 5 was deemed ready for school. A child with a mental age less than 5 was deemed not to be ready for school, and maybe in need of special educational services.

Somewhat later, William Stern, a German psychologist, invented a method for calculating what he called the child’s “intelligence quotient”. Stern divided the child's mental age by his or her chronological age and then multiplied that by 100. Thus, a child who was 5 years old and had a mental age of 5 had an IQ of 100. A child who was 5 years old, but had a mental age of 4, was given an IQ of 80. A 5-year-old with the mental age of 6 received an IQ of 120.

The Binet-Simon scale was quickly imported to America. Louis Terman, a psychologist at Stanford University, translated the scale, adapting it for American children as the Stanford-Binet Intelligence Scale (1916), which is subsequent revisions remains in use today. Robert Yerkes, at Yale, adapted the Stanford-Binet for group testing of recruits (including draftees) into the United States Army during World War I. His tests, known as the Army Alpha (for literate testees) and Army Beta (for illiterates), laid the foundation for the modern Armed Forces (now Services)Qualification Test, still in use. And David Wechsler, a psychologist working at the Bellevue Hospital in New York City, developed the Wechsler-Bellevue Intelligence Scale, intended for use with adults.

The modern form of the Wechsler-Bellevue scale is the Wechsler Adult Intelligence Scale (WAIS); later Wechsler developed a version for children known as the Wechsler Intelligence Scale for Children (WISC). Both remain in use, following periodic updating, and the WAIS is generally considered the "gold standard" for the measurement of intelligence. The WAIS consists of 11 subscales, 6 "verbal" and 5 nonverbal or "performance" scales. Although the scale produces separate scores for "verbal" and "performance" intelligence, these are combined to yield an aggregate measure of general intelligence.

The Americanization of intelligence testing produced new ways of quantifying intelligence, not just new tests. Following the work of William Stern, a German psychologist, Terman introduced the intelligence Quotient, or IQ, calculated by taking the ratio of the subject's mental age to chronological age, and multiplying by 100.

IQ = (MA/CA) x 100.

But this formula produced a real anomaly, because when the test was used to measure intelligence in adults the ratio inevitably was less than 1, indicating that all adults were of subnormal intelligence.

To correct this problem, Wechsler introduced the deviation IQ.He administered the test to representative samples of different age groups, 16-75, and calculated the mean and standard deviation for each group. Using a statistical procedure known as a z-transformation, he converted these distributions of raw scores to a mean of 100 and a standard deviation of 15. Then, the individual's IQ was calculated by comparing him to his own age group.

To illustrate, assume that a test has a mean score of 40, with a standard deviation of 12. By means of the z-transformation, we can convert these scores to standard scores where M = 100 and SD = 15.

Raw Test Score

Deviation Score

100

115

130

The result of such a procedure is a frequency distribution of IQ scores that closely follows the normal or "Gaussian"distribution -- commonly known as the "the bell-shaped curve". There are relatively few individuals with very high IQs, and very few individuals with very low IQs; most people fall somewhere in the middle (remember the rule of "68, 95, and 99").

Wechsler explained the apparent departure from normality at the low end with a two-factor theory of what used to be called mental retardation -- which we now call intellectual disability.. Some people have low IQs just by chance: somebody's got to be at the bottom of the distribution, just as somebody's got to be at the top. But Wechsler also thought that mental retardation could also result from "accidents" occurring prenatally, perinatally, or postnatally: these cases would add to the numbers of low-IQ individuals who would be expected to occur just by chance.

The shape of this distribution is exactly what Galton would have predicted: he thought that mental characteristics should be normally distributed in the population. But note that the bell-shaped distribution of IQ is to some extent an artifact of the way IQ was computed from the mean and standard deviation. An example from the sociologist Claude Fischer and his colleagues, based on the AFQT, illustrates the point. The shaded portion of this figure shows the distribution of raw scores obtained when the AFQT was administered to a sample of Americans in 1980 as part of the National Longitudinal Study of Youth. the scores are widely distributed, to be sure, but the distribution doesn't look very much like a bell. But when the scores are transformed into z scores (this time, with a mean of 50), the "bell curve" pops out. So the bell-shaped distribution of IQ is something of a statistical artifact.

Now, there's nothing wrong with this sort of procedure, it's done all the time to simplify statistical analysis (technically, it converts IQ from an ordinal scale to an interval scale). There are lots of other mental tests that employ the same sort of procedure:

The WAIS, the WISC, and now the Stanford-Binet itself all use deviation scores, with M = 100 and SD = 15.
The SAT, GRE, and GMAT exams produced by the Educational Testing Service to aid the admissions process to college and graduate school, are standardized with M = 500 and SD = 100.
The LSAT, also produced by the ETS, is standardized with M = 150 and SD = 10.
And, of course, there is the infamous "forced curve" employed in some educational institutions (it is especially popular in business and law schools), where the average grade is set to something like a C (or C+, or B-) by fiat, so that some students must fail, and only relatively few students can get As -- no matter how high their raw scores are.

Now, as I say, there's nothing wrong with this kind of maneuver, it's perfectly appropriate, so long as everybody is clear about what's going on some students in this course will be familiar with the forced curve sometimes used by college instructors so that the average course grade is set at something like a C or maybe C+ and then the other grades are arrayed around that so that most people will get some kind of C; some fewer people will get Bs; even fewer people will get As. In such a system, even if everybody did really well on a test, they wouldn't all get As. Some would have to get Cs because they were only at the mean of the distribution.

By the way we don't do that in this course. Here there is an absolute standard for an "A". If you achieve 90% of the available points, you are going to get some kind of "A", that’s a guarantee, and if everybody in the class gets some kind of "A" that's fine with us. Other professors take different positions and the forced curve is very popular in the natural sciences and mathematics. It's also very popular in law school and business school so watch out when you get there.

Again, there may be good reasons for these statistical decisions (though, frankly, it's hard to think of a cogent rationale for the "forced curve"). But not too much should be made of the shape of the resulting distribution of IQ itself --because it's a mathematical artifact.

The Properties of Psychometric Tests

Intelligence tests, like all measuring instruments must have certain psychometric properties.

Standardization -- rules for administering and scoring the test that guarantee that each individual responds to the same situation.
Norms -- some sense of the average test score in the population, as well as the variation observed around that mean (usually expressed as the standard deviation), to serve as a guide for interpreting individual test scores.
Reliability -- some degree of precision in measurement, expressed either by

Inter-rater reliability (agreement between two observers rating the same person) or
Test-retest reliability (agreement between measurements taken on two different occasions).

Validity -- some sense that the test accurately measures the trait it is supposed to measure, expressed as the ability of the test score to predict some external criterion of the trait. For example, intelligence tests might be validated against a criterion of educational outcome, such as grades completed or grade-point average.

In addition, one psychometric property is highly desirable, even if it is not strictly necessary:

Utility -- or some sense that the test provides an economic advantage over alternative measures of the same trait, expressed as the cost/benefits ratio (where cost refers to the expense of constructing, administering, and scoring the test, and benefits refers to the validity of the test in question). The most efficient tests have high validity but low cost of development and administration.

The Structure of Intelligence

Intelligence has been notoriously hard to define -- leading some theorists to fall back on a kind of operational definition -- that "intelligence is what intelligence tests measure. But other theorists have not been so complacent, and have offered detailed theories about what intelligence is.

Classic Theories

The use of a single IQ score as a measure of intelligence implies that intelligence itself is a single, unitary entity. In fact, Charles Spearman (1904), a British psychologist who worked with Galton and Pearson (he, too, devised a correlation coefficient, formally known as "Spearman's rank-order correlation coefficient", and abbreviated with the Greek letter , or "rho") noticed that the intercorrelations among various tests and subtests were all positive -- that is, high scores on one test were associated with high scores on the other tests. On this basis, he argued that intelligence was, indeed, a single, unitary, generalized intellectual ability, which he called g (for "general intelligence"). Why then weren't the intercorrelations absolutely perfect? Because there were also specific factors, designated s, that affected performance on individual tests. You can be pretty smart, but if you're clumsy with your hands, you're not doing to do well on the "Block Design" or "Object Assembly" subscales of the WAIS. For Spearman, performance on any particular test of intelligence was determined by the combination of general intelligence,g, plus some test-specific factor, s.

A contrary position was taken by E.L. Thurstone (1941), an American psychometrician. Thurstone employed a relatively new technique, "group" or "multiple" factor analysis (which had not been available to Spearman) to examine the pattern of intercorrelations more closely. His factor analysis revealed clusters of tests in which the correlations among the tests within a cluster were substantially higher than the correlations of tests across different clusters. These clusters of tests defined factors. Thurstone identified seven such clusters, which he called Primary Mental Abilities:

Number
Word Fluency
Verbal Meaning
Memory
Reasoning
Space
Perceptual Speed.

Thurstone did not deny the existence of general intelligence. After all, the various tests were all positively intercorrelated with each other, meaning that the entire set of tests was saturated with single "general factor". But for Thurstone,g was a relatively weak superordinate factor, and the real action was with the seven "group factors" representing the primary mental abilities.

As a sidelight, there is an interesting lesson in the scientific method here. Spearman believed that intelligence was a single, unitary ability -- "One Big Thing", as it were. He could see the positive correlations among all the different tests of mental abilities with his naked eye -- what, in the lectures on "Statistics and Methods in Psychology", I called the "traumatic interocular test". But he wanted proof that what he thought he saw was really true, and so he invented the first technique for factor analysis to show that a common factor ran through all his tests (and he also coined the term "factor" in this statistical context). But the method of "common" factor analysis which Spearman invented was expressly designed only to document this single common factor. It was not intended to pick up on group factors such as those later identified by Thurstone. For that you had to have a different method of factor analysis, known as "group" or "multiple" factor analysis. Ironically, this newer method was originally developed by a student of Spearman's, Nancy Carey, in her doctoral dissertation published in 1916. Thurstone improved on Carey's method. But the interesting lesson in science is this: Spearman believed that intelligence was a single unitary ability, and he invented a method to prove it; but his method didn't really enable him to test the alternative hypothesis, that there may be different kinds of intelligence. Scientists aren't immune to finding what they're looking for.

An even more radical position was adopted by J.P. Guilford (1967), another American psychologist. According to this Structure of Intellect theory, intelligence is composed of a very large number of discrete abilities, which can be represented by a cube with the following dimensions:

Content, or the kinds of information:

Featural, having to do with physical features perceived through the sensory apparatus;
Symbolic, information represented by symbols that have no intrinsic meaning, such as letters and numerals;
Semantic, such a words or sentences;
Behavioral, or individual actions.

Here, Guilford laid the foundation for ideas about social intelligence, of which we will have more to say anon.

Operations which can be performed on these contents:

Evaluation, or judging whether information is accurate or valid;
Convergent production, following rules to arrive at the single solution to a (well-formed) problem;
Divergent production, or the ability to arrive at multiple solutions to an (ill-formed) problem;

This was probably Guilford's most interesting idea -- that intelligence should not necessarily be measured in terms of the subject's ability to get the one correct answer; as such, it set the stage for modern research on creativity.

Memory, or the ability to encode and retrieve information;
Cognition, or the ability to comprehend available information.

Products resulting from various operations performed on various contents:

Units, or single items of new knowledge;
Classes, or categories of units sharing features in common;
Relations, linking units into opposites, associations, sequences, etc.;
Systems, or complex networks of relations;
Transformations, or changes in knowledge from one form to another;
Implications, predictions or inferences based on information.

With 4 kinds of contents, 5 different operations, and 6 specific products, Guilford's original theory predicted 120 different intellective factors (4 x 5 x 6 = 120). Before his death in 1987, Guilford's research had identified tests for 98 of these 120 factors postulated by the theory. However, the testing process has also revealed new classes of intellectual abilities not predicted by the original theory, so that more than 20 new cells were added, and the revised Structure of Intellect predicted a total of 180 different kinds of intelligence.

That's a lot of different abilities. To make things even more interesting, Guilford argued that each of these abilities was independent of, uncorrelated with, each of the others. Thus, Guilford denied that there was anything lie Spearman's g -- even as a weak Thurstonian superordinate factor.

On the other hand, Raymond B. Cattell, another American psychologist, essentially embraced Spearman's concept of g. Still, he thought that there were two types of general intelligence:

Fluid intelligence, abbreviated Gf, was defined as a general ability to perceive relationships. Cattell thought that Gf had an innate neurological basis in brain structure (the more neural connections, the better you can see relationships -- that sort of thing).
Crystallized intelligence, abbreviated Gc, was a product of the environment, especially educational experience.

Cattell's terms were based on a deliberate chemical analogy: fluids have no shape, while crystalline structures are formed from fluids or from materials dissolved in fluids. Gf is raw intellectual ability; Gc is what results when Gc is given shape by education and other experiences.

Cattell believed that traditional intelligence tests assessed Gc -- after all, things like arithmetic ability and vocabulary are a product of education and experience. Accordingly, he called for the development of "culture fair" tests which would assess intelligence in a manner that was not contaminated by education and other cultural differences. One such test is Raven's Progressive Matrices, in which the subject is shown a series of objects, and then must complete a new series according to the same pattern. Here, intelligence is explicitly defined as the ability to perceive relationships, the assessment is essentially nonverbal, and the stimulus materials are completely culture-free.

One early attempt to develop a "culture-fair" intelligence test was at Ellis Island, the main gateway, from 1892-1954, for European immigration to the United States (the other major center was on Angel Island, for Asian immigrants, was in San Francisco Bay off Tiburon; both are well worth a visit). On Ellis Island, potential immigrants were screened for health by the US Public Health Service (and also for financial resources and radical political views). Part of the health screening was a test of intelligence, for the purpose of "sorting out of those immigrants who may, because of their mental make-up, become a burden to the State of who may produce offspring that will require care in prisons, asylums, or other institutions". Those are the words of Howard A. Knox, a USPHS physician assigned to Ellis Island,who recognized that the standard IQ tests of his time, the Stanford-Binet and WAIS, tapped cultural and linguistic knowledge that new immigrants might not possess. Accordingly, he developed the Feature Profile Test, basically a jigsaw puzzle which subjects had to assemble to form a profile of a man with all the features -- eyes, ears, nose -- in their proper places. Knox figured that even uneducated people, unfamiliar with the English language and American culture, ought to be able to assemble such a puzzle in 5 minutes. In 1913-1914, about one million immigrants were tested; about 1% failed, and were returned to their home countries. For details, see Howard Andrew Knox: Pioneer of Intelligence Testing at Ellis Island, by John T.E. Richardson (2011).

Perhaps the apotheosis of "culture fair" intelligence tests is one that could be given to nonhuman animals as well as to nonverbal humans. Just such a universal intelligence test has been proposed by Jose Hernandez-Orallo, a Spanish psychologist, and David Dowe, a psychologist working in Australia, based on a complicated concept in information theory called the Kolmogorov complexity. The general idea is to use instrumental conditioning techniques to engage the subject (human or animal) in some kind of puzzle-solving activity -- playing a game like tic-tac-toe, for example, in order to obtain a treat. Then, the Kolmogorov complexity would be calculated for a wide variety of such puzzles. Sticking with the tic-tac-toe example, an animal that could play the game on a 3x3 matrix would be classified as more intelligent that one that could play only on a 1x3 matrix; an animal that could play on an 8x8 checkerboard would be classified as more intelligent than one that could only play on the standard 3x3 matrix, and so on.

Modern Approaches

Spearman, Thurstone, Guilford, Cattell, and other early investigators offered theories of the structure of intelligence based on the correlation coefficient and factor analysis. Other theorists, who arrived on the scene in the wake of the cognitive revolution in psychology, have offered alternative views of the structure of intelligence informed not just by factor analysis, but also by experimental evidence from the laboratory or neuropsychological research.

One important example is the theory of multiple intelligences proposed by Gardner (1983, 1993). Unlike Spearman (1927), and other advocates of general intelligence (e.g., Jensen, 1998), Gardner has proposed that intelligence is not a unitary cognitive ability, but that there are seven (and perhaps more) quite different kinds of intelligence, each hypothetically dissociable from the others, and each hypothetically associated with a different brain system.

Some of these proposed intelligences (linguistic,logical-mathematical, and spatial intelligence) are "cognitive" abilities somewhat reminiscent of Thurstone's primary mental abilities.
Others, such as musical and bodily-kinesthetic intelligence are derived from Gardner's longstanding interest in the psychology of art (not to mention athletics).
Two other forms of intelligence are explicitly personal and social in nature. Gardner defines intrapersonal intelligence as the person's ability to gain access to his or her own internal emotional life, and interpersonal intelligence as the individual's ability to notice and make distinctions among other individuals.

Although Gardner's (1983) multiple intelligences are individual-differences constructs, in which some people, or some groups, are assumed to have more of these abilities than others, Gardner does not rely on the traditional psychometric procedures -- scale construction, factor analysis, multitrait-multimethod matrices, external validity coefficients, etc. -- for documenting individual differences. Rather, his preferred method is a somewhat impressionistic analysis based on a convergence of signs provided by eight different lines of evidence.

Chief among these signs are isolation by brain damage, such that one form of intelligence can be selectively impaired, leaving other forms relatively unimpaired; and exceptional cases, individuals who possess extraordinary levels of ability in one domain, against a background of normal or even impaired abilities in other domains. Alternatively, a person may show extraordinarily low levels of ability in one domain, against a background of normal or exceptionally high levels of ability in others).
- So, for example, Gardner (1983) argues from neurological case studies that damage to the prefrontal lobes of the cerebral cortex can selectively impair personal and social intelligence, leaving other abilities intact. The classic case of Phineas Gage may serve as an example (Macmillan, 1986). On the other hand, Luria's (1972) case of Zazetsky, "the man with a shattered world", sustained damage in the occipital and parietal lobes which severely impaired most of his intellectual capacities, but left his personal and social abilities relatively intact.
- Gardner also notes that while both Down syndrome and Alzheimer's disease have severe cognitive consequences but little impact on the person's ability to get along with other people, Pick's disease spares at least some cognitive abilities while severely impairing the person's ability to interact with others.
In addition, Gardner postulates several other signs suggesting different types of intelligence. Among these are identifiable core operations, coupled with experimental tasks which permit analysis of these core operations; and, of course,psychometric tests which reveal individual differences in ability to perform them.

While Gardner's theory of multiple intelligences is largely informed by neuropsychological case material and the study of savants, Sternberg's Triarchic theory of Intelligence is based on experimental cognitive psychology, especially research on reasoning itself. As its name implies, the Triarchic Theory proposes that the concept of "intelligence" subsumes three quite different facets:

Analytical intelligence comes closest to how intelligence is defined in the psychometric tradition, in terms of academic problem-solving -- though even here the ideas are more influenced by Sternberg's own experimental research than by methods such as factor analysis.

The metacomponents of analytical intelligence monitor and control information processing.
The performance components include the basic operations involved in perception, working memory, comparison, knowledge retrieval, and the like.
The knowledge acquisition components guide the learning process.

Creative intelligence takes a page from Guilford's book, and involves such things as insight learning and novel responses to problems.

Novelty skills enable the individual to deal with new situations.
Automatization skills enable the individual to automatize information-processing, so that it consumes less cognitive capacity.

Practical intelligence moves Sternberg's theory furthest away from the traditional psychometric approach, shifting from academic problems to the sorts of problems encountered in the ordinary course of everyday living. Practical intelligence includes the skills that enable a person to adapt to the environment,change the environment, or - -if neither of these strategies works -- to select a new environment.

We usually think of intelligence as a cognitive ability having to do with the person's ability to perceive and remember and reason and solve problems. But as long ago as 1920, E.L. Thorndike -- the same person who studied cats escaping from puzzle boxes, and who also worked on the Army Alpha and Beta tests -- defined social intelligence as "the ability to understand and manage men and women, boys and girls -- to act wisely in human relations" (1920, p. 228). His hypothesis was that social intelligence was distinct from academic intelligence -- an idea that remains controversial to this day.

Interest in social intelligence was spurred by Guilford's "structure of Intellect" theory: the 30+ types of "behavioral" intelligence are all social in nature (for a historical review see Kihlstrom & Cantor, 2020).

Similarly, Peter Salovey and John Mayer have argued for a concept of emotional intelligence, distinct from cognitive intelligence. Emotional intelligence has to do with individual differences in abilities related to the emotional domain of living. Salovey and Mayer defined emotional intelligence as the ability to monitor one's own feelings and those of others; to discriminate among these feelings; and to use information about feelings to guide one's thoughts and action. Again, although the idea is attractive, it's still not entirely clear that emotional intelligence is anything more than a specific application of general analytical intelligence.

Salovey and Mayer (1990; Salovey & Grewal, 2005) have argued for individual differences in emotional intelligence: "the ability to monitor one's own and others' feelings, to discriminate among them, and to use this information to guide one's thinking and action" (p. 189). Emotional intelligence subsumes four component abilities:

The ability to perceive emotions as expressed in people's faces, voices, gestures, and the like, as well as the ability to perceive and identify one's own emotions. If you can't differentiate between fear and anger, for example, you're going to get in trouble!
The ability to use emotions in the service of thinking and problem solving. Positive moods generally facilitate cognition, so it's best (1) not to study when you're in a foul mood; and (2) wait until your mood brightens a little.
The ability to understand emotions, especially how emotions are related -- for example, how fear can turn into anger, and how momentary joys and sorrows can fade.
The ability to manage emotions, in terms of both self-regulation and the management of emotions in others. If you're in a foul mood, how do you get yourself out of it? If someone else is angry, how do you calm them down.

The concept of emotional intelligence was popularized by Daniel Goleman, a psychologist turned popular-science writer, in his best-selling book, Emotional Intelligence (1995). Salovey and Grewal (2005) summarize the state of scientific research on the topic.

Goleman has also popularized the notion of social intelligence in another best-selling book (2006).

Goleman himself has been greatly influenced by developments in the new field of social neuroscience (2006). As he imagines it, the social brain is not a discrete clump of tissue, like MacLean's (1970) 'reptilian brain', or even a proximate cluster of structures, like the limbic system.Rather, the social brain is an extensive network of neural modules, each dedicated to a particular aspect of social interaction, and grouped under two major headings:

Social awareness includes the ability to perceive other people's internal mental states, to understand their feelings and thoughts, and to comprehend the demands of complex social situations.

Primal empathy is the ability to pick up on, and share, the feelings of others.
Empathic accuracy is the ability to understand another person's thoughts, feelings, and desires.
Attunement is the ability to pay attention to a person -- to really listen to what s/he is saying.
Social cognition refers to one's knowledge of the social world, how it's structured and how it works.

Social facility, or relationship management, "builds on social awareness to allow smooth, effective interactions" (2006, p. 84).

Interaction synchrony is the ability to coordinate social behaviors with those of others, especially at the nonverbal level.
Self-presentation is the ability to convey our own thoughts, feelings, and desires to other people accurately and efficiently.
Influence is the ability to control the outcomes of social interactions.
Concern for others, simply, is the ability to care for other people, in thought and deed.

Thorndike, Guilford, and Goleman all have a psychometric conception of social intelligence in mind -- that social intelligence is a dimension of individual differences; and that these individual differences can be measured by standardized tests, the same way we measure any other mental ability. As with emotional intelligence, there has been considerable debate as to whether social intelligence is really a distinct form of intelligence -- or whether it is just general intelligence deployed in the cognitive domain.

An alternative conception of social intelligence has been proposed by --ahem -- Cantor and Kihlstrom (1987; see also Kihlstrom & Cantor, 2000, 2010). As opposed to the psychometric ability view, they have proposed a knowledge view of social intelligence. From their point of view, social behavior is cognitively mediated, and individual differences in social behavior reflect individual differences in the declarative and procedural social knowledge that the person brings to bear on any particular social interaction. The important thing, then is not how much the person knows about the social world, but rather what intelligence -- what knowledge -- about the social world the individual possesses.

Where Does Intelligence Come From?

If the most controversial question in intelligence is whether it is adequately measured by IQ tests, the second-most controversial question concerns the role of nature versus nurture -- specifically, whether individual differences in intelligence are inherited. This question goes back at least as far as Sir Francis Galton, a cousin of Charles Darwin, who argued strongly for a strong heritable component.

We'll discuss the nature-nurture question in more detail later, in the lectures on Psychological Development. The bottom line there will be that nature versus nurture is a false issue, and that development proceeds through the interaction of genetic and environmental factors. For now, though, let me just make a few points briefly.

Despite the controversies concerning IQ as a measurement of intelligence, almost all research on the origins of intelligence employs IQ tests: they're simply the best measures we have for this purpose. The best of these studies employs the culture-fair Raven Progressive Matrices test discussed earlier.
There is no question that there is a substantial genetic contribution to individual differences in IQ.

Monozygotic (MZ) twins, who are genetically identical, are more alike on IQ than dizygotic (DZ) twins, who share no more genes than any other pair of siblings.
Adopted children are more similar in IQ to their biological parents (and siblings) than to their adoptive parents (and siblings).

But there is also no question but that the environment also makes a substantial contribution to individual differences in IQ.

The similarities between MZ twins, and between adopted children and their biological parents, are not so strong as to preclude any environmental influence.
Heritability is higher for children who are socially and educationally advantaged, compared to children who are disadvantaged in terms of their family's socioeconomic and educational status (SES). Apparently, children born into poverty do not always get to achieve what might be called their "full genetic potential", as encoded in their genomes.
Adopted children from low-SES families, who are placed in higher-SES families, typically show substantial gains in IQ over those who are placed in lower-SES families.
Schooling itself increases IQ.

Children who are deprived of formal schooling for an extended period of time show lower IQs than those who enter at the usual time.
Enriched pre-kindergarten programs, such as Head Start, targeting low-SES children, produce substantial increases in IQ. It's true that these gains tend to be lost in later years of schooling, so that children who got Head Start end up at about the same place, IQ-wise, as those who did not. But this is probably because Head Start children often move from Head Start programs into relatively poor schools.

We know that, when it comes to cognitive skills, there is what Kimberly Noble, at Teachers College, Columbia University, has called a wealth effect -- test scores are correlated with socioeconomic status (Noble et al., Developmental Science, 2007). She and her colleagues have also found that parental education and family income were correlated with individual differences in children's brain size, as measured by cortical surface area and cortical thickness (Nature Neuroscience, 2015). True, these findings are only correlational, and can't prove the direction of causality, but Noble's studies have ruled out the obvious confounding variables. They are, at least, consistent with the hypothesis that sociocultural adversity can have an effect on brain maturation (see "Brain Trust" by Kimberly G. Noble, Scientific American, 03/2017).

So, the best way to think about the origins of intelligence is that there is probably some predisposition to low or high intelligence carried on the genes. The particular genes that contribute to IQ have not been identified. One such gene, known as IGF2R, looked promising, but even that gene accounted for only about 2% of variance in IQ. Anyway, the study that uncovered this apparent link failed to replicate.

In 2017, however, Danielle Posthuma, a Dutch geneticist, working with many colleagues, performed a meta-analysis of a number of genome-wide association studies which sought to identify specific genes associated with high (or low) intelligence (Sniekers et al., Nature Genetics, 2017). With a database of 13 studies involving more than 78,000 subjects, they identified 52 such genes -- interestingly, IGF2R was not on the list. However, taken together these genes accounted for less than 5% of population variance in IQ. Thus, (1) even more, as-yet-unidentified genes, are involved in the genetic contribution to intelligence; (2) the studies including in this meta-analysis involved only subjects of European descent, leaving open the possibility that studies of other populations might yield different results; (3) it's completely unclear how any of these genes affect intelligence; and (4), as the authors themselves are at pains to point out, the environmental contributions to intelligence are also significant and substantial.

In any event, whatever genetic predisposition exists is apparently not decisive. Environmental enrichment can enhance IQ, while environmental deprivation can reduce it. In particular, schooling itself enhances intelligence -- which is one reason you're here reading this.

Are We Getting Smarter?

Although there has been increasing interest in various "types" of intelligence, the fact of the matter is that most intelligence research, even now, employs traditional IQ tests. Perhaps one of the most surprising results of this research is that, despite pervasive individual differences in IQ scores, on the whole people are getting smarter. This is known as the Flynn Effect, after the person who discovered it: John Flynn, a professor of political science at the University of Otago, in New Zealand (Flynn, 1984, 1987, 1999; though actually, as Flynn is the first to acknowledge, the trend was actually first noticed by another psychologist, Reed Tuddenham).

Recall from the earlier discussion of the "bell curve" that, by convention, the distribution of IQ test scores is typically standardized to a mean of 100. When new revisions of the tests have been published, new norms are also collected, and these, too, are standardized to a mean of 100. So, by definition, the average IQ in the population remains constant at 100. But just as normalization obscures the true distribution of IQ scores in the population, so re-norming these tests obscures any changes in actual test performance that might have occurred over time. Think about inflation in economics: a $1 bill is a $1bill, but $1 in 2009 buys what $0.11 would have bought in 1948. Put another way, $1 in 1948 would have the buying power of about $8.96 in 2009. (It's not an accident that Flynn is not a psychologist, but rather a political economist.)

To see if there was any "inflation" -- or "deflation" in IQ scores, Flynn examined normative studies in which subjects completed both the old and the new form of a test such as the Stanford-Binet or the WAIS. Then he compared the performance of the new normative group against the old norms. Almost without exception, the result was that the new normative group scored higher, on the old test, than the original normative group had done -- even though the groups were selected according to the same criteria. For example, between 1932 and 1978, white American subjects gained almost 14 IQ points -- a rise of almost a whole standard deviation, at a fairly steady rate of approximately 0.3 IQ points per year. This population increase in IQ test scores was obscured by the practice of setting each new test to an artificial mean of 100.

The same point can be made by looking at changes in raw scores on a test such as Raven's Progressive Matrices, which are not normalized in the same manner as the Stanford-Binet and WAIS. In the Netherlands, the Dutch military routinely examines all 18-year-old men, and the examination includes a relatively difficult version of Raven's Progressive Matrices. The proportion of recruits who got more then 24 of the 40 items correct increased from 31.2% in 1952 to 82.2% in 1982 -- which translates into an IQ gain of more than 21 points!

If "intelligence is what intelligence tests measure", then the Flynn Effect shows that we are, indeed, getting smarter. Similar gains have been observed wherever relevant data has been collected: in Belgium, France, Norway, New Zealand, Canada, Germany, England, Australia, Japan, Switzerland, and Austria, as well as the United States and the Netherlands.

The source of the effect is also obviously environmental. There has been no Galtonian eugenics program in any of these societies. It's been suggested that improved nutrition is responsible for the increase, at least in part: after all, people have also gotten significantly taller over the same period. Generational increases in education and socioeconomic status have also been implicated. It's also possible that the increase is due simply --simply -- to modernization itself: recent generations have been raised in an environment that is more complex, more stimulating, more informationally rich, more intellectually demanding than was the case even a couple of generations ago.

So, the modern world, itself a product of human intelligence, appears to be raising the intelligence of the humans who made it. I wonder how long this can go on....

More on Intelligence

For more on what is known about intelligence -- its nature, measurement, determinants, and consequences -- see:

Neisser, U., Boodoo, G., Bouchard, T. J., Boykin, A. W., Brody, N., Ceci, S. J., et al. (1996). Intelligence: Knowns and unknowns.American Psychologist, 51, 77-101. A critical response to The Bell Curve, by Richard Herrnstein and Charles Murray (1994), which argued that IQ tests were accurate measures of intelligence; that IQ was a strong predictor of success in life; and that IQ was highly heritable, and little influenced by sociocultural factors (wrong, wrong, wrong and wrong).
Nisbett, R.E. (2009). Intelligence and How to Get It
Nisbett, R. E., Aronson, J. A., Blair, C., Dickens, W., Flynn, J. P., Halpern, D. F., et al. (2012). Intelligence: New findings and theoretical developments. American Psychologist, 67(2), 130-159. A 15-year followup to the Neisser report, confirming and expanding on its findings and conclusions.

For more on the Flynn Effect, see Are We Getting Smarter? Rising IQ in the 21st Century by James R. Flynn (2012); also Flynn's What is Intelligence? Beyond the Flynn Effect (2009)..

Wisdom

There's intelligence, and then there's wisdom, involving "ego integrity and maturity, judgment and interpersonal skills, and an exceptional understanding of life (Sternberg, 1990). Thus, wisdom has an affective component as well as cognitive "smarts". It's more like a feature of personality than a simple collection of abilities. Wise people know a lot -- but there's much more to it. As discussed later in the lectures on Psychological Development, Erik Erikson postulated that wisdom came to use toward the end of life -- if we're lucky.

Baltes and Staudlinger (2000) identified seven components of wisdom:

a truly superior level of knowledge, judgment, and advice;
addresses important and difficult questions and strategies about the conduct and meaning of life;
includes knowledge about the limits of knowledge and the uncertainties of the world;
constitutes knowledge with extraordinary scope, depth, measure, and balance;
involves a perfect synergy of mind and character... of knowledge and virtue;
knowledge is used for the good or well-being of oneself and others;
though difficult to define, it is easily recognized.

Although difficult to define, Baltes and others have developed instruments to measure wisdom.

Creativity

Psychology has made a lot of progress in studying reasoning, decision-making, and problem-solving, but it hasn't made much headway on one of the most interesting aspects of thinking, which is creativity. A major reason for this deficit is that it has proved difficult to get consensus on how creativity might be measured. Without such an operationalization, it's hard to assess who is creative and who isn't, what conditions give rise to creative thinking, and how creative ideas are generated.

A number of investigators have worked hard at this, especially J.P. Guilford (in his studies of the divergent thinking components of intelligence) and E.P. Torrance (who built on Guilford's work to developed the multidimensional Torrance Tests of Creative Thinking (CCIT; 1966).

One popular measure of creativity is Guilford's (1967) Alternative Uses Test -- as in,How many uses can you think of for a brick? The idea is that the more plausible,useful uses one can generate, the more creative one is.
Another is the Remote Associates Test (Mednick & Mednick, 1967). Subjects are given three words, such as stick, light, and birthday, and asked to generate a word that all three stimuli have in common. This is a fairly easy one; others are much more tricky, such as skunk, kings, and boiled. Subjects who respond correctly, especially those who do so quickly, are thought to be more creative than those who don't.

You get the idea, but you can also see how difficult it is to get a single measure that everyone can agree on. The CCIT tries to cope with this problem by providing a whole battery of tests, much like the Stanford-Binet or Wechsler IQ tests, which yield separate scores for verbal and figural creativity.

Two things we're pretty sure of: creativity is somehow different from intelligence; and creativity can be displayed in lots of different domains, not just the arts.

The Myth of the Mad Genius

One of the great images coming out of the Romantic Era is that of the mad genius. This is the idea that creativity and mental illness are somehow related. Think Beethoven. Think Van Gogh. Think Sylvia Plath. The Remote Associates Test was originally formulated as a test of schizophrenia as well as of creativity -- the idea being that both creative people and schizophrenics have "loose" associations -- it's just that creative people make better use of them. It's a very appealing idea -- unless, I suppose, you think you're a creative person. And for some time, some psychologists have promoted the idea that there is, in fact, an empirical link between creativity and mental illness -- either that mental illness creates the conditions for creativity or that creative people are at increased risk for mental illness.

In modern times, the link between creativity and mental illness was popularized by Nancy Andreasen, whose 1987 paper claimed that 80% of a sample of creative writers whom she interviewed reported some experience of mental illness, chiefly depression or bipolar disorder, compared to 30% of a control group of non-writers. She also reported a higher degree of mental illness in the first-degree relatives (parents and children) of the writers. The link between creativity and bipolar disorder was also uncovered by other researchers, such as Kay Redfield Jameson (e.g., Touched with Fire: Manic-Depressive Illness and the Artistic Temperament, 1993).

However, many of these studies have serious flaws. In particular, Andreasen studied a very small and unrepresentative sample of creative writers: 30 faculty members at the Iowa Writer's Workshop (Andreasen is on the faculty at Iowa). The Writer's Workshop is one of the most important training grounds in the world for creative writers, but it's only 30 people, and they're all writers. Moreover, when Andreasen conducted her interviews, she was not blind to her subject's status: knowing that an interviewee was a writer, she may have been biased toward interpreting their reports as evidence of mental illness (or the reports of her control subjects as evidence of mental health). Moreover, her sample of writers, perhaps sharing the Romantic view of the mad genius, may have exaggerated their own histories of mental illness -- or just been more likely to disclose it. Depression is, after all the "common cold" of psychiatry -- almost everybody has it at one time or another. To make things worse, Andreasen failed to keep records of her interviews!

Later attempts to confirm her findings with more objective methods have largely failed. Never mind that we don't really possess good psychometric measures of creativity!

There may be some link between creativity and mental illness, particularly affective disorder, but it appears to be a lot weaker than Andreasen suggested.

The scientific literature on creativity has been thoroughly reviewed by Jonah Lehrer, a popular-science writer who specializes in psychology, in Imagine: How Creativity Works (2012) -- a sort of companion to his earlier volume on How We Decide, a popular guide to research on judgment, decision-making, and problem-solving.

Lehrer begins with David Hume, an 18th-century British philosopher, who argued that creativity often involved recombination -- that is, putting two unrelated ideas together, or translating an idea from one domain to another. So, for example, Johannes Gutenberg based his idea of the printing press on the wine press, and the Wright brothers took what they learned in their bicycle shop to make the first airplane.
For that reason, Lehrer argues, the people who are most likely to produce creative ideas -- as it were, to "think outside the box" -- are those who are already working at the boundaries of their field with some other field. This allowed them to view the problems inside a field as outsiders, and also draw on relevant knowledge from other fields.
Lehrer also points out that social interaction plays an important role in creative thinking -- combining individuals facilitates the creative recombination of individuals' ideas (UCB's Prof. Charlan Nemeth has done a lot of research in this area).
For this reason, cities are wellsprings of creativity, because they force people to interact with each other. Firms that facilitate interaction produce more creative ideas than those that don't. And in the future, social networking via the Internet should help all of us be more creative, in "a virtual world that brings us together for real".

This page last revised 06/30/2025.

A. Tversky & D. Kahneman, "Judgment Under uncertainty: Heuristics and biases" (Science, 1974,185, 1124-1131)	D. Kahneman & A. Tversky, "Choices, Values, and Frames" (American Psychologist, 1984,39, 341-350).
Not to mention the "Cognitive Bias Video Song", composed by a high-school Advanced Placement psychology teacher for his students.

		Cancer
		No	Yes
Test Result	Positive	99	8	107
Test Result	Negative	891	2	893
		990	10	1000