Social Perception

For background on sensation and perception, see the General Psychology Lecture Supplements on Sensation and Perception.

Social cognition is the study of the acquisition, representation, and use of social knowledge -- in general terms, it is the study of social intelligence.

A comprehensive theory of social cognition must contain several elements (Hastie & Carlston, 1980; Kihlstrom & Hastie, 1987):

On the perceptual side:

a vocabulary to describe the (social) stimulus environment;
a description of the perceptual processes by which the person forms a mental representation of the distal social stimulus.

With respect to memory:

a characterization of the encoding operations by which stimulus information is processed (these may overlap with the perceptual processes);
a description of the mental representations stored in memory as a result of these encoding operations;
a characterization of the retrieval operations by which stored knowledge enters into ongoing cognitive processes; and

on the action side:

a description of the "covert" cognitive processes involved in reasoning, problem-solving, judgment, and inference;
a vocabulary to describe the person's response output of "overt" social behavior (one person's overt behavior, of course, is another person's social stimulus).

The first two of these facets have to do with perception and attention -- the processes by which knowledge is acquired.

The Origins of (Social) Knowledge

Or are they? Actually, a prior question is: Where does social knowledge come from?

When asked about knowledge in general, psychology and cognitive science offer two broad answers -- and a third, correct answer.

According to the nativist approach, knowledge is innate, given a priori. Descartes thought that knowledge came from God, but modern nativists argue that (some) knowledge is part of our evolutionary, genetic heritage.
According to the contrasting empiricist approach, our knowledge comes to us a posteriori from the senses -- either through our sensory experiences, or through our reflections on our sensory experiences. This was the view of knowledge promoted by the British empiricists, such as John Locke and David Hume.
A third position was espoused by Immanuel Kant, and has come to be known as the Kantian synthesis. Kant agreed that most knowledge is acquired through sensory experience, but he also argued that our sensory experiences are themselves structured by innate schemata (he called this kind of knowledge synthetic a priori) representing such concepts as causality.

To take an example from modern psychology: classical conditioning occurs when the organism learns to predict the occurrence of the US based on the prior occurrence of the CS. But you can't acquire that knowledge from experience unless you already have some knowledge of what "prior" means -- that is, of temporal relations in general.

Historically, psychology has come down on the side of empiricism. There are exceptions, of course: some aspects of our linguistic knowledge seem to be innate (Chomsky called his approach to language "Cartesian linguistics"). But in general, psychologists hold the view that we learn what we know, acquiring knowledge through experience. For that reason, historically, scientific psychology began with an analysis of elementary processes of sensation and perception.

And so it is with social cognition: it, too, begins with social perception:

How we derive our knowledge of ourselves and others from our experience of ourselves and others; and
How we form internal, mental representations of the social world.

We can begin by distinguishing between sensation and perception.

Sensation has to do with the detection of stimuli in the environment -- in the world outside the mind, including the world beneath the skin. To make a long story short: Sensory processes:

transduce a pattern of
proximal stimulation,
radiating from a distal stimulus,
into a pattern of neural impulses
that is transmitted from the sensory receptors (or sensory surfaces) in the periphery of the body
over afferent nerves comprising the various sensory tracts
to the sensory projection areas of the brain.

For example, the rods and cones in the retina of the eye convert light waves emitted by an object into neural impulses that flow over the optic nerve to the occipital lobe.

Perception gives us knowledge of the sources of our sensations -- of the objects in the environment, and of the states of our bodies: What objects are in the world outside the mind, where they are located, what they are doing, and what we can do with them. Put another way, perception assigns meaning to sensory events.

In many respects, sensation is not an intelligent act.

Every organism has a sensory apparatus that serves as the basis for reflex action.
Sensory processes do not discriminate between a beam of light emanating from a candle and a beam of light emanating from an oncoming train.

By contrast, perception is the quintessential act of the intelligent mind. Perception goes beyond the mere pickup of sensory information, and involves the creation of a mental representation of the object or event that gives rise to sensory experience. In order to form these mental representations, the perceiver (in the lovely phrase of Jerome Bruner, a pioneering cognitive psychologist) "goes beyond the information given" by the stimulus, combining information extracted from the current stimulus with pre-existing knowledge stored in memory, employing processes of judgment and inference.

In cognitive psychology, there are basically two views of perception.

The classical view of perception, associated with Herman von Helmholtz in the 19th century and with Jerome Bruner and others (Julian Hochberg, Richard Gregory, and Irvin Rock) in the 20th century, argues that the proximal stimulus is inherently ambiguous, and does not all by itself provide enough information to permit the perceiver to form a mental representation of the distal stimulus. Accordingly, the perceiver must go, in Bruner's famous phrase, "beyond the information given" by the stimulus -- supplementing information extracted from the stimulus with knowledge, expectations, beliefs, and inferences retrieved from memory. For Helmholtz, some of these inferences were unconscious, but they were still inferences.
According to the ecological view of perception espoused by James J. Gibson (and his spouse, Eleanor Jack Gibson -- she of the "visual cliff" experiments), however, there is no cognitive contribution to perception. All the information needed for perception is provided by the stimulus itself -- as J.J. Gibson put it, the information is "in the light". Further, Gibson asserted that our perceptual system evolved in such a way as to extract the information needed to perceive the world as it really is. There is no need for learning, and no need to invoke any "higher" cognitive processes such as judgment, inference, or problem-solving. The ecological view gets its name because perception involves extracting information from the environment. Because perception is not mediated by any cognitive processes, the ecological view is sometimes called direct perception. Because the perceiver perceives the world accurately, the ecological view is sometimes called direct realism.

By any standard, the constructivist view dominates research and theory on perception. But, as we will see, the ecological view also finds its proponents.

Impression-Formation

The study of social perception begins with an analogy between social and nonsocial objects. The study of social perception assumes that any person is an object who has an existence independent of the mind of the perceiver. Accordingly, the perceiver's job is to extract information from the stimulus array to form an internal, mental representation of the external object of regard.

The term person perception was introduced by Bruner and Tagiuri (Handbook of Social Psychology, 1954) to reflect the status of persons as objects of knowledge. As with any other aspect of perception, they argued that a number of factors influence perceptual organization:

information in the stimulus array;
selective attention;
the categories of perception, represented in language; and
the internal state of the perceiver, including his or her mental set and the emotional and motivational context in which perception occurred.

Jerome Bruner and the "New Look" in Perception

Bruner was a pioneering cognitive psychologist and cognitive scientist. Among his notable accomplishments was the introduction of what he called a "New Look" in perception, which sought to redirect perception research from an analysis of stimulus features to an analysis of the perceiver's internal mental states. Although perception is obviously a field of cognitive psychology, to some extent Bruner's New Look was influenced by psychoanalysis, as when he argued that emotional and motivational processes interacted with cognitive processes -- so that, in some sense, our feelings and desires affect what we saw.

Our impressions of other people are typically represented linguistically, often as trait adjectives. Consider a survey by the Washington Post, which asked respondents to describe in three words the various candidates for the Democratic and Republican presidential nominations. The most frequent responses, arranged as "tag clouds" in which the font size represents the frequency with which the word was used, looked like these:

Here's a similar survey, conducted over Facebook by the Daily Beast, a journalism website (with, perhaps, a somewhat liberal bent), following the February 2012 Republican presidential debates. During the debate, CNN correspondent John King had asked each of the candidates to describe themselves in one word. The Daily Beast polled subscribers to its Facebook page with the same question, resulting in the following word clouds. For good measure, they also asked their Facebook subscribers to describe Barack Obama, who was unopposed for the Democratic nomination.

In 2015, in the run-up to the 2016 election, YouGov.com, a global online community that promotes citizen participation in government, conducted a similar survey in which visitors to the organization's website were asked to characterize some of the leading presidential candidates in one word (at the time, there was some indication that Mitt Romney would join the race). Separate word clouds were constructed from the responses of people who liked and disliked each candidate.

Often, the stimulus information for person perception also comes in verbal form, as a list of traits and other descriptors. This is certainly the case with the self-descriptions that appear in "personals" ads in newspapers and magazines. But it is also true when we describe other people. Consider this passage from the Autobiography of Mark Twain, the author describes the countess who owned the villa in Florence where he and his family stayed in 1904:

"excitable, malicious, malignant, vengeful, unforgiving, selfish, stingy, avaricious, coarse, vulgar, profane, obscene, a furious blusterer on the outside and at heart a coward."

Or, this description of Osama Bin Laden, which the former chief of the CIA's "Bin Laden Issues Station" has endorsed as a "reasonable biographical sketch" of the man.

When Fiske and Cox (1979) coded peoples' open-ended descriptions of other people, they identified six major categories:

physical attributes (e.g., ravishingly beautiful);
behavioral information (e.g., aggressive);
social relations (e.g., Tom's girlfriend);
characteristic situations (e.g., spends a lot of time in bars);
origins (e.g., of Polish extraction); and
functional properties (e.g., makes me sick).

Because verbal lists of traits are easy to compose and present to subjects, many studies of person perception begin with traits, and proceed from there. This is reasonable, because so much of our social knowledge is encoded and transmitted via language.

The Asch Impression-Formation Paradigm

Actually, the study of person perception began before 1954, with the work of Solomon Asch (1946). Like Lewin (about whom you've already heard), and Fritz Heider (about whom you'll hear a lot in the future), Asch was a German refugee from Hitler's Europe. And like them, he was heavily influenced by European Gestalt psychology. Much of Asch's early work was on aspects of nonsocial perception, but he brought the Gestalt perspective to bear on problems of social psychology in his classic textbook, Social Psychology (1952), which was the first social psychology text to be written with a unifying cognitive theme running throughout.

Asch (1946) set out the problem of social perception as follows:

[O]rdinarily our view of a person is highly unified. Experience confronts us with a host of actions in others, following each other in relatively unordered succession. In contrast to this unceasing movement and change in our observations we emerge with a product of considerable order and stability.

Although he possesses many tendencies, capacities, and interests, we form a view of one person, a view that embraces his entire being or as much of it as is accessible to us. We bring his many-sided, complex aspects into some definite relations....

How do we organize the various data of observation into a single, relatively unified impression?

How do our impressions change with time and further experiences with the person?

What effects in impressions do other psychological processes, such as needs, expectations, and established interpersonal relations, have?

In addressing these questions, Asch set out two competing theories:

The impression of a person is simply the sum total of his or her various characteristics, as represented by a list of trait terms; or
The impression of a person is a unified perception, derived from the relations of his or her various characteristics with each other.

Obviously as a Gestalt psychologist, Asch had a pre-theoretical preference for the latter theory.

In order to study the process of person perception, Asch (1946) invented the impression-formation paradigm. He presented subjects with a trait ensemble, or a list of traits ostensibly describing a person (the target) -- varying the content of the ensemble, the order in which traits were listed, and other factors. The subjects were asked to study the trait ensemble, and then to report their impression of the target in free descriptions, adjective checklists, or rating scales.

Asch's first experiment compared the impressions engendered by two slightly different trait ensembles. Subjects were presented with one of two trait lists, which were identical except that target A was described as warm while target B was described as cold.

After studying the trait ensemble, the subjects reported their impressions in terms of a list of 18 traits, presented as bipolar pairs such as generous-ungenerous.

The two ensembles generated two quite different impressions, with A perceived in much more positive terms than B. There were significant differences between the two impressions on 10 of the 18 traits in subjects' response sets. A later experiment, varying only intelligent-unintelligent, yielded similar results.

But when the experiment was repeated (Experiment 3), with the words polite and blunt substituted for warm and cold, there were relatively few differences between the two impressions.

From these and related results, Asch concluded that traits like warm-cold and intelligent-unintelligent were central to impression formation, while traits like polite-blunt were not. In Asch's view, central traits are qualities that, when changed, affect the entire impression of the person. Other traits are more peripheral, in that they make little difference

Although being described as warm rather than cold led the target to be described in highly positive terms, Asch distinguished the effect of central traits from the halo effect described by Thurstone, by which targets described with one positive trait tend to be ascribed other positive traits as well. Being described as warm rather than cold does not lead to an undifferentiated positive impression; the warm-cold effect is more differentiated than that.

As proof, Asch pointed out that warm-cold is not always central to an impression. In his Experiment 2, where warm and cold were embedded in a different trait ensemble, there were few differences between the resulting impressions. If anything, the person was perceived as somewhat dependent, rather than the glowingly positive terms that emerged from Experiment 1.

Consistent with Gestalt views of perception, the effect of one piece of information (whether the person is warm or cold) depends on the entire field in which that information is embedded. To explain why traits are sometimes central and other times peripheral, Asch offered the change of meaning hypothesis, which holds that the total environmental surround changes the meaning of the individual elements that comprise it. Remember, for Gestalt psychologists, the distinction between figure and ground is blurry, because both figure and ground are integrated into a single unified perception. Perception of the figure affects perception of the background, and perception of the background affects perception of the figure.

In addition to studying the semantic relations among stimulus elements, Asch also studied their temporal relations. After all, he argued, impression-formation is extended over time: as we gradually accumulate knowledge about a person, our impression of that person may change. In his Experiment 6, Asch presented subjects with two identical trait ensembles, except that intelligent was the first trait listed for target A, and the last trait listed for target B. The two impressions differed markedly, revealing an order effect in impression formation.

In order to explain order effects, Asch held that the initial terms in the trait ensemble set up a "direction" that influences the interpretation of the later ones. The first term sets up a vague but directed impression, to which later characteristics are related, resulting in a stable view of the person -- just as our perception of a moving object remains stable, even though our perspective on it may change over time.

Taken together, Asch's studies illustrate principles of person perception that are familiar from the Gestalt view of perception in general. The whole percept is greater than the sum of its stimulus parts, because the elements interact with each other; just as the perception of the individual stimulus elements influences perception of the entire stimulus array, so the perception of the entire stimulus array influences the perception of the individual stimulus elements.

Asch's 1946 experiments set the agenda for the next 20 to 30 years of research on person perception and impression formation, which basically sought clarification on questions originally posed by Asch himself:

What is the nature of central traits, and how do they differ from peripheral ones?
How is trait information combined -- through a unified impression, as Asch thought, or as a simple list?
What are our beliefs about the relations among traits -- How is social knowledge organized so that it influences our perception of traits, and their effects on our impressions?

Interestingly, however, a recent large-scale study failed to replicate one of Asch's findings: the "primacy of warmth" effect, by which warm-cold serves is not only a central trait, but more important to impression-formation than the other big central trait, intelligent-unintelligent. Nauts et al. (2014) carefully repeated Asch's (1946)procedures (for his Studies I, II, and IV), in a sample of 1140 subjects run online via Mechanical Turk.

Despite the impression given by their title, Nauts et al. actually replicated the primary findings of Asch's study.

Content analysis of the subjects' open-ended descriptions of the targets (a quantitative procedure that was not available to Asch in 1946) showed that subjects used the terms warm, cold, and intelligent more often than any other trait term. This is evidence for the importance of these dimensions to impression-formation.

In addition, subjects who saw warm (cold) in the trait ensemble described the targets in more (less) positive terms, compared to those who saw polite (blunt).

So, how could Nauts et al. claim a failure to replicate Asch?

It turned out that warm-cold was not used more often than intelligent-unintelligent in subjects impressions, contradicting the "primacy of warmth" hypothesis. In fact, intelligent-unintelligent appeared more often in the impressions than warm-cold.

When subjects were asked to rank how important the various items in the trait ensemble were to their overall impressions, they ranked intelligent-unintelligent higher than warm-cold.

So, just to be clear, Nauts et al. confirmed that arm-cold and intelligent-unintelligent are central to impressions of personality; but they failed to find, as he claimed, that warm-cold was more important than intelligent-unintelligent.

What Makes a Trait Central?

Asch's distinction between a central and a peripheral trait was made on a purely empirical basis: He discovered that some traits, such as warm-cold and intelligent-unintelligent, exerted a disproportionate effect on impressions of personality, while others, such as polite-blunt, did not. But although he could predict the effects of central traits on impressions, he had no theory that would enable him to predict which traits would be central, and which peripheral.

So what makes a trait central as opposed to peripheral? Julius Wishner (1960) offered a plausible answer. He administered a 53-item adjective checklist, derived from the checklists that Asch had used, asking his subjects to describe their acquaintances (often, their teacher in introductory psychology). Using the power of high-speed computers that simply were not available to Asch in 1946 (and which, frankly, are dwarfed by the computational power of the simplest laptop or even palmtop computer today), Wishner calculated the correlations between each trait and every other trait in the list. Examining the matrix of trait intercorrelations, Wishner observed that traits such as warm-cold and intelligent-unintelligent, which Asch had identified as central, had significant correlations with many other traits (e.g., mean rs = .62 and .56, respectively); by contrast, peripheral traits such as polite-blunt had relatively few correlations (e.g., mean r = .43).

The upshot of Wishner's study is that central traits carry more information than peripheral traits, in that they have more implications for unobserved features of the person. By virtue of their high intercorrelations with other traits, knowing that a person is warm or intelligent tells us a great deal about the person, while knowing that a person is polite does not. In the same way, a change in one central trait, from warm to cold or from intelligent to unintelligent, implies changes in many other traits as well, while a change from polite to blunt does not.

Wishner's findings also explained why a trait like warm-cold was not always central: it depends on the precise list of traits on which subjects make their ratings. Any trait from the stimulus ensemble will function as a central trait, so long as it is highly correlated with many of the traits on the response list.

Wishner's solution to the problem of central traits made his paper a classic in the person-perception literature, but it is not completely satisfactory. For example, it might be nice if "centrality" was a property of the trait itself, and did not depend on the context provided by the response set (though, frankly, Asch, as a Gestalt psychologist, might not think this was so desirable!). Are there any traits that are inherently central?

Seymour Rosenberg (1968), making use of even more computational power than had been available to Wishner, factor-analyzed the intercorrelations among a large number of trait terms, yielding a hierarchical structure consisting of subordinate traits, primary traits, secondary and even tertiary traits. He discovered that Asch's central traits tended to load highly on two very broad superordinate factors of personality ratings representing two dimensions:

social good-bad (for traits like warm and cold); and
intellectual good-bad (for traits like intelligent and unintelligent).

So, in fact, warm-cold and intelligent-unintelligent do seem to be inherently central to impression formation -- unless the experimenter uses a strange set of response scales, which don't bear on either of these major dimensions of personality impressions.

Interestingly, these two "superfactors" are not entirely independent of each other: people who tend to be described in positive social terms also tend to be described in positive intellectual terms. There is a "super-duper" factor of evaluation which runs through the entire matrix of personality traits, and gives rise to Thurstone's halo effect.

Pulling all of this material together, we can conclude that central traits have two properties:

Central traits occupy a central position in clusters of related traits, allowing us to predict where people stand on these other traits.
In particular, central traits define the poles of the two superfactors representing positive vs. negative social and intellectual traits.

Talking with Strangers

Malcolm Gladwell, a journalist who interprets social-science research in the popular press (Blink was about automaticity; The Tipping Point was about minority influence and social contagion), tackled the problem of social cognition in his book, Talking with Strangers: What We Should Know About the People We Don't Know (2019). The general thesis of the book is that bad things happen "when a society does not know how to talk to strangers". His prime example is the "Peace in our time" meeting between Neville Chamberlain and Adolph Hitler, which led to the German annexation of Czechoslovakia and, thus, World War II. In this telling, Chamberlain erroneously concluded from Hitler's double handshake and other aspects of his nonverbal behavior that he could be relied on to keep his word. Gladwell concludes that "the people who were right about Hitler were those who knew the least about him personally", while "the people who were wrong about Hitler were the ones who had talked with him for hours". Well, maybe.

Gladwell also discusses other, more prosaic examples of egregious misunderstanding, which he explains with two basic principles, both drawn from social-science research.

Gladwell cites Tim Levine, a communications researcher, who has suggested that a common factor in social misunderstandings is that some people are "mismatched" -- for example, a dishonest person presenting him- or herself as honest (the Hitler example), or a neurotic person presenting as stable. Levine argues that we are usually able to determine whether someone is being truthfulness -- but this ability fails when that person is "mismatched" -- underscoring the possibility that people deliberately misrepresent themselves in everyday life (apologies to another sociologist, Erving Goffman).
Gladwell also suggests that people suffer from an "inability to make sense of the stranger as an individual -- for example, by taking note of the context in which their behavior occurs -- which sounds a little like what we will discuss later as the Fundamental Attribution Error.

Despite all this, especially the Hitler example, Gladwell cautions his readers not to distrust everyone. Rather, we should "accept the limits of our ability to decipher strangers". On the other hand, Andrew Gottlieb, reviewing the book in the New York Times Book Review concludes that "the threads that connect Gladwell's somewhat rambling material have to do with misreading people -- mistaking their intentions, drawing erroneous conclusions from their demeanors and believing their false claims of innocence. Yet despite its title, the book is not really about strangers.... Lies, misunderstandings, and escalating confrontations have, after all, been known to occur even within marriages ("Malcolm Gladwell's Advice When 'Talking to Strangers': Be Careful", 10/06/2019). Maybe we should accept the limits of our ability to decipher coworkers, neighbors, friends, and even intimates, as well.

Gladwell probably makes too much of two relatively small principles (he made similar mistakes in Blink and The Tipping Point), but his book only underscores the importance of understanding both how we perceive other people, and how accurate, or inaccurate, those perceptions can be.

Implicit Personality Theory

While a great deal of personality research has been devoted to determining the hierarchical structure of personality traits (e.g., the Big Five structure of neuroticism, extraversion, agreeableness, conscientiousness, and openness to experience), it seems likely that laypeople possess some intuitive knowledge of the structure of personality as well. In fact, Asch's concept of the central trait assumes that laypeople possess some intuitive knowledge about the relations among personality traits. If they did not, all traits would be created equal, none more central, or more peripheral, to impression formation than any other.

The term implicit personality theory (IPT) was coined by Bruner and Tagiuri (1954 -- they were busy that year) to refer to "the naive, implicit theories of personality that people work with when they form impressions of others". Bruner and Tagiuri understood that person perception entailed "going beyond the information given" by combining information extracted from the stimulus with information supplied by pre-existing knowledge. In other words, in the course of person perception the person must make use of knowledge that he or she possesses about the relations among various aspects of personality.

Implicit personality theory is "naive" in that it is not the product of formal research, but rather built up from the experience of everyday life.
Implicit personality theory is "implicit" in the sense that people are not particularly aware of the knowledge they use and the assumptions that they make when forming impressions of personality.

People's implicit theories of personality may be quite different from the formal theories of personality researchers. In fact, compared against formal theories resulting from methodologically rigorous research, they may even be wrong. But right or wrong, they are used in the course of person perception.

Defining Implicit Personality Theory

The domain of IPT was further explicated by L.J. Cronbach in his contribution to a 1954 book on person perception edited by Bruner and Tagiuri (it was a very good year for person perception). Cronbach discussed IPT in the context of personality ratings made by judges in traditional trait-oriented personality research, and suggested that in addition to information derived from the judge's observations of the target, the ratings will be influenced by the "Judge's description of the generalized Other" -- that is, by the judge's beliefs about what people are like in general. In Cronbach's view, IPT consists of several elements:

a list of the important dimensions of personality;
an estimate of the mean value on each dimension within the population;
an estimate of the variance on each dimension within the population;
an estimate of the covariances or correlations among the various dimensions.

Of course, these are also the elements of formal, scientific theories of personality structure.

Cronbach believed that IPT was widely shared within a culture, but he acknowledged that there might also be individual differences in IPT. For example, some people might assume that most people are friendly and well-meaning, and that becomes the "default option" when they make judgments about some specific person; but other people might assume that most people are hostile and aggressive. In addition, he suggested that there may be cultural differences in IPT. For example, within Western culture, IPT seems to be centered on clusters of traits, or stable individual differences in behavioral dispositions; but other cultures might have more "situationist" or "interactionist" views of personality.

Following Cronbach, an expanded concept of implicit personality theory might look like this:

A set of highly abstract, generalized assumptions concerning...

Human nature
The causes of behavior in general
The origins of individual differences
Fundamental personality traits

And, for cultures like ours, a further set of more specific assumptions concerning these traits, such as...

The number of basic traits
Their names
The average standing in the population on each trait
Population variance for each trait
The shape of the distributions of individual differences on each trait; and
The intercorrelations among the traits.

For example, people might think that Rosenberg's traits of social and intellectual "good-bad" are normally distributed, with populations right about the midpoint.

Osgood's "Pollyanna Principle" reflects the assumption that the distribution of positive traits in the population is skewed toward the positive end of the continuum.

Or, people might believe that there is a bimodal distribution of social and intellectual goodness, with most people tending toward "good" but a substantial minority tending toward "bad".

Thorndike's "halo effect" reflects the assumption that socially desirable traits, whether social or intellectual, are positively correlated with each other, as in the two-dimensional structure uncovered by Rosenberg.

Note for statistics mavens: the correlations between variables can be represented graphically by vectors, with the angles between vectors reflecting the correlation between them such that r is equal to the cosine of the angle at which the vectors meet (this is how factor analysis can be done geometrically). Thus, two variables that are uncorrelated with each other (r = 0.0) are represented by two vectors that meet at right angles (cos 90^o = 0); two variables that are perfectly correlated (r = 1.0) are represented by two vectors that overlap completely (cos 0^o = 1.0); and two variables that are highly but not perfectly correlated, say .60 < r < .65, are represented by vectors that meet at an angle of about 45^o (cos 45^o = 0.635).

Two Models of IPT

Implicit theories of personality have been studied through the application of multivariate statistical methods, such as factor analysis, multidimensional scaling, and cluster analysis to various types of data:

correlations among trait ratings collected through adjective checklists and scales;
analysis of the probability of co-occurrence of traits in free-response productions; and
judgments of the similarity between traits.

But because these techniques are very time-consuming, developments in implicit personality theory had to await the proper technology, particularly the availability of cheap, high-speed computational power. In the 1960s, as appropriate computational facilities became widely available, two competing models of implicit personality theory began to emerge.

The first of these was a semantic differential model of IPT based on Charles Osgood's tridimensional theory of meaning (e.g., The Measurement of Meaning by Osgood, Suci, & Tannenbaum, 1957). According to Osgood, the meaning of any word can be represented as a point in a multi-dimensional space defined by three vectors:

evaluation (e.g., good-bad, optimistic-pessimistic, complete-incomplete);
potency; (e.g., strong-weak, hard-soft, severe-lenient); and
activity (e.g., active-passive, hot-cold, excitable-calm).

In this EPA scheme, closely related words are represented by points that lie very close to each other in this space. Osgood's method was to have subjects rate objects and words on a set of bipolar adjective dimensions. When these ratings were factor analyzed, the three dimensions of evaluation, potency, and activity came out regardless of the domain from which the objects and words were sampled -- people, animals, inanimate objects, even abstract concepts. If these three dimensions are the fundamental dimensions of meaning, they are likely candidates for implicit personality theory -- the cognitive framework for giving meaning to people and their behaviors -- as well.

The principal problem with the semantic differential model is that the best evidence for the three factors came from studies employing adjective checklists, where the adjectives on the list were deliberately chosen to represent evaluation, potency, and activity. Accordingly, it seems possible that evaluation, potency, and activity came out of analyses of adjective ratings because, however unintentionally, they were built into these ratings to begin with. Interestingly, Osgood's three dimensions were not clearly obtained from free-response data which was not constrained by the experimenter's choices. Left to their own devices, without the experimenter's constraints, subjects' cognitive structures look somewhat different from Osgood's scheme.

For example, Rosenberg and Sedlak (1972) asked subjects to provide free descriptions of 10 people each. These investigators then selected the 80 traits that occurred most frequently in these descriptions, and then submitted the 80x80 matrix of trait co-occurrences to a technique of multivariate analysis called multidimensional scaling (why only 80 traits? An 80x80 matrix, generating more than 3,000 unique correlation coefficients, exhausted the computational power available at the time). They found that Osgood's evaluation and potency factors were highly correlated (r = .97): people who were perceived as good were also perceived as strong. Osgood's activity factor was quite weak in the data, but was also positively correlated with the evaluation and potency factors (r = .57). Accordingly, Rosenberg & Sedlak concluded that the evaluation dimension dominated people's implicit theories of personality.

Based on free-description data such as this, Rosenberg proposed an alternative evaluation model of IPT. He argued that evaluation was the only perceptual dimension common to all individuals, and that any additional dimensionality came from correlated content areas such as social and intellectual evaluation. The figure graphically represents the loadings on the two dimensions of a representative set of trait adjectives. Note that Asch's central traits, warm-cold and intelligent-unintelligent, lie fairly close to the axes that define the two-dimensional space.

Kim and Rosenberg (1980) offered a direct test of the two models. In the Rosenberg and Sedlak (1972) study, and other studies of implicit personality theory, individual subjects rated only a single person, and then the subjects' responses were aggregated, so that the resulting IPT structure reflected the average "Judge's description of the generalized Other". But averaging may obscure the structures that exist in individual Judges' minds. It is entirely possible that individual judges have something like the Osgood structure in their heads, but when their responses are aggregated, only evaluation remains. Accordingly, Kim and Rosenberg decided to compare the adequacy of the two models at the individual level (again, this is the kind of analysis that can only be done when computing resources are cheap). Because multivariate analysis requires multiple responses, they had subjects describe themselves and 35 other people that they knew well; and they collected both free descriptions and ratings on an adjective checklist. Multidimensional scaling of the individual subject data revealed that something resembling Osgood's three-dimensional EPA structure appeared in only 8 of the 20 subjects studied, and in 3 of these 8, the potency and activity dimensions were not independent of evaluation. More important, the evaluation dimension emerged from every individual subject's data set.

Kim and Rosenberg concluded that that Osgood's EPA structure was an artifact of aggregation across subjects. All subjects use evaluation as a dimension for person perception; some use potency, some use activity, and some use both, and these dimensions are strong enough to create the appearance of a major dimension when data is aggregated across subjects. Potency and activity are positively correlated for some subjects, and negatively correlated for others; these tendencies balance out, and give the appearance that potency and activity are independent of each other and of evaluation. But this is an illusion produced by data aggregation. In their view, only the evaluation dimension is genuine; the potency and activity dimensions are largely artifacts of method.

Fiske et al. (Trends in Cognitive Sciences, 2007) has drawn new attention to Rosenberg's work by claiming that warmth and competence are universals in social cognition, which exert a powerful influence on how we interact with other people. In particular, she has argued that various combinations of warmth and judgment characterize various out-group stereotypes. This is true in both "individualist" and "collectivist" (or 'independent" and "interdependent") societies. Fiske has even argued that there is a specific module in the brain, located in the medial prefrontal cortex (MPFC), which constitutes a "social evaluation area". In any event, she and her colleagues have argued that assessments of warmth and competence are made automatically and unconsciously -- even though they may not necessarily be accurate.

The "Big Five" as an Implicit Theory of Personality

Implicit theories of personality are like formal scientific theories, except that they are "naive" and "implicit". Recently, scientific research on personality has focused on a five-factor model of personality structure originally proposed by Norman (1963). In his research, Norman examined subjects' ratings of other people on a representative set of trait adjectives. Factor analysis reliably revealed five reliable dimensions of personality:

Extraversion (talkative vs. silent, frank vs. secretive, adventurous vs. cautious, and sociable vs. reclusive);
Agreeableness (good natured vs. irritable, not jealous vs. jealous, gentle vs. headstrong, and cooperative vs. negativistic);
Conscientiousness (tidy vs. careless, responsible vs. undependable, scrupulous vs. unscrupulous, and persevering vs. quitting);
Emotional Stability (poised vs. nervous, calm vs. anxious, composed vs. excitable, and not hypochondriacal vs. hypochondriacal); and
Culture (artistically sensitive vs. insensitive, intellectual vs. unreflective, refined vs. crude, and imaginative vs. simple)

Norman's culture dimension was characterized as intellectance in earlier studies (e.g., Tupes & Christal, 1947);
later, it came to be known as openness to experience

Norman (1963) recovered this five-factor structure from factor analyses of questionnaires and rating scales, of self-ratings and other-ratings, regardless of the method of collecting data or factor analysis he employed. In his view, the five-factor structure was ubiquitous.

Norman's findings were soon replicated by others (actually, they had been obtained by earlier investigators as well) -- so reliably that they came to be known as The Big Five dimensions of personality.

Goldberg (1981) proposed that the Big Five comprised a universally applicable structure of personality. By universally applicable Goldberg meant that it could be used to assess individual differences in personality under any circumstances:

across cultures (e.g., adults in the United States and in China);
across generations (e.g., adult Americans who lived in the 19th century and those who live in the 21st); and
across developmental epochs (e.g., preschool children, adolescents, and adults living in the United States).

In line with the doctrine of traits, Norman (and many other advocates of the Big Five) assumed that these five traits had actual existence, just like physical traits, as behavioral dispositions.

Goldberg noted that the Big Five are so ubiquitous that they have been encoded in language, as familiar trait adjectives like extraverted and cultured. Of course, if ordinary "laypeople" (not just trained scientists) notice these dimensions enough to evolve words for them, the Big Five structure may exist in people's minds as well as their behavior. That is, the Big Five may well serve as the structural basis for people's implicit theories of personality, as well as a formal theory of personality structure.

Along these lines, I have often thought of the Big Five as The Big Five Blind Date Questions -- representing the kind of information that we want to know about someone that we're meeting for the first time, and will be spending some significant time with:

Is s/he outgoing?
Is s/he friendly?
Is s/he reliable?
Is s/he crazy? and
Is s/he smart? (or sophisticated, or even just interested in new things?)

If these are indeed the kinds of questions we ask about people, then it seems like the Big Five -- and not just a single dimension of evaluation -- resides in our heads as an implicit theory of personality. As it happens, we can fit The Big Five into a hierarchical structure of implicit personality theory.

Extraversion, Agreeableness, Emotional Stability, and Conscientiousness are all socially desirable traits.
Culturedness (or Intellectance or Openness to Experience) and Conscientiousness are both intellectually desirable traits.

And in fact, there is some empirical evidence that The Big Five -- whatever its status as a scientific theory of personality -- serves as an implicit theory of personality as well.

The evidence comes from a provocative study by Passini and Norman (1966), who asked subjects to use Norman's adjective rating scales to rate total strangers -- people they had never met before, and with whom they were not permitted to interact during the ratings session. The subjects were simply asked to rate others as they "imagined" them to me. Nevertheless, factor analysis yielded The Big Five, just as had earlier factor analyses of ratings of people the subjects had known well. Note that the Passini and Norman study violates the traditional assumption of personality assessment: that there is some degree of isomorphism between personality ratings and the targets' actual behavior. In this case, the judges had no knowledge of the targets' behavior. The Big Five structure that emerged from their ratings was not in their targets' behavior -- simply because they had no knowledge of their targets' behavior; but it certainly existed in the judges' heads, as a "description of the generalized Other".

Based on this evidence, it may be that the Big Five provides a somewhat more differentiated implicit theory of personality than the two-dimensional evaluation model promoted by Rosenberg and Sedlak. If so, we would have another answer to the question of what makes a central trait central. Just as R&S argued that central traits loaded highly on the two dimensions of evaluation, perhaps central traits load highly on one or the other of the Big Five dimensions of personality. Certainly that's true for warm-cold, which loads highly on extraversion, and intelligent-unintelligent, which loads highly on openness.

The Illusion of Coherence

By now there have been many studies similar to that of Passini and Norman, all with similar results: every factor structure derived from empirical observations has been replicated by judgments of conceptual similarity. Thus, we do seem to carry around in our heads an intuitive notion concerning the structure of personality -- the co-occurrences among certain behaviors, the covariances among certain traits, the notion that certain things go together, and other things contradict each other. This conceptual structure -- this implicit personality theory -- is thus cognitively available to influence people's experience, thought, and action in the social world.

The existence of implicit personality theory is interesting, but in some sense it is also troublesome, because it raises a difficult question that has long bedeviled theorists of perception in general -- the question of realism vs. idealism:

Does implicit personality theory accurately reflect the structure of the world outside the mind (the realist view)? Or
Does implicit personality theory represent an a priori conceptualization of the social world, imposed on rather than derived from sensory experience (the idealist view)?

Recall that a major assumption underlying traditional psychometric approaches to personality is coherence:

topographically different behaviors tend to co-occur reliably, as separate manifestations of a single subordinate trait disposition; and
semantically different traits tend to covary reliably, as separate facets of a single superordinate trait disposition.

Coherence yields a hierarchical structure of personality:

at the lowest level, specific episodes of behavior;
repetition of specific behaviors in different contexts yields habitual behaviors;
co-occurring habits yield primary traits; and
correlated primary traits yield superordinate traits.

The assumption of coherence is apparently confirmed by factor-analytic studies of personality, because the factors that emerge from the statistical analysis summarize the patterns of co-occurrences and correlations, and represent primary and superordinate personality traits. The fact that certain factor structures, such as The Big Five, appear to be extremely stable, gives rise to the notion that factor analysis (or similar multivariate methods) yields the structure of personality.

But this kind of evidence is problematic. In principle, factor analysis should be applied to objective observations. But for pragmatic reasons, this is generally impossible in the domain of personality research, simply because it is very difficult to perform the systematic observations of behavior that are required for this purpose. Because we have no direct measurements of personality traits, factor analysis is generally applied to rating data -- subjective impressions of behavior and traits; judgments that rely heavily on memory.

The problem is the reconstructive nature of memory retrieval. Memory for the past is contaminated by expectations and inferences. When factor analysis is applied to memory-based ratings, therefore, we cannot be sure what the factor matrix represents: the structure residing in the personalities of the targets, or the structure residing in the minds of the raters.

The fact is, we know from studies like Passini and Norman's that the structure of personality -- and, specifically, the Big Five structure that is so popular -- resides in the minds of raters. Therefore, it is possible that the structure of personality is to some degree illusory -- in a manner somewhat resembling the Moon illusion familiar to perception researchers. The moon looks larger on the horizon than at zenith, even though it isn't, because of "unconscious inferences" made by perceivers that take account of distance cues in estimating size. Perhaps personality raters make similar sorts of unconscious inferences in rating other people's personalities (or their own). Note that the existence of a moon illusion doesn't imply that there is no moon. It simply means that the moon isn't as big as it looks.

Similarly, in the realm of person perception, our expectations and beliefs can distort our person perceptions, and thus our person memories; in particular, our expectations and beliefs about the coherence of personality can magnify our perception of that coherence.

Where Does Implicit Personality Theory Come From?

Research has already established that the structure of personality exists in the mind of the observer. The important question is whether it also has an independent existence in the world outside the mind. As in the moon illusion, we usually take a modified realistic view of perception -- that our perceptions are fairly isomorphic with the world. Accordingly, we may assume that our beliefs about personality are to some extent isomorphic with the actual structure of personality. But are there really?

The controversy about the nature of implicit personality theory is reflected in two competing hypotheses:

The Accurate Reflection Hypothesis agrees in principle that memory-based ratings can be influenced by generalized expectations and inferences. However, it argues that these mental structures are themselves derived from empirical observation. Thus, the organization of mental structures preserves the organization of the sense-data from which they are derived. The implication is that implicit personality theory is empirically valid. It does not, in fact, bias memory. It follows that the factor structures derived from memory-based ratings are also valid, because the distortion effect of implicit personality theory is minimal.
By contrast, the Systematic Distortion Hypothesis argues that our expectations and beliefs are formed independently of sense-data, and that they are derived in large part from linguistic conventions. These preconceived notions bias memory, in particular because they foster a confusion of likeness with likelihood -- that is, a confusion between semantic similarity and co-occurrence or correlation. The implication is that the structure of memory-based ratings is not faithful to reality, because they are distorted by a priori beliefs.

Obviously the fact that the structure of memory-based ratings resembles those of conceptual similarity ratings cannot resolve this conflict, because both hypotheses predict that these two structures will be highly similar. But the two hypotheses do make different predictions about the structure of observer ratings of behavior. Observer ratings, made "on-line" as it were, have no opportunity to be distorted by mental structures. In observer ratings, behavior is recorded as it occurs, and traits are measured directly, with minimal recourse to inference.

Thus, we can test the accurate reflection hypothesis against the systematic distortion hypothesis by comparing the structures derived from three types of data:

strictly objective recordings of behavior;
memory-based ratings of behavior; and
judgments of the conceptual similarity of behaviors.

The two hypothesis making competing predictions:

The accurate reflection hypothesis predicts that the same structure will emerge, regardless of the data base;
The systematic distortion hypothesis predicts that the structure of observer ratings will differ from memory-based or conceptual-similarity ratings.

Unfortunately, for reasons alluded to earlier, observer ratings of behavior are extremely difficult to obtain -- especially of behaviors that are relevant to the Big Five. However, it is possible to conduct such an experiment on a smaller scale.

One such experiment, by Shweder and D'Andrade (1980), employed 11 categories of interpersonal behavior as target items: these were behaviors such as advising, informing, and suggesting.

For the observer ratings, judges watched a 30-minute videotape of 4 family members interacting, and counted instances of each of the target behavior by each target.
For the memory-based ratings, another set of judges watched the same videotape, but didn't count anything. Instead, immediately after the presentation they rated the frequency with which each target displayed each behavior.
For the conceptual similarity ratings, a third set of judges was not exposed to the videotape at all, but rather judged the similarity in meaning for all possible pairs of the 11 target items.

Shweder and D'Andrade then constructed correlation matrices representing the structural relations among all 11 behaviors. They then performed 7 different tests of the correspondence between the matrices.

For example, examining the correlations between parallel cells of the matrices, they observed the following pattern of correlations:

memory-based ratings vs. conceptual similarity ratings, r = .75.
memory-based ratings vs. observer behavior ratings, r = .22.
observer behavior ratings vs. conceptual similarity ratings, r = .00.

Aggregating the results across the 7 different tests, they observed the following pattern of correlations:

memory-based ratings vs. conceptual similarity ratings, r = .75.
memory-based ratings vs. observer behavior ratings, r = .25.
observer behavior ratings vs. conceptual similarity ratings, r = .02.

In other words, there was a high degree of correspondence between memory-based ratings and conceptual similarity ratings, but very little correspondence between either of these and ratings of observed behavior. These results are consistent with the systematic distortion hypothesis, but inconsistent with the accurate reflection hypothesis.

Systematic Distortion or Accurate Reflection?

Although the Shweder & D'Andrade study seems quite compelling, it has come under criticism from advocates of the accurate reflection hypothesis. In particular, UCB's own Prof. Jack Block has been an ardent defender of the notion that memory-based ratings, and implicit theories of personality, are accurate reflections of external reality. See, in particular, an exchange between Shweder and D'Andrade and Block, Weiss, and Thorne that appeared in the Journal of Personality & Social Psychology for 1979.

To be honest, the systematic distortion hypothesis is somewhat paradoxical, because it seems to refute the realist assumption that there is a high degree of isomorphism between the structure of external reality and our internal mental representations of it. Where does implicit personality theory come from, if not from the world outside? According to the ecological perspective on semantics, "the meanings of words are in the world" ( a quote from Ulric Neisser): our cognitive apparatus picks up the structure of the world, and so our mental representations are faithful to that structure. But they apparently aren't, at least in the case of person perception, where it is very clear that our cognitive structures depart radically from the real world that they attempt to represent.

So where else might implicit personality theory come from? How do our behaviors and traits become schematized, organized, and clustered into coherent knowledge structures?

D'Andrade and Shweder have suggested a number of possibilities:

Definitional Overlap in signifying criteria

If two items contain the same defining criterion, or necessary feature, we assume that they also "go together" in some sense.
Similarly, terms marking the opposite ends of a bipolar scale, such as masculine-feminine, are assumed to contradict each other.
And if two terms possess a subset-superset relation, we assume that the properties of one are included in the properties of the other.

Referential Overlap

If two different terms refer to the same actions, we assume that they "go together" -- even if this action is not a necessary feature of either one.

According to D'Andrade & Shweder, the distinction between referential and definitional overlap is one of cancellability:
If it is impossible to imagine a person having one quality but not the other, then there is no definitional overlap.
But if it is possible to imagine such a person, then there is only referential overlap.

Connotative Relationships or derivatives

Both terms may be parts of the same whole;
One term may be a cause of the other; or
Both terms may occur in the same behavioral "scripts".

Tone or evaluative "halo" (Thurstone)

Both terms may engender similar emotional responses.

In addition to these ideas, it may be that implicit personality theory reflects ideal types -- that is, it represents our wishes about what goes with what, as represented by cultural heroes and villains.

Information Integration in Impression Formation

Regardless of the ontological status of implicit personality theory, Asch's initial question remains on point: How do we integrate information acquired in the course of person perception into a unitary impression of the person along some dimension? Asch (1946) considered two possibilities: either we simply sum up a list of a person's individual features to create a unitary impression, or the unitary impression is some kind of configural gestalt. Asch clearly preferred the gestalt view to the additive view, a preference that integrated social with nonsocial perception, but his impression-formation paradigm has permitted later investigators to consider simpler alternatives.

Chief among these investigators has been Norman Anderson (1974), who has promoted cognitive algebra as a framework for impression formation and for cognitive processing in general. According to Anderson, perceptual information is integrated according to simple algebraic rules, which take information (about, say, primary traits) and performs a linear (algebraic) combination that yields a summary of the trait information in terms of a superordinate dimension (say, a superordinate trait).

In particular, Anderson has considered two very simple algebraic models (where S = stimulus information and R = the impression response):

Adding: R = S₁ + S₂ + ... + S_n
Averaging: R = (S₁ + S₂ + ... + S_n) / n

Anderson's basic procedure follows the Asch paradigm:

Perceivers are presented an ensemble of traits describing a target; he then varies the size and mix of the ensemble, and the order of presentation of individual traits.
Perceivers are required to make a numerical rating of the target on some new characteristic. In Anderson's best-known research, this is a rating of likeability -- which seems appropriate, given Rosenberg's finding that evaluation is fundamental to social judgment.
The traits that go into Anderson's ensemble have been rating for likeability -- that is, how socially desirable they are, or how likable people are who possess them.
Using the adding and averaging formulas above, Anderson tries to determine the linear combination of traits (stimuli) that best predicts the final judgment (response).

Anderson's experiments include critical comparisons that afford a test of the adding and averaging models of impression formation. Assume that the trait ensemble includes a mix of traits:

highly desirable traits (H) are given a value of +2;
moderately positive traits (M) are given a value of +1;
evaluatively neutral traits are given a value of 0;
we can also give highly undesirable traits a value of -2, and moderately negative traits a value of -1.

Thus, the adding and averaging functions give the following values to various trait ensembles:

Trait Ensemble	Adding	Averaging
HH	4 2+2	2.0 (2+2)/2
MMHH	6 1+1+2+2	1.5 (1+1+2+2)/4
HHHH	8 2+2+2+2	2.0 (2+2+2+2)/4

Thus, the adding rule predicts that both the HHHH and the MMHH ensemble will be preferred to the HH ensemble; but the averaging rule predicts no difference between HH and HHHH, and that both will be preferred to MMHH.

When Anderson (1965) actually performed the comparison, the empirical results were a little surprising:

HHHH was preferred to HH, favoring adding; but
HH was preferred to MMHH, favoring averaging.

So, this experiment seemed to offer no decisive test between the adding and averaging models.

Anderson resolved the conflict by adding three new assumptions:

The model should assign differential weights to the various pieces of stimulus information, depending on how important each trait is to the judgment the subject must make. In the case of an evaluative judgment, the weight in question is the correlation of each trait with overall likeability.
The model should also give disproportionate weight to the first stimulus, in line with Asch's finding of primacy effects in impression formation.
The model must take account of the subject's pre-existing bias (or implicit personality theory), or his evaluation of people in general as positive, neutral, or negative. This a priori bias (which we can also think of as the perceiver's first impression, before he received any stimulus information) must be factored into the adding or averaging equation.

Because of the new emphasis on stimulus weights, this revised form of cognitive algebra view is known as the weighted adding or averaging rules.

In a revised test, Anderson set aside the matter of stimulus weightings. Instead of asking subjects whether they were biased positively or negatively, he simply assumed that positive and negative biases would average themselves out, so that the average subject could be considered to be neutral at the outset (in fact, there is probably an average positive bias, but the essential point remains intact). Accordingly, a value of 0 was entered into the adding and averaging equations, along with the values of the stimulus information. Of course, adding 0 does nothing to sums; but it can have a marked effect on averages. Compare, for example, the following table to the table just above:

Trait Ensemble	Adding	"Weighted" Averaging
HH	4 0+2+2	1.33 (0+2+2)/2
MMHH	6 0+1+1+2+2	1.20 (0+1+1+2+2)/4
HHHH	8 0+2+2+2+2	1.60 (0+2+2+2+2)/4

When Anderson (1965) actually performed the comparison, the empirical results were less confusing:

HH was preferred to MMHH, favoring averaging, and was also consistent with the weighted averaging rule.
HHHH was preferred to HH, favoring adding, but also consistent with the weighted averaging rule.

So, this experiment seemed to offer decisive evidence favoring the weighted averaging model of impression formation. In the weighted averaging model, the perceiver's final impression builds up slowly, and is heavily constrained by his or her initial bias and first impressions.

Cognitive Algebra as Mathematical Modeling

Anderson's cognitive algebra is an attempt to represent a basic cognitive function as a mathematical formula. As such, cognitive algebra is intended to be a formal mathematical model of the impression-formation process. So, if you've always been wary of mathematical modeling (perhaps because you've thought it was too dry, or perhaps because of a little math phobia), but you've followed the arguments about cognitive algebra so far, then

Congratulations!

You've just successfully worked your way through a mathematical model of a psychological process.

Anderson's cognitive algebra, and especially the weighted-averaging rule, is an extremely powerful framework for studying social judgment. Cognitive algebra can be applied to any social judgment, so long as the stimulus attributes are quantifiable, and so long as the perceiver's judgment response can be expressed in numerical terms.

But cognitive algebra also has some problems:

It depends on a highly artificial experimental situation in which stimuli are presented in a particular format, and the subject's response is highly constrained as well (but what experimental situation isn't at least somewhat artificial?).
It is purely empirical. Anderson favors the weighted-averaging model as a general framework for cognitive algebra, but other judgments might follow other social rules, and if the experiment had come out favoring adding, he would have accepted those results happily. In other words, Anderson had no theoretical grounds for favoring one model over another.

Despite these problems, lots of work in social cognition has been done within the framework of cognitive algebra -- so much so that we could devote an entire course to it. But we won't.

The Social Relations Model

Another prominent model of person perception is the Social Relations Model (SRM) developed by David Kenny (1994; for earlier versions, see Kenny & LaVoie, 1984; Malloy & Kenny, 1986; Kenny, 1988), based on the early work of the existential psychiatrist (before he became an "anti-psychiatrist") R.D. Liang (Liang et al., 1966).

Link to an overview of the Social Relations Model, based on Kenny's 1994 book, Interpersonal Perception: A Social Relations Analysis. For critical reviews of the SRM, see the review of Kenny's book by Ickes (1996), as well as the book review essays published in Psychological Inquiry (Vol. 7, #3, 1996).

The SRM is focused on dyadic relations -- that is, relations between two people, say Andy and Betty, and in particular how these two people perceive each other. For purposes of illustration, let's suppose that Andy perceives Betty as high in interpersonal warmth. In Kenny's s analysis, this perception -- or impression -- has a number of components, such that A's perception of B's warmth is given by the sum of four quite different perceptions (actually, five, depending on how you count):

The constant: How warm people in general think that people in general are.
The actor: How warm A thinks people in general are.
The partner: How warm other people think B is in particular.
The relationship: How warm A thinks B is in general.
The occasion: How warm A thinks B is in some particular situation. In most applications of the Social Relations Model, the occasion component is treated as the error term for purposes of statistical analysis, but Kenny understands clearly that the "occasion" component, reflecting the influence of the situation on ratings, is a substantive construct in its own right.

Because the relationship among the components is additive, you can think of the SRM as a version of Anderson's additive model for impression-formation. But because the addition includes a constant, which reflects A's biased view of people in general, it's actually a weighted additive model. But it's not necessarily a strictly additive model (which would go against Anderson's results, which favor averaging). The relationship component may well be achieved by an averaging process. We're not going to get into that detail: it will be enough just to explore the surface features of the SRM.

This is because the SRM is actually quite complicated, because Kenny has built into the model two features that are not found in other, simpler models of person perception:

It analyzes impression-formation at two different levels:

The level of the individual perceiver.
The level of the dyad constituting the perceiver and the target.

It takes multiple perspectives on the problem of person perception:

The perceiver's view of the target.
The perceiver's view of himself.
The other's view of self.
And, for that matter, the perceiver's view of the target's view of the perceiver!

In applying the SRM, Kenny prefers to employ a round-robin research design, in which each person in a group rates everyone else in the group, as well as themselves. The particular rating scales can be selected for the investigator's purposes, but we might image that the ratings are of likability (Anderson), warmth and competence (Rosenberg, Fiske), or the Big Five traits of extraversion, neuroticism, agreeableness, conscientiousness, and openness to experience. Of course, selection of the traits to be rated matters a great deal: results may differ greatly if subjects are forming impressions of a person's masculinity, sexual orientation, or likeability, or extraversion.

Note that in the round-robin design, the perceiver is also a target, and the target also a perceiver -- just as in the General Social Interaction Cycle, the actor is also a target and the target also an actor. The SRM treats the individual as both subject and object, stimulus and response, simultaneously.

The mass of data from the round robin design is then decomposed into three components:

Perceiver: how the perceiver tends to view other people in general.
Target: how the target tends to be viewed by other people in general.
Relationship: how a particular perceiver views a particular target.

With the round-robin design in hand, Kenny can proceed to address a number of questions about interpersonal perception. Here are these questions, and short answers, based on some 45 studies reported in Kenny's 1994 monograph. .

Assimilation: the extent to which a perceiver, A, rates all targets in the same way (note that Kenny's usage of the term "assimilation" differs somewhat from the usual usage in social psychology, which has to do with interpreting ambiguous behavior so that it matches a particular category). In Kenny's usage, "assimilation" refers to A's stereotype concerning people in general -- or, if you will, his view of human nature. It reflects a kind of bias that colors the perceiver's impressions of particular targets.

Assimilation declines with increasing acquaintance. When a perceiver rates total strangers, these ratings are highly colored by his view of people in general. But when a perceiver rates close acquaintances, he forms quite differentiated impressions of each of them.
Women tend to for more favorable impressions than men do.

Consensus: the extent to which perceivers all rate a particular target, B, in the same way. To the extent that B makes the same impression on everyone, we may conclude that the impression is not merely a construction of an individual perceiver, but rather reflects something about the target himself.

Levels of consensus are fairly modest, and does not increase with greater acquaintance. Kenny explains this paradox with two processes, which offset each other out:

Agreement increases as perceivers become better acquainted with the target.
Agreement decreases because the shared stereotypes decrease over time.

Uniqueness: the extent to which A rates a B differently from other targets, and differently from how others view B (note that these are actually two somewhat different questions).

Uniqueness is the dominant component in ratings, reflecting the idiosyncratic nature of impression-formation.

Reciprocity: the extent to which A and B perceive each other in the same way. Reciprocity comes in several forms.

Generalized: If A generally views others as friendly, is he viewed as friendly by others? The answer, generally, is "no".
Dyadic: If A views B as friendly, does B view A as friendly as well? Again, the answer is generally "no"

Assumed Reciprocity refers to A's assumption about B (or about other people in general) -- that, if he perceives her (or them) as friendly, she (or they) will also perceive her as friendly.

This generally doesn't work either, at least for personality traits like the Big Five -- although assumed reciprocity of liking, at both the generalized and dyadic levels, "is one of the strongest effects in interpersonal perception".

Target Accuracy: the extent to which a perceiver's impression is correlated with the target's actual status. I'll have more to say about this below, in the section on Accuracy of Person Perception. But let's just say for now that target accuracy is not easy to calculate, given that, when it comes to person perception, we lack an objective, independent criterion.
Meta-Accuracy is easier to content with, because that has to do with A's impression of B's impression of A. If A thinks that B perceives him as friendly, is this in fact true? I'll also have more to say on this subject below.
Self-Other Agreement: the extent to which A rates himself the same way that B rates him.

Obviously, the level of self-other agreement is highest when the two people are closely acquainted, but it's also surprisingly high at zero acquaintance.

Assumed Similarity: the extent to which A perceives others the same way he perceives himself.

This is also known as the false consensus bias, which tells you all you need to know about assumed similarity: there's a lot of it, and most of it is illusory!
There is also good evidence for a self-enhancement effect, whereby people rate themselves higher, at least on socially desirable traits, than they do others.
There may be evidence for an opposite self-effacement effect in certain non-Western cultures.

Like Anderson's cognitive algebra, Kenny's Social Relations Model is more a method than a theory. The round-robin design, coupled with sophisticated statistical tools, can be used to partition person perception into its various components, and so to answer a wide variety of questions about a wide variety of topics in social cognition.

Person Perception as Perception

Research on impression-formation, from Asch (1946) to Anderson (1974) and beyond, has largely made use of trait terms as stimulus materials. This is certainly appropriate, because -- as Fiske & Cox (1979) demonstrated, as if we needed any proof -- we often describe ourselves and others in terms of traits. Working with traits injects substantial economies into impression-formation research, because they're easy to manage. Moreover, traits may fairly closely represent the way information about people is stored in social memory. But it's also clear that an exclusive focus on traits can give a distorted view of the process of impression formation, because traits are not really -- or, at least, not the only -- stimulus information for social perception.

We don't walk around the world with our traits listed on our foreheads, to be read off by those who wish to form impressions of us. Rather, the real stimulus information for person perception consists of our physical appearance, our overt behavior, and the situational context in which they appear. Accordingly, in addition to describing how we make use of trait information to form impressions of personality, a satisfactory account of person perception needs to answer a different sort of question -- to wit:

How do we get from the physical stimulus of the person -- his or her appearance and behavior -- to his or her mental state?

Or, put another way,

What features of the physical stimulus give rise to our impressions of a person's mental state?

Again, the same question about perception occurs in the social domain as in the nonsocial domain.

In the nonsocial domain, the problem of perception is to unpack physical stimulus information to make perceptual inferences about the object's physical state -- it's form, location, activity, and affordances of the distal stimulus.
In the social domain, the problem of perception is to unpack physical stimulus information to make perceptual inferences about a person's mental state -- his or her thoughts, feelings, and desires.

In the nonsocial domain, the stimulus information for perception consists of patterns of physical energy (the proximal stimulus) radiating from the distal stimulus, and impinging on the perceiver's sensory surfaces. In the social domain, the stimulus information for perception consists of a person's surface appearance and overt behavior. These include, among others, the person's:

facial expressions;

bodily orientation, posture, and movement;

vocal cues;

interpersonal distance;

eye contact and touching;

physical appearance, dress, and cleanliness; and

the person's local behavioral environment in which we encounter the person (particularly those aspects of the situation that are under the person's control).

Many investigators interested in the perceptual processing of physical information have been heavily influenced by Gibson's "ecological" view of social perception. Recall that, according to Gibson, all the information needed for perception (whether nonsocial or, by extension, social) is provided by the stimulus field (including the nominal stimulus and its background). And our perceptual apparatus has evolved in such a way as to enable us to perceive the world the way it really is, without any need for "higher" cognitive processes such as thinking, reasoning, or problem-solving; and, for that matter, without any need for "implicit" theories of personality. Among the leaders in this research are Ruben Baron at the University of Connecticut and Leslie Zebrowitz (nee McArthur) at Brandeis University (Connecticut is a hotbed of Gibsonian perception research). Others don't know or care much or anything about Gibson, but still focus their research on the information supplied by the physical stimulus, as opposed to language-based information provided by trait names.

Until relatively recently, there was relatively little work on these more "physical" aspects of person perception -- not least because research with trait ensembles is relatively easy to do. As a result, we have very little knowledge of the physical stimuli in the natural social world -- their basic features, and the relations among them. However, some investigators have made promising starts. The major exceptions have to do with faces and voices -- social stimuli that are amenable to fairly simple physical descriptions.

Physiognomy

The use of physical features to make inferences about character and personality has its roots in physiognomy, a pseudoscience in which a person's character was judged according to stable features of the face -- much as the 19th-century phrenologists judged character from the bumps and depressions on the skull.

The word physiognomy comes from the Greek Physis (nature) and gnomon (judge), and began with the observation that some people looked like certain animals. It was only a short step, then, to infer that those individuals shared the personality traits presumed to be characteristic of those animals. More generally, physiognomy was based on the assumption that a person's external appearance revealed something about his internal personality characteristics.

References to physiognomy go back at least as far as Aristotle, who wrote in his Prior Analytics (2:27) that

It is possible to infer character from features, if it is granted that the body and the soul are changed together by the natural affections... passions and desires....

Aristotle (or perhaps one of his students) actually produced a treatise on the subject, the Physiognomonica.

Physiognomy fell into disrepute in the medieval period, and was revived by Giambattista della Porta (De humana physiognomia, 1586), Thomas Browne (Religio Medici, 1643), and Johann Kaspar Lavater (Physiognomische Fragmente zur Beforderung der Menschenkenntnis and Menschenliebe, 1775-1778).

And it's been revived again, much more recently. In a study of transactions on a peer-to-peer lending site (Prosper.com), Durate (2009) showed that people could make valid judgments of trustworthiness based on a head-shot: the criterion was the applicant's actual credit rating and history.

Here are some physiognomic drawings by Charles LeBrun (1619-1690) a French artist who helped establish the "academic" style of painting popular in the 17th-19th centuries (from Charles LeBrun -- First Painter to King Louis XIV).

Facial Expressions of Emotion

A good example of social-perception research involving descriptions of physical stimuli is the work of Ekman (1975, 2003; Ekman & Friesen, 1975) and others on facial expressions of emotion. In his work, Ekman has been particularly concerned with determining the "sign vehicles" by which people communicate information about their emotional states to other people. The fact that such communication occurs necessarily entails that there is a receiver who is able to pick up on the communications of a sender -- and this information pickup is exactly what we mean by perception.

Facial Expressions in Art Among the many formalisms taught in European painting academies in the 17th-19th centuries were standards for the depiction of emotion on the face. Among the most popular of these texts was the Methode pur apprendre a dessiner les passions proposee dans une conference sur l'expression general et particuliere (1698) by Charles LeBrun, a leader of French academic painting. Here are samples from LeBrun's book, showing how various emotions should be depicted (from Charles LeBrun -- First Painter to King Louis XIV).
Anger	Desire	Fear
Hardiness	Sadness	Scorn
Simple Love	Sorrow	Surprise

One of Ekman's most famous findings is that people can reliably "read" certain emotions from the expressions on people's faces. This is true even when the sender and receiver come from widely disparate cultures. Close analysis of these expressions shows that each of them is comprised of a particular configuration of muscle activity. These include:

The type (or topography) of action (brow raise, nose wrinkle, lip corners down, etc.);
the intensity of action (the magnitude of the change in physical appearance resulting from an action); and
the timing of action (abrupt or gradual speed of onset, short or long duration, speed of offset, etc.).

Facial Expression	Corresponding Emotion	Analysis
	Happiness
	Sadness
	Fear
	Anger
	Surprise
	Disgust

Ekman's system for coding the facial musculature is known as the Facial Action Coding System (FACS). The system has more than 60 coding categories for various muscle action units (like the Inner Brow Raiser or the Lip Corner Puller) and other action descriptors (such as Tongue Out or Lip Wipe). Each basic emotion, and every variant on each basic emotion, can be described as a unique combination of these coding categories. And each coding category is associated with a specific pattern of muscle activity.

The Universality Thesis... and Its Discontents

Cross-cultural studies show that Ekman's basic emotions are highly recognizable across cultures. Nelson and Russell (2013) summarized several decades' worth of such studies, involving subjects from literate Western cultures (mostly, frankly, American college students), literate non-Western cultures (e.g., Japan, China, and South Asia), and non-literate non-Western cultures (e.g., indigenous tribal societies in Oceania, Africa, and South America). Subjects from all three cultures recognized prototypical displays of the six basic emotions at levels significantly and substantially better than chance. This is consistent with the hypothesis that the basic emotions, and the apparatus for producing and reading their displays on the face, is not a cultural artifact but something that is, indeed, biologically basic.

Evidence like this is generally taken as support for the universality thesis that facial expressions of the basic emotions are universally recognized. They are a product of our evolutionary heritage, innate (not acquired through learning), and shared with at least some nonhuman species (especially primates). Recognition of these emotions is a product of "bottom-up" processing of stimulus information -- essentially a direct, automatic readout from the target's facial musculature. And, as the evidence shows, they are invariant across culture. The ability to read the basic emotions from the face does not depend on contact with Western culture, literacy, or stage of economic development. The universality thesis has its origins in the work of Darwin, and also in the writings of Sylvan Tomkins, who was Ekman's mentor; but it is most closely associated these days with Ekman himself.

The universality thesis is widely accepted, but there are those who have raised objections to it, arguing that, at the very least, it has been overstated. They note, in the first place, that recognition of the basic emotions is not, in fact, constant across cultures. If you look at the results of the Nelson & Russell (2013) review, depicted above, you'll see clearly that, while recognition is significantly and substantially above chance levels, there are also substantial and significant cultural differences. Only happiness, apparently, is truly universally recognized. Recognition of surprise, and especially the more negative emotions, drops off substantially as we move to literate non-Western and then non-literate non-Western cultures.

It turns out that emotion-recognition isn't purely a bottom-up process, and that context can make a big difference.

Viewing a face against a contrary background -- say, a sad face against a background picture of an amusement park -- makes it more difficult to recognize the facial expression.
Photoshopping a face on a contradictory bodily posture -- say, a smiling face on the body of someone who is asking for forgiveness -- also makes it more difficult to recognize the facial expression.

There are also a host of methodological issues that appear to inflate levels of emotion-recognition in the classic experiments.

In the Ekman pictures presented earlier, which are used in many studies, the facial expressions are posed by actors who have been trained to display the critical cues. Pictures of "spontaneous" emotion are recognized at much lower levels than posed ones.
The subjects in typical experiment are shown several expressions at a time; accuracy is reduced when subjects must deal with pictures shown only one at a time, precluding comparison.
The typical emotion-recognition experiment is structured as a within-subjects design, meaning that subjects are are tested on all of the basic emotions. Again, accuracy is reduced in a between-subjects design, in which individual subjects deal with only a single emotion.
Most critically, most of these experiments employ a forced-choice format, in which subjects are asked to match a picture with the appropriate emotion. Recognition levels are substantially reduced in experiments employing a free-response format, in which subjects must generate their own labels for the pictures.

The bottom line is that while there is some evidence for some universality in emotion recognition, accuracy even when dealing with these six ostensibly "basic" emotions has been exaggerated to some extent. In fact, a comprehensive survey by Lisa Feldman Barrett et al. ("Emotional expressions reconsidered: Challenges to inferring emotion in human facial movements", Psychological Science in the Public Interest, July 2019) shows that, while people do tend to display these classic facial expressions, they do not do so with enough consistency across contexts, individuals, and cultures to make them reliable indicators of an individual's emotional state. Nor, for that matter, do perceivers reliably infer emotional states from facial expressions. In a commentary on their article, Dacher Keltner and Alan Cowen et al., who have worked closely with Ekman, agree that there is nothing like an isomorphism between emotional state and facial expression, and that information from the face should be supplemented with other nonverbal channels, such as posture, body movements, and speech prosody facial expressions ("Mapping the Passions: Toward a High-Dimensional Taxonomy of Emotional Experience and Expression", Psychological Science in the Public Interest, July 2019).

The Smile

Just as the face may be the pre-eminent social stimulus, so the smile may be the pre-eminent social behavior.

Smiles express happiness and interpersonal warmth, and so encourage social interaction.

Although, as Landis (1924) discovered, people also smile during a wide variety of activities, not all of which are pleasant. In fact, smiling may be the most commonly used strategy for hiding our true emotions, positive or negative.

The perception of a smile tends to bring about an imitative smile on the part of the perceiver.
Feedback from the facial musculature that creates the smile may sustain, or even enhance, the mood of the person who is doing the smiling -- in this case, two people.

So, for example, Ekman distinguishes between two kinds of smile:

the "Duchenne smile", expressing genuine, involuntary happiness, involves the zygomaticus major muscle around the mouth and the orbicularis oculi muscle around the eyes.
the "Pan American smile", a voluntary, polite smile so named after the cabin attendants on a famous, but now-defunct, airline, involves only zygomaticus major.
In a similar way, Ekman believes that the sorts of "non-happy" smiles observed by Landis (1924) were not genuine "Duchenne"smiles, and can be distinguished from the real thing by his FACS system. In fact, Ekman has identified as many as 17 distinct types of smile!

Infants as young as 10 months will smile differently to a stranger than they do to their mothers.

According to the Associated Press, customer-service employees at the Keihin Electric Express Railway Company in Japan can check their smiles against the Okao Vision face-recognition software system, to make sure that they are smiling properly at customers "Japan Train Workers check Grins with Smile" by Jay Alabaster, Contra Costa Times 07/26/2009). It's not clear whether they're being checked against a Duchenne smile or a Pan American smile.

And as another example, anger involves a large number of muscles. So, as your grandmother told you, it really does take more muscles to frown than to smile. So smile and save your energy.

Ekman's analysis of facial emotion has been used to offer a solution to a famous question in art history: what is it about the smile of Mona Lisa, in Leonardo da Vinci's famous painting (c. 1503-1505)?. It's not just Nat "King" Cole who has found this smile mysterious. Part of the mystery of the smile is the ambiguous way it's painted -- which, according to a conventional theory, reflects the "archaic smiles" in the ancient Greek and Roman paintings and sculpture that so inspired Leonardo and other artists of the Renaissance.

In 2005 NIcu Sebe, a computer-vision researcher at the University of Amsterdam, scanned the Mona Lisa with an emotion-recognition program he developed with colleagues at the Beckman Institute of the University of Illinois, and based on Ekman's analysis of facial expressions of basic emotions. Using this program, he determined that the Mona Lisa's smile consisted of 83% happiness, 9% disgust, 6% fear, and 2% anger (New Scientist, 12/17/05). So that's part of the mystery. Or maybe it's just a smirk, as in this New Yorker cartoon by Emily Flake (08/30/2021).

On the other hand, Peter Schjeldahl, commenting on the sale (for almost half a billion dollars) of another Leonardo painting, Salvator Mundi, remarked on the "ambiguous Mien" of Jesus, and went on to write "Giving an ambiguous character an ambiguous mien doesn't seem a stop-the-presses innovation. The trick of it, by the way is the same as that of the "Mona Lisa": painting different expressions in the eyes and in the mouth. When you look at one, your peripheral sense of the other shifts, and vice versa. You try to reconcile the impressions, with frustration that seeks and finds relief in awe."

Since then, work on computer recognition of emotion, based largely on facial cues, has progressed apace, evolving into a new sub-discipline known as affective computing. Among the most highly developed of these systems is Affdex, a product of Affectiva, an offshoot of the MIT Media Lab. Based largely on Ekman's FACS system, Affdex scans the environment for a face, isolates it from its background, and identifies major regions such as mouth, nose, eyes, and eyebrows -- distinguishing between non deformable points, such as the tip of the nose, which remain stationary, and deformable points, such as the corners of the lips, which change with different facial expressions. It computes various geometric relations between these points, and compares the current face to a very large number of other faces, previously analyzed, stored in memory. It then outputs a probabilistic judgment of whether the face is displaying such basic emotions as happiness, disgust, surprise, concentration, and confusion. It can distinguish between social smiles and genuine "Duchenne" smiles, and between real and feigned pain. And it does this in real time.

For an article on affective computing, including the story of how market forces turned Affdex from an emotional prosthetic for autistic people into a marketing tool, see "We Know How You Feel" by Raffi Khatchadourian, New Yorker, 01/19/2015.

FACS is intended for use by professional researchers and clinicians, including computer analysis of facial expressions. But the fact that people can reliably read emotions from other people's faces suggests that our perceptual systems are sensitive to changes in facial musculature. Ekman's FACS system is a formal description of the physical stimulus that gives rise to the perception of another's emotional states.

The stimulus faces used in much of Ekman's research comes from actors posing various expressions according to Ekman's instructions, but we can read emotional states from people's faces in other circumstances as well.

Consider, for example, photographs taken in April 2000, when Elian Gonzalez, a Cuban boy who had lost his mother during an attempt to escape from Cuba, was being sheltered by some of his mother's relatives in Miami. Elian's father, who was estranged from his mother, and had remained in Cuba, demanded that he be returned to Cuba. The US Department of Justice, for its part, determined that, legally, custody of the boy should be given to his closest living relative -- his father, who was in Cuba. The Miami relatives refused to turn Elian over for repatriation, and in the final analysis an armed SWAT team from the Border Patrol forced its way into the relatives' house to retrieve the boy. Little did the officers know that a newspaper reporter and photographer were already in side the house. The resulting remarkable sequence of photographs shows the surprise of one officer when he discovered the photographer in the room.

The Embodied Smile

Like Ekman, Paula Niedenthal and her colleagues have cataloged a number of different types of smiles, but the differences they observe go far beyond patterns of facial activity.. It turns out (to quote the headline in the New York Times over an article by Carl Zimmer, 01/25/2011), that there's "More To a Smile Than Lips and Teeth". Some smiles are expressions of pleasure, while others are displayed strategically, in order to initiate, maintain, or strengthen a social bond; some smiles comprise a greeting, others display embarrassment -- or serve as expressions of power. Niedenthal has been especially active in examining the process of smile recognition -- recognizing the differences among smiles of pleasure, embarrassment, bonding, or power. She proposes that smiles are "embodied" in perceivers through a process of mimicry, in which different types of smiles initiate different patterns of brain activity in the perceiver -- patterns that are similar to those in the brain of the person doing the smiling.

In one experiment, Niedenthal found, not surprisingly, that subjects could accurately distinguish between these different types of smiles. But when they held a pencil between their lips, essentially interfering with the facial musculature that would mimic the target's smile, accuracy fell off sharply. Similarly, the judgments of subjects in the "pencil" condition were more influenced by the contextual background, than by the smiles themselves. Apparently, mimicry plays an important role in the recognition of smiles (and, probably, other facial expressions as well).

Niedenthal's work exemplifies a larger movement in cognitive psychology known as embodied cognition or grounded cognition. For most of its history, psychology has assumed that the brain is the sole physical basis of mental life. Embodied cognition assumes that other bodily processes -- in this case, the facial musculature -- are also important determinants of mental states. And so is the environment -- like the context in which a smiling face appears. Proponents of embodied cognition do not deny the critical role of the brain for the mind. They just argue that other factors, in the body outside the brain, and in the world outside the body, are also important.

Full disclosure: Prof. Niedenthal worked in my laboratory as an undergraduate. But even so, her work on the smile is the most thorough analysis yet. For an overview of her work on smiles, see P.M. Niedenthal et al. "The Simulations of Smiles (SIMS) Model: Embodied simulation and the Meaning of Facial Expression", Behavioral & Brain Sciences, 33(6), 2010.

Ekman and Darwin

Ekman's work on facial expressions of emotion is strongly informed by evolutionary theory. Charles Darwin, in his book on The Expression of the Emotions in Men and Animals (1872), noted that the facial expressions by which humans expressed such emotions as fear and anger strongly resembled those by which other animals, such as apes and dogs, expressed the same states. Ekman assumes, as Darwin suggested, that facial expressions of emotion are part of our phylogenetic endowment, or evolutionary heritage, a product of natural selection. Ekman edited the 3rd edition of Darwin's Expression (1998).

According to Ekman, Darwin made five major contributions to the study of emotional expressions (Transactions of the Royal Society, 2009):

He treated the emotions as discrete entities, not as points on a continuum.
He focused primarily on the face (and secondarily to vocalization, posture, and other features).
The facial expressions of emotion are universal (though emotional gestures might be culture-specific.
Emotions are not unique to humans, but are found in many other species, especially vertebrates.
The facial expressions of emotion stem from "serviceable habits" -- the raised upper lip characteristic of anger, for example, also exposes the teeth, which our evolutionary forebears used as weapons of attack and defense.

Based on comparative studies of emotional expression in different cultures, Ekman has suggested that there are at least six basic emotions, each associated with an evolved mode of facial expression:

joy
sadness
fear
anger
surprise
disgust.

There may also be other basic emotions, also "hard-wired" through natural selection:

contempt
anguish.

Ekman's evolutionary theory of facial emotion is interesting, but we do not have to accept it to construe his work on emotional expression as an aspect of person perception. After all, the fact that people can "read" emotions in others' faces is precisely what we're interested in: how we get from physical stimulus information -- the facial expression -- the perception of the states (cognitive, affective, conative) of the person.

Emotion Perception Beyond the Face

The face is a major channel for communicating emotional states, and probably the most important, but it is not the only one. Tone of voice, gesture, posture, and gait are also available channels -- although they have not been given as much systematic attention as the face.

The importance of nonfacial expressions of emotion is underscored by cases of Moebius Syndrome, a congenital condition first described by Paul Julius Moebius in 1888. The condition entails a paralysis of the facial musculature (the illustration shows Kathleen Bogart, who has Moebius syndrome, and who studies the disorder, with her husband, Beau, from the New York Times, 04/06/2010). People with Moebius syndrome cannot express emotions on their faces, so they must find other means of emotional expression, including both verbal and nonverbal channels. They have no difficulty recognizing other people's facial expressions, however. And they still feel various emotions. This is important, because a major theory of emotion communication implies that we mimic other people's facial expressions, and feedback from our own facial expressions shapes both our perception of their emotional states, and our own emotional experience. That can't happen in cases of Moebius syndrome, of course, because the facial paralysis prevents the feedback. so either mimicry isn't important, or there are other mechanisms for emotion perception.

A similar problem is encountered in Bell's palsy, a neurological condition involving the usually) temporary paralysis of the facial musculature caused by inflammation of the VII cranial nerve. Jonathan Kalb, a theater professor at Fordham University, has written about his own experience with Bell's palsy in "Give Me a Smile" (New Yorker, 01/12/2015). He has never completely recovered from the illness, with the result that his smile is "an incoherent tug-of-war between a grin on one side and a frown on the other: an expression of joy spliced to an expression of horror). Kalb reports that he has difficulty communicating positive affect to other people, and they have difficulty reading positive affect from his facial expressions. He also suggests that, because of disrupted feedback from the facial musculature, he has diminished experience of pleasant affect, and must engage other, compensatory strategies -- some drawn from tricks employed by Method actors.

The Determinants of Physical Attractiveness

Another prominent topic for person perception research has to do with the perception of facial beauty. We know from research on interpersonal attraction that physical attractiveness is the most powerful determinant of likeability (e.g., Berscheid & Walster, 1974). And we also know that likeability -- evaluation, in Anderson's terms -- influences a host of social judgments through the halo effect. But exactly what determines physical attractiveness remains a mystery. As Berscheid and Walster (1974), two social psychologists who are probably the world's foremost experts on interpersonal attraction, concluded, "There is no answer to the question of what constitutes beauty".

Why is this question important? One reason is Thorndike's halo effect. People tend to believe (regardless of whether it's actually true) that socially desirable features go together. Therefore, if someone is physically attractive, they'll also tend to think that they're socially attractive -- on the "warm" end of the social good-bad scale, and on the "intelligent" end of the intellectual good-bad scale. As the English Romantic poet John Keats wrote (in Ode on a Grecian Urn, 1819), "Beauty is truth, truth beauty".

Interestingly, there may also be a reverse halo effect. Vincent Yzerbt, Kocolas Kervyn, and their colleagues have found that, when comparing people with each other, a person (or group) who receives high ratings on warmth may receive low ratings on competence, and vice-versa. Apparently, the traditional halo effect occurs when evaluating individuals separately, while the reverse halo effect occurs when comparing one individual with another.

There's no question about the bias toward facial attractiveness -- and not just in bars and bedrooms. A number of writers have commented on the pervasiveness of "lookism", a concept modeled on racism, having to do with discrimination against those who are less than a perfect "10" (to use the title of a 1979 movie on this theme starring Dudley Moore, Julie Andrews, and Bo Derek as the eponymous beauty. For more on lookism, see the following books, discussed by Rachel Shteir in "Taking Beauty's Measure" (Chronicle of Higher Education, 12/16/2011):

Hope in a Jar: The Making of America's Beauty Culture by Kathy Peiss (1998).
The Beauty Bias: The Injustice of Appearance in Life and Law by Deborah L. Rhode (2010).
Beauty Pays: Why Attractive People Are More Successful by Daniel S. Hamermesh (2011).
Erotic Capital: The Power of Attraction in the Boardroom and the Bedroom by Catherine Hakim (2011).
Pricing Beauty: the Making of a Fashion Model by Ashley Mears (2011).
And last, but by no means least, Lip Service: Smiles in Life, Death, Trust, Lies, Work, Memory, Sex, and Politics by Marianne LaFrance (2011), a social psychologist and director of the Women's Studies Program at Yale.

The Role of Averageness

Actually, maybe there is. A large body of literature now strongly suggests that attractiveness is strongly related to averageness -- in other words, that we find most attractive those faces (and, for that matter, bodies) that are close to the average for the population. As counterintuitive as that may seem, there are actually good reasons to think that average faces really are highly attractive.

According to evolutionary theory, natural selection has a normalizing function. Because (again according to evolutionary theory) extreme values on any feature mark genetic mutations, features that are at or near the average mark reproductive fitness. Therefore, mate selection (which, according to evolutionary theory, is all about reproductive fitness) will prefer those with average features.
According to prototype theory, category prototypes may be thought of as the average of the instances of a category. We respond to category prototypes as if they were familiar -- which they are, because they look like so many category instances; and we tend to prefer the familiar to the unusual.

Theory aside, a study by Langlois and Roggman (1990) does indicate that, as an empirical fact, average faces are more attractive. In this study, full-front face-and-neck photographs of people bearing a pleasant,

neutral expression (with background and lighting controlled) were digitized. Each face was matched on the location of the eye pupils and the lip midline, and then composites were created through a computer averaging program. The results of the study were clear: composite faces were preferred to individual faces, and the more faces that went into the composite (from 2 to 32), the more the composite face was preferred. The more a face reflects the average of all faces, the more attractive it is.

Referring to the Berscheid and Walster (1974) quote above, Langlois and Roggman (1990) concluded that the question of facial beauty had been solved: [A]ttractive faces... represent the central tendency or the averaged members of the category of faces".

Averageness or Symmetry?

But the question isn't entirely resolved, because evolutionary psychology has a somewhat different answer to the question of why we prefer average faces. According to evolutionary psychology, patterns of experience, thought, and action that were adaptive in our ancestral environment (the Environment of Early Adaptation -- roughly the East African savanna during the Pleistocene era) have been preserved in current members of the human species through natural selection. In this view, mate selection prefers healthy, fecund mates: facial symmetry is a marker of health and fecundity, while fluctuating asymmetries on the face (and elsewhere on the body are signs that the organism is unhealthy, and less desirable from the point of view of reproductive fitness. Averaging eliminates these fluctuating asymmetries, and produces symmetrical faces. So, according to evolutionary psychology, average faces are not attractive because prototypes seem familiar, but because average are more symmetrical.

There's certainly anecdotal evidence in favor of a connection between symmetry and attractiveness. Queen Nefertiti, wife and co-ruler of ancient Egypt with the Pharaoh Akhenaten (14th century BCE) was widely acclaimed as the most beautiful woman in the ancient world (this was before Helen of Troy): her name Nefertiti even means (in rough translation) "The perfectly beautiful woman has come". And Nefertiti is portrayed in images that survive from her time as having a perfectly symmetrical face. Of course, these are only images -- we don't know what she really looked like. She might just have had a good public-relations firm.

But we do know what the actress Elizabeth Taylor (who died in 2011 at age 79) looked like, and she really did have a perfectly symmetrical face -- and was universally acknowledged as fabulously beautiful.

On the other hand, 20th-century culture gives us lots of examples of very attractive woman who have prominent fluctuating asymmetries on the face: consider, for example, the prominent "beauty marks" on the faces of Marilyn Monroe and Cindy Crawford. Beauty marks are called "beauty marks" precisely because they enhance the person's facial beauty, but as fluctuating asymmetries they're supposed to mark a lack of reproductive fitness, and thus make the person less attractive, not more.

So something's wrong with the evolutionary argument. In fact, the evolutionary story, like many of the "just-so" stories that abound in evolutionary psychology, sounds good, but doesn't stand up to close scrutiny.

In the first place, the connection between facial attractiveness and reproductive fitness appears to be pretty weak, perhaps nonexistent. Kellick, Zebrowitz, Langlois, and Johnson (1998) analyzed data from the Intergenerational Studies conducted by the Institute for Human Development at the University of California, Berkeley, which included data from a large group of individuals who were born in the Berkeley-Oakland area between 1920 and 1929. These subjects had been photographed as adolescents, and health assessments had been made on them during adolescence (ages 11-18), middle age (30-36), and old age (56-66). There was essentially zero correlation between facial attractiveness, rated from the adolescent photographs, and health at any stage of life. So, attractiveness does not seem to serve as a marker of health -- and thus of reproductive fitness. Men aren't attracted to women because they think they'll produce lots of healthy babies. Men are attracted to women because -- well, they're attractive.

In the second place, the relation between averageness and attractiveness does not appear to be mediated by symmetry. In another experiment, Rhodes, Sumich, and Byatt (1999) employed computer-averaged composites of facial photographs that varied in their averageness, as defined in the Langois et al. (1990) study. Subjects then rated these photographs on symmetry, pleasantness, and attractiveness.

The raw (zero-order) correlation between attractiveness and averageness (r = .77) was much higher than the corresponding correlation between attractiveness and symmetry (r = .43).
Using a statistical technique called the partial correlation, Rhodes et al. recalculated the correlation between attractiveness and averageness, controlling for symmetry; and also the correlation between attractiveness and symmetry, controlling for averageness. Both correlations dropped a little, as partial correlations do. But the important point is that averageness continued to correlate with attractiveness, even with symmetry partialled out. Therefore, the correlation between averageness and attractiveness was not an artifact of the symmetry of average faces.
Similar findings were obtained for pleasantness of emotional expression.

Rhodes et al. concluded that averageness (the opposite of distinctiveness), symmetry, and pleasantness each make an independent contribution to physical attractiveness. We find pleasant, symmetrical, average faces attractive, but the attractiveness of average faces is not an artifact of their symmetry. (Nor, Rhodes et al. argued, is it an artifact of blending, which tends to remove distinctive characteristics even if they are symmetrical.)

Rhodes et al. (1999) assert that their experimental results "settle the dispute" between averageness and symmetry. But the question remains why average faces are attractive. Their best guess (and mine) is that average faces look like lots of other faces, and so they seem familiar; and we know from the mere exposure effect that we find the familiar more attractive than the unfamiliar.

Beyond Averageness and Symmetry

As the examples of Marilyn Monroe and Cindy Crawford suggest, there's probably more to facial beauty than averageness and symmetry. In addition to those "beauty marks", there's skin tone, body-mass index and waist-to-hip ratio -- and a genuine smile.

"Babyfacedness"

Another facial feature that has been studied by researchers of person perception is babyfacedness. Ethologists such as Konrad Lorenz have long noted that immature organisms, whether mammals or even birds and reptiles, share certain features in common:

enlarged eyes and lips;
soft, chubby cheeks;
fine eyebrows;
pug nose;
large cranium, relative to the face; and
non-sloping forehead.

Lorenz suggested that babyfacedness constitutes a "universal stimulus" that elicits care-taking behaviors, and inhibits aggression. Leslie Zebrowitz (nee McArthur) picked up on Lorenz's ideas, and has studied the consequences of babyfacedness for social perception and social interaction.

Using computer "morphing" programs, it is possible to take line drawings or photographs of faces and adjust their features to make them appear more or less baby-like. Subjects then rate the targets for various personality traits. Research by Zebrowitz and her colleagues generally finds that people perceive baby-faced individuals as warmer, weaker, more naive and trusting; they are also more likely to help baby-faced people, even when help isn't needed.

In one study, Friedman and Zebrowitz (1992) took schematic drawings of male and female human faces and manipulated their facial features to create or erase aspects of babyfacedness. As it happens, there is a sex difference here, with the typical female face possessing more "baby-faced" features than the typical male face. Therefore, by adding baby-faced features they made the typical male face more "babyish" in appearance, and the typical female face appear more "mature". They then had male and female subjects view the sketches, and make ratings of their impressions of the targets' personalities.

Baby-faced males and females alike were rated lower on power, compared to their mature-faced counterparts. But because the typical male face has more mature features than the typical female face, the typical male was rated as more powerful than the typical female.

Baby-faced females (but not baby-faced males) were rated higher on warmth than their mature-faced counterparts. Again, because the typical male face has more mature features than the typical female face, the typical male was rated as less warm than the typical female.

Perhaps not surprisingly, because of the sex difference in babyfacedness, baby-faced males and females alike were rated lower on masculinity (and thus higher on femininity), compared to their mature-faced counterparts. Again, because the typical male face has more mature features than the typical female face, the typical male was rated as more masculine than the typical female.

Baby-faced females (but not baby-faced males) were rated more likely to be the "child caretaker" in the family higher on warmth than their mature-faced counterparts. Again, because the typical male face has more mature features than the typical female face, the typical male was rated as less likely to be a child caretaker than the typical female.

Baby-faced females (but not baby-faced males) were rated less likely to be the "financial provider" in the family than their mature-faced counterparts. Again, because the typical male face has more mature features than the typical female face, the typical male was rated as more likely to be a financial provider than the typical female.

These are social stereotypes, of course, but that's the point: social perceivers use the physical properties of the face to make inferences about the emotions and dispositions of the person.

The baby-faced stereotype is so commonly held that it has been employed in cartoon characters (such as Elmer Fudd and Tweetybird). And in political humor as well. After Vice President Cheney was involved in a quail-hunting accident, in which he peppered one of his companions with bird shot, he was depicted as Elmer Fudd -- a clear contrast between the befuddled lovableness of the cartoon character and Cheney's own reputation as a humorless right-wing ideologue.

Of course, "baby-facedness" is a stereotype, and stereotypes can be misleading, and sometimes downright wrong. For example, Umar Farouk Abdulmutallab, the "Underpants Bomber" of Christmas Day, 2009, was commonly described in press accounts as "baby-faced".

Beyond the Face in Person Perception

Most research on social perception has focused on the face, which is after all the most salient, perhaps the quintessential, social stimulus. However, other nonverbal cues play a part in person perception, including vocal (prosodic) cues, gestures, and other aspects of body language. Here, in a famous photograph from the New York Times (1957), the future president Lyndon B. Johnson (then Majority Leader of the United States Senate) discusses a point of legislation with a colleague.

Body Language

Edward T. Hall first drew attention to several aspects of body language in his popular book, The Hidden Dimension (1966).

Proxemics, or the study of social distance.
Posture
Calypsis, or the strategic covering and uncovering of body parts.
Gesture

Robert Rosenthal and his colleagues have developed a psychological test to assess individual differences in people's sensitivity to nonverbal cues -- including, but going beyond, facial cues (Rosenthal, Hall, DiMatteo, Rogers, & Archer, 1979). The Profile of Nonverbal Sensitivity (PONS) consists of 220 2-second audio/video clips portraying a 24-year-old woman (Judith Hall, now a Professor of Psychology at Northeastern University) acting out a set of 20 vignettes. The subjects' task is to guess which of two vignettes is being acted out.

The vignettes are classified into a 2x2 scheme crossing positive-negative with dominant-submissive, with 5 scenes in each category.

Positive/Dominant: e.g., admiring the weather, or talking about a wedding;
Positive/Submissive: e.g., asking for a favor, or helping a customer;
Negative/Dominant: e.g., threatening someone, or nagging a child; and
Negative/Submissive: e.g., asking forgiveness, or returning a faulty purchase.

Each vignette, in turn, presents information over one of several nonverbal channels of communication:

3 Visual Channels Alone: full figure, face alone, or body alone, all with no vocal cues.
2 Vocal Channels Alone: there are no visual cues at all; moreover, the content of the actor's speech has been rendered unintelligible by one of two methods: electronic filtering of certain frequencies or (literally) cutting up the audiotape and splicing it back together randomly. In either case, only the "tone of voice" remained.
6 Combinations of Each Visual and Each Vocal Channel.

The 11 communication channels, x 20 affective scenes, yielded the 220 items of the PONS test.

Rosenthal and his colleagues were interested in using the test to measure individual differences in sensitivity -- i.e., in perceptual ability to various channels of nonverbal communication. In the context of this course, the PONS is a good illustration of the point that there are physical sources of social information beyond the face, including vocal and gestural cues.

Person Perception Beyond the Body

Person perception is shaped by the person's physical features, but it is also influenced by aspects of his or her dress -- what one hides and reveals, what one draws attention to are also stimulus cues as to a person's internal psychological state. Even a person's office or bedroom can provide clues to his or her personality.

A Different Sort of "White Coat Syndrome"

An interesting illustration of this point occurred in 2000, in a dispute over the dress code at Duke University Medical Center (a similar dispute also arose at the Massachusetts General Hospital in Boston, one of Harvard's teaching hospitals). At Duke, physicians wore two different types of white coats: a knee-length duster was restricted to senior physicians, while a hip-length jacket was imposed on interns and residents. Thus, in the Duke (and MGH) environment, the type of white coat worn by a physician conveyed information about the person wearing it --his or her level of training, and presumed levels of knowledge and expertise.

Interestingly, this long coat-short coat tradition has a long history. In England and America, through the 19th century, there was a professional distinction between physicians and surgeons. Physicians diagnosed and cured illness, while surgeons removed diseased body parts. Physicians, who enjoyed a higher social status than surgeons, wore long coats, while the lower-status surgeons wore short coats. When medicine and surgery were united in the 20th century, this long coat-short coat distinction was transferred to the person's degree of training.

Actually, the professional distinction between physicians and surgeons is still honored in some ways: Columbia University has a College of Physicians and Surgeons, while British medical students work toward two professional degrees, the MB, or Bachelor of Medicine, and the ChB, or Bachelor of Chirurgerie (surgery).

Anyway, Duke's medical residents complained about this policy. They said that they usually wore white trousers with the short white jacket, and the combination made them feel like ice-cream vendors. More important, perhaps, patients picked up on the fact that senior physicians wore long coats, and often questioned the profession, status, and competence of those who wore short jackets. Moreover, female residents were often confused with nurses, who also wore white coats; for the same reason, male residents were often confused with the janitorial staff. The short coat-long coat distinction even affected relations among the medical staff. Senior staff were more likely to speak to residents who wore long coats, and especially more likely to address them as equals.

In the event, Duke altered the dress code as it pertained to residents, but short coats were still imposed on interns. So, class distinctions prevailed after all!

The white coat, which has been worn by physicians since the scientific revolution in medicine of the late 19th century, is part of the identity of most physicians, and part of the concept of physician held in the mind of the public. If nothing else, it creates a link between medicine and science.

"The coat is part of what defines me, and I couldn't function without it", said Dr. Richard Cohen, a clinical professor of medicine at Weill Medical College of Cornell University and an attending physician at New York-Presbyterian Hospital. "When a patient shares intimacies with you and you examine them in a manner that no one else does, you'd better look like a physician -- not a guy who works at Starbucks".... A Postgraduate Medical Journal study in 2004 found that 56 percent of patients surveyed felt that physicians should wear them. About 94 percent of schools of medicine and osteopathy in the united States have "white coat ceremonies" whereby new students don the garment to signify their entry into the profession {note by JFK: Geez, they used to gt a little black bag and a stethoscope]. ("The Lab Coat Is On the Hook In the Fight Against Germs" by Thomas Vinciguerra, New York Times, 07/26/2009).

But not for long. As Vinciguerra notes, it has become increasingly clear that the white coat, whether long or short, is a major carrier of bacteria and thus a major source of hospital-based infections, and a major contributor to morbidity and mortality -- not to mention increased healthcare costs. In 2007, the British national health system adopted a policy of "bare below the elbow" banning lab coats as well as long-sleeved shirts and blouses, neckties, long fingernails, and jewelry on the hands and wrists. Maybe physicians will have to settle for wearing their stethoscopes.

Lie Detection as a Problem of Person Perception

Much of Ekman's work on facial emotion, and much of the interest in nonverbal communication generally, has to do with the detection of deception -- or, put bluntly, with lie detection. How can we know when someone is deceiving us? Note that, in terms of person perception, the question of deception is this: how can we know that a person is deceiving us about his or her internal mental state -- about what he or she is thinking, feeling, or desiring? Ekman's work on behavioral (as opposed to physiological) lie-detection has been extremely influential. He has consulted with law-enforcement agencies at all levels of government, and has even "gone Hollywood" as a consultant to the TV show Lie to Me (Fox), about Dr. Cal Lightman (played by Tim Roth), a "human polygraph" who can read body-language "micro-expressions".

The problem, of course, is that people rarely tell us that they are lying -- what would be the point of that? (Even Epimenides, the Cretan philosopher of the 6th-century BC, who asserted that "All Cretans are liars", could not have been lying, because if all Cretans really were liars, he -- a Cretan himself -- would have been telling the truth, thus disproving his own statement). Instead, we usually have to infer, from their nonverbal behavior, that their verbal communications are not accurate.

And let's be clear -- lying is a serious problem of social perception. DePaulo et al. (1996) conducted a study of everyday lying by means of diary study in which subjects were asked to keep track of all of their social interactions for a week, including instances in which they lied. They found that lying is a common feature of social interaction. College students recorded lying about twice a day, on average, in 1/3 of all their social interactions. A community sample lied somewhat less often: about once a day, in about 1/5 of their interactions. DePaulo et al. hasten to point out that most of these lies were trivial, but they were untruths nonetheless.

Many lies are self-oriented:

People lie to enhance their own socially desirable traits, as when you say you got an A on an exam when in fact you got a B.
And people lie to escape punishment, as when a child says that the cat tipped over the goldfish bowl.

Other lies are other-oriented:

People lie to protect the feelings of other people, as when a husband tells his wife that her dress doesn't make her look fat.
And people lie to protect relationships, as when a woman denies to her boyfriend that she's been flirting with another man.

So lying is an important aspect of social interaction, and so our ability to detect lying is an important aspect of social perception.

Can "Only a Few" "Tell a Liar"?

It turns out that we are surprisingly bad at this. Our poor lie-detection abilities were dramatically illustrated in a study by Ekman and O'Sullivan (1991). For this study, they created 10 1-second video clips, showing the full head-on view of a target's face and body. Then the target described his or her positive emotions as s/he was viewing a video. Half of the targets were viewing a pleasant nature scene, in which case s/he was telling the truth about his/her emotional state. The other half of the targets were actually viewing a very gruesome scene -- in which case, s/he was not telling the truth. The subjects' task was to identify which of the targets were telling the truth, and which were lying. Ekman and O'Sullivan tested several different groups of subjects, ranging from college students and psychiatrists to law-enforcement officials.

Averaged across all the groups, the subjects were only about 57% correct -- barely above chance levels. Only agents of the United States Secret Service, a branch of the Treasury Department that has responsibility for protecting the President and other high officials, were particularly good at picking out liars: 53% had 70% or greater accuracy, compared to 50% "chance" level.

A later study by Ekman, O'Sullivan and Frank (1999), focused on subjects who had special professional interests in lie-detection yielded similar, if somewhat better, results. This time, the subjects achieved about 63% accuracy -- which is better than chance, but not all that great. The top scorers (those with 70% or greater accuracy) were federal "law enforcement" officers, most of whom were actually agents of the Central Intelligence Agency.

A cautionary note: The data in this study was collected while Ekman delivered a research on behavioral lie detection to various professional audiences. After presenting the film clips, Ekman revealed to his audience which targets had been lying, and which telling the truth. He then asked the members of the audience to raise their hands if they got 10, 9, 8,etc. correct, and tallied the results. Given natural tendencies for self-enhancement, it seems likely that the audience self-reports of accuracy were inflated somewhat. By how much, however, we cannot know for sure. Presumably, the same polling procedure, also resulting in possibly inflated scores, was employed in a 1999 follow-up

So while most people are pretty bad at behavioral lie-detection, some people are better than others. Ekman argued that lie-detection is possible when perceivers pick up on the leakage of nonverbal cues. For example, people tend to display "Duchenne" smiles when telling the truth, but "Pan American" smiles when telling lies. Their vocalizations also tend to show an increase in fundamental pitch. Ekman and his colleagues were able to detect this leakage through special means, such as viewing the videos at slow-motion and noticing micro-expressions of affect that are incongruent with the content of the target's message. However, these micro-expressions can also be picked up in real time, especially by people -- like Secret Service and CIA agents, perhaps -- who have had a lot of experience with distinguishing truths from lies. In the 1991 and 1999 experiments, the successful subjects were able to pick up on these instances of leakage.

However, the situation is a little more complicated than this, because even some of the "top scorers" didn't perform better than chance. This is a little counterintuitive, because you'd think that anything better than 50% would count as "greater than chance". But as Nickerson and Hammond (1993) pointed out, when the probability of a hit p equals probability of a miss q, even 8 hits out of 10 is not significantly greater than chance with p < .05 (actually, it just misses).

Using a more stringent criterion of 8/10 hits, the proportion of high-scoring Secret Service agents in the Ekman & O'Sullivan (1991) study fell from 53% to 29%. That might be a better rate than the ordinary person on the street, but it's nothing to write home (or a paper!) about.
And in the Ekman et al. (1999) study, the percentage of high-scoring Federal officers would fall from 74% as well (Ekman et al. don't provide data that would permit the actual calculation).

Still, as Ekman & O'Sullivan (1993) pointed out in reply, the Secret Service agents did better than anyone else. The point of this is not to criticize Ekman's work, but rather to point out that the determination of "better than chance" levels of responding isn't quite as simple as it would seem, intuitively, to be.

The detection of deception can be construed as a problem for signal detection theory. In contrast to traditional analyses of accuracy, which focus on hits (and their obverse, misses), signal-detection theory focuses on hits and false alarms -- in this case, instances where a target is called a liar but is actually telling the truth. If you call everyone a liar, you'll correctly identify every actual liar, but you'll also misidentify all the truth-tellers. Good lie-detection will maximize hits while minimizing false alarms.

A bigger problem has to do with the measure of "accuracy" employed in these Ekman studies, which takes only correct responses into account. For example, in the studies described, a subject would "catch" 100% of the liars simply by calling everyone a liar. So it's important to take error into account. From this perspective, we can classify subjects' responses into four categories:

Correctly calling liars "liars" -- in this context, these would be true positives (TP).
And correctly calling truthtellers "not liars" -- these would be true negatives (TN).
Calling truthtellers "liars" would be an example of false positives (FP).
And calling liars "truthtellers" would count as false negatives" (FN).

There are some ways of taking false positives and false negatives into account.

A measure called precision or positive predictive value takes account of false as well as true positives:

PPV = TP / (TP+ FP).

A measure called sensitivity or the true positive rate takes account of false negatives as well as true positives:

S = TP (TP + FN).

It was just to address this problem that signal detection theory (SDT) was invented (Green & Swets, 1966; see also Tanner & Swets, 1954). In sensory psychophysics, the observer's problem is to discriminate between trials in which a signal is presented against a background of noise, and other trials in which only noise is presented, no signal. On any trial, an observer might actually detect the signal. Alternatively, he might miss the signal, because it's too faint. Or, the signal might be strong enough, but he might miss it because he's not expecting it. Or, he might miss it because the costs of making a mistake are relatively low. There are other possibilities. In any event, the point is that the observer's performance must take account f both the observer's sensory acuity and his s biases, expectations, and motivations. SDT does this by separating performance into two parameters:

a "bias-free" measure of sensitivity, often labeled d' ("d-prime") or A' ("A-prime")
a measure of bias, often labeled as beta, C or B" ("B-double-prime").

For the purpose of this course, you don't need to know how to calculate either d' or beta (or any of the other SDT parameters). You just need to know the concepts.

Signal-detection experiments are set up so that on some trials (e.g., half), a signal is presented against a background of noise; on other trials, the signal is omitted, and only the noisy background is presented. On each trial, the observer responds with a "Yes", indicating that the signal was present, or a "No", indicating that it was absent. The 2x2 arrangement yields the proportion of trials representing "Hits", "Misses", "False Alarms", and "Correct Rejections". The translation of this framework into the lie-detection situation is obvious.

Unfortunately, all too many studies of lie-detection aren't amenable to analysis in terms of SDT, because all too many investigators fail to report false alarms as well as hits. This was the case with the Ekman & O'Connor (1991) study, but Ekman et al. (1999) did report separate values for accuracy in lie-detection and accuracy in truth-detection, which enables us to calculate the false-alarm rate (as 100% - accuracy in truth detection).

For the Federal Officers, accuracy in lie-detection was 80%, while accuracy in truth-detection was about 66% -- which means that truth-tellers were falsely called liars about 34% of the time. Applying the formulas of SDT yields a d' measure of sensitivity of about 1.26, and a C measure of bias of about -.21. All by themselves, these numbers aren't too meaningful. But if you construct tables representing the values of various combinations of hits and false alarms, you can get a sense of the subjects' performance. A d' = 1.26 puts the Federal Officers about in the middle, between randomness (i.e., no sensitivity, or d' = 0) and almost-perfect performance (99% hits and 1% false alarms, yielding d' = 4.65). Similarly, a C = -.21 indicates a slight "liberal" bias toward "Yes" -- that is, a bias toward calling targets liars.
For all the subjects in the 1999 study, the results were much the same, except that all of the subjects taken together, including the Federal Officers, showed less sensitivity (d' = .66), and only a very slight liberal bias (C = -.07).

Moreover, there is another, more subtle problem with the Ekman/O'Sullivan studies, which is that the targets in these studies were individuals who were determined, by the FACS system, to be "leaking" cues, especially through their faces, that they were lying instead of telling the truth. The good lie-detectors were apparently able to pick up on these cues, so that they were able to perform better than chance. But how representative are these "leaky" liars. In developing their stimulus materials, Ekman and his associates drew on a sample of 31 individuals who were instructed to deliberately lie. Only 10 of these liars (32%) actually leaked cues that were picked up by the FACS system, and were used as targets in the 1991 and 1999 studies. This means that the remaining 21 liars (68% of the total), who were excluded from the experiment, didn't leak any (facial) cues that revealed them to be lying. So, the good lie-detectors in the Ekman studies were only "good" with respect to their ability to pick up certain facial cues to deception in those liars who were so poorly skilled in lying that they leaked them in the first place.

Apparently, if most people are bad at detecting lies, most of us are pretty good at lying. In fact, maybe that's why we're so bad at lie-detection: it's not so much that we're bad at detecting lies, but that we're so good at lying undetectably!

Or, put another way, lie-detection is a problem of signal-detection, and people are typically bad lie-detectors because so often there is no signal to detect!

Lie-Detection in Lab and Life

Ekman and O'Sullivan based their conclusions about people's lie-detection abilities on their own studies -- where, frankly, the experimental procedures are somewhat informal (the subjects are typically members of the audience to whom Ekman is giving a talk). More systematic laboratory research comes to much the same conclusion: People just aren't particularly good at it.

In their reviews of the experimental literature, Kraut (1980), Vrij (2000), and Bond and DePaulo (2006) all found that the average receiver was barely better than chance at detecting lying under natural conditions -- that is, when the senders included both leaky and non-leaky liars. Things looked a little better, though, when B&DeP looked at continuous ratings of honesty, rather than dichotomous judgments of lying. Under these circumstances, honesty ratings distinguished between liars and truth-tellers to a modest degree.

The overall accuracy rate was 54%, barely better than chance, broken down as follows:

Accuracy at detecting lies, 47%.
Accuracy at detecting truths, 61%.

Of course, these results can also be cast in terms of signal-detection theory. Applying the formulas, as we did in the study by Ekman et al. (1999):

A d' = 0.20 puts the average subject very close to randomness (i.e., no sensitivity, or d' = 0) and far from almost-perfect performance (99% hits and 1% false alarms, yielding d' = 4.65).
Similarly, a C = 0.18 indicates a slight "conservative" bias toward "No" -- that is, a bias toward calling targets truthful. You can see hints of this same sort of bias in the Ekman et al. (1999) study, among those subjects who were not Federal Officers).

Part of the problem with lie-detection may be that most of us may tend to assume that people are telling the truth. B&DeP discovered a small "truth bias" -- to judge that people are telling the truth, even when they're not. This effectively reduces our ability to make correct judgments of the matter.

B&DeP also examined a number of other variables that might affect lie-detection performance:

	There is a modality effect, such that lie-detection is more accurate when receivers have access to an audio channel, as well as -- or instead of -- a video channel. This may also help explain Ekman & O'Sullivan's results: because of their emphasis on facial cues, they typically present subjects with only a visual channel.

	Detection of deception does not depend on whether the sender is highly motivated to be believed. In fact, there is a paradox of motivation such that the receiver's "truth bias" is reduced when the sender is highly motivated to deceive. Still, the receivers are not very accurate at correctly detecting deception. Nor does it depend on whether the sender has been given an opportunity to prepare to deceive. Nor does the receiver's expertise matter much (remember, even Ekman and O'Sullivan's experts weren't all that good, and there aren't that many of them to begin with. However, detection of deception is better if the receiver has had some prior exposure to the sender -- that is, prior to the experimental test.

	A very provocative finding is that third parties, observing the interaction between the sender and the receiver, are better at detecting deception than the receivers are.

Another problem is that people do not have very accurate knowledge about valid cues to deception -- and many of their beliefs about valid cues turn out to be wrong. When Miron Zuckerman (1981, 1985) reviewed research on nonverbal cues to deception, he discovered that there were a number of valid cues on which we could base a judgment that a person was being deceptive. However, there were two important aspects of his findings:

First, none of these cues is pathognomonic of lying -- that is, always diagnostic of a lie. Some nonverbal cues are correlated with lying, but the correlations are far from perfect.
People frequently believe that certain cues are valid that are not; or, they believe that certain cues are more valid than they are; and they ignore some cues that are, at least to some degree, actually valid.

	Some cues are components of the deceiver's verbal behavior.
	Other cues are paralinguistic -- that have less to do with what the deceiver says than how he or she says it.
	Still other cues are visual rather than vocal or auditory.
	And then there are some miscellaneous cues to deception.

Despite the availability of such cues, people are surprisingly poor at reading them -- partly because they're attending to cues that are, in fact, invalid!

Similar findings were obtained in a review by DePaulo et al (2003), who examined more than 100 studies and more than 150 possible cues.

Facial expressions and body language were relatively poor cues to deception.

The best cues were general nervousness, and elevated vocal pitch.

More valid cues were found in the subjects' verbal behavior:

Liars' narratives contained fewer details.
They behaved in an ambivalent manner toward their interrogators.
Their narratives were generally viewed as relatively implausible.

Presumably, though, people could be taught to read these cues properly, in the same way that Ekman's FACS system presumably teaches people to read people's emotional expressions more accurately. If so, the detection of deception from verbal and nonverbal cues, like the reading of facial expressions of emotion, is a perceptual skill that can be acquired through perceptual learning, much as people can learn to adjust to viewing the world through distorting prisms.

This is, in fact, the premise of a program, initiated by the Transportation Security Administration, called SPOT -- for Screening of Passengers by Observational Techniques. Beginning in 2007, the TSA spent approximately $200 million per year training personnel to spot behavioral cues to deception in airline passengers' facial expressions and other aspects of body language. However, a November 2013 evaluation by the Government Accountability Office recommended that the SPOT program be terminated, on the grounds that it was adequately supported by scientific evidence. In large part, the GAO based its conclusions on Bond and DePaulo's 2006 review. Ekman's response is that B&DeP relied too much on laboratory studies of "low-stakes" lies, which may not generalize to the real-world problems faced by TSA screeners. (For a journalistic account of the debate, see"The Liar's 'Tell'" by Christopher Shea, Chronicle of Higher Education, 10/17/2014.)

Lie Detection in Forensic Settings

Ekman's studies suggest that law-enforcement personnel -- or, at least, some of them -- tend to have acquired a particular perceptual skill of lie-detection. But Bond and DePaulo's studies, among others, suggest that most people don't have the knack. And other critiques suggest that even the skills of trained law-enforcement personnel may be exaggerated.

First, let's get one thing -- actually, two closely related things - -straight. Lie detectors don't work very well either.

The traditional polygraph, which records various indices of autonomic nervous system functioning, such as heart rate and blood pressure, are very poor except under very tightly controlled conditions -- conditions that don't usually obtain in actual field settings.

More recent innovations, such as the use of EEG and brain-imaging methods (including so-called "brain fingerprinting") don't work any better.

Lie-Detection and Berkeley

The traditional polygraph was developed at Berkeley, based on an earlier prototype developed by William Moulton Marston, a Harvard psychologist. August Vollmer, at that time the police chief for the city of Berkeley, collaborated with John Larson, a UCB physiology PhD who had joined the police force as a patrolman (!). Beginning in 1920, Larson tested a wide variety of suspects on his apparatus. When Vollmer left Berkeley to become the police chief in Los Angeles, in 1923, he took another polygraph enthusiast, Leonarde Keeler, with him (Larson, for his part, took the device to a new job in Chicago at the Institute for Juvenile Research). It was Keeler, in fact, who coined the term polygraph, and popularized the technique within law-enforcement circles.

Subsequent legal debates over the validity of polygraphic lie detection resulted in the Frye Rule concerning the admission of scientific evidence in court -- that "expert testimony deduced from a well-recognized scientific principle or discovery" requires that "the thing from which the deduction is made must be sufficiently established to have gained general acceptance in the particular field in which it belongs".

For the whole story, see The Lie Detectors: The History of an American Obsession (2010).

The only physiological technique that is really good at detecting lies is the Guilty Knowledge Test, devised by David T. Lykken, which assesses a suspect's "secret knowledge" of the details of a crime. So, for example, in the case of a stolen watch, the investigator might ask the suspect whether the stolen watch was a Bulova or a Rolex. All things being equal, an innocent person will not respond differentially to the two probes; but a guilty person will know the truth, and this knowledge will show up in his physiological response. Studies have shown that the GKT produces a hit rate of 80-90%, and a false alarm rate of less than 10%. The trick, of course, is that there has to be some aspect of the crime that only the perpetrator would know. For this reason, it is not always possible to use the GKT; but when it's possible, it's dynamite.

Traditionally, the GKT is performed with a traditional polygraph. More recently, EEG and fMRI have been touted for this purpose, but in this case, the neural signature is just another physiological response. People may be more inclined to believe a "neural signature", but it should be clear that the EEG or fMRI is nothing more than a hopped-up polygraph.

So how do law-enforcement personnel determine who is lying to them? To some extent, they rely on nonverbal cues, such as facial expressions and posture, a la Ekman. But new work focuses on what people actually say, rather than how they say it (see "Judging Honesty by Words, not Fidgets" by Benedict Carey, New York Times, 05/12/2009). Problems with false confessions have led police to focus their interrogations on gathering information about the crime and a suspect, instead of forcing a confession.

But even in the determination of honesty and dishonesty, there are linguistic as well as paralinguistic cues that can make a difference.

Liars tend to prepare a script ahead of time that is tight, but lacking in detail -- and then they stick to it.
Truth-tellers tend to recall lots of incidental, extraneous details, they may change their stories over time, and they may make mistakes. When recounting the event repeatedly, truth-tellers may add up to 20-30% more details over trials -- again, these are often often extraneous details.

But even these kinds of clues are far from infallible. In one experiment, the rate of lie detection was only about 70%

In the absence of disciplined perceptual learning, however, most of us are pretty bad at detecting deception -- we perceive people as lying who are telling the truth, and we perceive people as telling the truth who are in fact lying.

For more about lie-detection, see "Lie Detection: What Works?" by Tim Brennen and Svein Magnussen, Current Directions in Psychological Science, 2023. From the Abstract:

A reliable lie-detection method would be extremely useful in many situations but especially in forensic contexts. This review describes and evaluates the range of methods that have been studied. Humans are barely able to pick up lies on the basis of nonverbal cues; they do so more successfully with systematic methodologies that analyze verbal cues and with physiological and neuroscientific methods. However, the rates at which people are able to detect lies are still well below the legal standard of “beyond a reasonable doubt.” This means that the utmost caution must be exercised when such methods are employed. In investigations where independent evidence exists, there is emerging evidence that interviews based on a free account followed by the gradual introduction of the evidence by investigators can reveal inconsistencies in a guilty interviewee’s account. Automated machine-learning methods also hold some promise.

"Gaydar" as Person Perception

Person perception is the perception of a person's internal mental states of knowledge and belief, feeling and desire. In addition to making judgments of competence, neuroticism, extraversion, and the like, we also make judgments of other people's sexuality -- both sexual orientation in general, and -- if we're interested -- sexual interest in us. The problem of judging sexual orientation is known colloquially as gaydar -- the idea that people, especially gay people, can intuitively tell whether another person is gay or not.

Beginning with a set of studies by Rule and Ambady (2008; Rule et al., 2009), a number of studies have demonstrated that people can identify, at better than chance levels, a target's sexual orientation based on visual, auditory, and even olfactory (don't ask) cues, even when the stimulus is severely degraded (e.g., exposures of only 50 milliseconds in duration).

A representative study is one by Lyons et al. (2014a), in which women, self-identified as straight or lesbian, viewed head-shots of men and women who were self-identified as gay or straight on social media. The study was conducted over the internet, and the subjects were simply asked to classify each target as homosexual or heterosexual. Women were pretty good at this, averaging about 61% hits (i.e., classifying as gay people who really were gay, and straights as straight), and about 27% false alarms, for both male and female targets. Both values differ significantly from the chance level of 50%. Applying signal-detection theory yields substantial values for the d' measure of accuracy; it also revealed a bias toward classifying women as gay, especially by perceivers who themselves were lesbians.

Earlier research by Joshua Tabak and Vivian Zayas (2012) employed more degraded stimulus materials. They presented (mostly female) judges with very brief (50 msec) flashes of faces of male and female targets who were self-identified as heterosexual or homosexual, and found that subjects were accurate in judging the targets' sexual orientation about 60% of the time -- as compared to the 50% accuracy that would be expected just by chance. Women were more accurate than men, Judgments of women's faces were more accurate (64%) than those of men's faces (57%). Although researchers have not (yet) uncovered the specific cues that perceivers use in this task, Tabak and Zayas found that judgments were more accurate when the faces were presented right-side up, as opposed to upside-down. Presenting faces upside-down disrupts disrupts facial recognition -- what is known as the face inversion effect (Valentine, 1988; Farah et al., 1995). The face inversion effect, in turn, is commonly attributed to configural processing -- that is, people recognize faces not just by recognizing someone's nose, or eyes, or mouth as individual features, but rather by recognizing the length of the nose relative to the distance between the eyes -- you get the drift (Maurer et al., 2002). Anyway, the superiority of rightside-up presentation indicates that it was a configuration of cues, rather than individual features that was the relevant cue. Research by Nicholas Rule suggests that the mouth may be an important cue. Perhaps, Tabak and Zayas speculate, perceivers judged "effeminate" male faces and "masculine" female faces (as indicated, for example, by the ratio of width to height) as more likely to belong to homosexuals. But they didn't actually test this.

Studies of "gaydar" were taken to a new level by a study reported by Yilun Wang and Michal Kosinski (JPSP, in press 2017) that garnered considerable media attention, drawing articles in The Economist, the New Yorker, and the New York Times. Wang and Kosinski employed 35,000 images of the faces of white men and women who had reported their sexual orientation on online dating sites (there weren't enough minority gays to permit analysis). When they presented these images to a group of human judges (recruited through Mechanical Turk), the humans' judgments of sexual orientation were correct approximately 61% of the time for male faces and about 54% of the time for female faces -- barely better than chance, and in line with the findings of Tabak & Zayas (2012). However, when W&K submitted the same faces to an off-the-shelf pattern-recognition program, the machine's judgments were much better: 81% correct for male faces and 71% correct for females. If the program was given five different faces for each target, overall accuracy increased to 91%. Apparently, two factors contribute to the increased accuracy of the machine: (1) by virtue of being a computer processing a huge database, it was able to process much more cue information than would be possible for a human perceiver; (2) it employed available cue information more reliably in making its judgments. At the same time, W&K make clear that the machine was, essentially, doing what the human judges were doing: assigning stereotypically "feminine" male faces and stereotypically "masculine" female faces to the "gay" category. Emphasis on stereotyping. The machine is not even, necessarily, a good model of human "gaydar", because it's likely that people rely on other aspects of appearance and behavior to make these judgments -- a man who has an inordinate interest in musical theater, perhaps, or a woman who's really into carpentry. Of course, these too are stereotypes. It's stereotypes all the way down. And not necessarily accurate stereotypes, either.

Yes, stereotypes can be accurate, in the sense that they can accurately capture what a group is like on average, even if it's not accurate with respect to all the individual group members. In fact, Lee Jussim (Behavioral & Brain Sciences, 2017) has argued that even racial and gender stereotypes are more accurate than usually believed. I think his evidence is actually pretty weak, but he's right in principle that stereotypes are not necessarily inaccurate representations of groups.

Moreover, even accuracy of 74-91% shouldn't be overestimated, because of the low base rate of homosexuals in the population. Consider this example taken from an article about the W&K study which discusses other controversies surrounding this study ("Why Stanford researchers Tried to Create a 'Gaydar' Machine" by Heather Murphy, New York Times, 10/10/2017). Assume, for purposes of argument, that 5% of the population is gay. A facial-recognition algorithm that is 91% accurate would mistakenly classify 9% of straight people as gay, and 9% of gay people as straight. In a sample of 1000 individuals, that would mean that 4 or 5 of the 50 homosexuals (1000 x .05) x .09) would be mistakenly classified as straight, while as many as 85 of the 950 ((950 x .05) x .09) heterosexuals would be mistakenly classified as gay. The problem is not so much with the algorithm as with the base rates: with a low-baserate event, like homosexuality, there are going to be a lot of mistaken classifications.

The W&K study was parodied in the New Yorker in "Modern Science", by Paul Rudnick, the American playwright and humorist (12/04/2017). Excerpts follow:

On several occasions, when a photo of an especially attractive subject was scanned, the hardware would disappear from the lab for many hours and then return with a sheen of perspiration and the categorization "YES....

The presence of a single arched eyebrow and a slight contraction of the lips cannot be used as evidence of male homosexuality, except when the subject is examining furniture from West Elm....

The algorithm was able to ascertain sexual preference with 98% accuracy when using only photos of the subjects' shoes....

Photos of male, female, and non-binary subjects currently attending progressive liberal-arts colleges refused to be categorized as "gay" or "straight," and made disgusted noises.

Facial Recognition and Artificial Intelligence

Ekman's work, and research like W&K's study of "gaydar", signaled a trend toward the use of artificial intelligence and machine learning to create algorithms for facial recognition. In an important Op-Ed article in the New York Times, Sahil Chinoy (a UCB graduate in physics and economics who worked at the Times before going on to graduate school in economics at Harvard), discusses some of the problems with the practice ("The Racist History Behind Facial Recognition", 07/14/2019). See also "Spying on Your Emotions" by John McQuaid, Scientific American 12/2021.

One of these problems, at least from the point of view of social policy, is the "perpetual lineup" problem: if a photograph (from, say, closed-circuit TV) can be matched against millions of photographs from a database of driver's licenses, then, in a sense, we're always under surveillance. We are very quickly headed toward a surveillance society in which we're always being watched, identified, and tracked whenever we're outside the privacy of our own homes. And even in our homes, we're already part of a surveillance economy in which our every Google search and Facebook like will result in an advertisement appearing on our computer screens.

Another problem, from the point of view of psychological research and theory, is that the idea of identifying people's internal mental (especially emotional) states from their facial expressions, bodily postures, gestures, and the like may be simply wrongheaded. Ekman's work, and other work like his, has been severely criticized by Lisa Feldman Barrett and other researchers who point out that the correlations between facial expressions and emotion are far from perfect. Chinoy cites a report from the AI Now Institute argues that, based on the current state of both scientific knowledge and computer technology, widely available AI systems for identifying race, sexuality, emotions, and personality traits are "being applied in unethical and irresponsible ways".

In his article, Chinoy traces the current enthusiasm for facial recognition technology back to its roots in the 19th-century pseudosciences of phrenology and physiognomy. In phrenology, people's traits and states are identified by virtue of bumps and depressions in their skulls which ostensibly correspond to high or low levels of benevolence or conscientiousness. In physiognomy, traits are thought to correspond to people's physical appearance -- a person who looks like a fox, for example, was thought to be sly. We laugh at such notions, perhaps, and recognize that they're based on the crudest form of stereotyping. But these ideas have staying power. Sir Francis Galton, who almost single-handedly invented psychometrics in the late 19th century, superimposed pictures of convicts one on the other, hoping that the average would reveal "the essence of the criminal face". And Cesare Lombroso, a 19th-century proponent of physiognomy, argued that intellectual inferiority could be determined from face and body measurements. More recently (2016, to be exact), a group of Chinese researchers employed essentially the same method in an attempt to reveal the "average face" corresponding to criminality.

Prior Probabilities and Baserate Neglect

The ability of people to detect other people's sexual orientation, even with degraded exposure, is impressive. Still, as with Ekman's studies of lie-detection, it should not be exaggerated, because, as with Ekman's famous studies, there is a subtle procedural feature that magnifies the subjects' accuracy levels. Not to pick on it, because it's a perfectly good study as far as it goes, let's take the Lyons study as an example. Like most other signal-detection studies, the "signal" (i.e., a gay target) was "on" for half the trials -- that's just how these studies are done. And when half the targets were gay, the subjects were pretty good -- though far from perfect -- at "detecting" their sexuality. But the problem is that, in the real world outside the laboratory, half the targets aren't gay. A reasonable estimate of the proportion of gays in the population is closer to 5%, and that changes everything.

Bayes' Theorem

The reason it changes everything has to do with Bayes' Theorem, first proposed by Thomas Bayes, an English clergyman who also dabbled in statistics, in the 18th century (the origin myth is that he was trying to formulate a statistical proof of the existence of God). The problem in Bayes' Theorem is to determine the likelihood that some proposition (A) is true, given some observation or evidence (B). Bayes argued that in calculating this probability, you have to take account of the base-rates: (1) first, the probability that A is true, regardless of B; (2) and second, the probability that B is true, regardless of A.

Here's a restatement and expansion of Bayes' Theorem, so you can see how the calculations work out in what follows. For a nice introduction to Bayes' Theorem, see the fabulous book by Reid Hastie and Robyn Dawes, Rational Choice in an Uncertain World (2001; 2nd Ed., 2010).

Applying Bayes' Theorem to Gaydar

In the current context, the accuracy of gaydar can be reformulated as follows:

The hypothesis is that the target is gay.
The evidence is the face in the picture.
What is the likelihood that the target is gay, given the evidence at hand -- that is, the features of the face presented to the subjects -- and taking into account the base-rates of both gayness and those features?

In a friendly critique of the Lyons study, Ploderl (2014) applied Bayes' theorem, which takes account of base rates, to the calculation of detection accuracy. Given a base rate of 5%, a hit rate of 70% and a false-alarm rate of 20% (both figures are reasonably close to what Lyons found) would yield "gaydar" accuracy of only 15%. Even a more liberal base-rate estimate of 10% increases gaydar accuracy only to about 22%. That's still not bad: as Dr. Johnson once said about a dog who could walk on its hind legs, "It is not done well; but you are surprised to find it done at all". Still, accuracy of 15-22% is a lot lower than 70%.

As Lyons et al. (2014b) pointed out in reply, Ploderl's analysis undercuts the ecological validity of their findings to some extent, but the point of the study was simply to demonstrate that gaydar can be accurate at all. This finding then will motivate future laboratory research intended to identify the valid (and invalid) cues to sexual orientation. For that purpose, the problem of baserate neglect (so named by Kahneman & Tversky, 1974) isn't really a problem. These studies did not identify the precise visual cues that the subjects employed to make their judgments, though other research has shown that judgments of sexual orientation are based largely on stereotypes: men who have "feminine" features, and women who have "masculine" features, are more likely to be classified as gay.

Applying Bayes' Theorem to Lie-Detection

A similar problem crops up in the Ekman studies.

As already noted, his targets were already an unrepresentative sample of liars who were known to "leak" facial cues to deception.

Only about 1/3 of his liars were "leakers", meaning that most liar don't leak.
And meaning that the facial cues leaked by these leaky liars lose validity when applied to the population of liars as a whole.

But again, in the Ekman study, half of the targets were liars and half were not, which -- as with the homosexuals in the Lyons study, probably overestimates the proportion of liars in the population. Perhaps not, of course, depending on the population!

But consider the problem of an airport TSA screener, who is asking each passenger, essentially, "Are you going to blow up this plane?". The baserate is going to be low, certainly lower than 50%. If the baserate is 10% -- on September 11, 2001, 4 of the 37 passengers on Flight 93 were hijackers -- then the true level of accuracy is going to go down from the best accuracy rate of 70% or so obtained by Ekman et al. (1999) from their sample of sheriffs and federal officers.
Even more so when you consider that, in 2001, Newark Airport handled more than 30 million passengers, or about 2,500,000 passengers per month, or about 83,000 per day -- which means that the ratio of lying hijackers on September 11 was about 1:21,000!

Again In the current context, the accuracy of gaydar can be reformulated as follows:

The hypothesis is that the target is lying.
The evidence is whatever nonverbal and .verbal cues are available to the perceiver.
What is the likelihood that the target is gay, given the evidence at hand -- that is, the constellation of nonverbal and verbal cues, given the base-rates of lying and those cues?

So now let's return to the question of lie detection, and apply Bayes' Theorem to that situation. To begin with, here's an example from the Conceptual Tools website developed by Neil Cotter, a professor of electrical engineering at the University of Utah. In his discussion of Bayes' Theorem, he considers a polygraph lie detector that detects lies with about 90% accuracy. That is, the probability that the lie detector says "You lied" when you really did lie is .89, and the probability that the machine says "You told the truth" when you really did tell the truth is .90.

Legend	Probability
DL = Detector says "You Lied"	p(DL \| L = .89
DT = Detector says "You told the Truth"	p(DT \| L = .11
L = You actually Lied	p(DL \| T) = .10
T = You actually told the Truth	p(DT \| L) = .90

According to Bayes' Theorem, however, the situation is not that simple, because we have to take into account the baserate of lying -- which is probably not the 50% rate built into a standard signal-detection experiment. Let's assume, in fact, that lies are relatively rare, occurring only 5% of the time. So, to make a long story short, we have to multiply the raw probabilities by their respective base-rates.

Thus, p(L) = .05;

and p(T) = .95.

Applying Bayes' Theorem, p(L | DL) = ((.89)*(.05)) / (((.89)*(.05)) + ((.10)*(.95))) = .32. In other words, what looked like an accuracy rate of 9/10 is reduced to about 1/3.

Now what about human lie-detection?

Let's consider first the Bond & DePaulo (2006) study, which found an overall accuracy of 54%. But those studies employed targets who were 50% liars and 50% truthtellers. If we follow Cotter's example, above, and assume that the base rate of lying is 10% (not his 5%), the probability of correctly identifying a liar drops to p = .12 -- a substantial loss of predictive accuracy.

Now let's turn our attention to Ekman et al.'s (1999) study of lie detection. Recall that their best human lie-detectors, most of whom were CIA agents, were 80% accurate in detecting liars. They were 66% accurate in detecting truthful statements, which means that they called truth-tellers liars 34% of the time.But that was with 50% liars in the target pool. Suppose that the proportion of liars is smaller than that -- say, 10%. Applying Bayes' Theorem, the accuracy of these Federal Officers drops considerably, to p = .21.

If the baserate of lying is 5%, as in the Cotter example above, accuracy drops even further, to about p = .11.

Now let's imagine that Ekman's Federal officers were working as Transportation Security Administration (TSA) agents on the morning of September 11, 2001, when 4 hijackers boarded United Airlines Flight 93 at Newark International Airport. Let's assume that the hijackers lied about their intention to fly the plane into the World Trade Center. There were 37 passengers on that flight, so the proportion of liars among the passengers is .11. Applying Bayes' Theorem, we can determine that the likelihood of correctly identifying even one of the hijackers falls from .80 to .23. But it's actually worse than that, because roughly 82,000 passengers passed through Newark Airport that day, only 4 of whom were lying hijackers. Again applying Bayes' Theorem, we determine that the likelihood of correctly identifying one of the hijackers falls to p = .0001 -- that's 1 chance in 10,000.

All of this is not to criticize Bond and DePaulo, or Ekman. These are just illustrations of how difficult it is to form accurate perceptions of people with respect to low-probability features.

Figure and Ground in Person Perception

For a variety of reasons, most work on nonverbal aspects of person perception is based on the assumption that certain physical stimuli are intrinsically related to certain mental states. One example is Ekman's work on facial expressions of emotion; another is Zebrowitz's work on babyfacedness. But one of the important lessons of nonsocial perception is that stimulation is ambiguous, forcing perceivers to go "beyond the information given" by the stimulus to form mental representations of the external world.

One way of going "beyond the information given" is to draw on pre-existing knowledge, expectations, and beliefs.
Another way is to pay attention to the context in which the stimulus occurs.

Context also played a role in the controversy over Elian Gonzalez, discussed earlier. Members of the anti-Castro Cuban exile community played up the obvious fear on Elian's face, as he was taken from his uncle in Miami. But the Clinton Administration published a later photograph of an apparently happy Elian reunited with his father. William Safire, a conservative columnist for the New York Times (and former speechwriter for Richard Nixon), wrote of the episode (04/24/00):

"Which photo do you believe? Photograph #1... was shot by Alan Diaz of the Associated Press. Were other shots taken in the instants before and after? Was anything posed? Who was first to grab the child? What were the subjects saying. The news organization is objective; we'll believe its report.

"Photograph #2... was credited "courtesy of Juan Miguel Gonzalez," carefully posed for propaganda purposes. It was taken -- after nobody knows how much cajoling -- by Gregory Craig, President Clinton's personal lawyer, who was hired by [a] left-wing church group serving Fidel Castro's interests"

Stimulus and Context in "Academic" Art

Up through the 19th century, so-called "academic" artists - -that is, artists who had been schooled in one of the Academies of Fine Arts, as opposed to those who had been apprenticed to a master -- were taught formal rules for portraying various emotions through certain facial expressions, postures, and gestures. These artists followed these rules, so that viewers would understand what their paintings were attempting to convey. However, many artists discovered that they could use these same expressions, postures, and gestures to convey precisely the opposite emotion, depending on the context. They could depend on the fact that their viewers shared the same knowledge and expectations as the painter, and so would interpret the painting properly.

In an example made famous by the art historian Edgar Wind (1937, 1986), the 18th-century English portrait painter Joshua Reynolds (who became the first president of the British Royal Academy) noted that

"There is a figure of a Bacchante [also known as a Maenad) leaning backward, her head thrown quite behind her, which... is intended to express an enthusiastick frantick ([both sic] kind of joy.... This figure Baccio Bandinelli, in a drawing... of the Descent from the Cross, has adopted... for one of the Marys, to express frantick agony of grief.


The Bacchante	The Cross	Comparison

Here is another example, also from Reynolds via Wind


Another Bacchante	Another Cross	Another Comparison

The same posture on the body, the same gesture of the arms, the same expression on the face -- these convey frantic joy and enthusiasm when presented in the context of the drunken, licentious, orgiastic revels associated with the Roman god Bacchus (also known as the Greek god Dionysus, from which we get the word bacchanalia), the Greek god of wine; but they convey frantic grief and agony when presented in the context of the crucifixion and death of Jesus, whom Christians believe to be the Word of God incarnate. The difference in interpretation is created by differences in which the context is presented.

Other art historians have noted similar instances of the contextual reversal of emotional meaning.

For example, E.H. Gombrich (who himself wrote many interesting books about the psychology of art) reports that Aby Warburg (1866-1929), a German historian of Renaissance art (the Warburg Institute in London is named after him), took a great interest in such context-based "inversions" of meaning.

	"The artist who uses [classical 'engrams', or symbolic poses]... can use them in a different context, 'invert' their original savage meaning, and yet benefit from their value as expressive formulae. In this way, Bertoldo de Giovanni had used the model of a pagan maenad to give expression to the passionate grief of the Magdalen under the cross,
	and Donatello... used a relief on the cover of a sarcophagus, in which Pentheus is torn to pieces by maenads, for a composition of his own in the Santo in Padua. Where pagan frenzy had represented a leg torn off by insensate women, the Christian artist 'inverted' the scheme to glorify the healing of a broken leg.
	Similarly Agostino di Duccio's relief representing the rescue of two children by San Berbardino adapts the forms but ''inverts' the meaning of a classical sarcophagus which represents the terrible tale of Medea's murder of her children."

Wind notes: "Perhaps the shrewdest advice Sir Joshua Reynolds gave his students was... a fundamental law of human expression:

'It is curious to observe, and it is certainly true, that the extremes of contrary passions are with very little variation expressed by the same action'".

This is possible because context changes the perception of the stimulus; or, put another way, the stimulus varies from context to context.

The moral of the story is that in order to properly perceive an object or event, including -- especially -- a social object or event, we must extract information from the stimulus in context, because the context is also part of this stimulus, and combine information extracted from the stimulus-in-context with pre-existing knowledge, expectations, and beliefs stored in memory.

Perception involves extracting information from the stimulus. But as F.C. Bartlett (1932) forcefully reminded us, "The psychologist, of all people, must not stand in awe of the stimulus". As Jerome Bruner has argued, perception entails "going beyond the information given" by the stimulus, combining information from the stimulus, and its environmental context, with other knowledge, beliefs, and expectations -- what might be called the cognitive context of perception.

Perceiving Behavior

The study of social perception is dominated by the problem of person perception, or impression formation, but we don't just perceive people: we also perceive their actions.

Wegner and Vallacher proposed their action identification theory to describe what goes on when we think about their own actions, and the actions of others that they observe. In particular, they were interested in how we focus on low-level details or high-level gist, and on how the meanings of events change as we get closer to them, or further away, in time.

The Accuracy of Person Perception

One implication of the self-fulfilling prophecy, and related effects, is that it doesn't really matter whether an actor's perceptions, expectations, and beliefs are correct -- all that matters is what they are, because those internal mental states determine our behavior. But obviously, accuracy is important. The purpose of perception is to enable us to know the world around us. This raises the question of the accuracy of social perception: do the mental representations of the objects and events we encounter in the social world accurately reflect their actual existence, structure, and states? The question of accuracy arises in nonsocial perception as well, as exemplified by research on visual and other illusions. The Gibsonian approach assumes that the perceptual apparatus evolved in such a way as to enable us to perceive the world the way it really is. But the constructivist approach admits that even our nonsocial percepts can be biased and distorted by knowledge, beliefs, and expectations. This must be even more the case in the social domain, given the vague, fragmentary, and ambiguous nature of the stimuli we encounter in the social world.

As Kenny and Albright (1987) note, interest in the accuracy of person perception has its roots in the intelligence-testing movement in the early 20th century. If psychologists could measure individual differences in intellectual skills, then they ought to be able to measure individual differences in social skills as well. Chief among these is empathy, which we can define as a person's ability to understand the attitudes, feelings, and experiences of another person. And, as a matter of sheer logic, empathy requires accuracy in social perception -- the ability to accurately read another person's mental states. Accuracy was also an issue in the analysis of clinical decision-making -- that is, whether psychiatrists and clinical psychologists were accurate in diagnosis mental illness, or predicting the outcome of treatment. Not to mention personnel selection, including college admissions: is an applicant the "right person" for a particular job or school?

Clinical vs. Statistical Prediction

A major feature of this early work was a debate over clinical vs. statistical prediction -- that is whether clinicians' impressions of patients, derived from their subjective appraisal of the patients themselves, interview records, and psychological testing, was superior to "actuarial" predictions derived by techniques such as multiple regression from objective data. The answer, quite clearly, was no. This was the finding of early studies by Sarbin (1943) and Meehl (1954), and has been born out by virtually every study since then (e.g., Mischel, 1968; Wiggins, 1973).

No matter what the assessment context, statistical methods of combining information have proved to be generally superior to impressionistic "clinical" methods.
The vanishingly few exceptions to this rule, which find that clinical prediction is superior to statistical prediction, are methodologically flawed, and corrections showed, once again, that statistical prediction was superior.
In some studies, clinical and statistical prediction were tied; but because statistical prediction is more economical (in terms of time, effort, and dollars) than clinical prediction, statistical prediction wins on grounds of utility (efficiency, cost-benefit analysis).

Regression equations based on unit weighting predict as well as more complicated ones based on actual beta weights.
Predictions based on objective tests have more utility than those based on "projective" tests like the Rorschach or Thematic Apperception Test.
Predictions based on the paper record have more utility than those based on interviews -- which is only one reason to oppose the way we organize Visitors' Day for potential graduate students.

The only exception is the critical incident interview pioneered by David C. McClelland and his associates, which is tantamount to a performance-based work-sample. But hardly anyone ever does a genuine critical incident interview, so the point is moot!

An excellent example of the virtues of statistical prediction comes from a study of objective lie detection by Hartwig & Bond (2014). They acknowledged, based on reviews such as DePaulo et al. (2003) and Bond & DePaulo (2006), that subjective lie detection is not very good. That is, people's subjective impressions of whether people are lying are not very accurate -- barely better than chance. They then raised the question of the accuracy of objective lie detection. That is, given all the various cues to deception, could a statistical algorithm combine all the available data to produce predictions of deception that would surpass the subjective, "clinical" impressions of human judges. Of course they could. For this purpose, H&B calculated the correlation coefficients between each of some 60 cues surveyed by DePaulo et al. (2003), and then entered these correlations into a multiple regression equation. Although the validity of the individual cues was relatively low, M |r| = .24, the multiple R =.52.

As a matter of statistics, such correlations are inflated by chance associations, so the proper method is to engage in double cross-validation. That is, you divide your sample in half, and calculate R for each half separately, and apply each of the resulting regression equations to the other half of the sample -- that is, the sub-sample from which it was not derived. The resulting cross-validity coefficient R was .42. This is lower than .52, to be sure, but it's still pretty good. The cross-validated multiple-regression equation yielded a validity of 68%, a substantial improvement over the 54% seen in the subjective, impressionistic, "clinical" judgments. The multiple R were highly stable across various conditions, such as the liar's demographic background, the motivation to deceive, the deception medium (visual, oral, written), etc.

Hartwig and Bond (2014) concluded that "signals of deception are manifested in constellations rather than single cues". But the more basic point is that there are, after all, valid cues to deception. The problems with subjective, "clinical" judgments of deception are:

Judges may pay attention to invalid cues.
Judges may not pay attention to valid cues.
Judges may weight valid cues more, or less, strongly than they deserve.
Judges may combine valid cues in sub-optimal ways.

These problems do not arise when the data are combined in an objective, actuarial manner by means of statistical formulas such as multiple regression.

We'll return to the difference between subjective and objective cues for deception later.

Cronbach's Analysis of the Accuracy Problem

As if the advantage of statistical over clinical prediction weren't bad enough, in 1955 Cronbach began publishing a series of papers that called into question most of the research on accuracy that had been published up to that time. In order to understand Cronbach's critique, consider a simple impression-formation experiment in which a a group of subjects must judge each of a set of targets on a set of traits (like the Big Five). In his analysis, Cronbach argued that both the judgments and the criteria against which they are validated consist of four components of accuracy.

Never mind, for a moment, what the objective criterion for a trait like "extraversion" is! Assume that the judges are basing their impressions on video clips of the targets, and that their impressions are being validated against the targets responses to an objective personality questionnaire. Never mind, for a moment, the validity of the personality questionnaire!

Elevation: the tendency of a judge to rate all targets favorably or unfavorably.

For example, a judge might rate all targets favorably on all the Big Five dimensions.

Stereotype Accuracy: the tendency of a judge to rate all targets more or less favorably on a particular trait, compared to other judges.

For example, a judge might rate subjects more favorably on Extraversion than other judges do.

Differential Elevation:the tendency of a judge to view some particular target more or less favorably than other judges do.

For example, a judge might rate a particular target more favorably on all the Big Five Dimensions.

Differential Accuracy: the judge's view of a particular target on a specific trait, after the other three components have been removed from the equation.

For example, a judge might rate a particular target more favorably on Extraversion than other subjects do.

Cronbach's basic point was that differential accuracy is at the heart of accuracy, because it has to do with the uniqueness of the individual target. And it can't be evaluated until the other components of accuracy have been accounted for, and removed from consideration -- which, Cronbach argued, hardly anyone had done up to that point.

Cronbach's point was well taken, but his critique had the unintended effect of stopping research on accuracy dead in its tracks.

Experimental social psychology shifted its focus to attitudes and persuasion, as well as the study of situational influences on behavior, as exemplified by the Milgram experiment.
It did not help, probably, that the cognitive point of view, as exemplified by the Thomas Theorem, symbolic interactionism, and the like, implied that "accuracy" was beside the point. What mattered was the way the actor perceived the situation, not whether the perception was accurate.
To make things even worse, beginning in the late 1970s and early 1980s the "errors and biases" program in social cognition (initiated in response to the early work of Kahneman and Tversky, among others) simply assumed that social perception was inaccurate, riddled with errors (like the Fundamental Attribution Error) and biases (like the self-serving bias in causal attribution).

Still, interest in accuracy did not disappear entirely. It was maintained by researchers in judgment and decision-making, who continued to be interested in the accuracy -- or, at least, the adaptiveness -- of social judgment. I'll have more to say about this line of research and theory in what follows. It was also maintained by a new generation of personality researchers, who insisted -- against the claims of social psychologists -- that personality traits really did exist, they were not merely figments of the imagination, and could be judged on the basis of behavior by external observers and targets themselves. Given the assumption that personality traits really did exist after all, it made sense to consider how accurate judgments of personality were. Hence, a new marriage was consummated, between personality and social psychologists, around the topic of person perception -- of impressions of personality and their validity.

Accuracy in Kenny's Social Relations Model

Kenny's (1994) Social Relations Model, follows Cronbach's analysis by decomposing person perception into its constituent components, but because his research designs differ from Cronbach's, his components differ as well. Take the perception of interpersonal warmth as an example. The SRM considers that A's perception of B's warmth is given by the sum of four (4) quite different perceptions:

The constant: How warm people in general think that people in general are. This is also known as the constant or elevation, and is independent of the target.
The actor: How warm A thinks people in general are. This represents the judge's response set, also independent of the target.
The partner: How warm other people think B is in particular.
The relationship: How warm A thinks B is in particular. This

Thus, the accuracy of A's perception of B depends on the accuracy of each of these component perceptions.

Kenny & Albright (1987) review the literature on the accuracy of person perception from the standpoint of Kenny's Social Relations Model, and Kenny (1994) collects evidence about accuracy.

For Kenny, a major problem in person perception is that we rarely have independent, objective evidence of how a target stands on the characteristic in question. There is no meter giving a direct readout of B's level of interpersonal warmth. All we have is the evidence from B's behavior -- and, to make things more difficult, all A has is evidence of B's behavior in the presence of A. If we judge the accuracy of A's impression of B by the agreement between A's judgment and the consensus of others about B, we must assume that B behaves in the presence of A the same way that B behaves in the presence of those other people. If B behaves differently with A than he does with others, than all bets are off. And any error on A's part is no fault of A himself.

Actually, there are three types of target accuracy:

Perceiver accuracy: the correlation between how A tends to see others in general, with how others generally behave in the presence of A.
Generalized accuracy: the correlation between how A is seen by others and how A generally behaves in the presence of others.
Dyadic accuracy: The correlation between how A uniquely views B and how B behaves in the presence of A.

In addition, there are at least two other aspects of person perception that are important. Continuing with the example of interpersonal warmth:

Self-perception: How warm A believes himself to be.

The general finding, in western cultures at least, is for self-enhancement:people tend to see themselves as better than others.

One exception is neuroticism, where people tend to see others as more emotionally stable than they are.

In eastern cultures, one may find more evidence for self-effacement: people tend to see others as better than themselves.

Meta-Accuracy: How warm A believes that B believes that he, A, is.

The general finding is a perceiver effect, in which people believe that others see them in the same way that they themselves do.

Funder's Realistic Accuracy Model

In social perception, the issue of accuracy has usually been framed in terms of traits: if we say that a person is extroverted, is he really extraverted? What is the correlation between a judge's ratings of a target's extraversion and the target's true level of extraversion? Of course, this begs the question of whether personality traits such as extraversion actually exist. I tend to think that they don't, but for the purposes of these lectures I'm going to assume that they do -- for the simple reason that, since the first studies by Solomon Asch, this assumption lies at the core of almost all research on person perception. So we're stuck with it for purposes of exposition. Still, it has to be understood that, if the question is whether we perceive other people's traits accurately, it would be nice if those traits actually existed to be perceived.

Among the most prominent models of accuracy in person perception has been the Realistic Accuracy Model proposed by David C. Funder, now at UC Riverside (1995, 2012).

Funder first considers three different ways of measuring the accuracy of person perception:

Self-Other Agreement: how well a judge's rating agrees with a target's assessment of him- or herself.
Other-Other Agreement: how well two (or more) judges' ratings of a target agree with each other.
Behavioral Prediction: how well a judgment of personality correlates with the target's actual behavior. Funder argues that this is the "gold standard" for accuracy, but because it is difficult to conduct, most researchers settle for Self-Other agreement.

Because targets may not know themselves particularly well -- or, more likely, may describe themselves in a self-enhancing manner -- Funder generally discounts Self-Other Agreement as a criterion of accuracy, and favors either Other-Other Agreement (also known as inter-judge accuracy) or Behavioral Prediction. Of these latter two, Behavioral Prediction is the ultimate test. If traits exist, and dispose people to behave in particular ways, and people can form valid impressions of personality, then these impressions ought to predict what targets actually do.

But there is a problem here, which is that a person's behavior in any particular situation is going to be influenced by the details of the situation itself -- by which I mean, of course (because this is a course on social cognition!), the details of the person's mental representation of the situation. Accordingly, "prediction" takes on a particular meaning, which is that "prediction" holds over the long run, across situations and through time. An extraverted person may not prefer to be in the company of other people all the time, in every situation, but he will want to be in the company of other people most times, and in most opportunities, or, at least, more often than not.

Funder's model is not a model of personality judgment in general, but rather a model of accurate personality judgment. That is, he is concerned with understanding the "moderating" conditions under which personality judgments are accurate, as defined above -- that is, the conditions that must be met in order for a personality judgment to be accurate. RAM specifies four elements in accurate personality judgment, and Funder's research program has been devoted to understanding the various factors that affect each of them.

Relevance: This is a property of the target stimulus. In order for the perceiver to render an accurate judgment, the target must display relevant behavior. So, for example, a judgment of neuroticism can only be accurate if, in fact, the target engages in "neurotic" behaviors -- or fails to do so in circumstances that would ordinarily elicit such behavior.
Availability: This is a property of the relationship between the perceiver and the target. In order for the perceiver to render an accurate judgment, the target must display relevant behaviors in a context that is available to the judge. If the target displays neurotic behavior at home, but the perceiver only encounters the target at school, then the perceiver will not have access to relevant information.
Detection: This is a property of the perceiver. If the perceiver does not pay attention to the target, then he will be unable to pick up on relevant behaviors displayed by the target.
Utilization: Relevant information, available to and picked up by the perceiver, has to be interpreted properly. If the target displays neurotic behavior, but the perceiver does not categorize this behavior as neurotic (perhaps because he, himself, is neurotic!), then the perceiver will not be an accurate judge of the target's personality.

All of which seems fairly straightforward.

Again, Funder's own research has focused on the factors -- the moderating variables -- that affect accuracy in person perception -- specifically, what makes a "good" target, trait, information, or judge. For example:

"Good targets" tend to be fairly consistent in their behavior across situations, which makes valid observations more available to perceivers.
"Good traits" tend to be more "visible", in a psychological sense. The Big Five are candidates for good traits.
"Good information" is, first and foremost, high in quantity: more is better. In terms of quality, behavior actually displayed in unstructured settings seems to be more informative than behavior displayed over a telephone in a highly constrained setting.
"Good judges" tend to be stable (non-neurotic) and agreeable; they also tend to be female.

Brunswik's Lens Model

Note. Much of the following discussion is heavily influenced by a paper by Reid Hastie and Kenneth A. Rasinski, "The Concept of Accuracy in Social Judgment", which appeared in the Social Psychology of Knowledge, edited by D. Bar-Tal and A. Kruglanski (1988). See also the very useful discussion by Hastie and Robyn Dawes in Rational Choice in an Uncertain World (2001; 2nd ed., 2010).

The first problem for social perception is the same as in the nonsocial domain: what information (if you will, the proximal stimulus) is displayed by the target (the distal stimulus), and how does stimulus information combine with the perceiver's pre-existing fund of knowledge and schemata to yield a perception of the person's internal mental state -- regardless of whether that perception is accurate? This is the problem addressed by Brunswik's lens model of perception.

The classic approach to accuracy in perception comes to us from Egon Brunswik (1947), a Hungarian psychologist who established the first psychological laboratory in Turkey (at the University of Ankara), but who taught for most of his career at UC Berkeley and was a close associate of E.C. Tolman. Brunswik's theoretical point of view, known as probabilistic functionalism, is best expressed in his monograph on Perception and the Representative Design of Psychological Experiments (1947; 2nd edition, 1956; see also Hammond, 1966, 1998).

Brunswik based his analysis of perception on his lens model, which argues that the individual perceives the world through a "lens" of imperfect cues (the diagram representing the model also looks a little like a lens). Recall that the goal of perception is to form an internal mental representation (what Brunswik called the achievement) of the distal stimulus. In this sense, "accuracy" may be defined in terms of the match between the features present in the stimulus and those cues present in the percept. Brunswik's model was quickly generalized to the realm of judgment and decision-making, which is where the lens model has been most frequently applied. For purposes of the present discussion, we can consider the perception of a person, or our impression of a person, as tantamount to a judgment concerning his internal mental states (beliefs, feelings, desires) and personality -- whether he's happy or angry, neurotic or extraverted.

Here is the basic vocabulary of the lens model, as applied to judgment and decision-making.

The distal stimulus, or the target's actual internal states and traits, is called the distal stimulus, inferred state, or -- simply -- criterion.

In nonsocial perception, these states are physical features such as whether the object is near or distant, stable or in motion, rigid or flexible.
In social perception, these states are psychological features such as whether the person is extraverted or introverted, happy or sad, gay or straight, lying or truthful.

The percept or mental representation of the target's states and traits is called the response, subject's inference, or -- simply -- judgment.
The relationship between the judgment and criterion is called the achievement, correspondence relationship, or inference.

Accuracy, then, reflects the degree of correspondence between the criterion and the judgment.

The distal stimulus presents proximal cues which form the basis for the judgment.
These cues vary in terms of their ecological validity, or the strength of correlation between the distal stimulus and the proximal cues available to the perceiver.

The cues themselves have objective values.
And because the cues are correlated with each other, there are objective inter-correlations among them.

The cues also vary in their cue utilization, or the extent to which they play a role in the subject's judgment.

On the subject's side, the utilized cues have a subjective value in terms of the inference he is trying to make.

These subjective values may be invalid.

And, in addition, we have to consider the subject's beliefs the subjective inter-correlations among the cues.

It goes without saying that these beliefs may also be invalid.

A great deal of research on social perception is concerned with the problem of cue utilization -- that is, what information do people use when forming impressions of another person. So, for example:

People base judgments of masculinity and femininity on certain facial features, such as length and width, sharp vs. rounded corners, shape of the eyes, eyebrow thickness, eyelash length, sharpness of cheekbones, and the like.
And they judge sexual orientation by whether a face is sex-atypical -- i.e., a masculine face on a woman or a feminine face on a man.

We can determine cue utilization by regressing the perceiver's judgment on these same cues -- a procedure known as policy capturing. Basically, we look at the correlation between each proximal cue and the subject's judgment about the target. A high correlation means that the cue is weighted heavily when making the judgment; a low correlation means that the cue is not weighted as heavily. Because there are (usually) many such cues available to the subject, multiple regression takes into account the correlations among the cues, and yields a value (often known as a beta weight).

Obviously, though, this says nothing about the accuracy of these judgments, or the validity of the cues they're based on. And, as you might expect, there are many slips between the cup and the lip.

The distal stimulus may not make ecologically valid cues available for perception. If the cues are ecologically invalid.

A good example of this in the nonsocial case is the Ames Room, which is specifically constructed to provide incorrect cues to the observer.
In the social case, recall that about 2/3 of Ekman's targets did not "leak" valid cues to deception.

Cue utilization validity may be discrepant from ecological validity.

The observer may not utilize ecologically valid cues available in the stimulus environment.

Even with ecologically valid cues, the subjective values may differ from the objective values.
And their subjective inter-correlations may differ from their objective intercorrelations.

Obviously, if any of these factors are present, and to the extent that these factors are present, perception will be inaccurate and judgment invalid. And to the extent that perception is inaccurate or judgment invalid, the resulting behavior will be inappropriate or maladaptive. But of course, in order to determine accuracy or validity, we must have independent knowledge of the criterion -- whether the target is happy or sad, introverted or extraverted, lying or telling the truth, gay or straight.

So, given this general framework, how do we measure the accuracy of social perception (and judgment)? Hastie and Rasinski (1988) outline four general strategies:

Direct comparison of the subject's judgment with some independently assessed criterion. This works best with an objective criterion, such as gender or sexual orientation, but it's possible to do this with less well-defined criteria.
Comparing one subject with another, to assess their level of agreement. If the two judges disagree, one of them must be wrong.
Examining cue utilization: If a judgment is correlated with a cue, but the cue is not a ecologically valid, then the judgment will be wrong.
And similarly: If a judgment is not correlated with an ecologically valid cue, then that judgment will also be wrong.

That all seems simple enough, but again, the devil is in the details, and Hastie and Rasinski also note several shortcomings in research on the accuracy of social perception and judgment.

When the criterion is complex or unfamiliar, the experimenter and the subject may not share the same understanding of the criterion.
It can happen that the experimenter chooses a degenerate task, where there are no ecologically valid cues available to the subject.
The subject's judgment may not be easy to summarize on the experimenter's rating scale.
The target property.may not "exist", in some objective sense, but are "unicorns" that exist only in the mind of the experimenter.
The criterion may exist, but reliable and valid measurement may be practically impossible.
The criterion may have been established by a normative model (e.g., normative rationality) that is not appropriate to the judgment task.
The judgment task may incorrectly assume that one criterion necessarily excludes another, in which case the experimenter (who renders one judgment) and the subject (who renders the other one) may both be correct!

The Lens Model and Lie-Detection

Hartwig and Bond (2011)applied Brunswik's lens model to the problem of lie detection. Based largely on the review by DePaulo et al. (2003), they identified 158 potential cues to deception in "ordinary lies" -- that is, lies that people tell in the ordinary course of everyday living, without benefit of special training in deception, and without cues to deception filtered out (or, as in the case of the Ekman studies described earlier, filtered in). Most of the studies reviewed had been concerned with the correlations between these cues and perceptions (i.e., subjective judgments) of lying. However, correlations with actual (i.e., objective) lying were available for about 1/3 of these cues.

There were several interesting results. First, there were some ecologically valid cues to deception -- that is, cues which were significantly correlated with actual lying on the part of the targets. These correlations were relatively weak, however. More of these cues were correlated with subjective judgments of lying, and the correlations were generally stronger.

This figure plots the relationship between ecological validity (i.e., the correlation between a cue and actual lying) and cue utilization for the cues employed in this study. The correlation is strongly positive, about r=.59, but it is clear that subjects do not use some cues that are ecologically valid, and use other cues that have little or no ecological validity. Even when subjects utilize an ecologically valid cue, they tend to give the cue more weight than is warranted by its ecological validity. There was relatively little overlap between the ecologically valid cues and the cues that were utilized by subjects in making their judgments.

An analysis like this shows the power of Brunswik's lens model as a framework for analyzing accuracy and error in social perception. Earlier studies were consistent with the conclusion that lie-detection is so poor because people are such good liars. But now we know that's only part of the problem, because there are ecologically valid cues to lying. They're relatively weak cues, to be sure, but they're valid cues nonetheless. At least equally at fault is the fact that people are poor lie-detectors. We pay attention to ecologically invalid cues, and weight even valid cues more strongly than they deserve. Put another way, there is a discrepancy between the ecologically cues available in the stimulus and the pattern of cue utilization on the part of the perceiver. The result is that accuracy -- or, in Brunswik's terms, achievement or functional validity is relatively low.

The Lens Model and Gaydar

A similar analysis could be offered concerning gaydar. If you consider baserates (applying Bayes' Theorem), people are not all that great at judging who is straight and who is gay. From the few studies that have been done on this topic, the suggestion has been made that perceivers' judgments of sexual orientation are based on such features as feminine faces on men and masculine faces on women (after all, the subjects in these experiments were shown only headshots, and were given no other information). From the perspective of Brunswik's lens model, then, we can offer a number of possibilities for understanding both accuracy and error in gaydar.

There are no ecologically valid cues to sexual orientation.
There are such cues, but they're not available in headshots. Maybe they're more visible in bodily posture, or behavior, or something else besides the face.
There are such cues, even in headshots, but people utilize these cues way out of proportion to their ecological validity -- perhaps by virtue of the power of social stereotypes concerning feminine gays and masculine lesbians.
There are lots of other possibilities.

Testing these possibilities, is for the future (hint, hint); but the Hartwig and Bond (2011) application of the lens model to lie-detection shows what might be done.

"Erosion of Meaning" in Brunswik's "Revolutionary Concepts"?

Brunswik's notion of ecological validity was imported into social psychology by Martin Orne (1962; see also his critique of the Milgram experiments, discussed in the lectures on The Cognitive Perspective on Social Interaction), who used it to refer to the degree to which findings from an experimental situation could be generalized to the real world outside the laboratory. Others went further than Orne, to assert that laboratory research generally lacked ecological validity, and that researchers should focus their studies on the real world instead of the laboratory. For an example of this argument, for example, see the debate between Neisser (1976) and Banaji and Crowder (1989) over "ecological" studies of memory (see also Kihlstrom 1996).

Some Brunswikian scholars, however, thought that, in the process, Brunswik's ideas had been distorted. Chief among these was Kenneth R. Hammond, especially in his essay "Ecological Validity: Then and Now" (1998). Here, for the record, is a summary of Hammond's restatement of three of Brunswik's "revolutionary concepts".

Representative Design. Just as the subjects in a study should be representative of the population at large, so the conditions of an experiment must be representative of the world outside the laboratory to which the lab results are to be generalized. For the record, Orne hewed precisely to Brunswik's idea -- arguing, for example, that the conditions of the Milgram experiment, such as the episodic nature of the experiment, the implicit contract between subject and experimenter, and the demand characteristics, were not representative of obedience situations as they occur in the real world.

Ecological Validity. As noted earlier, Brunswik's concept of ecological validity refers solely to the correlation between a proximal cue and the distal stimulus. Hammond has a point, that Orne (1970, p. 259) erred when he referred to "Brunswik's concept of the ecological validity of research" (emphasis added) -- because Brunswik's concept concerned only the ecological validity of cues. But by now, usage has evolved to the point that "ecological validity" has two distinct meanings: (1) in the context of perception research, the correlation between cues and distal objects; (2) in the context of research methodology, the degree to which an experimental situation is representative of the real-world situation it is intended to model. For Orne, this was an empirical question -- and because he himself did laboratory research, and developed techniques like the real-simulator design for evaluating the ecological validity (in his sense) of the experimental situation, Orne himself believed that laboratory research could be ecologically valid. Those who cast doubt on the ecological validity of psychological research, simply because it takes place in the relatively sterile confines of the psychological laboratory, miss both Brunswick's and Orne's points.

There is, however, a way to reconcile Brunswik's and Orne's construals of "ecological validity". Orne believed that ecological validity was threatened when the cues -- following Lewin, he called them demand characteristics, but that's another story -- communicated to subjects that the experimental situation differed from what was represented to them by the experimenter -- as when Milgram mis-represented his experiment as a study of punishment and learning, but allowed his experiment to contain cues that clearly suggested that something else was going on. In such an instance, the cues in the experimental situation have no no ecological validity with respect to the misrepresentation, but instead have considerable ecological validity with respect to the real experiment. When the cues in the experimental situation are perceived one way by the experimenter, and another way by the subject, the experimental situation is ecologically invalid with respect to the real-world situation that the experimenter wants to study.

For more about Orne, see my essay, "Demand Characteristics in the Laboratory and the Clinic: Conversations and Collaborations with Subjects and Patients" (2002).

Intra-Ecological Correlation. Hammond also cites this as one of Brunswik's "revolutionary concepts", but he doesn't discuss it further in the 1998 essay cited. What Brunswik is referring to is what I (following Hammond) called the "objective intercorrelations" among proximal cues. Highly intercorrelated cues are redundant, and if the correlation is high enough knowledge of the value of one can be substituted for a missing value of the other.

The Information for Perception

Stimuli provide information about themselves, which is available to the perceiver through his or her sensory surfaces. This information can take the form of physical features (alone and in configuration with other features), as well as linguistic descriptions. Additional information is provided by the context, the background, which in the social case is probably a lot broader than the kinds of contextual factors considered by Gibson. And finally, information for perception is provided by the perceiver's own beliefs and expectations, by which the perceiver fills in the gaps in the information provided by the stimulus and its context. This knowledge is stored in, and retrieved from memory -- the topic to which we turn next.

This page last modified 11/17/2023.