Representation

Here's the question: What do memories look like? We're talking about secondary, or "long-term" memory here, but still the answer turns out to depend on what kind of knowledge we're talking about. There have been quite different proposals depending on whether we're talking about declarative or procedural knowledge , or episodic or semantic memory. In addition, there are different proposals about how conceptual knowledge -- an aspect of semantic memory, to be sure -- is represented in the mind. In this supplement, we'll focus on episodic memory, with some side glances at semantic memory, and then turn to conceptual representations as a special case of semantic memory. But, in the end, there's just one memory, and a task left largely undone is to figure out how to represent conceptual knowledge in the same cognitive architecture as episodic and semantic knowledge.

What's a Mental Representation?

A representation is just that: it's something that represents, or stands for, or models, something else. An event can be represented as a list of features, or as a sentence, or as a picture, or a string of digits, or a bunch of beer cans connected with string.

Anything can represent something else, so long as the representational system satisfied certain requirements outlined by UCB's Steven Palmer (1978):

There is a target domain.
There is a modeling domain.
Some feature(s) of the structure of the target domain is (are) relevant.
Some feature(s) of the structure of the modeling domain is (are) relevant.
There is a systematic correspondence between the relevant structure in the modeling domain and the relevant structure in the target domain.

So, to continue Palmer's example:

A target domain might contain three rectangles of varying sizes.
A modeling domain might contain three sets of lines.
The relevant structural feature of the target domain is the height of the rectangle, its width being irrelevant.
The relevant structural feature of the modeling domain is the number of lines, their length being irrelevant.
There is a systematic correspondence between the height of the rectangles in the target domain and the number of lines in the modeling domain.

Let's now see how this idea of representation works out in the psychology of learning and memory.

The View from Associationism

Behaviorists like Watson had a simple answer to the question: memories look like associations between stimuli and responses -- because that's what everything is. This emphasis on associations as the basic structure of memory has proved remarkably durable -- though, as we will see, not the way the S-R theorists framed them.

But first a little history, mostly taken from Anderson & Bower's Human Associative Memory (1973).

Aristotle's Associationism. The idea that associations are central to memory has its origins in Aristotle's treatise De Memoria et Reminiscentia. Beginning with the proposition that ideas are derived from sensory experience (instead of being innate, as Plato had asserted), he further argued that ideas became associated with each other by virtue of a small number of principles such as similarity (and contrast), and especially contiguity. (Aristotle also offered subsidiary principles of association such as frequency, intensity, and good order). Memories were retrieved (Aristotle didn't use precisely this term) by virtue of the association of ideas, where one idea served as a probe to elicit an associated idea as a memory.

Aristotle further distinguished between two forms of memory:

Remembering, in which the probe occurs involuntarily, is available to animals as well as to Man.
Recollection, in which the probe is voluntarily selected, is available only to Man -- because only humans have free will, and the capacity to engage in voluntary behavior in the first place.

British Associationism. In the 18th century, David Hartley and other philosophers (such as Hobbes, Locke, Berkeley, Hume, and both John Stewart and James Mill) construed ideas (representing sensations and reflections on sensation) as the building-blocks of the mind, and associations as the "mind's glue". For the British associationists, contiguity was virtually the sole basis for association:

Contiguity in time, or successive association;
Contiguity in space, or synchronous association.

"Virtually", because they accepted similarity as a principle of association as well -- though they really emphasized contiguity.

For the British associationists, associations had only one property: strength, or the likelihood that one idea would elicit another.

British associationism was extremely influential on the early verbal-learning tradition. For example, Ebbinghaus (1885) employed the serial learning of nonsense syllables to study how associations were formed during the learning process, what kinds of associative links were stored in memory, and how associations led from one memory to another. Similarly, Mary Whiton Calkins (1898), working in William James' laboratory at Harvard, invented the paired-associate learning paradigm expressly to study the formation of associations. (Calkins completed a doctoral dissertation, but Harvard refused her a degree, and she in turn refused its offer of a doctoral degree from Radcliffe College. Nevertheless, she founded the psychological laboratory at Wellesley College and later became the first female president of the American Psychological Association.)

American Associationism. Following the lead of the British associationists, there arose an American tradition of associationism at the hands of J.B. Watson, E.L. Thorndike, and later behaviorists such as E.B. Guthrie, C.L. Hull, and especially B.F. Skinner. These were all learning theorists, and they considered the association to be a primitive concept for learning theory. The difference between American and British associationism, of course, was that the British were interested in the association of ideas, while the Americans, being behaviorists, abandoned ideas as mentalistic, in favor of observable stimuli and responses. Thus, for Watson and the others, the conditioned response was the basic unit of behavior, and complex behaviors were built from elementary conditioned responses -- sometimes linked by implicit mediating responses, implicit stimuli, and response-produced stimuli. Ebbinghaus' and Calkins' work fit fairly comfortably into this framework, leading to the S-R reinterpretation of verbal learning.

There were, of course, dissenters among the neo-behaviorists, particularly E.C. Tolman, who argued that stimulus-response associations were not sufficient to explain learning.

In the first place, even laboratory animals could associate multiple stimulus inputs to multiple response outputs.
But more important, learning appeared to be regulated by internal states, such as expectations, emotions, and motives that were neither stimuli nor responses -- at least, not as the radical behaviorists construed these terms.

The View from Cognitive Psychology

With the cognitive revolution in psychology came a return to mentalism, and revived interest in the association of ideas.

In fact, even before the cognitive revolution, a number of researchers in the verbal-learning tradition collected data on pre-existing patterns of word association (actually, this line of research was initiated by C.G. Jung, who in turn was influenced by Freud; but Jung's work -- let along Freud's -- had no direct influence on the verbal-learning tradition). Here, for example, is a fragment of an associative network centered on the word lion. Thus, if you ask subjects to respond with the first word that comes to mind after hearing some other word, the stimulus lion often leads to the responses of tiger, Africa, and den; den leads to the response lair.

But it soon became clear that verbal associations had some funny properties that had not been anticipated by the British and American associationists.

First, it turned out that associations are not necessarily symmetrical. For example, the stimulus tiger may strongly elicit the response tail, but the stimulus tail does not tend to elicit tiger as a response; a much stronger response is end. If you're a British or American associationist, that should strike you as strange. If tiger is associated with tail by virtue of contiguity (or similarity, or whatever), then why isn't tail associated with tiger?

Even earlier, Thorndike (1931) had uncovered the phenomenon of belongingness.. In one of his experiments, he had subjects learn a list of names, in which some names were repeated, such as

Mary Jones Bill Smith Sam Peck Richard Jones Bill Smith.

When subjects were tested with the stimulus Bill-_____, the likelihood of the correct response Smith increased with repetition, as predicted. But when tested with the stimulus Jones-_____, there was no effect of repetition on the correct response Bill. It seemed that, despite being equally contiguous, and equally repeated, Bill and Smith belonged together in a way that , Jones and Bill did not. Thorndike had no way to account for this, but it did suggest that something was wrong with the general principle that associations were formed by virtue of contiguity, and strengthened by means of repetition.

For the British and American associationists, all associations were created equal -- all qualitatively the same, if quantitatively differing in strength. But in 1979, the Mandlers -- George and Jean, one of cognitive psychology's first husband-and-wife teams, working at UCSD, distinguished among different qualitative types of associative structures in memory.

Jean Mandler (1979) distinguished between two types of associations:

Schematic, or temporal, as in Ebbinghaus' work on serial learning of lists of nonsense syllables.
Taxonomic, or linguistic, as in the associations between categories and instances, supersets and subsets.

George Mandler (1979), for his part, offered a tripartite distinction:

Pro-ordinate associations, as in serial learning.
Co-ordinate associations, as in familiar word associations such as black-white. You can think of co-ordinate associations as horizontal, in that both terms lie at the same level of categorization.
Subordinate associations, as in category relationships such as fruit-apple. Subordinate associations are vertical, in that one term lies at a higher level of categorization than the other.

I'll just cite two pieces of evidence, both from my own laboratory, that suggests that these differences are real.

In one line of research, we looked at the organization of recall during partial posthypnotic amnesia. We asked subjects, while they were hypnotized, to memorize a list of words, following standard verbal-learning procedures. In one experiment, we used a serial learning paradigm that encouraged pro-ordinate, serial associations. In another experiment, we used a free-recall paradigm, with a categorized list, that encouraged subordinate, vertical associations. A third experiment encouraged subjective organization. Then they received a suggestion to forget the words. The most highly hypnotizable subjects showed a dense amnesia, temporarily forgetting most or all of the words, while the insusceptible subjects showed no amnesia at all. But some subjects, who are relatively highly hypnotizable, showed a partial response to the amnesia suggestion. These subjects recalled some words, but tended to do so in a disorganized fashion -- but the disorganization only appeared in the serial-learning condition. Posthypnotic amnesia disrupted pro-ordinate, serial, organization, but spared organization based on semantic relationships.

Another line of research made use of the associative memory illusion (sometimes known as the Deese-Roediger-McDermott or DRM effect), in which studying a list of associates to a stimulus word (such as sharp, prick, and haystack, which are all close associates of needle), led subjects to falsely recognize the critical lure (in this case, needle) as having been in the list, when in fact it was not. It turns out that the AMI occurs when the study list consists of co-ordinate associates, such as needle-haystack, but not when it consists of subordinate associates, such as animal-tiger.

The fact that posthypnotic amnesia dissociates serial associations from horizontal associations, and the AMI dissociates horizontal associations from vertical associations, suggests that these kinds of associations really are qualitatively different.

It also turns out that associations are labeled in terms of the semantic roles of cue and response. Thus, eating is related to glutton as act to actor, while eating is related to steak as act to object. A theory of association has to deal with the fact that associations do not differ only quantitatively, simply in terms of strength, but also differ qualitatively with respect to the type of association that has been created between one idea and another.

Neo-Associationistic Theories of Memory Structure

Despite these problems, the basic idea of association has been critical to cognitive theories of memory. These theories generally construe memory as a sort of mental dictionary in which words stand for concepts, and associations represent the relations between them. In a generic network model of memory:

Concepts, labeled by words, are represented as nodes in a semantic network.
Features associated with concepts are also represented as nodes.
Nodes are connected by associative links.
Perception activates nodes corresponding to the stimulus or its features.
Activation of one node spreads to other, associatively related nodes.
New concepts are formed by creating new nodes that are linked to existing nodes representing the features of the new object being represented.

Of course, there are lots of different ways to implement these general ideas.

An important early model proposed by Collins and Quillian (1969) assumes that concepts are stored in a hierarchical structure, with associated features stored according to a principle of cognitive economy -- meaning that each feature gets stored only once, at the particular level of the hierarchy to which it is relevant. Thus:

All animals have skin and can move around, so these features are stored at a superordinate level of the animal hierarchy.
Some animals have wings while other animals have fins, and these features are stored at a middle level, linked to concepts of bird and fish, respectively.
Some birds can sing, while others can't fly, and these features are stored at a subordinate level, linked to concepts of canary and ostrich, respectively.

The model correctly predicts performance in a sentence-verification task, in which subjects are asked to say whether some statements are true or false. Although subjects rarely make a mistake in this kind of task, their reaction times vary, depending on the distance between the concept and the feature..

It takes longer to verify that A canary can fly than that A canary can sing.
It takes longer to verify that A canary is an animal than that A canary is a bird.

An alternative model, proposed by Smith, Shoben, and Rips (1974) abandoned the hierarchy and linked concepts together based simply on degree of similarity in features (as indicated, for example, by multidimensional scaling techniques). In this model, the associative "distance" between concepts is a function of the number of overlapping features. The model correctly predicts an inverted-U-shaped relationship between similarity and response latency, such that reaction times are faster when two nodes are either very close together or very far apart, compared to when two nodes are at an intermediate distance from each other in multidimensional space.

Yet a third model, proposed by Collins and Loftus (1975) -- this is the same Collins as in the Collins & Quillian model -- also employs distance to represent similarity. The model correctly predicts priming effects in a lexical decision task, such that reading the word street (which is a word) makes it easier to judge that car is also a word (which it is), compared to apples (which also is a word). Similarly, red primes apples and fire engine, but not street or sunrises.

Each of these models has problems, but their success in predicting even subtle aspects of human performance suggests that they are pretty good first approximations of how the mental dictionary is arranged -- that is, how semantic knowledge is represented in memory.

And that's all well and good, except we're not so much interested in the mental dictionary. We're working in the verbal-learning tradition at this point, and what we're really interested in is how people represent lists of words that they've been asked to memorize.

Estes (1976) offered several simple associative models of memory, attempting to capture some aspect of verbal learning.

In chain association, memory represents the list items in the exact order in which they were presented in the study list. This kind of associative structure can account for experiments such as those of Ebbinghaus (1885).
In multiple association, each item is linked to a node representing the list itself. This kind of associative structure can account for performance in free recall situations, where subjects are not constrained in terms of how they are to reproduce list items.
In hierarchical association, each item is linked to a node representing its category membership, and these category labels are then linked to a node representing the list as a whole. This kind of associative structure can account for category clustering in free recall.

You get the idea.

This general idea has been implemented in a computer model of memory known as SAM (for Search of Associative Memory), proposed by Shiffrin and Raaijmakers (1992). A similar model, called REM (for Retrieving Effectively from Memory), has been proposed by Shiffrin & Steyvers (1997). In SAM:

Short-term memory is a temporary buffer that associated list items with each other and with a list marker.
Long-term memory is a permanent store consisting of concepts linked by many different types of associative links.
The longer an item is in STM, the more likely it is to be stored in LTM.
Memory retrieval is cue-dependent, in that retrieval traces links from the representation of the retrieval cue to associatively linked items. Context (or list) cues are particularly critical in this regard.

Thus, during learning subjects link nodes representing list items to a node representing the list. When asked to recall, they activate the list node, and follow associative pathways to list items.

The Dual-Code Theory of Memory

All these models view memory as a mental dictionary, nodes representing words linked to each other, and to nodes representing list membership. But it turns out that memory consists of more than words.

In particular, Paivio (1971, 1986) proposed that concrete objects, like fish and canaries, can be represented as images as well as words. He cited lots of different pieces of evidence in support of this proposition.

The evident existence of cognitive maps, such as those studied by Tolman (in rats, yet!).
The fact that we (well, most of us, anyway) have mental images, especially visual images. One way to think about mental images is that they are percepts formed in the absence of stimuli (and you can think of a hallucination as a mental image that's gone out of control). If images aren't elicited by stimuli, they have to come from somewhere, and that somewhere is -- memory. Therefore, it seems reasonable to conclude that we have imagistic representation stored in memory, as well as verbal representations.
There is also the picture superiority effect, which means that pictures of objects are generally remembered better (more accurately, more easily) than words representing those same objects.
Experiments on mental rotation, such as those performed by Roger Shepard and his associates.
Experiments on mental scanning, such as those performed by Steven Kosslyn and his associates (including a particularly beautiful experiment on scanning 3-dimensional mental images by Pinker and Finke, both of whom worked in Kosslyn's laboratory).

Paivio's arguments were emphatically rejected by Pylyshyn (1973), sparking "The Great Mental Imagery Debate". Pylyshyn argued, on conceptual grounds, that there was only one representational format, which was conceptual and word-like. He argued that evidence favoring imagistic representations was contaminated by tacit knowledge, experimenter bias, and demand characteristics.

J.R. Anderson (1978, 1979), argued that the issue was ultimately undecidable because, for every dual-code model that could be proposed, one could generate a single-code model that would produce the same effects. Here's where it has to be said that parsimony cuts both ways. In some sense it is more parsimonious to have one code than two. But in another sense it is more parsimonious to have two codes than one, if the single-code model has to go through all sorts of contortions to match the dual-code model.

In the next salvo of the debate, Finke (1980, 1985) identified a number of functional equivalences between imagery and perception. He relied on comparisons between recalling, imaging, and perceiving objects and their properties, and found a surprising number of instances where the effects of imagining were identical or similar to those of perceiving, and different from simply recalling. He concluded that "[visual] imagery involves the activation of many of the same information-processing mechanisms that are activated during visual perception" (1980, p. 130).

For some people, neuropsychological evidence clinched the case for the equivalence of imagery and perception. Farah (1988), investigated cases of visual agnosia, in which brain-injured patients are no longer able to identify familiar objects (prosopagnosia is a special form of visual agnosia). The syndrome is famously the subject of a case study by Oliver Sacks, The Man Who Mistook His Wife for His Hat (which was subsequently rendered into an opera, no less). Farah found that visual agnosics also lack a capacity for mental imagery, supporting the idea that mental images rely on the same mechanisms as actual perception.

Incidentally, Farah's arguments are often cited as an example where neuroscientific evidence constrains psychological theory, by offering decisive evidence for one theory (the dual-code theory) and against another (the single-code theory). But (with all due respect to Farah, who is a brilliant cognitive neuroscientist) this isn't exactly true.

Most of Farah's neuropsychological evidence wasn't exactly neuroscientific in nature. That is, it had nothing to do with data pertaining to brain structure or function at the neural level of analysis. Sure, her patients were brain-damaged, but its was behavioral evidence, not neuroscientific evidence, that she amassed in support of the equivalence of imagery and perception.
Actually, though, Farah did cite some truly neuroscientific evidence, showing that the same brain structures were activated in imagery as in perception.
So too did Kosslyn (1994), in the aptly titled monograph, Image and Brain: The Resolution of the Imagery Debate (Kosslyn was Farah's advisor in graduate school).
But even so, Pylyshyn himself wasn't convinced. And everyone else was already convinced by arguments such as Finke's. So, neuroscientific evidence was neither necessary (since most people already believed in dual codes) nor sufficient (since Pylyshyn wasn't persuaded) to make the case.

In any event, and despite his declaration of undecidability, Anderson himself opted for the second type of parsimony described above, and proposed a distinction between two types of mental representation:

Perception-based representations (analog or imagistic representations), which in turn come in two forms:

Spatial images, which preserve the detailed physical appearance of objects, and the spatial configuration of objects and their features.
Temporal strings, which preserve details of the temporal relations among events.

Meaning-based representations (semantic or verbal or propositional representations), which preserve information about the semantic properties of objects and events, including the semantic relations between them.

And that's pretty much where things stand in cognitive psychology today. With very few exceptions (really, only one exception), theorists accept the proposition that we have both words and pictures in the head.

What do perception-based knowledge representations look like? They look like mental images.
What do mental images look like? They look like percepts.

HAM: Knowledge as Sentences

Still, by far, most work on mental representation has focused on the verbal side.

Tulving and Bower (1974) summarized the view in the early 1970s as follows: "A rather general and atheoretical conception of the memory trace of an event regards it as a collection of features or a bundle of information" (p. 269). This bundle included a number of different components:

Physical attributes such as:

the sensory modality of the stimulus,
its spatial location,
and its physical appearance.

Linguistic attributes such as:

its phonological description
and semantic role.

Semantic attributes such as:

conceptual meaning,
referential imagery (see: even in 1974 major theorists accepted the dual-code theory!),
emotional connotations,
and "stray associations".

At roughly the same time, Anderson and Bower (1973) introduced a new theory of mental representation in a book describing their research on a computer simulation model of memory known as HAM (for Human Associative Memory):

"[T]he purpose of long-term memory is to record facts about various things, events, and states of the world. We have chosen the subject-predicate construction as the principal structure for recording such facts in HAM" (p. 156).

In other words, events are represented in sentence-like structures. This is quite a different approach from that implied by Tulving and Bower, in which the sentence might just be represented by a cluster of linked nodes. But in a representation like this, you don't really know who did what to whom, where, or when -- much less why. For this purpose, sentence-like structures seem to be better.

In order to illustrate their approach, they focused much of their exposition on variants of a single sentence:

In the park the hippie touched the debutante.

Perhaps Anderson and Bower were inspired by Hair: The American Tribal Love-Rock Musical, which opened in 1967. But they were even more inspired by two developments in linguistics.

First was the work of Noam Chomsky (1957, 1965) on phrase-structure grammar, in which sentences are rewritten as noun phrases and verb phrases, and verb phrases are rewritten as verbs plus noun phrases -- generically, The noun phrase verbed the other noun phrase. Thus, in the sentence the man who hits the ball kisses the girls, The man is the subject noun phrase, and kisses the girls is the verb phrase (which includes an object noun phrase). This phrase-structure representation is the easiest way to represent knowledge in memory.

One problem with Chomsky's system is that that there's more to grammar than syntax (as UCB's George Lakoff would put it, you need generative semantics as well as generative syntax). The UCB linguist Charles Fillmore (1968, 1971) pointed out that nouns, especially, played different semantic roles in sentences -- they weren't just subjects and objects. For example, in the sentence Mary pinched John on the nose, Mary is the agent of the action, John is the experiencer, and nose is the location where she pinched John. Fillmore invented case grammar to represent these semantic roles, and his innovation was picked up by Anderson and Bower.

Accordingly, the HAM representation of an event would look something like this, with a node linking a fact (that a hippie touched a debutante) with the context in which it is true (that the incident happened in a park sometime in the past).

The Declarative-Procedural Distinction

That's a pretty good solution, and HAM does a pretty good job of emulating the actual performance of subjects who are remembering lists of words, or sentences about hippies and debutantes. But it quickly became clear that there is more in memory than sentences. As noted earlier, the knowledge stored in memory comes in two forms:

Declarative knowledge is factual in nature. It has truth value, in that the "facts" contained in it may be true or false. Declarative knowledge can be represented in propositions of the form

The subject verbed the object

-- as in

The hippie touched the debutante.

Procedural knowledge, by contrast, consists of directions for action -- in the form of either overt motor behavior or of covert mental operation. Procedural knowledge can be represented in productions of the form

If goal and condition then action
-- as in

If the goal is to drive a standard shift car and the car is in neutral then shift the car into first gear.

Classical and instrumental conditioning are special cases of procedural knowledge:

If Conditioned Stimulus then Unconditioned Stimulus.

If Conditioned Response in the presence of the Conditioned Stimulus then Conditioned Response.

Individual propositions are, of course, embedded in a vast network of propositional knowledge -- more or less along the lines envisioned by Collins and Loftus (1975).

And individual productions, for their part, are embedded in a vast network of productions known as a production system, in which the output of one production provides input to another. In some sense, the action of one production creates the conditions for execution of the next one in the system .

The procedural-declarative distinction was introduced into artificial intelligence by Terry Winograd (1972, 1975), and imported into psychology by John Anderson (1976). But it also has deeper origins:

Bergson, in Matter and Memory (1911), distinguished between two forms of memory: habit (analogous to procedural knowledge) and recollection (analogous to declarative knowledge).
Ryle, in The Concept of Mind (1949), distinguished knowing how (procedural knowledge) from knowing that (declarative knowledge).

But a serious terminological confusion surrounds the procedural-declarative distinction, because some theorists, following Larry Squire, use the term declarative to refer to conscious recollection -- what Schacter and others call explicit memory (as opposed to implicit or unconscious memory).

Actually, this was not always the case. Originally, Squire and Cohen adopted Winograd's declarative-procedural distinction, based on findings that amnesic patients could not remember word-lists that they had studied (which have a declarative representation), but can acquire new skills, such as a stylus maze or mirror-image reading (which have a procedural representation).
But later, Squire and Knowlton (1995) substituted nondeclarative for procedural. In part, this was because amnesic patients also show priming effects from studied words, which are hard to construe as procedural in nature (though it can be done).
It was at this point that declarative became synonymous with explicit -- referring to memories that can be declared.

The problem is that this confuses the question of representational format -- whether the memory is represented in declarative or procedural format -- with the way that a memory is expressed -- either explicitly, in the form of conscious recollection, or implicitly, in the form of priming or some other unconscious effect.

Procedural knowledge is always implicit, in that we have no direct introspective access to it. Procedures can be known only by inference from performance.
Declarative knowledge is accessible to conscious awareness, at least in principle. But even in the absence of explicit expression, it can be expressed implicitly, or unconsciously.

Squire's work is (justly) so highly regarded that many researchers have adopted his terminology. But it's really not right. The declarative-procedural distinction, having to do with representational format, should be kept separate from the explicit-implicit distinction, having to do with the conscious or unconscious expression of memory.

The Episodic-Semantic Distinction

At roughly the same time, Endel Tulving (1972, 1983) introduced a further distinction between two forms of declarative (meaning factual) knowledge:

Episodic memory is, essentially, autobiographical memory, referring to events that have a unique location in space and time (two events can't occur at the same place and the same time).

As Tulving and Bower (1974) noted, episodic memory is modeled by traditional verbal-learning procedures, in which subjects study a list of familiar words, and then must remember which words were on the list -- that is to say, which words were studied at a particular place and a particular time.

Semantic memory refers to generic knowledge, of the sort that might be found in the "mental dictionaries" modeled by Collins and Quillian (1969), Smith, Shoben, and Rips (1974), and Collins and Loftus (1975).

These models often assume, for convenience, that semantic memory is pre-existing. But, clearly, new semantic memories can be acquired as well, either by adding nodes to the network or by forging novel combinations of nodes.

Knowing that Columbus discovered America in 1492 is a piece of semantic memory. It is true always and everywhere.
Remembering that you learned this fact in third grade is a piece of episodic memory. It is only true with respect to your experience of third grade.
Episodic and semantic memory can be dissociated in the phenomenon of source amnesia, in which people remember factual knowledge acquired through some learning experience, but forget the learning experience itself.

Which brings up the matter of self-reference. Tulving's analysis stresses the importance of spatio-temporal context in episodic memory -- that every event is specified by a unique location in space and time (two events cannot occur at precisely the same time and in precisely the same place). But it's also true that these events are somehow specific to the rememberer as well.

I learned about Columbus in the third grade (I think), but you may have learned about him in the second or fourth grade.
Or, you may have gone to school in Berkeley, where they teach you that Columbus did no such thing.
Or you may be a Native American, in which case you might think that your people discovered him.
Even if you learned about Columbus in the third grade, you learned it in a different third-grade class than I did.
And even if you learned about Columbus at precisely the same time, and in precisely the same classroom as I did, the fact remains that you learned it, just as I did.

Episodic memories are memories of what a specific individual has done, or experienced, at a particular time and in a particular place.

Episodic and semantic memory can be dissociated in the case of source amnesia, but it is evident that both kinds of memories can be stored in the same declarative, propositional, representational format.

First, there must be a fact node representing the fact that The hippie touched the debutante.
This must be connected to a context node representing the fact that this event occurred in the park (and in the past).
And the whole thing must be connected to a self node representing the involvement of the person remembering the event in the event being remembered. Following Fillmore's case grammar (and Roger Brown, who elaborated on Fillmore's distinction between agents and experiencers), there are four general roles that the rememberer can play:

As the agent or patient of some action, as when I remember that I (the agent) gave a present to my wife, or that Lucy gave a present to me (the patient).
As the stimulus or experiencer of some event, as when I remember that I (the stimulus) made my wife happy or that Lucy made me (the experiencer) happy.

Of course, not all self-knowledge is episodic in nature. Some of it is semantic, more or less context-free knowledge about myself having nothing to do with any specific action or experience, such as I am a neurotic extravert or I am of Swedish-Finnish extraction on my father's side.

The self, viewed as a knowledge structure, consists of whatever one knows about oneself, including episodic and semantic self-knowledge.

Can Animals Have Episodic Memory?

Animals can learn, for sure, and so they acquire knowledge stored in memory. But it's not clear that they can acquire episodic memories -- that they can remember particular events that happened to them at a particular time and a particular place. Their memories may be more generic, represented in procedural, or perhaps semantic form, but not necessarily as episodic memories of specific experiences. Although the Darwinian principle of evolutionary continuity should caution us not to make sharp distinctions between human and nonhuman mental capacities, some authorities have suggested that, in the absence of language, permitting self-report, the question of episodic memory in animals is essentially undecidable (e.g., Tulving, 1983).

Still, there experiments that seem to reveal something very much like episodic memory.

Western scrub jays appear to remember where they cached certain kinds of food, and how long it has been since they did so.
Similarly, hummingbirds appear to remember where particular flowers are located, and how long ago they've visited them.
Eichenbaum and Sauvage (2008) gave rats pairs of containers in which a smell (like oregano) was mixed into a particular digging material (like wood chips). The rats learned to dig in previously encountered containers for treats. Eichenbaum and Sauvage argue that this requires a specific memory where and when. The fact that hippocampal lesions abolished this memory strengthens the idea that these rats had something very much like a conscious episodic memory of what happened when, and where.

So, maybe animals do have episodic memory after all, even though they can't share their conscious recollections with us via language.

The ACT Model

Actually, Anderson and Bower were aware of Winograd's work -- they were all together at Stanford after all -- but they were not ready to incorporate the procedural-declarative distinction into their model. That task fell to Anderson, in his ACT (Adaptive Control of Thought) model of cognition, which he introduced in 1976 and has continued to develop over the subsequent 30-plus years. ACT is a complete cognitive theory, written in the form of a computer simulation, that includes learning and memory, but also includes language, reasoning, and problem-solving (Anderson is especially interested in simulating students' learning and use of algebra, which he has called "the Drosophila of cognitive theory" [2007}).

ACT is rather complex, and its complexities need not detain us here. There have also been a number of versions of ACT developed over the years by Anderson and his colleagues, and these evolutionary steps need not detain us either. The following is adapted from the succinct description of the generic ACT model by Medin, Ross, and Markman (2001).

Declarative knowledge is represented in memory by conceptual nodes linked in a network to form propositions like The flower is pretty and Bill thought that the flower was pretty. Like HAM, ACT recognizes a number of semantic roles, but for purposes of simplicity we will only consider three: Agents, Objects, and the Relations between them.

The links between nodes differ in strength.

ACT also recognizes the type-token distinction first proposed by Simon and Feigenbaum (1964), which is a distinction between a general concept and a specific instance of it. For example, a particular chair may be blue, but it is not true that all chairs are blue; blue is the color of only a particular chair. ACT handles this by linking the marker X, which represents a particular chair, to a node representing chairs in general. Thus, Some particular chair is blue, or Some particular small chair is blue. This permits ACT to represent facts about other chairs, which may be large or beige or whatever.

ACT also includes a working memory, which should not be confused with the working memory of Baddeley and Hitch (1974). By working memory Anderson only means that subset of nodes that are activated at any given time. Activation makes a node accessible in memory, but the total amount of activation in a network is limited -- which, effectively limits the number of nodes that can be in working memory at any particular time (think of Miller's "magical number seven, plus or minus two").

Processing a sentence (which is Anderson's proxy for perception) activates nodes corresponding to the elements of the sentence. This activation spreads along links to associated nodes. But the total activation accruing to a conceptual node is divided among the links emanating from that node, such that the strongest links receive the most activation.

While this discussion focuses on the declarative side of ACT, there is also a procedural side, and these are related:

The goals and conditions of a production are represented as concepts in declarative memory (when activated, these also become part of working memory).
A common action of a production is to create or activate a new node in declarative memory.

ACT, especially in its current incarnation, is an extremely powerful model of memory. For example, it predicts the fan effect -- the more you know about a particular concept, the longer it takes to retrieve any particular piece of knowledge about it. We'll discuss the fan effect later.

A Connectionist Alternative

ACT is generally considered a symbolic or localist model of cognition, in which concepts are represented as symbols that stand for some piece of knowledge, and these symbols are localized at discrete nodes in the associative network (Anderson himself disagrees with this characterization, but we're not going to let this fact get in the way of our exposition, are we?). When a person acquires a new piece of knowledge, a new node is added to the network (as well as new links from that node to other, pre-existing nodes).

An alternative model is a connectionist or parallel distributed processing (PDP) model, in which the same set of nodes represents each piece of knowledge -- because the knowledge is not represented by the nodes at all, but rather by the connections between them (hence the name). Put another way, knowledge is distributed across the entire network -- hence that name, too! PDP models were introduced to cognitive theory by James (Jay) McClelland and David Rumelhart (1986a; Rumelhart & McClelland, 1986b; McClelland, Rumelhart, et al., 1995), who at the time were colleagues at UCSD (McClelland subsequently moved to Carnegie-Mellon University, where he was a colleague of John Anderson, which may account for Anderson's qualms about the characterization of his model as "symbolic" or "localist"; Rumelhart subsequently moved to Stanford; then McClelland himself moved to Stanford; it's a small world).

As with the ACT model, this discussion of PDP models draws heavily on the treatment by Medin et al. (2001).

In large part, connectionist or PDP models are motivated by considerations of neural plausibility.

We know, from principles such as Lashley's Law of Mass Action, that no single neuron, or even a cluster of neurons, is critical for any particular piece of knowledge.
We also know, from neuroanatomical studies, that cortical neurons are arranged into thin layers roughly analogous to the "layers" in a connectionist network, described below.
And we know, despite some substantial functional specialization, that most of the brain is active in all information-processing.
And we know, from neurophysiological studies, that synaptic connections have a valence, existing in two forms, excitatory and inhibitory.
And we also know that synaptic connections vary in strength.

From these considerations, connectionist models begin with the assumption that the connections among neurons are strengthened or weakened during learning.

Connectionist models are "neurally inspired" because they take the brain as a metaphor.

The nodes in a connectionist network (often called units) are analogous to neurons.
These nodes are arranged into layers.

Two layers, one for inputs and the other for outputs, are obligatory.
Other, intermediate layers, often called hidden layers, are optional.

All nodes in the input layer are activated by every stimulus.

Each input unit carries a specific level of activation.
Each input concept is represented by a unique pattern of activation across the entire input layer.
The nodes in the input layer are all activated simultaneously, in parallel, not serially (hence the name).

All nodes in the output layer generate responses.

Each output unit carries a specific level of activation.
Each output concept is represented by a unique pattern of activation across the entire output layer.

During learning, the model adjusts the connection weights between each and every node in the input layer and each and every node in the output layer until the output layer produces the desired response.
Thus, similar patterns of inputs will produce similar patterns of outputs -- modeling the well-known fact that difficult discriminations are difficult to learn.

This generic connectionist model has implications for memory.

All memories are encoded in the same set of nodes.

There are not separate nodes representing each individual memory.
Rather, each individual memory is represented by the pattern of activation across the entire set of nodes.
Put another way, concepts, events, and the like are represented in the connections between nodes, not at the nodes themselves.

A retrieval cue provides partial information about an event to the input layer.
The activity of the output layer completes the memory.
Each episodic memory is a specific instance of an entire class of events, and is represented by a particular adjustment to the weights on the input and output layers.
Semantic memory is construed as a generalization from episodic memory, such that similar patterns of weights will represent the various instances of the category.
Retroactive interference occurs because new learning changes the weights in the network, making it hard for the response layer to produce the correct response associated with an old stimulus.
Proactive interference occurs because old learning makes it hard to adjust the weights in the network, making it hard for the response layer to produce the correct response associated with the new stimulus.

Connectionist models are extremely powerful learning machines, and for that reason, not to mention their "neural plausibility", they have been very attractive as models of memory -- indeed, vigorous rivals to symbolic or localist models.

But they have one big disadvantage: they are extremely prone to forgetting, especially forgetting via retroactive interference. In fact, this vulnerability to so bad that it has been characterized as catastrophic interference by McCloskey and Cohen (1989; see also Ratcliff, 1990) and French (1999). To see why this is so, consider the A-B/A-C retroactive interference paradigm.

When learning the first, A-B list, the network adjusts its weights so that it correctly produces the response B to the stimulus A. So far, so good.
But then when it has to learn the second, A-C list, the network will adjust its weights so that it correctly produces the response C to the stimulus A. Which means that it has to undo the connections that it generated while learning A-B!

So, a generic connectionist model must forget A-B in order to learn A-C. But we know from studies using paradigms like modified (and modified modified) free recall, discussed in the lectures on Associationism and Interference Theory, that people who learn A-C can also remember A-B. So, the typical connectionist model doesn't provide a very good match to actual human performance -- which reduces its attractiveness considerably.

One solution to this problem is to abandon connectionism entirely. This has been done in several models, like Metcalfe's CHARM and Murdock's TODAM, which learn efficiently but do not show catastrophic interference.

Another solution is to abandon sequential learning, in which A-B is followed by A-C.

For example, the model can "rehearse" A-B while learning A-C.
Alternatively, the model can learn A-B and A-C concurrent.
But actual subjects remember A-B and A-C just fine without this gambit.

Another solution is to distinguish between two As, A1 and A2, and have the model learn A1-B and A2-C. But that seems like cheating, doesn't it? Maybe not.

So-called "dual-net" architectures actually include two different networks, one to learn A-B, and the other to learn A-C. This, too, sounds like cheating, until you consider that there might actually be two different memory systems in the brain.

One, centered on the hippocampus, for new learning.

Another, centered on the cortex, for consolidated "old" learning.

At a more conceptual level, and with all due respect to McClelland (with whom I went to graduate school) and Rumelhart (who was without a doubt one of the world's most distinguished cognitive scientists), the whole connectionist enterprise smacks of the S-R theory of learning (not for nothing was Thorndike's S-R theory of learning called "connectionism").

Stimuli are represented by the weights on the input layer.
Responses are represented by the weights on the output layer.
The network is "trained", by the computer analog of reinforcement, to give the right response to each stimulus.
To be sure, there's often stuff that goes on in the hidden layers between the input and the output layer, but that's still an awful lot of emphasis on stimulus and response.
More to the point, it seems that the whole point of a connectionist model is to extract an accurate reflection of the stimulus world. By contrast, a major motivation for the cognitive approach to memory is the understanding that the mind also imposes structure on that mental representation which can lead to errors and illusions. Some structure is in the world, but other structure is in the mind.

It's hard to express, but I've got an aching feeling that connectionism ends up looking an awful lot like something that Skinner would find friendly. And that's a cause for alarm in the hearts and minds of cognitive psychologists.

An Interactive Activation Model

You can see some of the properties of connectionist networks in general by examining interactive activation model of word recognition presented by McClelland and Rumelhart (1981) in their classic text introducing a version of connectionism known as parallel distributed processing or PDP (not to be confused with Larry Jacoby's Process Dissociation Procedure, also known by the "PDP" acronym).

In this paradigm, subjects are presented with words, and are asked to identify the presence of particular letters.

There are three levels in this network:

At the lowest level are nodes representing various graphemic features, such as horizontal and vertical lines, diagonals, and open and closed circles.
At the middle level are letters that possess various combinations of these features.

Thus, the stylized letter A consists of 2 long vertical lines and 2 short horizontal lines -- or, less stylized, two long diagonal lines and one horizontal line.
And the stylized letter S consists of 5 horizontal lines and two horizontal lines -- or, if you prefer, two open circles.

At the highest level are words in which various letters appear.

PDP is a model or parallel processing, so the presentation of a particular letter will simultaneously activate nodes representing all three levels: graphemic feature, letter code, and lexical entry.

The presence in the stimulus of a short horizontal line will activate a graphemic node representing this feature.

Activation of these nodes will also excite higher-level nodes representing stylized letters that include this combination of features.
Activation will also inhibit nodes for letters which do not include this combination of features.

The presence of the stylized letter A in the stimulus will activate the corresponding letter code.

Activation of this node will also excite lower-level nodes representing its constituent graphemic features.

And inhibit lower-level nodes representing graphemic features that are not present in this letter.

Activation of this node will also excite higher-level nodes representing words which contain this letter.

And inhibit higher-level odes representing words which do not contain this letter.

Presentation of the word ABLE will activate a lexical node representing this word.

Activation will also excite middle-level nodes representing its constituent letters.
And it will inhibit nodes representing other letters which do not appear in the word.

Note that, the IA model and all other PDP models entail interactive activation.

Activation does not simply proceed in a "bottom-up" manner, from the graphemic level to the letter level to the word level. It doesn't proceed simply in a "top-down" fashion either.
Each level in the network interacts with adjacent levels in a reciprocal manner.

Eventually, all this activity will "settle" on a pattern of activation that represents the word ABLE..

Because all of this is going on in parallel, it all happens very quickly.

And because processing of words occurs simultaneously with, and reciprocally influences, processing of letters, subjects are faster to identify letters presented in the context of words, compared to letters presented in meaningless strings -- the word-letter phenomenon discussed in the lectures on Mathematical and Computational Modeling.

The Great Representational Debate

Much like the Great Mental Imagery Debate of the 1908s, the rise of connectionist modeling has stimulated a new opposition, between "symbolist" models like ACT and "connectionist" models like PDP. And like the Great Mental Imagery Debate, this new debate may prove to be undecidable.

Consider, for example, the following irony: along with their monograph, McClelland and Rumelhart distributed a disk containing sample PDP programs to demonstrate various applications of their approach (the "Sharks and Jets" model of categorization was especially popular). But these simulations were programmed to run on a standard desktop (or laptop) computer -- which is a serial, symbol-processing machine.
The fact is, as Feldman and Ballard (1982) made clear, "neural network" models, often identified with connectionism, can be implemented in both a localist and a distributed manner.
Moreover, as the interactive activation model of word recognition makes clear, PDP models can have both localist and distributed representations.

At the highest level, the word ABLE is represented by a single node.
At lower levels, the same word is represented by a pattern of activation over nodes representing its constituent letters and graphemes.

In addition, Labiere and Anderson (1993) accounted a "connectionist" implementation of ACT-R -- or at least its procedural component: declarative knowledge was stored in a separate "associative" memory.

So, we've got a "connectionist" model which runs on a symbol-processing machine, and we've got a "symbolist" model that can be given a connectionist implementation. Sounds like a draw to me.

But, then again, maybe not. Labiere and Anderson's (1993) title, referring to "A Connectionist Implementation" of the ACT-R Production System" (emphasis added) brings to mind the three-level analysis of vision promoted by Marr (1982; Marr & Poggio, 1976).

a computational level that operates on input representations to generate output representations,
an algorithmic level that specifies the processes to be performed at the computational level;
and an implementational level that embodies the algorithms in a physical system. Note Marr's assumption that the computational and algorithmic levels could be understood without reference to the implementational level.

In these terms, ACT-R might be identified with the computational level of analysis, and is symbolic in nature. The connectionist implementation might be identified with the implementational level of analysis.

Interestingly, recent findings from cognitive neuroscience may help us to choose between symbolic and connectionist architectures. After all, the chief argument in favor of distributed models of representation is that they are more biologically plausible than localist models. But are they? Let's look at the evidence from neuroscience.

The View from Cognitive Neuroscience

The presentation so far has focused on representation as viewed by cognitive psychology, but the rivalry between localist and distributed models has also played itself out within cognitive neuroscience.

Consider the following true story from the annals of cognitive psychology. There once was a seminar at Stanford University attended by both William K. Estes, a pioneering cognitive psychologist, and Karl Pribram, a pioneering cognitive neuroscientist. A student had presented some puzzling new experimental results, and the exchange went something like this:

Bill: Suppose there are a series of little drawers in the brain.

Karl: I have never seen any drawers in there.

Bill: They're very small.

We have a pretty good idea what memories look like in the mind. They look like propositional networks, or maybe like networks of connections. But what do memories look like in the brain? The answer comes in two forms.

The Distributionist Solution

The easiest answer is that the every memory is represented by a single neuron, or perhaps a small cluster of neurons, located in a particular part of the brain, and that person memories are no exception to this rule. Thus, the nodes in associative-network models of person memory, like those discussed here, have their neural counterparts in distinct (clusters of neurons).

Early research by Wilder Penfield (1954), a Canadian neurologist, suggested that this is indeed the case. In the process of diagnosing and treating cases of epilepsy, Penfield would stimulate various areas of the brain with a small electrical current delivered through a microelectrode implanted in the brain. This procedure does not hurt, because the cortex does not contain afferent neurons, and patients remain awake while it was performed. Accordingly, Penfield asked patients what they experienced when he stimulated them in various places. Sometimes they reported experiencing specific sensory memories, such as an image of a relative or the sound of someone speaking. This finding was controversial: Penfield had no way to check the accuracy of the memories, and it may be that what he stimulated were better described as "images" than as memories of specific events. In any event, the finding suggested that there were specific neural sites, perhaps a cluster of adjacent neurons, representing specific memories in the brain.

However, evidence contradicting Penfield's conclusions was provided by Karl Lashley (1950), a neuroscientist who conducted a "search for the engram", or biological memory trace, for his entire career. Lashley's method was to teach an animal a task, ablate some portion of cerebral cortex, and then observe the effects of the lesion on learned task performance. Thus, if performance was impaired when some portion of the brain was lesioned, Lashley could infer that the learning was represented at that brain site. After 30 years of research, Lashley reported that his efforts had been entirely unsuccessful. Brain lesions disrupted performance, of course. But the amount of disruption was proportional to the amount of the cortex destroyed, regardless of the particular location of the lesion.

Lashley's Law of Mass Action states that any specific memory is part of an extensive organization of other memories. Therefore, individual memories are represented by neurons that are distributed widely across the cortex. It is not possible to isolate particular memories in particular bundles of neurons, so it is not possible to destroy memories by specific lesions.

At about the same time, D.O. Hebb, a pioneering neuroscientist, argued that memories were represented by reverberating patterns of neural activity distributed widely over cerebral cortex. Hebb's suggestion was taken up by others, like Karl Pribram, who postulated that memory was represented by a hologram, in which information about the whole object was represented in each of its parts.

Localism Redux

Connectionist models are inspired, in part, by both Lashley's Law of Mass action and Hebb's reverberating-network model of memory.

Still, Penfield's vision held some attraction for some neuroscientists, who continued to insist that individual memories were represented by the activity of single neurons, or at most small clusters of neurons, at specific locations in cortex.

Sherrington (1941) postulated pontifical cells that represent sensory scenes.
Konorski (1967) postulated gnostic neurons that represented unitary percepts.
Barlow (1969, 1972) argued on the basis of a principle of "economy of impulses" that the brain should achieve a complete representation of a sensory scene with the fewest number of active neurons possible.

Problems with Penfield's clinical studies aside, early advances in understanding the neural basis of perception led support to the localist views of representation.

Barlow (1953) identified specific cells in the frog retina that responded to particular elementary patterns of visual stimulation: contrast between light and dark, moving edges, dimming of light, and convexity (where a dark object appears against a bright field).
Hubel and Wiesel (1959) won the Nobel Prize for similar studies that identified orientation-specific fields in the visual cortex of the cat.

While these neural systems responded to the physical properties of the stimulus, their discovery fed speculation that the meaning of the stimulus, and other cognitive contents, might similarly be represented by a localized cluster of neurons.

Jerome Lettvin (1969) speculated that a mother cell, or rather mother cells, plural, might represent all that subjects knew about their mothers. It was Lettvin who called Barlow's convexity detectors cells "bug perceivers".
Barlow himself (1972) speculated about a grandmother cell.
Harris (1980) somewhat facetiously speculated that if we have cells that respond to yellow, and other cells that respond to Volkswagens, we might also have yellow Volkswagen cells.

Nobody, including Lettvin and Barlow themselves, took any of this all that seriously, and neuroscientific doctrine has emphasized distributed representations of the sort envisioned by Lashley and Hebb.

Until recently, that is.

A serendipitous finding, ingeniously pursued by a group of investigators at UCLA and Cal Tech, has suggested that there might be something to the idea of a "grandmother neuron" after all (Quiroga, et al., 2005).

These investigators worked with eight patients with intractable epilepsy. In order to localize the source of the patients' seizures, they implanted microelectrodes in various portions of the patients' medial temporal lobes (the hippocampus, amygdala, entorhinal cortex, and parahippocampal cortex). Each microelectrode consisted of 8 active leads and a reference lead. They then recorded responses from each lead to visual stimulation -- pictures of people, objects, animals, and landmarks selected on the basis of pre-experimental interviews with the patients.

In one patient, the investigators identified a single unit (i.e., a single lead of a single electrode, corresponding either to a single neuron or to a very small, dense cluster of neurons), located in the left posterior hippocampus, that responded to a picture of Jennifer Aniston, an actress who starred in a popular television series, Friends. (A response was defined very conservatively as an activity spike of magnitude greater than 5 standard deviations above baseline, consistently occurring within 1 second of stimulus presentation). That unit did not respond to any other stimuli tested. The investigators quickly located other pictures of Aniston, including pictures of her with Brad Pitt, to which she was once (and famously) married. The same unit responded to all the pictures of the actress -- except those in which she was pictured with Pitt!

Similarly, a single unit in the right anterior hippocampus of another patient responded consistently and specifically to pictures of another actress, Halle Berry (who won an Academy Award for her starring role in Monsters' Ball). Interestingly, this unit also responded to a line-drawing of Berry, to a picture of Berry dressed as Catwoman (for her starring role in the unfortunate film of the same name), and even to the spelling of her name, H-A-L-L-E--B-E-R-R-Y (unfortunately, the investigators didn't think of doing this when they were working with the "Jennifer Aniston" patient -- remember, they were flying by the seat of their pants, doing this research under the time constraints of a clinical assessment). The fact that the unit responded to Berry's name, as well as to her picture, and to pictures of Berry in her (in)famous role as Catwoman, suggests that the unit represents the abstract concept of "Halle Berry", not merely some configuration of physical stimuli.

As another example, yet a third patient revealed a multi-unit (i.e., two or more leads of a single electrode, evidently corresponding to a somewhat larger cluster of neurons) in the left anterior hippocampus that responded specifically, if not quite as distinctively, to pictures of the Sydney Opera House. This same unit also responded to the letter string SYDNEY OPERA HOUSE. It also responded to a picture of the Baha'i Temple -- but then again, in preliminary testing this patient had misidentified the Temple as the Opera House! So again, as with the Halle Berry neuron, the multi-unit is responding to the abstract concept of the Sydney Opera House", not to any particular configuration of physical features.

Across the 8 patients, Quiroga et al. tested 993 units, 343 single units and 650 multi-units, and found 132 units (14%) that responded to 1 or more test pictures. When they found a responsive unit, they then tested it with 3 to 8 variants of the test pictures. A total of 51 of these 132 units yielded evidence of an invariant representation of people, landmarks, animals, or food items. In each case, the invariant representation was abstract, in that the unit responded to different views of the object, to line drawings as well as photographs, and to names as well as pictures.

So maybe there is a "grandmother neuron" after all! This research -- which, remember, was performed in a clinical context and thus may have lacked some desirable controls -- identified sparse neural representations of particular people (landmarks, etc.), in which only a very small number of units is active during stimulus presentation.

Of course, this evidence for localization of content contradicts the distributionist assumptions that have guided cognitive neuroscience for 50 years. Further research is obviously required to straighten this out, but maybe there's no contradiction between distributionist and locationist views after all. After all, according to Barlow's (1972) psychophysical linking principle,

Whenever two stimuli can be distinguished reliably... the physiological messages they cause in some single neuron would enable them to be distinguished with equal or greater reliability.

In other words, even in a distributed memory representation, there has to be some neuron that responds invariantly to various representations of the same concept. Neural representations of knowledge may be distributed widely over cortex, but these neural nets may come together in single units.

But wait a minute -- we're talking about the cerebral cortex, and the data from Quiroga et al. came from the hippocampus and other subcortical structures. Note, however, that the hippocampus is crucial for memory: it was the destruction of his hippocampus that rendered H.M. amnesic. Nobody thinks that memories are stored in the hippocampus -- it's just too small for that purpose. But one prominent theory of the hippocampus is that it performs a kind of indexing function, relating memories to each other that are located in the cortex. Accordingly, maybe Quiroga didn't exactly tap into their patient's whole knowledge representation of Halle Berry -- but instead, hit on the neural index card that locates all that information.

In any event, more recently Quian Quiroga and his colleagues (2008) have backed off their earlier, strong claims for having discovered something very much like a grandmother cells.

In the first place, they acknowledged the extremely low probability of stumbling on the cell that represents Jennifer Anniston, or Halle Berry, or whatever. After all, there are lots of cells in the brain (to put it mildly), and lots of knowledge encoded in these cells. It may be that there are other cells, located elsewhere, that code for the same knowledge as the cells they happened on more or less by accident.
Moreover, the coding of these cells may not be quite as specific as they had earlier supposed. That is, the Jennifer Anniston cell may fire to Jennifer Aniston, but it may also fire to Courtney Cox, her co-star on Friends. (Remember, Quian Quiroga had limited time with these patients, and they couldn't think of everything.)
In fact, computational models of neural activity suggest that each cell in the brain fires in response to about 100 objects (don't ask how they estimated this).

Still, they argued, the coding is more sparse than distributed.

So maybe symbolic/localist cognitive models have some life in them after all!

Just such an argument has been made by Bowers (2009), in a Psychological Review paper whose title gives the argument away: "On the Biological Plausibility of Grandmother Cells". At the very least, Bowers argues that localist models of cognition are compatible with neurophysiological findings.

Bowers begins with an instructive discussion of the differences between localist (symbolic, computational) and connectionist (PDP) models.

In a classic localist representation, each individual word, object, or concept has its own dedicated representation in the mind or the brain -- a representation often characterized as a node in an associative network, linked to other nodes representing related words, objects, or concepts; or, alternatively, as a neuron or a spatially contiguous cluster of neurons. In what follows, I will refer to these nodes, neurons, or clusters as units.

Each unit codes for one "thing" (which might be a concept or other equivalence class).
As a result, the activation of a single unit can be interpreted in terms of the thing which it represents.
Units are atomic.

They may represent a word or a concept, but not an entire proposition.
They may represent objects, but not entire scenes.

Propositions and scenes are coded when individual units enter into some sort of relationship with each other -- such as that represented by linguistic syntax.
Importantly, multiple units may represent the same thing (there's no rule against this, and multiple-trace models of memory seem to allow for it explicitly).

This redundancy in memory may be proportional to the importance of the thing represented.
And, of course, it permits enforcement of Lashley's Law of Mass Action -- which, at the very least, may be interpreted as meaning that the neural code for learning is highly redundant.

Because activation spreads from one unit to another, several units may be activated in the presence of a particular thing. This may give the appearance of a distributed code, but it is not.

In this case, each of the units represents a different thing.

For example, one unit may represent the grandmother, while another may represent the concept of old age, or Thanksgiving dinner, or Alzheimer's disease.

By contrast, in a PDP representation, individual words, objects, and concepts are coded as a pattern of activation across many processing units (nodes, neurons, or clusters of neurons).

There is no single unit which represents a single "thing".

As a result, the activation of any single unit cannot be interpreted in terms of the thing it represents -- precisely because no single unit represents anything.

Interpretations can be given only to particular patterns of activation.

In a distributed code, each unit codes for more than one thing.

For this reason, the identity of a stimulus cannot be inferred from the activation of any single unit in a network.

Units in a distributed network may code for "a set of entities in the world", but these entities "do not constitute a meaningful equivalence class".

Bowers argues that the general preference for distributionist vs. localist coding schemes is based not just on the neural analogies discussed earlier, or a particular set of neurophysiological findings, but also on a misunderstanding of localist models -- not least because there is not just one possible localist model, but several.

The conventional localist model is what might be called "grandmother" coding, in which individual units code for particular things.

So, there might be a single unit which is activated whenever a subject perceives, remembers, thinks about, or images his grandmother.
As noted earlier, there may well be redundancy in the system, such that several different units code for the same object.
Grandmother coding of this sort was rejected by Quin Quiroga et al. (2008) themselves.

Quian Quiroga et al. (2008) preferred sparse coding, in which individual units code for several objects, but clusters of units code for a specific object.
And there is also coarse coding, where individual units code for a range of similar objects.

As Bowers notes (2009, p. 225), "The critical question is not whether a given neuron responds to more than one object, person, or word but rather whether the neuron codes for more than one thing. Localist coding is implemented if a stimulus is encoded by a single node (neuron) that passes some threshold of activity, with the activation of other nodes (neurons) contributing nothing to the interpretation of the stimulus.

For their part, distributed models also come in various forms.

In a conventional dense distributed representation, each unit codes for many different things, such that little or no information about can be inferred from the activation of any particular unit.

This is the form taken by most PDP models of information processing.
Such units can be described as subsymbolic or subconceptual.

By analogy with coarse localist representations, in a coarse distributed representation each unit participates in the coding of a broad range of similar things -- broad enough that it is still not possible to identify the thing being represented from the activation of any particular units.

In many such schemes, adjacent units code for similar things, much like the topographic map of visual cortex, the tonotopic map in auditory cortex, or the sensory and motor homunculi in somatosensory or motor cortex.
In "winner take all" schemes, the most active unit within a set of adjacent, co-active units represents the thing in the world. But that would make it very similar to a coarse localist code.

Accordingly, in coarse distributed representations there is no such "pooling", and so the entire ensemble of adjacent units is necessary to represent the thing.

And, finally, there are sparse distributed representations -- which involve a smaller batch of units.

Sparse distributed representations often are often used to solve the problem of catastrophic interference, because one sparse coding can be used for A-B, and another for A-C. They learn rapidly, too. But they don't show much by way of generalization -- which is as bad a problem, for a model of learning, as catastrophic interference.

The View from Cognitive Sociology

Memory, like any other aspect of mind and behavior, can be analyzed at the psychological level, as in models like HAM and ACT, and it can be analyzed at the neuroscientific level, as in discussions of the hippocampus and grandmother neurons. But memory can also be analyzed at a level "above" the individual mind and brain. So, for example, sociologists discuss collective memories shared by groups, organizations, institutions, and whole societies and cultures.

So how are memories represented at the sociocultural level of analysis?

Obviously, groups represent their memories in literary forms such as oral and written histories, as well as in video and audio records.
And just as obviously, they represent their memories in monuments and memorials -- which is why they're called memorials.
And more subtly, they represent their memories in social souvenirs, including historical preservations like Colonial Williamsburg, which "freeze time".
And even more subtly, they represent their memories in their calendars, which is why Americans have holidays like Memorial Day, to remember those who have sacrificed for their country, and why Christians celebrate Easter, to remember the crucifixion of Jesus.

That's what memories look like at the sociocultural level of analysis. Understanding memory at this level is the province of cognitive sociology, a new field of sociology introduced by Eviatar Zerubavel (1997).

Schemata

Concepts, in turn, are a form of knowledge representation known as schemata. F.C. Bartlett (1932) introduced the concept of schema (pl. schemata, although schemas is acceptable too) as a central concept in his reconstructive theory of memory. According to Bartlett, remembering is not like taking a book off the shelf and reading it, as the traditional library metaphor would have it. Rather, remembering is more like writing the book anew, based on fragmentary notes. The process of remembering, of reconstructing a memory, is guided throughout by an organized framework of world-knowledge and attitudes, within which the memory is reconstructed. This organized framework is the schema.

Many people find schemata difficult to understand, but you begin to get the idea if you think of a more familiar derived term, schematic. A schematic diagram is a kind of logical diagram of a house or piece of equipment. It shows how the parts are associated with each other. But in the case of the house, it doesn't specify what the walls are made of, or what color they are painted. And in the case of a piece of electronic equipment, it doesn't show how the parts are actually configured inside the case. A schematic diagram represents the general idea of a thing -- and that is exactly what a schema is.

Head's Concept of Schema

Bartlett actually got the schema concept from Sir Henry Head (1861-1940), a British neurophysiologist famous for his studies of bodily posture and of aphasia. In his Studies in Neurology (1920), Head asserted that, in order to maintain correct posture, an organism must have some conception of its own body in space and time -- a homunculus-like "plastic model" which registers information about successive movements of various body parts (arms, legs, etc.), and updates the conception accordingly (see also Head and Holmes, 1911). The body schema is an internal representation of the body, but it's not exactly a picture of what our bodies look like now; but rather a more generic concept of our bodies, that we have arms and legs and hands, and what kinds of motions these body parts can make, where these body parts are likely to be found, and so on.

"Schemas are abstractions from specific instances that can be used to make inferences of the concepts they represent" (Anderson, Cognitive Psychology and Its Implications, 2000).

"A schema is a general knowledge structure used for understanding" (Medin, Ross, & Markman, Cognitive Psychology 2001).

Bartlett's Concept of Schema

In his theory of memory, Bartlett defined a schema as "an active organization of past reactions, or of past experiences, which must always be supposed to be operating in any well-adapted organic response" (p. 201) -- not just in moving around the physical world, but in mental activities such as remembering as well.

Schemata are unconscious, and operate unconsciously to guide conscious perception, memory, and thought. Just as we are not consciously aware of our body schema, although we are aware of where our body parts are in space by virtue of such a schema, so we are not consciously aware of our cognitive schemata, although we are aware of the percepts, memories, and thoughts that are constructed and reconstructed within the framework provided by these schemata.
Schemata are generic, abstracted from specific instances to form a representation of what a class of instances is like in general. In this sense, schemata are like concepts. We can say that we have a "wedding schema", or a general idea of what weddings are like as social events; or we can say we have a "concept of a wedding".
All incoming information interacts with whatever schemata have been activated, and prevailing schemata actively organize perceptual input. Schemata do not simply accept incoming information, the way a jigsaw puzzle accepts only pieces of a particular shape. Rather, schemata have a way of shaping the pieces themselves. During perception, the perceiver actively tries to fit the new information into pre-existing schemata -- a process that Bartlett famously called "effort after meaning".
Schemata are generative, in the sense that they allow the person to deal with an infinite number of new schema-relevant instances. He drew an analogy to a skilled tennis player, who can hit the ball even when it appears in an unfamiliar location, or comes at an unfamiliar angle.
Schemata guide recall as well as perception. In what Brewer & Nakamura (1984) called Bartlett's "pure reconstructive" theory of memory, Bartlett seems to deny that new experiences leave specific traces of themselves in memory. Instead, he argued that new experiences modify pre-existing schemata, and memory retains only these modified schemata. Recall, for Bartlett, entails the person "turn[ing] round upon [his] own schemata" (a phrase that struck even Bartlett's friends as incomprehensible), inferring what happened from the relevant schema, rather than retrieving a record of that event from memory.
The "pure reconstructive" theory is what Bartlett held in his heart of hearts, but he had to acknowledge that people actually remembered specific details of specific events, not just generic concepts of events in general. Accordingly, Bartlett also offered a "partial reconstructive" theory of memory, which holds that a memory is a joint product of information contained in a memory trace and knowledge represented by a generic schema.

It is this latter "partial reconstructive" view that is Bartlett's legacy to memory theory. In the constructivist theory of perception, as it has been known at least since the time of Helmholtz, the perceiver combines information extracted from the stimulus with prior knowledge, expectations, and beliefs stored in memory to form a representation of some event that may or may not be precisely accurate. In much the same way, it appears that the rememberer combines information retained in a memory trace with knowledge stored as part of a generic schema relevant to the event being remembered. The result of this is that the individual will correctly remember those details that are schema-congruent, but also will falsely remember details that are congruent with the schema but not not actually features of the event in question. In addition, the individual will also remember schema-incongruent features -- those which were unexpected based on the schema activated at the time of perception, and so drew additional attention, and dominated the perceiver's "effort after meaning".

Piaget on Schemata

The great Swiss developmental psychologist Jean Piaget (1896-1980) also employed the schema concept in his "genetic epistemology" theory of cognitive development. For Piaget, as for Bartlett, a schema is an internal representation of some general class of situations. Incoming stimulus information is assimilated to prevailing schemata, which in turn accommodate to information that doesn't quite fit. Thus, the child is born with innate sensory-motor schemata, which develop through pre-operational, concrete-operations, and formal-operations stages as a result of the dynamic interplay of assimilation and accommodation. It's easy to see the similarities between Bartlett's and Piaget's ideas about schemata, but neither of them references the other. As far as I can tell, Piaget first employed the schema concept in The Language and Thought of the Child (1926), so one would not expect Piaget to cite Bartlett. But Bartlett didn't cite Piaget, either. My best guess is that they derived the idea independently -- Bartlett from Henry Head, and Piaget from Immanuel Kant. Oldfield and Zangwill (1942-1943) do not cite Piaget in their discussion of Head and Bartlett, and deny any connection between Bartlett's views and Kant.

It was Kant, in fact, who first introduced the notion of a schema, referring to the a priori categories that Kant invoked in his synthesis of Cartesian rationalism and British empiricism. Think, for example, of the associationist principle of association by contiguity (never mind that it's wrong). You can't perceive things as close together in space and time unless you already have some notion of space and time. Such notions are schemata, in Kant's terms.

Incidentally, the Bartlett-Piaget coincidence repeated itself several decades later. In his pioneering textbook on Cognitive Psychology, published in 1967, Ulric (Dick) Neisser made considerable use of Bartlett's notion of the schema as the generic knowledge against which percepts are constructed and memories reconstructed. At exactly the same time, Aaron T. (Tim) Beck published a pioneering cognitive theory of depression (as opposed to the prevailing psychoanalytic one), based on the idea that depressed individuals suffer from depressogenic schemata -- basically, negative construals of self, the future, and the world. Neisser was at the time on the faculty at Cornell, but he wrote his book while on sabbatical at the University of Pennsylvania -- which was where Beck, on the faculty of Penn's psychiatry department, was writing his book. I know both individuals (being a Penn PhD), and so far as I can tell neither knew what the other was up to.

The Bartlett Revival

Partly owing to the influence of Neisser's book, and partly owing to the increasing interest on the part of memory researchers in memory for stories (as opposed to word-lists), the schema concept was revived in the 1970s -- first within cognitive psychology, and then within social psychology. For example, a number of experiments showed that comprehension of prose passages was better if subjects were first given information about the general theme of the passage; expert chess players, remember chess positions better than novices; and story details that fit subjects' expectations and world-knowledge are remembered better than those that do not.

Taylor and Crocker (1981) discussed a number of functions of schemata:

Schemata lend structure to experience. The stimulus field is often vague and ambiguous, and schemata allow the perceiver to structure and organize its elements.
They enable the perceiver to fill in missing information. According to the cognitive analysis of perception, not all the information needed for perception is provided by the stimulus. Thus, in Bruner's phrase, the perceiver must "go beyond the information given" by the stimulus, and schemata provide the basis for filling in the gaps and making inferences about missing information.
Schemata determine what will be encoded -- and, by extension, what will be retrieved.
They also affect processing time. At least in principle, schema-congruent information should be processed relatively quickly.
Schemata provide the basis for problem-solving, by activating a schema covering a whole class of problems.
They also provide a basis for evaluating experience. Bartlett argued that schemata are often associated with positive or negative attitudes. Thus, matching an event to a particular schema will yield an initial evaluation of that event.
Schemata provide a basis for anticipating the future, by telling the perceiver what to expect from a general class of objects and events, so that he can then plan to take action with respect to them.
More generally, schemata provide the cognitive basis for understanding and comprehension, problem-solving, evaluation, and planning.

Brewer and Nakamura (1984) outlined five ways that schemata could specifically influence memory:

Schemata influence the amount of attention directed to particular details.
They act as frameworks for storing new information.
Generic information in schemata can combine with specific details of an event.
Schemata can guide memory retrieval.
They can guide the process by which the subject selects retained information for actual reporting.

Schemata in Artificial Intelligence

Both Bartlett's and Piaget's notions of schemata are relatively informal, and so was the concept of schema held by the cognitive and social psychologists just described. For them, the term simply refers to an organized body of more-or-less generic knowledge that guides perception, memory, thought, and action. But this is a lecture supplement on representation, so we need to ask:

what do schemata look like?

We got an answer when the schema concept was revived in cognitive science, and particularly in work on artificial intelligence, by theorists who rejected the "atomistic" implications of information-processing theory -- as in HAM or ACT, with individual pieces of knowledge represented as local nodes in an associative network. They had to figure out what schemata looked like, because they wanted to incorporate the concept in their computer-simulation models of memory and other aspects of cognition.

For example, Minsky (1975) explicitly rejected atomism and postulated the existence of "larger" "data structures" for representing knowledge known as frames. A frame has nodes that provide its basic structure, and slots that accept only certain kinds of information. If a slot is not filled by information to the contrary, it is filled in by "default" information. For example, a room has a floor, walls, windows, doors, and a ceiling, each represented by nodes. The floor may be wood or tile or carpeted, but it is unlikely to be made of water or grass. The ceiling may be level or vaulted, but if it is vaulted the vault is unlikely to point downward. There are usually four walls, and at least one window on every outside wall.

At roughly the same time, Rumelhart and Ortony (1977) also invoked the schema concept to handle the problem of representing "higher-level abstractions" in story memory (which they used as a proxy for episodic memory in general).

Rumelhart (1981, 1984) began by offering some analogies between schemata and more familiar terms:

Schemata are like plays with characters that "can be played by different actors without changing the essential nature of the play".
Schemata are like theories about the world, which guide our interpretation of events and which are revised based on our actual experience.
Schemata are like procedures that actively evaluate the degree to which they account for available data.
Schemata are like parsers that impose structure on incoming data.

For Rumelhart (1984), schemata have several major characteristics:

Schemata are abstract representations of various kinds of objects, situations, or events. In this sense, schemata are a lot like concepts.
Schemata have variables which can be associated with, or bound to, different aspects of the environment. For example, a schema for room can stipulate that it has walls, a floor, a ceiling, windows, and a door, without stimulating exactly what these elements are made of, what they look like, or where they are in relation to each other.
These variables, in turn, are subject to variable constraints or default values. For example, there are usually four walls, arranged in a rectangle. There are usually windows on each exterior wall. And the door is usually on an interior wall.
Where Minsky talked about nodes, Rumelhart spoke of fixed slots whose contents were invariant, as opposed to variable slots which were filled by default values in the absence of information to the contrary.
Some of these variable constraints are conditional on the values of observed values. For example, while a bedroom might have only one door and a closet, a living room might have two doors -- one opening to the outside, another opening to the kitchen -- and no closet. Thus, schemata represent the relationships among variables, not just the existence of the features themselves.
Schemata can be embedded in each other, just as concepts can be embedded in hierarchies of supersets and subsets. So, a schema for an office building contains offices, cafeterias, and rest rooms, while the schema for office contains desks, chairs, shelves, telephone, and computer terminal.
Schemata represent knowledge at all levels of abstraction. We can have a broad schema for democratic governance, and a narrow schema for Vermont town meeting.
Schemata are knowledge -- they don't just hold knowledge; they are our knowledge about things in general.
Schemata are active, organizing, interpreting, and supplying information -- not just passive repositories of information.
Schemata are recognition devices which evaluate their own goodness of fit to the information they are processing.

For Rumelhart, schemata mediate the dynamic interplay between top-down and bottom-up information processing -- much like Piaget's interplay between assimilation and accommodation. Incoming stimulus information is processed with respect to active schemata, while schemata direct attention and interpretation. In schematic processing, information processing goes in both directions: top-down and bottom-up. This schematic processing is critical to every aspect of cognition: perception, discourse processing, learning, memory, and problem-solving.

For thorough discussions of Bartlett's schema theory, and its more modern adaptations, see

Oldfield, R.C., & Zangwill, O.L. (1942a). Head's concept of the schema and its application in contemporary British psychology. Part I. Head's concept of the schema. British Journal of Psychology, 32, 267-286.
Oldfield, R.C., & Zangwill, O.L. (1942b). Head's concept of the schema and its application in contemporary British psychology. Part II. Critical analysis of Head's theory. British Journal of Psychology, 33, 58-64.
Oldfield, R.C., & Zangwill, O.L. (1943a). Head's concept of the schema and its application in contemporary British psychology. Part III. Bartlett's theory of memory. British Journal of Psychology, 33, 113-129.
Oldfield, R.C., & Zangwill, O.L. (1943b). Head's concept of the schema and its application in contemporary British psychology. Part IV. Wolters' theory of thinking. British Journal of Psychology, 33, 143-149.
Oldfield, R.C. (1954). Memory mechanisms and the theory of schemata. British Journal of Psychology, 45, 14-23.
Paul, I.H. (1967). The concept of schema in memory theory. Psychological Issues, 5(2-3), 218-258. Reprinted in R.R. Holt (Ed.) (1967), Motives and thought: Psychoanalytic essays in honor of David Rapaport (pp. 218-258). New York: International Universities Press.
Brewer, W.F., & Nakamura, G.V. (1984). The nature and function of schemas. In R.S. Wyer & T.K. Srull (Eds.), Handbook of social cognition (1st ed., vol. 1), pp.119-160. Hillsdale, N.J.: Erlbaum.
Rumelhart, D.E. (1984). Schemata and the cognitive system. In R.S. Wyer & T.K. Srull (Eds.), Handbook of social cognition (1st ed., vol. 1), pp.161-188. Hillsdale, N.J.: Erlbaum.

Scripts as Schemata

A special form of schema is known as a script. The notion of scripts has its origins in sociological role theory, and sociologists of sex often discuss sexual interactions as scripted in nature. For a long time, however, the script concept was relatively informal, based on a dramaturgical metaphor for social behavior in general.

Just what goes into scripts, and how they are structured, was discussed in detail by Schank & Abelson (1977), who went so far as to write script theory in the form of an operating computer program -- another exercise in artificial intelligence, this time applied to the domain of social cognition. Schank and Abelson based their scripts on conceptual dependency theory (Schank, 1975), which attempts to represent the meaning of sentences in terms of a relatively small set of primitive elements. Included in these primitive elements are primitive acts such as:

ATRANS, transfer of an abstract relationship, such as possession;
MTRANS, transfer of mental information between animals or within an animal;
PTRANS, transfer of the physical location of an object;
MOVE, movement of the body part of an animal by that animal;
INGEST, taking in of an object by an animal to the inside of that animal.

Schank & Abelson illustrate their approach with what they call the Restaurant Script:

The script comes in several different tracks, corresponding to the various types of restaurant, such as coffee shop.
There are various props, such as tables, menu, food, check, and money.
There are various roles, such as customer, waiter, cook, cashier, and owner.
There are certain entry conditions, such as Customer is hungry and Customer has money.
And there are certain results, such as Customer has less money, Owner has more money, Customer is not hungry, and Customer is pleased (which, of course, is optional).

Scene	Begins with...	Ends with...
Scene 1, Entering the Restaurant	Customer PTRANS Customer into restaurant	Customer MOVE Customer to sitting position
Scene 2, Ordering	Customer MTRANS Signal to Waiter	Waiter PTRANS Food to Customer
Scene 3, Eating	Cook ATRANS Food to Waiter	Customer INGEST Food
Scene 4, Exiting	Waiter ATRANS Check to Customer	Customer PTRANS Customer out of restaurant.

Although script theory attempts to specify the major elements of a social interaction in terms of a relatively small list of conceptual primitives, Schank and Abelson also recognized that scripts are incomplete. For example, there are free behaviors that can take place within the confines of the script.

There are also anticipated variations of the script, such as

equifinal actions, or actions that have the same outcome;
variables, such as whether the customer orders chicken or beef;
paths;or choice points;
scene selection; as well as
the tracks described above.

And there are unanticipated variations as well, such as

interferences such as obstacles, errors, and corrective prescriptions for them; and
distractions from the script.

Scripts are, in some sense, prototypes of social situations, because they list the features of these situations and the social interactions that take place within them. But they go beyond prototypes to specify the relations, particularly, the temporal, causal, and enabling relations, among these features. The customer orders food before the waiter brings it, and the customer can't leave until he pays the check, but he can't pay the check until the waiter brings it.

In any event, scripts enable us to categorize social situations: we can determine what situation we are in by matching its features to the prototypical features of various scripts we know. And, having categorized the situation in terms of some script, that script will then serve to guide our social interactions within that situation. By specifying the temporal, causal, and enabling relations among various actions, the script enables us to know how to respond to what occurs in that situation.

Categories and Concepts

Our discussion of memory storage has focused on episodic memory -- that is, how specific episodes of experience, thought, and action are represented in the mind. But it is also clear that more than episodic memories are stored in the mind. There is also semantic knowledge of various sorts, as well as procedural knowledge. A special form of semantic knowledge concerns conceptual knowledge about the world. Technically, conceptual knowledge is part of semantic memory, and we have already discussed how certain classic models of semantic memory represent conceptual knowledge:

The model of Collins and Quillian represents knowledge in a multi-layer hierarchy, nodes representing various levels of categorization are linked to their constituent properties. Thus, animal breathes, bird has wings, fish has fins, canary can sing, and salmon is pink.
The model of Smith, Shoben, and Rips (1974) locates instances of categories in a multidimensional space, where birds such as goose, duck, and chicken lie relatively close together, and relatively far away from other birds such as robin, sparrow,, and cardinal.
The ACT model of Anderson (1976) and his associates can also represent semantic knowledge in terms of propositions like Robin has a red breast, Robin is a bird, and Bird is an animal.

That's all well and good, but conceptual knowledge has been such an important part of theories of cognitive representation -- since, roughly, the time of Aristotle! -- that they deserve some special treatment.

So the question becomes -- what are concepts, and how are categories represented in the mind?

The terms concept and category are often used interchangeably, even though there is an important technical distinction between them:

A category may be defined as a group of objects, events, or ideas which share attributes or features in common. Categories partition the world into equivalence classes. Oak trees and elm trees belong in the category trees, while the Atlantic and the Pacific belong in the category oceans.

Some categories are natural, in that their members are part of the natural world.
Other categories are artificial, in that they have been contrived by experimenters who want to know more about how categorization works.

A concept is the mental representation of a category, usually abstracted from particular instances. Concepts serve important mental functions: the group related entities together into classes, and provide the basis for synonyms, antonyms, and implications. Concepts summarize our beliefs about how the world is divided up into equivalence classes, and about what entire classes of individual members have in common.

Generally, we think of our mental concepts as being derived from the actual categorical structure of the real world, but there are also points of divergence:

Categories may exist in the real world, without being mentally represented as concepts.
Concepts may impose a structure on the world that does not exist there.

Technically, categories exist in the real world, while concepts exist in the mind. However, this technical distinction is difficult to uphold, and psychologists commonly use the two terms interchangeably. In fact, objective categories may not exist in the real world, independently of the mind that conceives them (a question related to the philosophical debate between realism and idealism). Put another way, the question is whether the mind picks up on the categorical structure of the world, or whether the mind imposes this structure on the world.

Some categories may be defined through enumeration: an exhaustive list of all instances of a category. A good example is the the English alphabet, A through Z; these letters have nothing in common except their status as letters in the English alphabet.

A variant on enumeration is to define a category by a rule which will generate all instances of the category (these instances all have in common that they conform to the rule). An example is the concept of integer in mathematics, which is defined as the numbers 0, 1, and any number which can be obtained by adding or subtracting 1 from these numbers one or more times.

The most common definitions of categories are by attributes: properties or features which are shared by all members of a category. Thus, birds are warm-blooded vertebrates with feathers and wings, while fish are cold-blooded vertebrates with scales and fins. There are three broad types of attributes relevant to category definition:

perceptual or stimulus features help define natural categories like birds and fish;
functional attributes, including the operations performed with or by objects, or the uses to which they can be put, are used to define categories of artifacts like tools (instruments which are worked by hand) or vehicles (means of transporting things);
relational features, which specify the relationship between an instance and something else, are used to define many social categories like aunt (the sister of a father or a mother) or stepson (the son of one's husband or wife by a former marriage).

Of course, some categories are defined by mixtures of perceptual, functional, and relational features.

Still, most categories are defined by attributes, meaning that concepts are summary descriptions of an entire class of objects, events, and ideas. There are three principal ways in which such categories are organized: as proper sets, as fuzzy sets, and as sets of exemplars.

Now having defined the differences between the two terms, we are going to use them interchangeably again. The reason is that it's boring to write concept all the time; moreover, the noun category has a cognate verb form, categorization, while conceptual does not (unless you count conceptualization, which is a mouthful that doesn't mean quite the same thing as categorization).

Still, the semantic difference between concepts and categories raises two particularly interesting issues for social categorization:

To what extent does the categorical structure of the social world exist in the real world outside the mind, to be discovered by the social perceiver, and to what extent is this structure imposed on the world by the social perceiver?
To what extent are social categories "natural", and to what extent are they "artificial"?

The Classical View: Categories as Proper Sets

Perhaps the earliest philosophical discussion of conceptual structure was provided by Aristotle in his Categories. Aristotle set out the classical view of categories as proper sets -- a view which dominated thinking about concepts and categories well into the 20th century. Beginning in the 1950s, however, and especially the 1970s, philosophers, psychologists, and other cognitive scientists began to express considerable doubts about the classical view. In the time since, a number of different views of concepts and categories have emerged -- each attempting to solve the problems of the classical view, but each raising new problems of its own. Here's a short overview of the evolution of theories of conceptual structure.

According to the classical view, concepts are summary descriptions of the objects in some category. This summary description is abstracted from instances of a category, and applies equally well to all instances of a category.

In the classical view, categories are structured as proper sets, meaning that the objects in a category share a set of defining features which are singly necessary and jointly sufficient to demarcate the category.

By singly necessary we mean that every instance of a category possesses that feature;
by jointly sufficient we mean that every entity possessing the entire set of defining features is an instance of the concept.

Examples of classification by proper sets include:

square: a closed geometric figure with four equal sides and four equal angles
bachelor: an unmarried male human
US Senator: a Congressional representative at least 35 years of age, elected at large by the people of a state to a six-year term

According to the proper set view, categories can be arranged in a hierarchical system which represents the vertical relations between categories, and yield the distinction between superordinate and subordinate categories.

Subordinate categories (subsets) are created by adding defining features: thus, a square is a subset of the geometric category quadrilateral (four-sided figure).
Superordinate categories (supersets) are created by eliminating defining features: thus, a quadrilateral is a superset which includes squares, rectangles, and parallelograms.

Such hierarchies of proper sets are characterized by perfect nesting, by which we mean that subsets possess all the defining features of supersets (and then some). Examples include:

geometrical figures
    superset: points, lines, planes, solids
        subsets of planes: triangles, quadrilaterals, etc.
            sub-subsets of quadrilaterals: parallelograms, rhomboids, etc.
                sub-sub-subsets of parallelograms: rectangles, squares, etc.

people
    superset: male, female
        subsets of males: youth, bachelor, husband, widower

subsets of females: girl, maiden, wife, widow

government officials
    superset: executive, legislative, judicial
        subsets of legislative: senator, representative

        subsets of executive: president, cabinet secretary, administrator

        subsets of judicial: supreme court, court of appeals, district court, magistrate

Note, for example, the perfect nesting in the hierarchy of geometrical figures.

At the most superordinate level, categories are defined by a single feature, dimensionality.

Points are geometrical figures that have no dimensions;
Lines have only one (length);
Planes have two (length and width), and
Solids have three (length, width, and depth).

Within each of these categories, subsets are created by adding a new defining feature. For example, two-dimensional shapes may be divided into triangles, quadrilaterals, and the like.

Triangles may be divided into equilateral or isosceles, right or acute.
Quadrilaterals can be divided into parallelograms, rhombuses, trapezoids, and trapeziums; etc.

Such hierarchies show perfect nesting: all instances of subcategories also possess the defining features of. the relevant superordinate category. All trapezoids have the features of quadrilaterals, and all quadrilaterals have the features of planes.

Proper sets are also characterized by an all-or-none arrangement which characterizes the horizontal relations between adjacent categories, or the distinction between a category and its contrast. Because defining features are singly necessary and jointly sufficient, proper sets are homogeneous in the sense that all members of a category are equally good instances of that category (because they all possess the same set of defining features). An entity either possesses a defining feature or it doesn't; thus, there are sharp boundaries between contrasting categories: an object is either in the category or it isn't. You're either a fish, or you're not a fish. There are no ambiguous cases of category membership.

According to the classical view, object categorization proceeds by a process of feature-matching. Through perception, the perceiver extracts information about the features of the object; these features are then compared to the defining feature of some category. If there is a complete match between the features of the object and the defining features of the category, then the object is labeled as another instance of a category.

Problems with the Classical View

The proper set view of categorization is sometimes called the classical view because it is the one handed down in logic and philosophy from the time of the ancient Greeks. But there are some problems with it which suggest that however logical it may seem, it's not how the human mind categorizes objects. Smith & Medin (1981) distinguished between general criticisms of the classical view, which arise from simple reflection, and empirical criticisms, which emerge from experimental data on concept-formation.

General Criticisms. On reflection, for example, it appears that some concepts are disjunctive: they are defined by two or more different sets of defining features.

An example is the concept of a strike in the game of baseball: a strike is a pitched ball at which the batter swings but does not hit; or a strike is a pitched ball which crosses home plate in the strike zone, which baseball's rule book defines as "that area over home plate the upper limit of which is a horizontal line at the midpoint between the top of the shoulders and the top of the uniform pants, and the lower level is a line at the hollow beneath the kneecap" (assuming that the umpire calls it a strike); or a strike is a ball hit into the foul zone, unless it is the third such hit; or when it is called by the umpire.
Another example is jazz music, which comes in two broad forms, blues and swing, (or, depending on which critic you read, blues and standards), which are completely different from each other. Jazz can't be defined as improvisational music, because there are other forms of music, for example Bach's organ pieces and some rock music, which also involves improvisation.

Disjunctive categories violate the principle of defining features, because there is no defining feature which must be possessed by all members of the category.

Another problem is that many entities have unclear category membership. According to the classical, proper-set view of categories, every object should belong to one category or another. But is a rug an article of furniture? Is a potato a vegetable? Is a platypus a mammal? Is a panda a bear? We use categories like "furniture" without being able to clearly determine whether every object is a member of the category.

Furthermore, some categories are associated with unclear definitions. That is, it is difficult to specify the defining features of many of the concepts we use in ordinary life. A favorite example (from the philosopher Wittgenstein) is the concept of "game". Games don't necessarily involve competition (solitaire is a game); there isn't necessarily a winner (right-around-the-rosy), and they're not always played for amusement (professional football). Of course, it may be that the defining features exist, but haven't been discovered yet. But that doesn't prevent us from assigning entities to categories; thus, categorization doesn't seem to depend on defining features.

Empirical Criticisms. Yet another problem is imperfect nesting: it follows from the hierarchical arrangement of categories that members of subordinate categories should be judged as more similar to members of immediately superordinate categories than to more distant ones, for the simple reason that the two categories share more features in common. Thus, a sparrow should be judged more similar to a bird than to an animal. This principle is often violated: for example, chickens, which are birds, are judged to be more similar to animals than birds. This results in a tangled hierarchy of related concepts.

The chicken-sparrow example reveals the last, and perhaps the biggest, problem with the classical view of categories as proper sets: some entities are better instances of their categories than others. This is the problem of typicality. A sparrow is a better instance of the category bird -- it is a more "birdy" bird -- than is a chicken (or a goose, or an ostrich, or a penguin). Within a culture, there is a high degree of agreement about typicality. The problem is that all the instances in question share the features which define the category bird, and thus must be equivalent from the classical view. But they are clearly not equivalent; variations in typicality among members of a category can be very large.

Variations in typicality can be observed even in the classic example of a proper set -- namely, geometrical figures. For example, subjects usually identify an equilateral triangle, with equal sides and equal angles, as more typical of the category triangle, than isosceles, right, or acute triangles.

There are a large number of ways to observe typicality effects:

response latency in category verification: when classifying objects, subjects react faster to typical than to atypical members;
age of acquisition: children and nonnative speakers of a language learn typical members before atypical ones;
spew order: when generating instances, typical ones appear before atypical ones; and
reference points: typical members are more likely to serve as the basis for classification than atypical members. That is, you are much more likely to say that an object is a bird because it looks like a sparrow than that it is a bird because it looks like a chicken. A major-league baseball umpire is much more likely to call a strike if the pitch is below the batter's belt, even though many above-the-belt pitches fall well within the official strike zone (New York Times, 11/08/00)..

Typicality appears to be determined by family resemblance. Category instances seem to be united by family resemblance rather than any set of defining features shared by all members of a category. Just as a child may have his mother's nose and his father's ears, so instance A may share one feature with instance B, and an entirely different feature with instance C, while B shared yet a third feature with C, that it does not share with A. Empirically, typical members share lots of features with other category members, while atypical members do not. Thus, sparrows are small, and fly, and sing; chickens are big, and walk, and cluck.

Typicality is important because it is another violation of the homogeneity assumption of the classical view. It appears that categories have a special internal structure which renders instances nonequivalent, even though they all share the same singly necessary and jointly sufficient defining features. Typicality effects indicate that we use non-necessary features when assigning objects to categories. And, in fact, when people are asked to list the features of various categories, they usually list features that are not true for all category members.

The implication of these problems, taken together, is that the classical view of categories is incorrect. Categorization by proper sets may make sense from a logical point of view, but it doesn't capture how the mind actually works.

The Prototype View: Concepts as Fuzzy Sets

Recently, another view of categorization has gained status within psychology: this is known as the prototype or the probabilistic view.

The prototype view retains the idea, from the classical view, that concepts are summary descriptions of the instances of a category. Unlike the classical view, however, in the prototype view the summary description does not apply equally well to every member of the category, because there are no defining features of category membership.

According to the prototype view, categories are fuzzy sets, in that there is only a probabilistic relationship between any particular feature and category membership. No feature is singly necessary to define a category, and no set of features is jointly sufficient.

Some features are central, in that they are highly correlated with category membership (most birds fly, a few, like ostriches, don't; most non-birds don't fly, but a few, like bats, do). Central features are found in many instances of a category, but few instances of contrasting categories.
Other features are peripheral: there is a low correlation with category membership, and the feature occurs with approximately equal frequency in a category and its contrast (birds have two legs, but so do apes and humans).

Fuzzy Sets and Fuzzy Logic

The notion of categories as fuzzy rather than sets, represented by prototypes rather than lists of defining features, is related to the concept of fuzzy logic developed by Lofti Zadeh, a computer scientist at UC Berkeley. Whereas the traditional view of truth is that a statement (such as an item of declarative knowledge) is either true or false, Zadeh argued that statements can be partly true, possessing a "truth value" somewhere between 0 (false) and 1 (true).

Fuzzy logic can help resolve certain logical conundrums -- for example the paradox of Epimenides the Cretan (6th century BC), who famously asserted that "All Cretans are liars". If all Cretans are liars, and Epimenides himself is a Cretan, then his statement cannot be true. Put another way: if Epimenides is telling the truth, then he is a liar. As another example, consider the related Liar paradox: the simple statement that "This sentence is false". Zadeh has proposed that such paradoxes can be resolved by concluding that the statements in question are only partially true.

Fuzzy logic also applies to categorization. Under the classical view of categories as proper sets, a similar "all or none" rule applies: an object either possesses a defining feature of a category or it does not; and therefore it either is or is not an instance of the category. But under fuzzy logic, the statement "object X has feature Y" can be partially true; and if Y is one of the defining features of category Z, it also can be partially true that "Object X is an instance of category Z".

A result of the probabilistic relation between features and categories is that category instances can be quite heterogeneous. That is, members of the same category can vary widely in terms of the attributes they possess. All of these attributes are correlated with category membership, but none are singly necessary and no set is jointly sufficient.

Some instances of a category are more typical than others: these possess relatively more central features.

According to the prototype view, categories are not represented by a list of defining features, but rather by a category prototype, or focal instance, which has many features central to category membership (and thus a family resemblance to other category members) but few features central to membership in contrasting categories.

It also follows from the prototype view that there are no sharp boundaries between adjacent categories (hence the term fuzzy sets). In other words, the horizontal distinction between a category and its contrast may be very unclear. Thus, a tomato is a fruit but is usually considered a vegetable (it has only one perceptual attribute of fruits, having seeds; but many functional features of vegetables, such as the circumstances under which it is eaten). Dolphins and whales are mammals, but are usually (at least informally) considered to be fish: they have few features that are central to mammalhood (they give live birth and nurse their young), but lots of features that are central to fishiness.

Two Views of Prototypes

Actually, there are two different versions of the prototype view.

In the featural version of prototypes, which is what we've been discussing, features are either present or not, on an all-or-none basis, but none of them is singly necessary, nor are any of them jointly sufficient, to define category membership. Therefore, the prototype is a summary representation of the category that possesses many features that are central to membership in the target category, but few features that are central to membership in alternative categories.
In the dimensional version of prototypes, features are continuous rather than discrete, meaning that they vary in degree. Therefore, the prototype is a summary representation of the category whose values on each dimensional feature reflect the average of all the instances of the category. If the members of a category were plotted in a multidimensional space, with the dimensions representing the various features associated with category membership, the category would be represented as a cluster of points, one per instance, with the prototype at its center.

The two versions of the of the prototype view have somewhat different implications for categorization.

In the featural version, categorization proceeds as in the proper-set view, except that there is no list of defining features. Rather, the perceiver compares the list of stimulus features to those of the category prototype. If there is sufficient overlap, then the object is classified as a member of the category in question. But what does "sufficient" mean? Perhaps there is some threshold that the object has to cross, in terms of the number of "prototypical" features it possesses. In the final analysis, though, categorization is simply, a matter of judgment. If there is lots of overlap, then the object will closely resemble the category prototype, and the categorization judgment will be made with a high degree of confidence. But if there is not too much overlap, then the resemblance will be diminished, and so will confidence in the judgment.
In the dimensional version, categorization proceeds by computing the Euclidean (geometrical) distance, in the multidimensional space, between the object and the category prototype. If the two are sufficiently close, then the object is classified as a member of the category in question. Again, "sufficient" is a matter of judgment, associated with differing degrees of confidence.

Either way, categorization is no longer an "all-or-none" matter. Category membership can vary by degrees, depending on how closely the object resembles the prototype.

The prototype view solves most of the problems that confront the classical view, and (in my view, anyway) is probably the best theory of conceptual structure and categorization that we've got. But as research proceeded on various aspects of the prototype view, certain problems emerged, leading to the development of other views of concepts and categories.

In the prototype view, as in the classical view, related categories can be arranged in a hierarchy of subordinate and superordinate categories. Many accounts of the prototype view argue that there is a basic level of categorization, which is defined as the most inclusive level at which:

objects in a category have characteristic attributes in common;
objects have characteristic movements in common;
objects have a characteristic physical appearance; and
objects can be identified and categorized from their average appearance.

In the realm of animals, for example, dog and cat are at the basic level, while beagle and Siamese are at subordinate levels. In the domain of musical instruments, piano and saxophone are at the basic level, while grand piano and baritone saxophone are at subordinate levels. The basic level is in some important sense psychologically salient, and preferred for object categorization and other cognitive purposes.

The Exemplar View

For example, some theorists now favor a third view of concepts and categories, which abandons the definition of concepts as summary descriptions of category members. According to the exemplar view, concepts consist simply of lists of their members, with no defining or characteristic features to hold the entire set together. In other words, what holds the instances together is their common membership in the category. It's a little like defining a category by enumeration, but not exactly. The members do have some things in common, according to the exemplar view; but those things are not particularly important for categorization.

When we want to know whether an object is a member of a category, the classical view says that we compare the object to a list of defining features; the prototype view says that we compare it to the category prototype; the exemplar view says that we compare it to individual category members. Thus, in forming categories, we don't learn prototypes, but rather we learn salient examples.

Teasing apart the prototype and the exemplar view turns out to be fiendishly difficult. There are a couple of very clever experiments which appear to support the exemplar view. For example, it turns out that we will classify an object as a member of a category if it resembles another object that is already labeled as a category member, even if neither the object, or the instance, particularly resemble the category prototype.

Nevertheless, some theorists investigators are worried about it because it seems to be uneconomical. The compromise position, which has many adherents, is that we categorize in terms of both prototypes and exemplars. For example, and this is still a hypothesis to be tested, novices in a particular domain categorize in terms of prototypes and experts categorize in terms of exemplars.

Despite these differences, the exemplar view agrees with the prototype view that categorization proceeds by way of similarity judgments. And they further agree that similarity varies in degrees. They just differ in what the object must be similar to:

In the prototype view, the object must be similar to the category prototype.
In the exemplar view, the object must be similar to some category instance (or exemplar).

Following the work of Amos Tversky, Medin (1989) has outlined a modal model of similarity judgments:

similarity increases with the number of shared features;
similarity decreases with the number of distinctive features;
the features in question are, at least in principle, independent of each other;
features all exist at the same level of abstraction.

In either case, similarity is sufficient to describe conceptual structure -- all the instances of a concept are similar, in that they either share some features with the category prototype or they share some features with a category exemplar.

The Theory-Based View

As noted, the prototype and exemplar views of categorization are all based on a principle of similarity. What members of a category have in common is that they share some features or attributes in common with at least some other member(s) of the same category. The implication is that similarity is something that is an attribute of objects, that can either be measured (by counting overlapping features) or judged (by estimating them). But ingenious researchers have uncovered some troubles with similarity as a basis for categorization -- and, for that matter, with similarity in general.

Context Effects. However, recently it has been recognized that some categories are defined by theories instead of by similarity. For example, in one experiment, when subjects were presented with pictures of a white cloud, a grey cloud, and a black cloud, they grouped the grey and black clouds together as similar; but when presented with pictures of white hair, grey hair, and black hair, in which the shades of hair were identical to the shades of cloud, subjects grouped the grey hair with the white hair. Because the shades were identical in the two cases, grouping could not have been based on similarity of features. Rather, the categories seemed to be defined by a theory of the domain: grey and black clouds signify stormy weather, while white and grey hair signify old age.

Ad-Hoc Categories. What do children, money, insurance papers, photo albums, and pets have in common? Nothing, when viewed in terms of feature similarity. But they are all things that you would take out of your house in case of a fire. The objects listed together are similar to each other in this respect only; in other respects, they are quite different.

This is also true of the context effects on similarity judgment: grey and black are similar with respect to clouds and weather, while grey and white are similar with respect to hair and aging.

These observations tell us that similarity is not necessarily the operative factor in category definition. In some cases, at least, similarity is determined by a theory of the domain in question: there is something about weather that makes grey and black clouds similar, and there is something about aging that makes white and grey hair similar.

In the theory-based view of categorization (Medin, 1989), concepts are essentially theories of the categorical domain in question. Conceptual theories perform a number of different functions:

they provide a causal explanation for why the members of a category have the features they have -- or, put another way, for why the members of a category are in the category in the first place;
they explain the relations among features;
they render some features relevant (central), and others irrelevant (peripheral).

From this point of view, similarity-based classification, as described in the prototype and exemplar views, is simply a short-cut heuristic used for purposes of classification. The real principle of conceptual structure is the theory of the categorical domain in question.

Conceptual Coherence

One way or another, concepts and categories have coherence: there is something that links members together. In classification by similarity, that something is intrinsic to the entities themselves; in classification by theories, that something is imposed by the mind of the thinker.

But what to make of this proliferation of theories? From my point of view, the questions raised about similarity have a kind of forensic quality -- they sometimes seem to amount to a kind of scholarly nit-picking. To be sure, similarity varies with context; and there are certainly some categories which are only held together by a theory, and similarity fails utterly to hold a category together. For most purposes, the prototype view, perhaps corrected (or expanded) a little by the exemplar view, seems to work pretty well as an account of how concepts are structured, and how objects are categorized.

As it happens, most work on social categorization has been based on the prototype view. But there are areas where the exemplar view has been applied very fruitfully, and even a few areas where it makes sense to abandon similarity, and to invoke something like the theory-based view.

To summarize this history, concepts were first construed as summary descriptions of category members.

In the classical view of categories as proper sets, this summary consisted of a list of the features that were singly necessary and jointly sufficient to define the category.
In the prototype view of categories as fuzzy sets, this summary consisted of a prototype which possessed many features central to category membership, and few features central to membership in contrasting categories. In this view, categorization is a matter of judgment, and depends on the amount of similarity between the prototype and the object to be categorized.
The exemplar view abandons the notion that concepts are summary descriptions, and instead proposes that concepts are collections of instances that exemplify the category. But it does not abandon the notion that concepts are based on similarity of features. While in the prototype view category members are similar to the prototype, in the exemplar view category members are similar to other exemplars.

Between them, the prototype and exemplar views provide a pretty good account of concepts and categories. Conventional wisdom holds that concepts are represented as a combination of prototypes and exemplars, with novices relying on prototypes and experts relying on exemplars for categorization of new objects.

The theory view of categories abandons similarity as the basis for categorization. Instead, concepts are represented as "theories" which guide the grouping of instances into a category. According to the theory view, similarity is a heuristic that we use as an economical shortcut strategy for categorization; but the closer you look, according to this view, the more it becomes clear that conceptual coherence -- the "glue" that holds a concept together -- is really provided by a theory, not similarity.

Concepts and categories are just about the most interesting topic in all of psychology and cognitive science, and two very good books have been written on the subject. They are highly recommended:

Categories and Concepts by E.E. Smith and D.L. Medin (Harvard University Press, 1981).
The Big Book of Concepts by G.L. Murphy (MIT Press, 2002).

Here in Berkeley's Psychology Department, Prof. Eleanor Rosch, now retired, made fundamental contributions to the "prototype" view of conceptual structure. She also gave a wonderful course on the subject, enhanced by her interest in Buddhist psychology, which has a very different view of concepts and categories; the course is now offered by Prof. Tania Lombrozo. Prof. George Lakoff, in the Linguistics Department, also gives courses on concepts, with special attention to metaphor.

This page last revised 02/14/2014.