Home Introduction Prescientific 19th Century Interference Cognitive Primary Learning Representation Encoding Storage Retrieval Implicit Memory Implicit Learning Neural Bases Modeling Development Emotion Personal Social Conclusion



Here's the question: What do memories look like?  We're talking about secondary, or "long-term" memory here, but still the answer turns out to depend on what kind of knowledge we're talking about.  There have been quite different proposals depending on whether we're talking about declarative or procedural knowledge , or episodic or semantic memory.  In addition, there are different proposals about how conceptual knowledge -- an aspect of semantic memory, to be sure -- is represented in the mind.  In this supplement, we'll focus on episodic memory, with some side glances at semantic memory, and then turn to conceptual representations as a special case of semantic memory.  But, in the end, there's just one memory, and a task left largely undone is to figure out how to represent conceptual knowledge in the same cognitive architecture as episodic and semantic knowledge. 

What's a Mental Representation?

A representation is just that: it's something that represents, or stands for, or models, something else.  An event can be represented as a list of features, or as a sentence, or as a picture, or a string of digits, or a bunch of beer cans connected with string.  

Anything can represent something else, so long as the representational system satisfied certain requirements outlined by UCB's Steven Palmer (1978):

So, to continue Palmer's example:
Let's now see how this idea of representation works out in the psychology of learning and memory.

The View from Associationism

02SRTheory.JPG (21755
            bytes)Behaviorists like Watson had a simple answer to the question: memories look like associations between stimuli and responses -- because that's what everything is.  This emphasis on associations as the basic structure of memory has proved remarkably durable -- though, as we will see, not the way the S-R theorists framed them.



But first a little history, mostly taken from Anderson & Bower's Human Associative Memory (1973).

Aristotle's Associationism.  The idea that associations are central to memory has its origins in Aristotle's treatise De Memoria et Reminiscentia.  Beginning with the proposition that ideas are derived from sensory experience (instead of being innate, as Plato had asserted), he further argued that ideas became associated with each other by virtue of a small number of principles such as similarity (and contrast), and especially contiguity.  (Aristotle also offered subsidiary principles of association such as frequency, intensity, and good order).  Memories were retrieved (Aristotle didn't use precisely this term) by virtue of the association of ideas, where one idea served as a probe to elicit an associated idea as a memory. 

Aristotle further distinguished between two forms of memory:

British Associationism.  In the 18th century, David Hartley and other philosophers (such as Hobbes, Locke, Berkeley, Hume, and both John Stewart and James Mill) construed ideas (representing sensations and reflections on sensation) as the building-blocks of the mind, and associations as the "mind's glue".  For the British associationists, contiguity was virtually the sole basis for association:
"Virtually", because they accepted similarity as a principle of association as well -- though they really emphasized contiguity.

For the British associationists, associations had only one property: strength, or the likelihood that one idea would elicit another. 

British associationism was extremely influential on the early verbal-learning tradition.  For example, Ebbinghaus (1885) employed the serial learning of nonsense syllables to study how associations were formed during the learning process, what kinds of associative links were stored in memory, and how associations led from one memory to another.  Similarly, Mary Whiton Calkins (1898), working in William James' laboratory at Harvard, invented the paired-associate learning paradigm expressly to study the formation of associations.  (Calkins completed a doctoral dissertation, but Harvard refused her a degree, and she in turn refused its offer of a doctoral degree from Radcliffe College.  Nevertheless, she founded the psychological laboratory at Wellesley College and later became the first female president of the American Psychological Association.)

American Associationism.  Following the lead of the British associationists, there arose an American tradition of associationism at the hands of J.B. Watson, E.L. Thorndike, and later behaviorists such as E.B. Guthrie, C.L. Hull, and especially B.F. Skinner.  These were all learning theorists, and they considered the association to be a primitive concept for learning theory.  The difference between American and British associationism, of course, was that the British were interested in the association of ideas, while the Americans, being behaviorists, abandoned ideas as mentalistic, in favor of observable stimuli and responses.  Thus, for Watson and the others, the conditioned response was the basic unit of behavior, and complex behaviors were built from elementary conditioned responses -- sometimes linked by implicit mediating responses, implicit stimuli, and response-produced stimuli.  Ebbinghaus' and Calkins' work fit fairly comfortably into this framework, leading to the S-R reinterpretation of verbal learning.  

There were, of course, dissenters among the neo-behaviorists, particularly E.C. Tolman, who argued that stimulus-response associations were not sufficient to explain learning.

The View from Cognitive Psychology

With the cognitive revolution in psychology came a return to mentalism, and revived interest in the association of ideas. 

08FragAssn.JPG (40567
            bytes)In fact, even before the cognitive revolution, a number of researchers in the verbal-learning tradition collected data on pre-existing patterns of word association (actually, this line of research was initiated by C.G. Jung, who in turn was influenced by Freud;  but Jung's work -- let along Freud's -- had no direct influence on the verbal-learning tradition).  Here, for example,  is a fragment of an associative network centered on the word lion.  Thus, if you ask subjects to respond with the first word that comes to mind after hearing some other word, the stimulus lion often leads to the responses of tiger, Africa, and den; den leads to the response lair.


But it soon became clear that verbal associations had some funny properties that had not been anticipated by the British and American associationists.  

            (43511 bytes)First, it turned out that associations are not necessarily symmetrical.  For example, the stimulus tiger may strongly elicit the response tail, but the stimulus tail does not tend to elicit tiger as a response; a much stronger response is end.  If you're a British or American associationist, that should strike you as strange.  If tiger is associated with tail by virtue of contiguity (or similarity, or whatever), then why isn't tail associated with tiger?



Even earlier, Thorndike (1931) had uncovered the phenomenon of belongingness..  In one of his experiments, he had subjects learn a list of names, in which some names were repeated, such as

Mary Jones Bill Smith Sam Peck Richard Jones Bill Smith.

When subjects were tested with the stimulus Bill-_____, the likelihood of the correct response Smith increased with repetition, as predicted.  But when tested with the stimulus Jones-_____, there was no effect of repetition on the correct response Bill.  It seemed that, despite being equally contiguous, and equally repeated, Bill and Smith belonged together in a way that , Jones and Bill did not.  Thorndike had no way to account for this, but it did suggest that something was wrong with the general principle that associations were formed by virtue of contiguity, and strengthened by means of repetition.  

For the British and American associationists, all associations were created equal -- all qualitatively the same, if quantitatively differing in strength.  But in 1979, the Mandlers -- George and Jean, one of cognitive psychology's first husband-and-wife teams, working at UCSD, distinguished among different qualitative types of associative structures in memory.

Jean Mandler (1979) distinguished between two types of associations:

George Mandler (1979), for his part, offered a tripartite distinction:
I'll just cite two pieces of evidence, both from my own laboratory, that suggests that these differences are real.

12DisorgRetPHA.JPG (66961 bytes)In one line of research, we looked at the organization of recall during partial posthypnotic amnesia.  We asked subjects, while they were  hypnotized, to memorize a list of words, following standard verbal-learning procedures.  In one experiment, we used a serial learning paradigm that encouraged pro-ordinate, serial associations.  In another experiment, we used a free-recall paradigm, with a categorized list, that encouraged subordinate, vertical associations.  A third experiment encouraged subjective organization.  Then they received a suggestion to forget the words.  The most highly hypnotizable subjects showed a dense amnesia, temporarily forgetting most or all of the words, while the insusceptible subjects showed no amnesia at all.  But some subjects, who are relatively highly hypnotizable, showed a partial response to the amnesia suggestion.  These subjects recalled some words, but tended to do so in a disorganized fashion -- but the disorganization only appeared in the serial-learning condition.  Posthypnotic amnesia disrupted pro-ordinate, serial, organization, but spared organization based on semantic relationships.

13AMI.JPG (62190 bytes)Another line of research made use of the associative memory illusion (sometimes known as the Deese-Roediger-McDermott or DRM effect), in which studying a list of associates to a stimulus word (such as sharp, prick, and haystack, which are all close associates of needle), led subjects to falsely recognize the critical lure (in this case, needle) as having been in the list, when in fact it was not.  It turns out that the AMI occurs when the study list consists of co-ordinate associates, such as needle-haystack, but not when it consists of subordinate associates, such as animal-tiger.

The fact that posthypnotic amnesia dissociates serial associations from horizontal associations, and the AMI dissociates horizontal associations from vertical associations, suggests that these kinds of associations really are qualitatively different. 

14AssnsLabeled.JPG (54213 bytes)It also turns out that associations are labeled in terms of the semantic roles of cue and response.  Thus, eating is related to glutton as act to actor, while eating is related to steak as act to object.  A theory of association has to deal with the fact that associations do not differ only quantitatively, simply in terms of strength, but also differ qualitatively with respect to the type of association that has been created between one idea and another.


Neo-Associationistic Theories of Memory Structure

Despite these problems, the basic idea of association has been critical to cognitive theories of memory.  These theories generally construe memory as a sort of mental dictionary in which words stand for concepts, and associations represent the relations between them.  In a generic network model of memory:

Of course, there are lots of different ways to implement these general ideas.

17CollQuil.JPG (55972
            bytes)An important early model proposed by Collins and Quillian (1969) assumes that concepts are stored in a hierarchical structure, with associated features stored according to a principle of cognitive economy -- meaning that each feature gets stored only once, at the particular level of the hierarchy to which it is relevant.  Thus:

The model correctly predicts performance in a sentence-verification task, in which subjects are asked to say whether some statements are true or false.  Although subjects rarely make a mistake in this kind of task, their reaction times vary, depending on the distance between the concept and the feature..

        (50261 bytes)An alternative model, proposed by Smith, Shoben, and Rips (1974) abandoned the hierarchy and linked concepts together based simply on degree of similarity in features (as indicated, for example, by multidimensional scaling techniques).  In this model, the associative "distance" between concepts is a function of the number of overlapping features.  The model correctly predicts an inverted-U-shaped relationship between similarity and response latency, such that reaction times are faster when two nodes are either very close together or very far apart, compared to when two nodes are at an intermediate distance from each other in multidimensional space.


            (64285 bytes)Yet a third model, proposed by Collins and Loftus (1975) -- this is the same Collins as in the Collins & Quillian model -- also employs distance to represent similarity.  The model correctly predicts priming effects in a lexical decision task, such that reading the word street (which is a word) makes it easier to judge that car is also a word (which it is), compared to apples (which also is a word).  Similarly, red primes apples and fire engine, but not street or sunrises.


Each of these models has problems, but their success in predicting even subtle aspects of human performance suggests that they are pretty good first approximations of how the mental dictionary is arranged -- that is, how semantic knowledge is represented in memory.

And that's all well and good, except we're not so much interested in the mental dictionary.  We're working in the verbal-learning tradition at this point, and what we're really interested in is how people represent lists of words that they've been asked to memorize.

20Estes.JPG (69122
            bytes)Estes (1976) offered several simple associative models of memory, attempting to capture some aspect of verbal learning.

You get the idea.  

22SAM.JPG (51283 bytes)This general idea has been implemented in a computer model of memory known as SAM (for Search of Associative Memory), proposed by Shiffrin and Raaijmakers (1992).  A similar model, called REM (for Retrieving Effectively from Memory), has been proposed by Shiffrin & Steyvers (1997).  In SAM:

Thus, during learning subjects link nodes representing list items to a node representing the list.  When asked to recall, they activate the list node, and follow associative pathways to list items.

The Dual-Code Theory of Memory

All these models view memory as a mental dictionary, nodes representing words linked to each other, and to nodes representing list membership.  But it turns out that memory consists of more than words.

In particular, Paivio (1971, 1986) proposed that concrete objects, like fish and canaries, can be represented as images as well as words.  He cited lots of different pieces of evidence in support of this proposition.

Paivio's arguments were emphatically rejected by Pylyshyn (1973), sparking "The Great Mental Imagery Debate".  Pylyshyn argued, on conceptual grounds, that there was only one representational format, which was conceptual and word-like.  He argued that evidence favoring imagistic representations was contaminated by tacit knowledge, experimenter bias, and demand characteristics.

J.R. Anderson (1978, 1979), argued that the issue was ultimately undecidable because, for every dual-code model that could be proposed, one could generate a single-code model that would produce the same effects.  Here's where it has to be said that parsimony cuts both ways.  In some sense it is more parsimonious to have one code than two.  But in another sense it is more parsimonious to have two codes than one, if the single-code model has to go through all sorts of contortions to match the dual-code model.  

In the next salvo of the debate, Finke (1980, 1985) identified a number of functional equivalences between imagery and perception.  He relied on comparisons between recalling, imaging, and perceiving objects and their properties, and found a surprising number of instances where the effects of imagining were identical or similar to those of perceiving, and different from simply recalling.  He concluded that "[visual] imagery involves the activation of many of the same information-processing mechanisms that are activated during visual perception" (1980, p. 130).

For some people, neuropsychological evidence clinched the case for the equivalence of imagery and perception.  Farah (1988), investigated cases of visual agnosia, in which brain-injured patients are no longer able to identify familiar objects (prosopagnosia is a special form of visual agnosia).  The syndrome is famously the subject of a case study by Oliver Sacks, The Man Who Mistook His Wife for His Hat (which was subsequently rendered into an opera, no less).  Farah found that visual agnosics also lack a capacity for mental imagery, supporting the idea that mental images rely on the same mechanisms as actual perception.

Incidentally, Farah's arguments are often cited as an example where neuroscientific evidence constrains psychological theory, by offering decisive evidence for one theory (the dual-code theory) and against another (the single-code theory).  But (with all due respect to Farah, who is a brilliant cognitive neuroscientist) this isn't exactly true. 

In any event, and despite his declaration of undecidability, Anderson himself opted for the second type of parsimony described above, and proposed a distinction between two types of mental representation:
And that's pretty much where things stand in cognitive psychology today.  With very few exceptions (really, only one exception), theorists accept the proposition that we have both words and pictures in the head. 

HAM: Knowledge as Sentences

Still, by far, most work on mental representation has focused on the verbal side.

Tulving and Bower (1974) summarized the view in the early 1970s as follows: "A rather general and atheoretical conception of the memory trace of an event regards it as a collection of features or a bundle of information" (p. 269).  This bundle included a number of different components:

At roughly the same time, Anderson and Bower (1973) introduced a new theory of mental representation in a book describing their research on a computer simulation model of memory known as HAM (for Human Associative Memory):

"[T]he purpose of long-term memory is to record facts about various things, events, and states of the world.  We have chosen the subject-predicate construction as the principal structure for recording such facts in HAM" (p. 156).

34FeatBund.JPG (31226
            bytes)In other words, events are represented in sentence-like structures.  This is quite a different approach from that implied by Tulving and Bower, in which the sentence might just be represented by a cluster of linked nodes.  But in a representation like this, you don't really know who did what to whom, where, or when -- much less why.  For this purpose, sentence-like structures seem to be better.



In order to illustrate their approach, they focused much of their exposition on variants of a single sentence:

In the park the hippie touched the debutante.

Hairposter.jpg (21818
            bytes)Perhaps Anderson and Bower were inspired by Hair: The American Tribal Love-Rock Musical, which opened in 1967.  But they were even more inspired by two developments in linguistics.





35PhraseStruct.JPG (62108 bytes)First was the work of Noam Chomsky (1957, 1965) on phrase-structure grammar, in which sentences are rewritten as noun phrases and verb phrases, and verb phrases are rewritten as verbs plus noun phrases -- generically, The noun phrase verbed the other noun phrase.  Thus, in the sentence the man who hits the ball kisses the girls, The man is the subject noun phrase, and kisses the girls is the verb phrase (which includes an object noun phrase).  This phrase-structure representation is the easiest way to represent knowledge in memory.


            (79877 bytes)One problem with Chomsky's system is that that there's more to grammar than syntax (as UCB's George Lakoff would put it, you need generative semantics as well as generative syntax). The UCB linguist Charles Fillmore (1968, 1971) pointed out that nouns, especially, played different semantic roles in sentences -- they weren't just subjects and objects.  For example, in the sentence Mary pinched John on the nose, Mary is the agent of the action, John is the experiencer, and nose is the location where she pinched John.  Fillmore invented case grammar to represent these semantic roles, and his innovation was picked up by Anderson and Bower.  


            (40632 bytes)Accordingly, the HAM representation of an event would look something like this, with a node linking a fact (that a hippie touched a debutante) with the context in which it is true (that the incident happened in a park sometime in the past).  


The Declarative-Procedural Distinction

39HierMem.JPG (46253
            bytes)That's a pretty good solution, and HAM does a pretty good job of emulating the actual performance of subjects who are remembering lists of words, or sentences about hippies and debutantes.  But it quickly became clear that there is more in memory than sentences.  As noted earlier, the knowledge stored in memory comes in two forms:

The subject verbed the object

-- as in

The hippie touched the debutante.

If goal and condition then action
-- as in

If the goal is to drive a standard shift car and the car is in neutral then shift the car into first gear.

Classical and instrumental conditioning are special cases of procedural knowledge:

If Conditioned Stimulus then Unconditioned Stimulus.

If Conditioned Response in the presence of the Conditioned Stimulus then Conditioned Response.

44AssNet.JPG (52983
            bytes)Individual propositions are, of course, embedded in a vast network of propositional knowledge -- more or less along the lines envisioned by Collins and Loftus (1975).



45ProdSys.JPG (95303
            bytes)And individual productions, for their part, are embedded in a vast network of productions known as a production system, in which the output of one production provides input to another.  In some sense, the action of one production creates the conditions for execution of the next one in the system .



The procedural-declarative distinction was introduced into artificial intelligence by Terry Winograd (1972, 1975), and imported into psychology by John Anderson (1976).  But it also has deeper origins:

But a serious terminological confusion surrounds the procedural-declarative distinction, because some theorists, following Larry Squire, use the term declarative to refer to conscious recollection -- what Schacter and others call explicit memory (as opposed to implicit or unconscious memory).
The problem is that this confuses the question of representational format -- whether the memory is represented in declarative or procedural format -- with the way that a memory is expressed -- either explicitly, in the form of conscious recollection, or implicitly, in the form of priming or some other unconscious effect. 
Squire's work is (justly) so highly regarded that many researchers have adopted his terminology.  But it's really not right.  The declarative-procedural distinction, having to do with representational format, should be kept separate from the explicit-implicit distinction, having to do with the conscious or unconscious expression of memory.

The Episodic-Semantic Distinction

At roughly the same time, Endel Tulving (1972, 1983) introduced a further distinction between two forms of declarative (meaning factual) knowledge:

Which brings up the matter of self-reference.  Tulving's analysis stresses the importance of spatio-temporal context in episodic memory -- that every event is specified by a unique location in space and time (two events cannot occur at precisely the same time and in precisely the same place).  But it's also true that these events are somehow specific to the rememberer as well. 
Episodic memories are memories of what a specific individual has done, or experienced, at a particular time and in a particular place.

Episodic and semantic memory can be dissociated in the case of source amnesia, but it is evident that both kinds of memories can be stored in the same declarative, propositional, representational format.

Of course, not all self-knowledge is episodic in nature.  Some of it is semantic, more or less context-free knowledge about myself having nothing to do with any specific action or experience, such as I am a neurotic extravert or I am of Swedish-Finnish extraction on my father's side.

The self, viewed as a knowledge structure, consists of whatever one knows about oneself, including episodic and semantic self-knowledge. 

Can Animals Have Episodic Memory?

Animals can learn, for sure, and so they acquire knowledge stored in memory.  But it's not clear that they can acquire episodic memories -- that they can remember particular events that happened to them at a particular time and a particular place.  Their memories may be more generic, represented in procedural, or perhaps semantic form, but not necessarily as episodic memories of specific experiences.  Although the Darwinian principle of evolutionary continuity should caution us not to make sharp distinctions between human and nonhuman mental capacities, some authorities have suggested that, in the absence of language, permitting self-report, the question of episodic memory in animals is essentially undecidable (e.g., Tulving, 1983).

Still, there experiments that seem to reveal something very much like episodic memory.

  • Western scrub jays appear to remember where they cached certain kinds of food, and how long it has been since they did so.
  • Similarly, hummingbirds appear to remember where particular flowers are located, and how long ago they've visited them.
  • Eichenbaum and Sauvage (2008) gave rats pairs of containers in which a smell (like oregano) was mixed into a particular digging material (like wood chips).  The rats learned to dig in previously encountered containers for treats.  Eichenbaum and Sauvage argue that this requires a specific memory where and when.  The fact that hippocampal lesions abolished this memory strengthens the idea that these rats had something very much like a conscious episodic memory of what happened when, and where.  

So, maybe animals do have episodic memory after all, even though they can't share their conscious recollections with us via language.

The ACT Model

50ACTHist.JPG (91857
            bytes)Actually, Anderson and Bower were aware of Winograd's work -- they were all together at Stanford after all -- but they were not ready to incorporate the procedural-declarative distinction into their model.  That task fell to Anderson, in his ACT (Adaptive Control of Thought) model of cognition, which he introduced in 1976 and has continued to develop over the subsequent 30-plus years.  ACT is a complete cognitive theory, written in the form of a computer simulation, that includes learning and memory, but also includes language, reasoning, and problem-solving (Anderson is especially interested in simulating students' learning and use of algebra, which he has called "the Drosophila of cognitive theory" [2007}).


ACT is rather complex, and its complexities need not detain us here.  There have also been a number of versions of ACT developed over the years by Anderson and his colleagues, and these evolutionary steps need not detain us either.  The following is adapted from the succinct description of the generic ACT model by Medin, Ross, and Markman (2001).

52PropEnc.JPG (45306
            bytes)Declarative knowledge is represented in memory by conceptual nodes linked in a network to form propositions like The flower is pretty and Bill thought that the flower was pretty.  Like HAM, ACT recognizes a number of semantic roles, but for purposes of simplicity we will only consider three: Agents, Objects, and the Relations between them.



The links between nodes differ in strength.

            (38867 bytes)ACT also recognizes the type-token distinction first proposed by Simon and Feigenbaum (1964), which is a distinction between a general concept and a specific instance of it.  For example, a particular chair may be blue, but it is not true that all chairs are blue; blue is the color of only a particular chair.  ACT handles this by linking the marker X, which represents a particular chair, to a node representing chairs in general.  Thus, Some particular chair is blue, or Some particular small chair is blue.  This permits ACT to represent facts about other chairs, which may be large or beige or whatever.  


ACT also includes a working memory, which should not be confused with the working memory of Baddeley and Hitch (1974).  By working memory Anderson only means that subset of nodes that are activated at any given time.  Activation makes a node accessible in memory, but the total amount of activation in a network is limited -- which, effectively limits the number of nodes that can be in working memory at any particular time (think of Miller's "magical number seven, plus or minus two").

55ACTHAM.JPG (43090
            bytes)Processing a sentence (which is Anderson's proxy for perception) activates nodes corresponding to the elements of the sentence.  This activation spreads along links to associated nodes.  But the total activation accruing to a conceptual node is divided among the links emanating from that node, such that the strongest links receive the most activation.



While this discussion focuses on the declarative side of ACT, there is also a procedural side, and these are related:

ACT, especially in its current incarnation, is an extremely powerful model of memory. For example, it predicts the fan effect -- the more you know about a particular concept, the longer it takes to retrieve any particular piece of knowledge about it.  We'll discuss the fan effect later.

A Connectionist Alternative

ACT is generally considered a symbolic or localist model of cognition, in which concepts are represented as symbols that stand for some piece of knowledge, and these symbols are localized at discrete nodes in the associative network (Anderson himself disagrees with this characterization, but we're not going to let this fact get in the way of our exposition, are we?).  When a person acquires a new piece of knowledge, a new node is added to the network (as well as new links from that node to other, pre-existing nodes).

An alternative model is a connectionist or parallel distributed processing (PDP) model, in which the same set of nodes represents each piece of knowledge -- because the knowledge is not represented by the nodes at all, but rather by the connections between them (hence the name).  Put another way, knowledge is distributed across the entire network -- hence that name, too!  PDP models were introduced to cognitive theory by James (Jay) McClelland and David Rumelhart (1986a; Rumelhart & McClelland, 1986b; McClelland, Rumelhart, et al., 1995), who at the time were colleagues at UCSD (McClelland subsequently moved to Carnegie-Mellon University, where he was a colleague of John Anderson, which may account for Anderson's qualms about the characterization of his model as "symbolic" or "localist"; Rumelhart subsequently moved to Stanford; then McClelland himself moved to Stanford; it's a small world).

As with the ACT model, this discussion of PDP models draws heavily on the treatment by Medin et al. (2001).

In large part, connectionist or PDP models are motivated by considerations of neural plausibility.  

From these considerations, connectionist models begin with the assumption that the connections among neurons are strengthened or weakened during learning.

58Connectionist.JPG (66468 bytes)Connectionist models are "neurally inspired" because they take the brain as a metaphor.

This generic connectionist model has implications for memory.
Connectionist models are extremely powerful learning machines, and for that reason, not to mention their "neural plausibility", they have been very attractive as models of memory -- indeed, vigorous rivals to symbolic or localist models.  

But they have one big disadvantage: they are extremely prone to forgetting, especially forgetting via retroactive interference.  In fact, this vulnerability to so bad that it has been characterized as catastrophic interference by McCloskey and Cohen (1989; see also Ratcliff, 1990) and French (1999).  To see why this is so, consider the A-B/A-C retroactive interference paradigm.  

So, a generic connectionist model must forget A-B in order to learn A-C.  But we know from studies using paradigms like modified (and modified modified) free recall, discussed in the lectures on Associationism and Interference Theory, that people who learn A-C can also remember A-B.  So, the typical connectionist model doesn't provide a very good match to actual human performance -- which reduces its attractiveness considerably. 

At a more conceptual level, and with all due respect to McClelland (with whom I went to graduate school) and Rumelhart (who was without a doubt one of the world's most distinguished cognitive scientists), the whole connectionist enterprise smacks of the S-R theory of learning (not for nothing was Thorndike's S-R theory of learning called "connectionism"). 

It's hard to express, but I've got an aching feeling that connectionism ends up looking an awful lot like something that Skinner would find friendly.  And that's a cause for alarm in the hearts and minds of cognitive psychologists.

An Interactive Activation Model

 You can see some of the properties of connectionist networks in general by examining interactive activation model of word recognition presented by McClelland and Rumelhart (1981) in their classic text introducing a version of connectionism known as parallel distributed processing or PDP (not to be confused with Larry Jacoby's Process Dissociation Procedure, also known by the "PDP" acronym). 

The Great Representational Debate

Much like the Great Mental Imagery Debate of the 1908s, the rise of connectionist modeling has stimulated a new opposition, between "symbolist" models like ACT and "connectionist" models like PDP.  And like the Great Mental Imagery Debate, this new debate may prove to be undecidable.
So, we've got a "connectionist" model which runs on a symbol-processing machine, and we've got a "symbolist" model that can be given a connectionist implementation.  Sounds like a draw to me.

But, then again, maybe not.  Labiere and Anderson's (1993) title, referring to "A Connectionist Implementation" of  the ACT-R Production System" (emphasis added) brings to mind the three-level analysis of vision promoted by Marr (1982; Marr & Poggio, 1976). 

In these terms, ACT-R might be identified with the computational level of analysis, and is symbolic in nature.  The connectionist implementation might be identified with the implementational level of analysis.

Interestingly, recent findings from cognitive neuroscience may help us to choose between symbolic and connectionist architectures.  After all, the chief argument in favor of distributed models of representation is that they are more biologically plausible than localist models.  But are they?  Let's look at the evidence from neuroscience.

The View from Cognitive Neuroscience

The presentation so far has focused on representation as viewed by cognitive psychology, but the rivalry between localist and distributed models has also played itself out within cognitive neuroscience.

Consider the following true story from the annals of cognitive psychology.  There once was a seminar at Stanford University attended by both William K. Estes, a pioneering cognitive psychologist, and Karl Pribram, a pioneering cognitive neuroscientist.  A student had presented some puzzling new experimental results, and the exchange went something like this:

Bill: Suppose there are a series of little drawers in the brain.

Karl: I have never seen any drawers in there.

Bill: They're very small.

We have a pretty good idea what memories look like in the mind.  They look like propositional networks, or maybe like networks of connections.  But what do memories look like in the brain?  The answer comes in two forms.  

The Distributionist Solution

The easiest answer is that the every memory is represented by a single neuron, or perhaps a small cluster of neurons, located in a particular part of the brain, and that person memories are no exception to this rule.  Thus, the nodes in associative-network models of person memory, like those discussed here, have their neural counterparts in distinct (clusters of neurons).

Early research by Wilder Penfield (1954), a Canadian neurologist, suggested that this is indeed the case.  In the process of diagnosing and treating cases of epilepsy, Penfield would stimulate various areas of the brain with a small electrical current delivered through a microelectrode implanted in the brain.  This procedure does not hurt, because the cortex does not contain afferent neurons, and patients remain awake while it was performed.  Accordingly, Penfield asked patients what they experienced when he stimulated them in various places.  Sometimes they reported experiencing specific sensory memories, such as an image of a relative or the sound of someone speaking.  This finding was controversial: Penfield had no way to check the accuracy of the memories, and it may be that what he stimulated were better described as "images" than as memories of specific events.  In any event, the finding suggested that there were specific neural sites, perhaps a cluster of adjacent neurons, representing specific memories in the brain.  

54Lashley.gif (61883
              bytes)However, evidence contradicting Penfield's conclusions was provided by Karl Lashley (1950), a neuroscientist who conducted a "search for the engram", or biological memory trace, for his entire career.  Lashley's method was to teach an animal a task, ablate some portion of cerebral cortex, and then observe the effects of the lesion on learned task performance.  Thus, if performance was impaired when some portion of the brain was lesioned, Lashley could infer that the learning was represented at that brain site.  After 30 years of research, Lashley reported that his efforts had been entirely unsuccessful.  Brain lesions disrupted performance, of course.  But the amount of disruption was proportional to the amount of the cortex destroyed, regardless of the particular location of the lesion.

Lashley's Law of Mass Action states that any specific memory is part of an extensive organization of other memories.  Therefore, individual memories are represented by neurons that are distributed widely across the cortex.  It is not possible to isolate particular memories in particular bundles of neurons, so it is not possible to destroy memories by specific lesions.  

At about the same time, D.O. Hebb, a pioneering neuroscientist, argued that memories were represented by reverberating patterns of neural activity distributed widely over cerebral cortex.  Hebb's suggestion was taken up by others, like Karl Pribram, who postulated that memory was represented by a hologram, in which information about the whole object was represented in each of its parts.  

Localism Redux

Connectionist models are inspired, in part, by both Lashley's Law of Mass action and Hebb's reverberating-network model of memory.

Still, Penfield's vision held some attraction for some neuroscientists, who continued to insist that individual memories were represented by the activity of single neurons, or at most small clusters of neurons, at specific locations in cortex.  

Problems with Penfield's clinical studies aside, early advances in understanding the neural basis of perception led support to the localist views of representation.
While these neural systems responded to the physical properties of the stimulus, their discovery fed speculation that the meaning of the stimulus, and other cognitive contents, might similarly be represented by a localized cluster of neurons. 
Nobody, including Lettvin and Barlow themselves, took any of this all that seriously, and neuroscientific doctrine has emphasized distributed representations of the sort envisioned by Lashley and Hebb.

Until recently, that is.

A serendipitous finding, ingeniously pursued by a group of investigators at UCLA and Cal Tech, has suggested that there might be something to the idea of a "grandmother neuron" after all (Quiroga, et al., 2005).

075MTL.jpg (120549
              bytes)These investigators worked with eight patients with intractable epilepsy.  In order to localize the source of the patients' seizures, they implanted microelectrodes in various portions of the patients' medial temporal lobes (the hippocampus, amygdala, entorhinal cortex, and parahippocampal cortex).  Each microelectrode consisted of 8 active leads and a reference lead.  They then recorded responses from each lead to visual stimulation -- pictures of people, objects, animals, and landmarks selected on the basis of pre-experimental interviews with the patients.  


              (105156 bytes)In one patient, the investigators identified a single unit (i.e., a single lead of a single electrode, corresponding either to a single neuron or to a very small, dense cluster of neurons), located in the left posterior hippocampus, that responded to a picture of Jennifer Aniston, an actress who starred in a popular television series, Friends.  (A response was defined very conservatively as an activity spike of magnitude greater than 5 standard deviations above baseline, consistently occurring within 1 second of stimulus presentation).  That unit did not respond to any other stimuli tested.  The investigators quickly located other pictures of Aniston, including pictures of her with Brad Pitt, to which she was once (and famously) married.  The same unit responded to all the pictures of the actress -- except those in which she was pictured with Pitt!


077HalleBerry.jpg (93875 bytes)Similarly, a single unit in the right anterior hippocampus of another patient responded consistently and specifically to pictures of another actress, Halle Berry (who won an Academy Award for her starring role in Monsters' Ball).  Interestingly, this unit also responded to a line-drawing of Berry, to a picture of Berry dressed as Catwoman (for her starring role in the unfortunate film of the same name), and even to the spelling of her name, H-A-L-L-E--B-E-R-R-Y (unfortunately, the investigators didn't think of doing this when they were working with the "Jennifer Aniston" patient -- remember, they were flying by the seat of their pants, doing this research under the time constraints of a clinical assessment).  The fact that the unit responded to Berry's name, as well as to her picture, and to pictures of Berry in her (in)famous role as Catwoman, suggests that the unit represents the abstract concept of "Halle Berry", not merely some configuration of physical stimuli.

              (108773 bytes)As another example, yet a third patient revealed a multi-unit (i.e., two or more leads of a single electrode, evidently corresponding to a somewhat larger cluster of neurons) in the left anterior hippocampus that responded specifically, if not quite as distinctively, to pictures of the Sydney Opera House.  This same unit also responded to the letter string SYDNEY OPERA HOUSE.  It also responded to a picture of the Baha'i Temple -- but then again, in preliminary testing this patient had misidentified the Temple as the Opera House!  So again, as with the Halle Berry neuron, the multi-unit is responding to the abstract concept of the Sydney Opera House", not to any particular configuration of physical features.


Across the 8 patients, Quiroga et al. tested 993 units, 343 single units and 650 multi-units, and found 132 units (14%) that responded to 1 or more test pictures.  When they found a responsive unit, they then tested it with 3 to 8 variants of the test pictures.  A total of 51 of these 132 units yielded evidence of an invariant representation of people, landmarks, animals, or food items.  In each case, the invariant representation was abstract, in that the unit responded to different views of the object, to line drawings as well as photographs, and to names as well as pictures.

So maybe there is a "grandmother neuron" after all!  This research -- which, remember, was performed in a clinical context and thus may have lacked some desirable controls -- identified sparse neural representations of particular people (landmarks, etc.), in which only a very small number of units is active during stimulus presentation.  

Of course, this evidence for localization of content contradicts the distributionist assumptions that have guided cognitive neuroscience for 50 years.  Further research is obviously required to straighten this out, but maybe there's no contradiction between distributionist and locationist views after all.  After all, according to Barlow's (1972) psychophysical linking principle

Whenever two stimuli can be distinguished reliably... the physiological messages they cause in some single neuron would enable them to be distinguished with equal or greater reliability.

In other words, even in a distributed memory representation, there has to be some neuron that responds invariantly to various representations of the same concept.  Neural representations of knowledge may be  distributed widely over cortex, but these neural nets may come together in single units.

But wait a minute -- we're talking about the cerebral cortex, and the data from Quiroga et al. came from the hippocampus and other subcortical structures.  Note, however, that the hippocampus is crucial for memory: it was the destruction of his hippocampus that rendered H.M. amnesic.  Nobody thinks that memories are stored in the hippocampus -- it's just too small for that purpose.  But one prominent theory of the hippocampus is that it performs a kind of indexing function, relating memories to each other that are located in the cortex.  Accordingly, maybe Quiroga didn't exactly tap into their patient's whole knowledge representation of Halle Berry -- but instead, hit on the neural index card that locates all that information.

In any event, more recently Quian Quiroga and his colleagues (2008) have backed off their earlier, strong claims for having discovered something very much like a grandmother cells. 

Still, they argued, the coding is more sparse than distributed.  

So maybe symbolic/localist cognitive models have some life in them after all!

Just such an argument has been made by Bowers (2009), in a Psychological Review paper whose title gives the argument away: "On the Biological Plausibility of Grandmother Cells".  At the very least, Bowers argues that localist models of cognition are compatible with neurophysiological findings.

Bowers begins with an instructive discussion of the differences between localist (symbolic, computational) and connectionist (PDP) models.

Bowers argues that the general preference for distributionist vs. localist coding schemes is based not just on the neural analogies discussed earlier, or a particular set of neurophysiological findings, but also on a misunderstanding of localist models -- not least because there is not just one possible localist model, but several.

As Bowers notes (2009, p. 225), "The critical question is not whether a given neuron responds to more than one object, person, or word but rather whether the neuron codes for more than one thing.  Localist coding is implemented if a stimulus is encoded by a single node (neuron) that passes some threshold of activity, with the activation of other nodes (neurons) contributing nothing to the interpretation of the stimulus. 

For their part, distributed models also come in various forms.

The View from Cognitive Sociology

88Lincoln.jpg (57206
            bytes)Memory, like any other aspect of mind and behavior, can be analyzed at the psychological level, as in models like HAM and ACT, and it can be analyzed at the neuroscientific level, as in discussions of the hippocampus and grandmother neurons.  But memory can also be analyzed at a level "above" the individual mind and brain.  So, for example, sociologists discuss collective memories shared by groups, organizations, institutions, and whole societies and cultures.


So how are memories represented at the sociocultural level of analysis?

That's what memories look like at the sociocultural level of analysis.  Understanding memory at this level is the province of cognitive sociology, a new field of sociology introduced by Eviatar Zerubavel (1997). 


Concepts, in turn, are a form of knowledge representation known as schemata.  F.C. Bartlett (1932) introduced the concept of schema (pl. schemata, although schemas is acceptable too) as a central concept in his reconstructive theory of memory.  According to Bartlett, remembering is not like taking a book off the shelf and reading it, as the traditional library metaphor would have it.  Rather, remembering is more like writing the book anew, based on fragmentary notes.  The process of remembering, of reconstructing a memory, is guided throughout by an organized framework of world-knowledge and attitudes, within which the memory is reconstructed.  This organized framework is the schema.

Many people find schemata difficult to understand, but you begin to get the idea if you think of a more familiar derived term, schematic.  A schematic diagram is a kind of logical diagram of a house or piece of equipment.  It shows how the parts are associated with each other.  But in the case of the house, it doesn't specify what the walls are made of, or what color they are painted.  And in the case of a piece of electronic equipment, it doesn't show how the parts are actually configured inside the case.  A schematic diagram represents the general idea of a thing -- and that is exactly what a schema is.

Head's Concept of Schema

Bartlett actually got the schema concept from Sir Henry Head (1861-1940), a British neurophysiologist famous for his studies of bodily posture and of aphasia.  In his Studies in Neurology (1920),  Head asserted that, in order to maintain correct posture, an organism must have some conception of its own body in space and time -- a homunculus-like "plastic model" which registers information about successive movements of various body parts (arms, legs, etc.), and updates the conception accordingly (see also Head and Holmes, 1911).  The body schema is an internal representation of the body, but it's not exactly a picture of what our bodies look like now; but rather a more generic concept of our bodies, that we have arms and legs and hands, and what kinds of motions these body parts can make, where these body parts are likely to be found, and so on.  

"Schemas are abstractions from specific instances that can be used to make inferences of the concepts they represent" (Anderson, Cognitive Psychology and Its Implications, 2000).

"A schema is a general knowledge structure used for understanding" (Medin, Ross, & Markman, Cognitive Psychology 2001).

Bartlett's Concept of Schema

In his theory of memory, Bartlett defined a schema as "an active organization of past reactions, or of past experiences, which must always be supposed to be operating in any well-adapted organic response" (p. 201) -- not just in moving around the physical world, but in mental activities such as remembering as well.  

It is this latter "partial reconstructive" view that is Bartlett's legacy to memory theory.  In the constructivist theory of perception, as it has been known at least since the time of Helmholtz, the perceiver combines information extracted from the stimulus with prior knowledge, expectations, and beliefs stored in memory to form a representation of some event that may or may not be precisely accurate.  In much the same way, it appears that the rememberer combines information retained in a memory trace with knowledge stored as part of a generic schema relevant to the event being remembered.  The result of this is that the individual will correctly remember those details that are schema-congruent, but also will falsely remember details that are congruent with the schema but not not actually features of the event in question.  In addition, the individual will also remember schema-incongruent features -- those which were unexpected based on the schema activated at the time of perception, and so drew additional attention, and dominated the perceiver's "effort after meaning".  

Piaget on Schemata

The great Swiss developmental psychologist Jean Piaget (1896-1980) also employed the schema concept in his "genetic epistemology" theory of cognitive development.  For Piaget, as for Bartlett, a schema is an internal representation of some general class of situations.  Incoming stimulus information is assimilated to prevailing schemata, which in turn accommodate to information that doesn't quite fit.  Thus, the child is born with innate sensory-motor schemata, which develop through pre-operational, concrete-operations, and formal-operations stages as a result of the dynamic interplay of assimilation and accommodation. It's easy to see the similarities between Bartlett's and Piaget's ideas about schemata, but neither of them references the other.  As far as I can tell, Piaget first employed the schema concept in The Language and Thought of the Child (1926), so one would not expect Piaget to cite Bartlett.  But Bartlett didn't cite Piaget, either.  My best guess is that they derived the idea independently -- Bartlett from Henry Head, and Piaget from Immanuel Kant.  Oldfield and Zangwill (1942-1943) do not cite Piaget in their discussion of Head and Bartlett, and deny any connection between Bartlett's views and Kant.  

It was Kant, in fact, who first introduced the notion of a schema, referring to the a priori categories that Kant invoked in his synthesis of Cartesian rationalism and  British empiricism.  Think, for example, of the associationist principle of association by contiguity (never mind that it's wrong).  You can't perceive things as close together in space and time unless you already have some notion of space and time.  Such notions are schemata, in Kant's terms.  

Incidentally, the Bartlett-Piaget coincidence repeated itself several decades later.  In his pioneering textbook on Cognitive Psychology, published in 1967, Ulric (Dick) Neisser made considerable use of Bartlett's notion of the schema as the generic knowledge against which percepts are constructed and memories reconstructed.  At exactly the same time, Aaron T. (Tim) Beck published a pioneering cognitive theory of depression (as opposed to the prevailing psychoanalytic one), based on the idea that depressed individuals suffer from depressogenic schemata -- basically, negative construals of self, the future, and the world.  Neisser was at the time on the faculty at Cornell, but he wrote his book while on sabbatical at the University of Pennsylvania -- which was where Beck, on the faculty of Penn's psychiatry department, was writing his book.  I know both individuals (being a Penn PhD), and so far as I can tell neither knew what the other was up to.  

The Bartlett Revival

Partly owing to the influence of Neisser's book, and partly owing to the increasing interest on the part of memory researchers in memory for stories (as opposed to word-lists), the schema concept was revived in the 1970s -- first within cognitive psychology, and then within social psychology.  For example, a number of experiments showed that comprehension of prose passages was better if subjects were first given information about the general theme of the passage; expert chess players,  remember chess positions better than novices; and story details that fit subjects' expectations and world-knowledge are remembered better than those that do not.

Taylor and Crocker (1981) discussed a number of functions of schemata:

Brewer and Nakamura (1984) outlined five ways that schemata could specifically influence memory:

  1. Schemata influence the amount of attention directed to particular details.
  2. They act as frameworks for storing new information.
  3. Generic information in schemata can combine with specific details of an event.
  4. Schemata can guide memory retrieval.
  5. They can guide the process by which the subject selects retained information for actual reporting.

Schemata in Artificial Intelligence

Both Bartlett's and Piaget's notions of schemata are relatively informal, and so was the concept of schema held by the cognitive and social psychologists just described.  For them, the term simply refers to an organized body of more-or-less generic knowledge that guides perception, memory, thought, and action.  But this is a lecture supplement on representation, so we need to ask:

what do schemata look like?

We got an answer when the schema concept was revived in cognitive science, and particularly in work on artificial intelligence, by theorists who rejected the "atomistic" implications of information-processing theory -- as in HAM or ACT, with individual pieces of knowledge represented as local nodes in an associative network.  They had to figure out what schemata looked like, because they wanted to incorporate the concept in their computer-simulation models of memory and other aspects of cognition.

For example, Minsky (1975) explicitly rejected atomism and postulated the existence of "larger" "data structures" for representing knowledge known as frames.  A frame has nodes that provide its basic structure, and slots that accept only certain kinds of information.  If a slot is not filled by information to the contrary, it is filled in by "default" information.  For example, a room has a floor, walls, windows, doors, and a ceiling, each represented by nodes.  The floor may be wood or tile or carpeted, but it is unlikely to be made of water or grass.  The ceiling may be level or vaulted, but if it is vaulted the vault is unlikely to point downward.  There are usually four walls, and at least one window on every outside wall.

At roughly the same time, Rumelhart and Ortony (1977) also invoked the schema concept to handle the problem of representing "higher-level abstractions" in story memory (which they used as a proxy for episodic memory in general).  

Rumelhart (1981, 1984) began by offering some analogies between schemata and more familiar terms:

For Rumelhart (1984), schemata have several major characteristics:

For Rumelhart, schemata mediate the dynamic interplay between top-down and bottom-up information processing -- much like Piaget's interplay between assimilation and accommodation.  Incoming stimulus information is processed with respect to active schemata, while schemata direct attention and interpretation.  In schematic processing, information processing goes in both directions: top-down and bottom-up.  This schematic processing is critical to every aspect of cognition: perception, discourse processing, learning, memory, and problem-solving.

For thorough discussions of Bartlett's schema theory, and its more modern adaptations, see
  • Oldfield, R.C., & Zangwill, O.L.  (1942a).  Head's concept of the schema and its application in contemporary British psychology.  Part I. Head's concept of the schema.  British Journal of Psychology, 32, 267-286.
  • Oldfield, R.C., & Zangwill, O.L.  (1942b).  Head's concept of the schema and its application in contemporary British psychology.  Part II. Critical analysis of Head's theory.  British Journal of Psychology, 33, 58-64.
  • Oldfield, R.C., & Zangwill, O.L.  (1943a).  Head's concept of the schema and its application in contemporary British psychology.  Part III. Bartlett's theory of memory.  British Journal of Psychology, 33, 113-129.
  • Oldfield, R.C., & Zangwill, O.L.  (1943b).  Head's concept of the schema and its application in contemporary British psychology.  Part IV. Wolters' theory of thinking.  British Journal of Psychology, 33, 143-149.
  • Oldfield, R.C.  (1954).  Memory mechanisms and the theory of schemata.  British Journal of Psychology, 45, 14-23.
  • Paul, I.H. (1967).  The concept of schema in memory theory.  Psychological Issues, 5(2-3), 218-258.  Reprinted in R.R. Holt (Ed.) (1967), Motives and thought: Psychoanalytic essays in honor of David Rapaport (pp. 218-258).  New York: International Universities Press.
  • Brewer, W.F., & Nakamura, G.V.  (1984).  The nature and function of schemas.  In R.S. Wyer & T.K. Srull (Eds.), Handbook of social cognition (1st ed., vol. 1), pp.119-160.  Hillsdale, N.J.: Erlbaum.
  • Rumelhart, D.E.  (1984).  Schemata and the cognitive system.  In R.S. Wyer & T.K. Srull (Eds.), Handbook of social cognition (1st ed., vol. 1), pp.161-188.  Hillsdale, N.J.: Erlbaum.

Scripts as Schemata

A special form of schema is known as a script.  The notion of scripts has its origins in sociological role theory, and sociologists of sex often discuss sexual interactions as scripted in nature.  For a long time, however, the script concept was relatively informal, based on a dramaturgical metaphor for social behavior in general.  

Just what goes into scripts, and how they are structured, was discussed in detail by Schank & Abelson (1977), who went so far as to write script theory in the form of an operating computer program -- another exercise in artificial intelligence, this time applied to the domain of social cognition.  Schank and Abelson based their scripts on conceptual dependency theory (Schank, 1975), which attempts to represent the meaning of sentences in terms of a relatively small set of primitive elements.  Included in these primitive elements are primitive acts such as:

118SchankRestaurant.jpg (97954 bytes)Schank & Abelson illustrate their approach with what they call the Restaurant Script:


Begins with...

Ends with...

Scene 1, 

Entering the Restaurant

Customer PTRANS Customer into restaurant 

Customer MOVE Customer to sitting position

Scene 2, 


Customer MTRANS Signal to Waiter

Waiter PTRANS Food to Customer

Scene 3, 


Cook ATRANS Food to Waiter

Customer INGEST Food

Scene 4, 


Waiter ATRANS Check to Customer

Customer PTRANS Customer out of restaurant.

Although script theory attempts to specify the major elements of a social interaction in terms of a relatively small list of conceptual primitives, Schank and Abelson also recognized that scripts are incomplete.  For example, there are free behaviors that can take place within the confines of the script.  

There are also anticipated variations of the script, such as

And there are unanticipated variations as well, such as
Scripts are, in some sense, prototypes of social situations, because they list the features of these situations and the social interactions that take place within them.  But they go beyond prototypes to specify the relations, particularly, the temporal, causal, and enabling relations, among these features.  The customer orders food before the waiter brings it, and the customer can't leave until he pays the check, but he can't pay the check until the waiter brings it.

In any event, scripts enable us to categorize social situations: we can determine what situation we are in by matching its features to the prototypical features of various scripts we know.  And, having categorized the situation in terms of some script, that script will then serve to guide our social interactions within that situation.  By specifying the temporal, causal, and enabling relations among various actions, the script enables us to know how to respond to what occurs in that situation.


Categories and Concepts

Our discussion of memory storage has focused on episodic memory -- that is, how specific episodes of experience, thought, and action are represented in the mind.  But it is also clear that more than episodic memories are stored in the mind.  There is also semantic knowledge of various sorts, as well as procedural knowledge.  A special form of semantic knowledge concerns conceptual knowledge about the world.  Technically, conceptual knowledge is part of semantic memory, and we have already discussed how certain classic models of semantic memory represent conceptual knowledge:

That's all well and good, but conceptual knowledge has been such an important part of theories of cognitive representation -- since, roughly, the time of Aristotle! -- that they deserve some special treatment.

So the question becomes -- what are concepts, and how are categories represented in the mind?

The terms concept and category are often used interchangeably, even though there is an important  technical distinction between them:

Generally, we think of our mental concepts as being derived from the actual categorical structure of the real world, but there are also points of divergence:

Technically, categories exist in the real world, while concepts exist in the mind. However, this technical distinction is difficult to uphold, and psychologists commonly use the two terms interchangeably. In fact, objective categories may not exist in the real world, independently of the mind that conceives them (a question related to the philosophical debate between realism and idealism).  Put another way, the question is whether the mind picks up on the categorical structure of the world, or whether the mind imposes this structure on the world.  

Some categories may be defined through enumeration: an exhaustive list of all instances of a category. A good example is the the English alphabet, A through Z; these letters have nothing in common except their status as letters in the English alphabet.

A variant on enumeration is to define a category by a rule which will generate all instances of the category (these instances all have in common that they conform to the rule). An example is the concept of integer in mathematics, which is defined as the numbers 0, 1, and any number which can be obtained by adding or subtracting 1 from these numbers one or more times.

The most common definitions of categories are by attributes: properties or features which are shared by all members of a category. Thus, birds are warm-blooded vertebrates with feathers and wings, while fish are cold-blooded vertebrates with scales and fins. There are three broad types of attributes relevant to category definition:

Of course, some categories are defined by mixtures of perceptual, functional, and relational features.

Still, most categories are defined by attributes, meaning that concepts are summary descriptions of an entire class of objects, events, and ideas. There are three principal ways in which such categories are organized: as proper sets, as fuzzy sets, and as sets of exemplars.

Now having defined the differences between the two terms, we are going to use them interchangeably again.  The reason is that it's boring to write concept all the time; moreover, the noun category has a cognate verb form, categorization, while conceptual does not (unless you count conceptualization, which is a mouthful that doesn't mean quite the same thing as categorization).  

Still, the semantic difference between concepts and categories raises two particularly interesting issues for social categorization:

The Classical View: Categories as Proper Sets

Perhaps the earliest philosophical discussion of conceptual structure was provided by Aristotle in his Categories.  Aristotle set out the classical view of categories as proper sets -- a view which dominated thinking about concepts and categories well into the 20th century.  Beginning in the 1950s, however, and especially the 1970s, philosophers, psychologists, and other cognitive scientists began to express considerable doubts about the classical view.  In the time since, a number of different views of concepts and categories have emerged -- each attempting to solve the problems of the classical view, but each raising new problems of its own.  Here's a short overview of the evolution of theories of conceptual structure.

According to the classical view, concepts are summary descriptions of the objects in some category.  This summary description is abstracted from instances of a category, and applies equally well to all instances of a category.  

In the classical view, categories are structured as proper sets, meaning that the objects in a category share a set of defining features which are singly necessary and jointly sufficient to demarcate the category.

Examples of classification by proper sets include:
According to the proper set view, categories can be arranged in a hierarchical system which represents the vertical relations between categories, and yield the distinction between superordinate and subordinate categories.

geometry.gif (4741
              bytes)Such hierarchies of proper sets are characterized by perfect nesting, by which we mean that subsets possess all the defining features of supersets (and then some). Examples include:

geometrical figures
    superset: points, lines, planes, solids
        subsets of planes: triangles, quadrilaterals, etc.
            sub-subsets of quadrilaterals: parallelograms, rhomboids, etc.
                sub-sub-subsets of parallelograms: rectangles, squares, etc.

    superset: male, female
        subsets of males: youth, bachelor, husband, widower

subsets of females: girl, maiden, wife, widow

government officials
    superset: executive, legislative, judicial
        subsets of legislative: senator, representative

        subsets of executive: president, cabinet secretary, administrator

        subsets of judicial: supreme court, court of appeals, district court, magistrate

Note, for example, the perfect nesting in the hierarchy of geometrical figures.

Such hierarchies show perfect nesting: all instances of subcategories also possess the defining features of. the relevant superordinate category.  All trapezoids have the features of quadrilaterals, and all quadrilaterals have the features of planes.  

Proper sets are also characterized by an all-or-none arrangement which characterizes the horizontal relations between adjacent categories, or the distinction between a category and its contrast. Because defining features are singly necessary and jointly sufficient, proper sets are homogeneous in the sense that all members of a category are equally good instances of that category (because they all possess the same set of defining features). An entity either possesses a defining feature or it doesn't; thus, there are sharp boundaries between contrasting categories: an object is either in the category or it isn't. You're either a fish, or you're not a fish.  There are no ambiguous cases of category membership.

According to the classical view, object categorization proceeds by a process of feature-matchingThrough perception, the perceiver extracts information about the features of the object; these features are then compared to the defining feature of some category.  If there is a complete match between the features of the object and the defining features of the category, then the object is labeled as another instance of a category.

Problems with the Classical View

The proper set view of categorization is sometimes called the classical view because it is the one handed down in logic and philosophy from the time of the ancient Greeks. But there are some problems with it which suggest that however logical it may seem, it's not how the human mind categorizes objects.  Smith & Medin (1981) distinguished between general criticisms of the classical view, which arise from simple reflection, and empirical criticisms, which emerge from experimental data on concept-formation.  

General Criticisms.  On reflection, for example, it appears that some concepts are disjunctive: they are defined by two or more different sets of defining features.

Disjunctive categories violate the principle of defining features, because there is no defining feature which must be possessed by all members of the category.

Another problem is that many entities have unclear category membership. According to the classical, proper-set view of categories, every object should belong to one category or another. But is a rug an article of furniture? Is a potato a vegetable? Is a platypus a mammal? Is a panda a bear? We use categories like "furniture" without being able to clearly determine whether every object is a member of the category.

Furthermore, some categories are associated with unclear definitions.  That is, it is difficult to specify the defining features of many of the concepts we use in ordinary life. A favorite example (from the philosopher Wittgenstein) is the concept of "game". Games don't necessarily involve competition (solitaire is a game); there isn't necessarily a winner (right-around-the-rosy), and they're not always played for amusement (professional football). Of course, it may be that the defining features exist, but haven't been discovered yet. But that doesn't prevent us from assigning entities to categories; thus, categorization doesn't seem to depend on defining features.

nesting.gif (6674
              bytes)Empirical Criticisms.  Yet another problem is imperfect nesting: it follows from the hierarchical arrangement of categories that members of subordinate categories should be judged as more similar to members of immediately superordinate categories than to more distant ones, for the simple reason that the two categories share more features in common. Thus, a sparrow should be judged more similar to a bird than to an animal. This principle is often violated: for example, chickens, which are birds, are judged to be more similar to animals than birds.  This results in a tangled hierarchy of related concepts.


              (5180 bytes)The chicken-sparrow example reveals the last, and perhaps the biggest, problem with the classical view of categories as proper sets: some entities are better instances of their categories than others. This is the problem of typicality. A sparrow is a better instance of the category bird -- it is a more "birdy" bird -- than is a chicken (or a goose, or an ostrich, or a penguin). Within a culture, there is a high degree of agreement about typicality. The problem is that all the instances in question share the features which define the category bird, and thus must be equivalent from the classical view. But they are clearly not equivalent; variations in typicality among members of a category can be very large.

Variations in typicality can be observed even in the classic example of a proper set -- namely, geometrical figures.  For example, subjects usually identify an equilateral triangle, with equal sides and equal angles, as more typical of the category triangle, than isosceles, right, or acute triangles.  

There are a large number of ways to observe typicality effects: 

Typicality appears to be determined by family resemblance.  Category instances seem to be united by family resemblance rather than any set of defining features shared by all members of a category.  Just as a child may have his mother's nose and his father's ears, so instance A may share one feature with instance B, and an entirely different feature with instance C, while B shared yet a third feature with C, that it does not share with A.  Empirically, typical members share lots of features with other category members, while atypical members do not. Thus, sparrows are small, and fly, and sing; chickens are big, and walk, and cluck.

Typicality is important because it is another violation of the homogeneity assumption of the classical view. It appears that categories have a special internal structure which renders instances nonequivalent, even though they all share the same singly necessary and jointly sufficient defining features. Typicality effects indicate that we use non-necessary features when assigning objects to categories. And, in fact, when people are asked to list the features of various categories, they usually list features that are not true for all category members.

The implication of these problems, taken together, is that the classical view of categories is incorrect. Categorization by proper sets may make sense from a logical point of view, but it doesn't capture how the mind actually works.

The Prototype View: Concepts as Fuzzy Sets

Recently, another view of categorization has gained status within psychology: this is known as the prototype or the probabilistic  view. 

The prototype view retains the idea, from the classical view, that concepts are summary descriptions of the instances of a category.  Unlike the classical view, however, in the prototype view the summary description does not apply equally well to every member of the category, because there are no defining features of category membership.  

According to the prototype view, categories are fuzzy sets, in that there is only a probabilistic relationship between any particular feature and category membership. No feature is singly necessary to define a category, and no set of features is jointly sufficient. 


Fuzzy Sets and Fuzzy Logic

The notion of categories as fuzzy rather than sets, represented by prototypes rather than lists of defining features, is related to the concept of fuzzy logic developed by Lofti Zadeh, a computer scientist at UC Berkeley.  Whereas the traditional view of truth is that a statement (such as an item of declarative knowledge) is either true or false, Zadeh argued that statements can be partly true, possessing a "truth value" somewhere between 0 (false) and 1 (true).  

Fuzzy logic can help resolve certain logical conundrums -- for example the paradox of Epimenides the Cretan (6th century BC), who famously asserted that "All Cretans are liars".  If all Cretans are liars, and Epimenides himself is a Cretan, then his statement cannot be true.  Put another way: if Epimenides is telling the truth, then he is a liar.  As another example, consider the related Liar paradox: the simple statement that "This sentence is false".  Zadeh has proposed that such paradoxes can be resolved by concluding that the statements in question are only partially true.

Fuzzy logic also applies to categorization.  Under the classical view of categories as proper sets, a similar "all or none" rule applies: an object either possesses a defining feature of a category or it does not; and therefore it either is or is not an instance of the category.  But under fuzzy logic, the statement "object X has feature Y" can be partially true; and if Y is one of the defining features of category Z, it also can be partially true that "Object X is an instance of category Z".

A result of the probabilistic relation between features and categories is that category instances can be quite heterogeneous. That is, members of the same category can vary widely in terms of the attributes they possess. All of these attributes are correlated with category membership, but none are singly necessary and no set is jointly sufficient.

Some instances of a category are more typical than others: these possess relatively more central features.

According to the prototype view, categories are not represented by a list of defining features, but rather by a category prototype, or focal instance, which has many features central to category membership (and thus a family resemblance to other category members) but few features central to membership in contrasting categories.

It also follows from the prototype view that there are no sharp boundaries between adjacent categories (hence the term fuzzy sets). In other words, the horizontal distinction between a category and its contrast may be very unclear. Thus, a tomato is a fruit but is usually considered a vegetable (it has only one perceptual attribute of fruits, having seeds; but many functional features of vegetables, such as the circumstances under which it is eaten). Dolphins and whales are mammals, but are usually (at least informally) considered to be fish: they have few features that are central to mammalhood (they give live birth and nurse their young), but lots of features that are central to fishiness.

Two Views of Prototypes

Actually, there are two different versions of the prototype view.

The two versions of the of the prototype view have somewhat different implications for categorization. 
Either way, categorization is no longer an "all-or-none" matter.  Category membership can vary by degrees, depending on how closely the object resembles the prototype.

The prototype view solves most of the problems that confront the classical view, and (in my view, anyway) is probably the best theory of conceptual structure and categorization that we've got.  But as research proceeded on various aspects of the prototype view, certain problems emerged, leading to the development of other views of concepts and categories.

In the prototype view, as in the classical view, related categories can be arranged in a hierarchy of subordinate and superordinate categories.  Many accounts of the prototype view argue that there is a basic level of categorization, which is defined as the most inclusive level at which:

In the realm of animals, for example, dog and cat are at the basic level, while beagle and Siamese are at subordinate levels.  In the domain of musical instruments, piano and saxophone are at the basic level, while grand piano and baritone saxophone are at subordinate levels.  The basic level is in some important sense psychologically salient, and preferred for object categorization and other cognitive purposes.

The Exemplar View

For example, some theorists now favor a third view of concepts and categories, which abandons the definition of concepts as summary descriptions of category members. According to the exemplar view, concepts consist simply of lists of their members, with no defining or characteristic features to hold the entire set together. In other words, what holds the instances together is their common membership in the category. It's a little like defining a category by enumeration, but not exactly. The members do have some things in common, according to the exemplar view; but those things are not particularly important for categorization.

When we want to know whether an object is a member of a category, the classical view says that we compare the object to a list of defining features; the prototype view says that we compare it to the category prototype; the exemplar view says that we compare it to individual category members. Thus, in forming categories, we don't learn prototypes, but rather we learn salient examples.

Teasing apart the prototype and the exemplar view turns out to be fiendishly difficult. There are a couple of very clever experiments which appear to support the exemplar view.  For example, it turns out that we will classify an object as a member of a category if it resembles another object that is already labeled as a category member, even if neither the object, or the instance, particularly resemble the category prototype.

Nevertheless, some theorists investigators are worried about it because it seems to be uneconomical. The compromise position, which has many adherents, is that we categorize in terms of both prototypes and exemplars. For example, and this is still a hypothesis to be tested, novices in a particular domain categorize in terms of prototypes and experts categorize in terms of exemplars.

Despite these differences, the exemplar view agrees with the prototype view that categorization proceeds by way of similarity judgments.  And they further agree that similarity varies in degrees.  They just differ in what the object must be similar to:

Following the work of Amos Tversky, Medin (1989) has outlined a modal model of similarity judgments:
In either case, similarity is sufficient to describe conceptual structure -- all the instances of a concept are similar, in that they either share some features with the category prototype or they share some features with a category exemplar.

The Theory-Based View

As noted, the prototype and exemplar views of categorization are all based on a principle of similarity. What members of a category have in common is that they share some features or attributes in common with at least some other member(s) of the same category. The implication is that similarity is something that is an attribute of objects, that can either be measured (by counting overlapping features) or judged (by estimating them).  But ingenious researchers have uncovered some troubles with similarity as a basis for categorization -- and, for that matter, with similarity in general.

Context Effects.  However, recently it has been recognized that some categories are defined by theories instead of by similarity. For example, in one experiment, when subjects were presented with pictures of a white cloud, a grey cloud, and a black cloud, they grouped the grey and black clouds together as similar; but when presented with pictures of white hair, grey hair, and black hair, in which the shades of hair were identical to the shades of cloud, subjects grouped the grey hair with the white hair. Because the shades were identical in the two cases, grouping could not have been based on similarity of features. Rather, the categories seemed to be defined by a theory of the domain: grey and black clouds signify stormy weather, while white and grey hair signify old age.

Ad-Hoc Categories.  What do children, money, insurance papers, photo albums, and pets have in common? Nothing, when viewed in terms of feature similarity. But they are all things that you would take out of your house in case of a fire. The objects listed together are similar to each other in this respect only; in other respects, they are quite different.  

This is also true of the context effects on similarity judgment: grey and black are similar with respect to clouds and weather, while grey and white are similar with respect to hair and aging.  

These observations tell us that similarity is not necessarily the operative factor in category definition. In some cases, at least, similarity is determined by a theory of the domain in question: there is something about weather that makes grey and black clouds similar, and there is something about aging that makes white and grey hair similar.

In the theory-based view of categorization (Medin, 1989), concepts are essentially theories of the categorical domain in question.  Conceptual theories perform a number of different functions:

From this point of view, similarity-based classification, as described in the prototype and exemplar views, is simply a short-cut heuristic used for purposes of classification.  The real principle of conceptual structure is the theory of the categorical domain in question.

Conceptual Coherence

One way or another, concepts and categories have coherence: there is something that links members together. In classification by similarity, that something is intrinsic to the entities themselves; in classification by theories, that something is imposed by the mind of the thinker.

But what to make of this proliferation of theories?  From my point of view, the questions raised about similarity have a kind of forensic quality -- they sometimes seem to amount to a kind of scholarly nit-picking.  To be sure, similarity varies with context; and there are certainly some categories which are only held together by a theory, and similarity fails utterly to hold a category together.  For most purposes, the prototype view, perhaps corrected (or expanded) a little by the exemplar view, seems to work pretty well as an account of how concepts are structured, and how objects are categorized.

As it happens, most work on social categorization has been based on the prototype view.  But there are areas where the exemplar view has been applied very fruitfully, and even a few areas where it makes sense to abandon similarity, and to invoke something like the  theory-based view.

To summarize this history, concepts were first construed as summary descriptions of category members. 

Concepts and categories are just about the most interesting topic in all of psychology and cognitive science, and two very good books have been written on the subject.  They are highly recommended:

  • Categories and Concepts by E.E. Smith and D.L. Medin (Harvard University Press, 1981).
  • The Big Book of Concepts by G.L. Murphy (MIT Press, 2002).
Here in Berkeley's Psychology Department, Prof. Eleanor Rosch, now retired, made fundamental contributions to the "prototype" view of conceptual structure.  She also gave a wonderful course on the subject, enhanced by her interest in Buddhist psychology, which has a very different view of concepts and categories; the course is now offered by Prof. Tania Lombrozo Prof. George Lakoff, in the Linguistics Department, also gives courses on concepts, with special attention to metaphor.

This page last revised 02/14/2014.