Once sensory information arrives at the sensory projection area in the brain, the next step in cognition is forming a mental representation of the stimulus. This mental representation is what we call a percept. Although there are many different sensory modalities, most of what we know about perception is confined to visual and auditory perception -- that's where the research has been focused.
The Ecological View of Perception
However, some theories of perception deny or downplay the role of top-down, cognitive processes in perception. According to the ecological view of perception proposed by James J. Gibson, who often collaborated with his wife, Eleanor Jack Gibson (they were known as Jimmy and Jacky):
- All the information needed for perception is provided by the stimulus environment, broadly defined to include all the stimuli impinging on the organism at the moment.
- The perceptual apparatus has evolved in such a manner as to extract this information automatically, without any recourse to memory or thought.
- Because the perceptual apparatus has evolved in such a way as to enable us to perceive the world as it really is, there is no need for learning to occur (learning may be important for other aspects of behavior, but it is not important for perception).
- And because all the information we need to see the world the way it really is is provided by the stimulus environment, there is no need for the organism to consult its fund of world-knowledge stored in memory, or to engage in such "higher" mental processes as judgment, reasoning, or problem-solving.
The ecological view is a
radical view of perception, but in fact the principles of
direct realism can account for much of what we see. The
Gibsonian ecological view has been especially successful in
accounting for three basic aspects of perception:
- whether an object is stable or in motion;
- whether an object is close by or far away;
- whether an object is rigid or flexible.
The ecological view of perception is so named because it assumes that the stimulus environment (the ecology) provides enough information to enable us to perceive the world accurately, and there is no need to draw more information from "inside the head". It is also called direct perception because, in theory, the formation of the percept is not mediated by learning, inference, or other "higher" cognitive processes. And it is called direct realism because, again in theory, the mechanisms of perception have evolved in such a way as to permit us to perceive the world as it really is.
But there's a subtle trick in direct realism: the definition of "stimulus" must be broadened to include not just the distal stimulus object itself (the object of regard), but also other stimuli in the surrounding stimulus field, especially, the relationship between an object and its background. In addition to the stimulus and its context, important information for perception is also provided by the perceiver's body. All of this -- the distal stimulus, its environmental context, and the perceiver's own body -- are environmental sources of information for perception.
The Perception of Motion
How can we tell whether an object is in motion? Gibson argues that the perception of motion depends on the comparison between various sources of stimulus information.
Some information comes from the objects themselves.
The successive covering and uncovering of an object's background, or of one object by another, is clear information that the object is moving with respect to its background, or that one object is moving with respect to another. If you watch the red rectangle, you will see a green square successively cover and uncover it. This is information that the square is moving across the rectangle.
Other information relevant to motion is provided by the eyes and head.
Movement of the retinal image: As an object moves, the retinal image cast by that object moves across the retina. If you keep your eyes fixed on the cross, the image of the circle will move across the retina. This visual stimulation is information that the circle is moving.
Egomotion. But we do not always see motion when an image moves across the retina. If you fix your eyes on the cross, and then move your eyes or your head back and forth to the left and right, the retinal image of the cross also moves across the retina. The cross moves across the retina, but we do not see the cross move in the world. The visual system automatically corrects for kinesthetic information about egomotion, or self-produced movement, to tell us that it is our eyes that are moving, not the objects in the world. Moreover, we do not always see stability when an image stays fixed on the retina. If you fix your eyes on the cross, and then track the circle as it moves across the screen, your eyes and/or your head will move, in order to keep the image of the circle in the central field of vision; otherwise the image moves to the periphery. The cross stays fixed on the retina, but we see it move in the world. This egomotion is required to keep an image in a fixed position on the retina. Egomotion is another source of information for the perception of motion.
- The image-retina system takes information about the image of an object on the retina.
- The eye/head system takes information about movements of the eyes and head.
- When the image moves across the retina, while the eyes and head remain still.
- When the image remains stationary on the retina, but they eyes and head move.
- When the image of one object on the retina remains stationary, but the image of another object moves across the retina.
- When the image moves across the retina at a different rate than the eyes and head are moving.
The effective stimulus for motion, then is the discrepancy between information provided by the image-retina and eye/head systems.
The Perception of Depth or Distance
Sometimes the two sources of information are in conflict. Cover one eye with your hand, and then focus the other eye on the cross. Then gently push on your open eye with your finger. The cross seems to move. This is because the retinal image of the cross moves from one spot to another, but there is no kinesthetic information about egomotion to correct this apparent movement.
Similar processes can be seen in the perception of distance or depth.
Some cues to distance or depth are called binocular, because they depend on the fact that we have two eyes, and are not available to people who are blind in one eye.
- at short distances, the angle of convergence is large;
- at long distances, the angle of convergence is small.
Note that the convergence principle works only up to 30-40 feet. After that distance, the eyes are essentially parallel, and convergence is no longer available as a cue to distance.
Binocular (or Retinal) Disparity: Because the two eyes are separated by a space of 2-3 inches, they each provide somewhat different -- disparate -- images of an object.That is, each eye has a slightly different perspective on the object.
As a demonstration of binocular disparity, hold out your left index finger at full arm's length, and your right index finger at half arm's length. Close your right eye, and align your two fingers using only your left eye, so that both coincide with the cross. Then close your left eye and open your right eye. The movement of the cross shows that the left and right eyes have somewhat different views.
These 2-dimensional images are then fused by the brain to result in a three-dimensional percept, with information about depth as well as width and height.
An excellent demonstration of stereopsis is provided, naturally, by stereograms (invented by Charles Wheatstone, an19th-century physicist). These pairs of images differ slightly in lateral displacement so that, when one image is presented to each eye, the two images fuse into a vivid illusion of depth. Of particular interest are the random-dot stereograms invented by Bela Julesz, a 20th-century vision scientist working at Bell Laboratories, which uses stereoscopic images composed of thousands of randomly placed dots (hence the name) to create images in depth.
Stereopsis is the mechanism behind the production of 3-D movies and television, where scenes are filmed by cameras with two lenses, separated (like our eyes) by a few inches, so that each lens captures a slightly different view of the scene. By means of 3-D glasses, each of these views is presented to a different eye, and the visual system fuses the two 2-D images into a single 3-D image.
Other cues to distance are monocular, in that they do not depend on the use of two eyes, and are available to organisms that are blind in one eye.
- The lens bulges to focus on nearby objects.
- The lens flattens to focus on distant objects.
This flattening and bulging of the lens is accomplished by special muscles, which provide kinesthetic feedback to the visual system.
- If the distance from the observer to two objects is constant, the size of their retinal images will be a function of the size of the objects.
- If the size of two objects is constant, the size of their retinal images will be a function of their distance from the observer.
Thus, if two images are of
similar shape, but different relative size, there are two
- both are at the same distance from the observer, but one is smaller than the other;
- both are of the same size, but one is closer than the other.
The solar "Eclipse Across America" of August 21, 2017 dramatically illustrates the size-distance rule. The distance from the Sun to the Earth is about 92.96 million miles. The distance from the Moon to the Earth is about 238,900 miles -- a ratio of 400:1. As it happens, the diameter of the Sun is 864,000 miles, and the diameter of the Moon is 2,158 miles -- also a ratio of 400:1. When the moon orbits between the Sun and Earth, its disc completely covers that of the Sun, causing a solar eclipse to occur. But just as important, in the present context, the constant ratios mean that -- don't look at the Sun without adequate eye protection! -- the Sun and the Moon appear to be the same size. (Photo by Celia Talbot Tobin for The New York Times, 08/22/2017).
According to Gibson, the "choice" between these two possibilities is determined by other visual cues to distance provided the objects and their backgrounds.
Since the Renaissance, many painters have achieved a sense of
depth or distance in their two-dimensional canvases by making
use of perspective lines that converge toward a vanishing
point. Objects along these lines are foreshortened
proportionately. This cue is also known as spatial
Here's an early example -- perhaps the very first painting to use linear perspective to create the illusion of depth: "The Trinity" by Masaccio (1425), in the Basilica of Santa Maria Novella, Florence. The principles of linear perspective were not codified until Leon Battista Alberti wrote his treatise De Pictura ("On Painting") in 1435. Thereafter, the principles were quickly adopted by other painters of the Renaissance.
Here's a rough analysis of the use of perspective in Raphael's painting. The tiles on the floor create perspective lines leading to the central figure. So do the columns near the ceiling. Individuals arrayed along the perspective lines are perceived as lying at different distances from the viewer. Enhancing the illusion of depth is relative size: individuals "near" the viewer are painted larger than those "farther away", in an application of the size-distance rule.
Renaissance architects sometimes employed illusory tricks of linear perspective to make their buildings seem bigger than they really are. On the left, the Duomo (cathedral) in Orvieto, Italy. Although we expect the columns along the nave to be in parallel, in fact they converge slightly as they approach the altar, thus exaggerating the length of the nave. At the Basilica of Santa Maria Novella, in Florence, the architect pulled out all the stops: not only do the columns along the nave converge slightly, but the floor rises slightly and the ceiling lowers slightly, to exaggerate the sense of distance even more.
Magritte also played on the Renaissance idea that the picture frame is like a window, and the painter should paint scenes as if they were being viewed through this opening.In "The Human Condition" (1933) and "The Fair Captive" (1931), Magritte paints the canvas in the scene being portrayed on the canvas.
Here's a photographic riff on Magritte: an empty picture frame at a viewpoint looking toward Cadaques, Spain (New York Times, 12/15/2013). The city was a frequent subject of the surrealist painter Salvador Dali, who lived nearby -- as in his "Cadaques" (1923).
In "The Promenades of Euclid" (1935), the conical tower on the castle on the left is identical in size and shape to the boulevard on the right. The difference is that the tower cuts off our view of the apartment building (an application of superposition), while the boulevard proceeds along linear perspective lines to a vanishing point.
A similar ploy was used by Makoto Aida, a modern Japanese artist, in "Path Between Rice Fields", where the part in the girl's hair is continuous with the path between the fields. Superposition makes us see the girl as close by, while the converging lines of linear perspective makes us see the path receding into the distance (note that the path continues "forward", and is visible next to the girl's neck).
Here's a cover of the New Yorker by David Hockney that is a Magrittian riff on linear perspective. The receding palm trees give a clear impression of distance but note that the width of the road doesn't change.
Elevation with respect to the Horizon: Distant objects appear closer to the horizon. That this is not just a matter of "up" versus "down", consider that the upper trees appear further away than the lower ones, but the upper clouds appear closer than the lower one.
Another surrealist, the Russian-born American Paul (Pawel) Tchelitchew (1898-1957) combined linear perspective and elevation to striking effect in his masterpiece "Phenomena" (1938), the "Final Sketch" of which is shown here. Here there are three different sets of perspective lines. (1) In the foreground, with all the human figures (including Gertrude Stein, who owned the "Final Sketch", and her life-partner Alice B. Toklas, as well as assorted "freaks, monsters, and mutants", as if on a mesa. (2) In the upper left and right, a sort of city with street grid receding to the horizon. Note the blocky skyscraper on the right, which enhances the illusion of depth by means of superposition. That's Stein and Toklas sitting at the feet of the corresponding shrouded figure on the left. (3) In the upper center, a kind of mountain, also consisting of converging lines; but this time the lines converge above the horizon line, thus creating an illusion of height rather than depth.
Before the invention (discovery?) of linear perspective, artists sometimes used elevation as their principal cue to depth or distance. Consider "Paradise" (1445), a painting by the early-Renaissance Sienese artist Giovanni di Paulo (1398-1482), originally in Florence's Church of San Domenico, but now in the Uffizi Gallery. The painting depicts a number of angels and saints embracing each other in greeting, as if some of them had just arrived. Di Paolo packs a lot of people into a small space, and he uses elevation (and superposition) to convey the sense that some of these groupings are further away from the others. There's no use of linear perspective: note that the figures in toward the bottom of the painting, "nearest" the viewer, are the same height (if anything, a little smaller) than those toward the top, "farther away". Di Paolo knew about linear perspective, and he used it in some of his other paintings. But in this painting, he falls back on techniques more characteristic of Gothic art, to give his vision of Paradise an other-worldly feel (see "A Celebration Not of This Earth" by Benjamin Shull, Wall Street Journal, 02/29/2020).
It's a Bird! It's a Plane!
Elevation, as a distance cue, can lead to interesting visual illusions. On November 9, 2010, local TV stations, and then the national networks, reported on a mysterious contrail that had been observed in the sky in the Los Angeles area. Contrails are formed by condensation from the exhaust of jet or rocket engines, and the fact that this particular contrail seemed to arise out of the sea, headed toward land, gave rise to the speculation that a missile had been launched from a submarine off the coast, perhaps accidentally. The Pentagon denied that there had been any such launch -- but given the massive distrust of "guv-ment" that infected the American citizenry, especially around the time of the 2010 midterm elections, very few in the "missile" crowd was persuaded. Still, a plausible alternative theory was that this contrail was generated by a jet flying east from Hawaii or Asia. The fact is that whatever object was generating the contrail, it was moving much too slowly to be a missile. We may never know what the truth is, but it's easy to see how the illusion of a missile launch could be generated by the elevation principle. Note that the contrail appears to be rising from the sea, but it is also rising from the horizon, and we know that objects on the horizon appear to be distant from the observer. Instead of a missile being launched from the sea and gaining altitude, the control may very well be caused by an airplane flying out of the horizon, toward the observer, maintaining a constant altitude. The next time you're outside on a sunny day, and you have a relatively clear shot to the horizon, look for a jet contrail -- you'll see exactly the same thing.
The "Mystery Missile" episode underscores a point that will be stressed later in these lectures - -that, contrary to Gibson, stimulation is inherently ambiguous, and any given pattern of proximal stimulation may be compatible with a number of different distal stimuli -- in this case, the same contrail could be generated by a submarine-launched missile, or a passenger plane carrying honeymooners home to their families. But we're not there yet!
Aerial Perspective (also called atmospheric perspective): Dust and water particles in the air absorb and diffract light, with the result that distant objects look both hazy and bluish. In the photo of Lake Atitlan, it is clear that the mountains are green, but the more distant ones are decidedly bluish. The effect is exaggerated in the Blue Ridge Mountains, which look blue not only because of aerial perspective, but because spruce, pine, and fir trees emit a sap which dissolves in the air and further enhances the effect.
You get the same effect in the Blue
Mountains in Australia -- although in this case the bluing is
caused by evaporated oil secreted by eucalyptus trees (photo
by Joe Wigdahl, from "Darwin's Forgotten World" by Tony
Perrottet, Smithsonian Magazine, 01/2015)..
One of the earliest uses of aerial (atmospheric) perspective is found in the Penitence of Saint Jerome (1518), a triptych by the Northern Renaissance painter Joachim Patinir. Patinir was one of the first Renaissance painters to specialize in landscapes, as opposed to portraits or paintings on historical or religious themes, and he was admired by Albrecht Durer. In fact, the first use of the term "landscape painter" comes from a remark by Durer about Patinir. Anyway, note that the distant portions of the landscape is given a bluish tinge, increasing the illusion of distance. Patinir's color scheme for spatial recession, beginning with browns for the near distance, green for the middle distance, and blue for the far distance, became a kind of "formula" for depicting distance in 16th-century landscapes.
Gradients: In a variant on linear perspective,
continuous changes in the relative size and compactness of
objects also provide cues to distance. Distant points have
smaller elements, and their elements are more compact.
And again, texture gradients have been used by architects to make their buildings seem taller than they really are. On the left are some Georgian-style townhouses in Dublin, where as you move up from the ground floor the windows become progressively shorter. Yes, the household help lived on the topmost floor, but that wasn't the reason that the windows are small: the windows are smaller to make the houses seem taller. The effect is also clear in this side view of the Duomo (cathedral) in Arezzo, Italy, where the rows of columns get shorter and shorter as they get higher and higher. In this case, the effect is magnified further by the fact that the side street is very narrow, so you're looking virtually straight up.
Here's another example of texture gradients in architecture: Cinderella's Castle at Magic Kingdom Park, part of the Walt Disney Resort in Orlando, Florida. It's the symbol of the park, like Sleeping Beauty's Castle in the original Disneyland. The designers wanted Cinderella's castle to be even taller, but their plans ran up against height restrictions imposed by a nearby airport. So, the castle was built with bricks that get progressively smaller and more compact as they go up, so that the castle looks taller than it actually is. (Thanks to Rhea Marie LaFleur for this one.)
Here's an example that combines texture gradients with linear perspective. It's the interior of a greenhouse at Backyard Farms, in Maine, as shot by Stacey Camp for the New York Times (03/31/2010). You can see the converging lines formed by the planters, and also by the tops of the vines. And while the tomatoes (and the panes of glass in the roof) nearest the viewer are clearly distinguishable, the ones furthest away are not.
another example, from the "golden age" of Dutch painting:
Meindart Hobbema's Avenue at Middelharnis (1689;
National Gallery, London). It's a classic example of what art
historians call "deep central-perspective". Notice that the
trees have been trimmed all the way to the top. You can see
this clearly in the closest trees, but the more distant ones
all blend together, so that they don't look trimmed at all.
Shadowing: While the shadowed portions of objects are hidden from light, the illuminated portions of objects must be situated between light and shadow. Therefore, if we know the location of a light source, patterns of light and shadow in the visual field mark the relative distance of objects from the light, and therefore from the observer.
The human visual
system evolved in an environment in which light from the sun
or the moon illuminates objects from above. Therefore, the top
row of circles looks like bumps in the surface, with their
centers closer to the observer, while the bottom row looks
like dents in the surface, with their centers relatively far
from the observer. If we flip the picture 180 degrees, the top
row now looks like bumps, and the bottom row now looks like
Shading doesn't just contribute to the perception of depth: it also contributes to the perception of form. The circles or discs in the example above don't look like circles. the look like bumps or indentations. V.S. Ramachandran (Nature, 1988; Scientific American, 8/1988) and his colleagues have been studying the principles which govern the perception of shape from shading (for an overview see "Out of the Shadows" by C. Chunharas and V.S. Ramachandran, Scientific American Mind, July-August/2016). Some of these principles are:
- All things being equal, the visual system "prefers" convexity: surfaces like this are more likely to be perceived as spheres than as cavities.
- We generally assume that there is only a single source of light.
- And we also assume that this light source is shining from above.
- Perceivers look for consistency among various cues.
These principles make evolutionary sense. After all, all creatures on earth evolved in an environment where there was a single source of light, either the sun or the moon, which was usually overhead.
A particularly interesting application of the principles of depth perception to creating an illusion of depth is in "3-D" or "virtual" speed bumps sometimes encountered in residential areas to control traffic speed. Instead of the usual physical speed bumps, these are flat pieces of plastic, embedded in the street surface, whose visual appearance conveys the appearance of a fairly nasty object sticking out of the street. They are apparently quite effective -- at least until drivers catch on to the trick! [See "To Slow Speeders, Philadelphia Tries Make-Believe" by Sean D. Hamill,New York Times, 07/12/08.]
A similar tactic, deployed experimentally in Canada in 2010, employs an image of a child playing in the street (thanks to Alex Ren for spotting this).
And finally, a
bit of viral hoax. In December 2015, someone in Brazil
actually painted, on the blank wall adjacent to a traffic
underpass, a fake tunnel of the sort that Road Runner used to
torment Wile E. Coyote in the famous series of Looney Tunes
cartoon shorts -- complete with an image of Road Runner
itself. Subsequently, someone posted the image on the
left to the internet (click on it to get the full treatment),
claiming that someone had actually crashed into the illusory
tunnel, after which the authorities painted it over.
It's a good story, and it's a wonderful visual illusion, but
the story is untrue. Yes, the fake tunnel was painted;
but no, no driver ever crashed into it (in fact, the red Fiat
automobile seen in the top image going through the underpass
is a different model from the damaged one depicted in the
lower left); and the image was painted over before any
accident could happen. The whole expose is detailed
here on the snopes.com website,. But it's still a
pretty convincing visual illusion, no? Here, for your
viewing pleasure, is a link to
a corresponding Road Runner cartoon posted to YouTube.
Speaking of motion, there are also motion cues to depth -- that is, dynamic cues to depth and distance that are produced by the observer's own movement through the environment.
Motion parallax refers to the differences in motion produced by objects at different distances, relative to the viewpoint of a moving observer. As the bicyclist moves from right to left, the world seems to move backwards, from left to right. That much is obvious, but it's actually more interesting than that. Assume that the bicyclist fixates on the tree, in the middle distance. Objects that are closer, like the cow, will appear to move in the opposite direction, but objects that are farther away, like the mountain, will do so more slowly, and may actually appear to move in the same direction. Thus, the speed and direction of apparent motion of objects created by a moving observer is a cue to the distance of those objects from the observer (astronomers use a similar principle to infer the distance of stars).
To simulate motion parallax, hold your two index fingers out in front of your nose, one at full arm's length, the other about halfway. Now close your right eye, much as you did in the demonstration of retinal disparity, and align your fingers with some object, such as this cross. Now keep your hands still, and slowly move your head to the right. When you do this, you'll see that both of your fingers appear to move in the opposite direction, with the closer finger moving leftward farther, and faster, than the more distant finger. Now repeat the action, but this time move your head to the left. This time, your fingers will appear to move to the right.
Optic flow also refers to the movement of images across the retina as the observer moves around the environment. If you're a pilot landing an airplane, objects appear to diverge outwards from a convergence point directly in front of you (this follows from the principles of linear perspective). Objects that are close by, like the near end of the runway, diverge very quickly, compared to distant objects, like the far end of the runway. If you're in the rear car of a train looking out the back window, objects appear to converge inwards toward the convergence point. And, again, nearby objects appear to go by quickly, while faraway objects don't appear to move much at all. So, in both cases, the relative velocity of images across the retina is a cue to the relative distance of the objects.
To simulate optic flow, just walk down a long hallway, and watch what happens to the doors, windows, or lockers that line the corridor. Now repeat the process, walking backward (but be careful not to trip or bump into anything!
Many (but by no means all)
of the cues for depth and distance are summarized in the
- Some cues are ocular i nature, in that distance information is provided by the muscles in the eyes.
- Other cues are optical in nature, in that distance information is provided by the light falling on the retina.
- Within each category, some cues are binocular in nature, requiring two eyes.
- Others are monocular, depending on only a single eye.
- Some optical cues are stereoscopic in nature, such as binocular disparity (which is obviously binocular!).
- Many of the monocular optical cues are pictorial in nature, in that they have been used by artists since the Renaissance to depict three dimensions in two-dimensional spaces such as a canvas (which is why you see so many paintings and buildings in this supplement).
- Other monocular optical cues, such as optic flow and motion parallax, depend on the object (or the observer) being in motion.
Organization of Depth Cues
In summary, some of the
cues to depth are ocular in nature, reflecting
information coming from the muscles in the eye:
- accommodation is a monocular cue, requiring only one eye;
- convergence is a binocular cue, requiring two eyes.
Another binocular cue is stereoscopic
in nature, reflecting the fact that each eye receives a
slightly different image of the object of perception:
- The most prominent stereoscopic cue is binocular (or retinal) disparity; (there are other forms of disparity).
The remaining cues to
depth or distance are optical in nature, reflecting
the physics of vision and the geometries of distance. These
cues are sometimes called pictorial cues, because
they are the same sorts of cues that visual artists use to
give the illusion of depth to a painting on a two-dimensional
- relative size, linear perspective, and the other pictorial cues are all monocular cues.
Some artists have used the pictorial cues in a form of painting known as trompe l'oeil (French, meaning "fool the eye", in which images are painted on a flat surface in such a way as to give an illusion of depth -- not just a representation of a scene, but the actual perceptual experience of seeing objects in three dimensions.
example is The Goldfinch (1654; in the
Mauritshaus, The Hague) by Carel Fabritius, an artist of the
"Golden Age" of Dutch painting. The painting is intended to
be hung high on a wall, with a light source to its left, so
that the painted shadow falls to the right. Most trompe-l'oeil
paintings were still lifes -- because, well, the illusion is
spoiled if there's something that's supposed to move! But in
this case Fabritius apparently figured that a pet bird would
perch somewhere for a long time. The painting was intended
to be hung unframed, so that its background would bled with
the texture as the wall of a 17th-century Dutch house.
(It's the inspiration for The Goldfinch (2013)
by Donna Tartt, which won the Pulitzer Price for fiction for
Another Dutch "golden age" painter of trompe l'oeil was Samuel van Hoogstraten, whose Still Life on a Cupboard Door (1655, in the Academy of Fine Arts, Vienna) is shown here. Van Hoogstraten was famous for painting life-size optical illusions showing a succession of doorways, as in View of a Corridor (1662; National Trust, Dyrham Park, England).. See how many pictorial cues to depth you can identify. I'll show you another van Hoogstraten painting later, when we discuss the Ames Room.
Here's another, done with
wood rather than paint: the "Gubbio Studiolo" of Federico da
Montefeltro, Duke of Urbino, created in the 1480s by the da
Maiano Brothers, master woodworkers, for the duke's
residence in Gubbio, Italy, and now installed at the
Metropolitan Museum of Art in New York City. A studiolo
was a small, private study, fitted with shelves for books
and other objects, and intended to impress a visitor with
the owner's learning and culture. In this one, however,
there's a joke: the five surfaces you see are all perfectly
flat. The da Maiano brothers employed the then-new
techniques of visual perspective to fool the eye into seeing
real shelves with real books, partially open cupboard doors,
a chair with side table, and the like.
Beginning in the Renaissance, painters thought of the picture frame a a sort of "window" through which a scene was viewed -- a scene whose illusory realism was created by liner perspective and other pictorial cues (that's why they're called "pictorial"). But beginning in the late 19th century, with Impressionism and other forms of "modern" art, artists began to abandon these aspects of realism.
- The Impressionists themselves began painting pictures that resembled what the eye takes in "at a glance", rather than trying to faithfully represent what the scene actually is.
- The Cubists, like Picasso and Braque, painted pictures that simultaneously looked at a scene from multiple perspectives.
- And other Modernists painted pictures which abandoned
the window-like frame, breaking down the traditional
distinction between painting and sculpture. As
Gertrude Stein put it, writing about Picasso, "the framing
of life, the need that a picture exist in its frame,
remain in its frame was over", and "pictures commenced to
want to leave their frames".
For a comprehensive survey of the use of perceptual principles in the visual arts, see Art and Illusionists (2015), by Nicholas Wade, a distinguished Scottish perception researcher. In 16 chapters covering everything from linear perspective to ambiguous figures,and from trompe l'oeil to surrealism, Wade shows how the principles of perception enable artists to create illusory three-dimensional worlds on flat surfaces. It's a worthy followup to Art and Illusion: A Study in the Psychology of Pictorial Presentation (1960), the classic treatise on the subject by E.H. Gombrich -- which, if you're really interested in this stuff, you should also read.
As research on motion and depth perception shows, in many cases all the information required for perception is supplied by the entire pattern of proximal stimulation available to the observer. Especially important is comparison between objects, and between objects and their backgrounds. Also relevant is information from the kinesthetic and vestibular senses, which permit a comparison between information processed by the distance senses and by the deep senses.
Perceiving Depth in Casablanca
From the Renaissance onward, visual artists have used the monocular optical cues to give the illusion of depth in their two-dimensional paintings. But the same cues have been employed in other circumstances as well.
Consider, for example, the last scene in the classic film Casablanca (1942, directed by Michael Curtiz), starring Humphrey Bogart (as American expatriate nightclub owner Rick Blaine) and Ingrid Bergman (as his former lover Ilse Lund, now married to Victor Laszlo, a leader of the anti-Nazi resistance, played by Paul Henreid), which takes place at the eponymous city's municipal airport. Due to wartime restrictions on access to airfields and the availability of aircraft, this scene could not be shot on location. Instead, it was shot on a sound-stage with a plywood mock-up of the airplane, using fog to obscure the artificiality of the whole thing.
In order to foster the illusion of distance, the mock-up was smaller than scale, and the ground crew was played by dwarfs -- actors of unusually small stature who were otherwise well-proportioned (probably some of the same "little people" who played the Munchkins in The Wizard of Oz). The same trick had been employed in an earlier stage version of the film, titled Everybody Goes to Rick's.
See Round Up the Usual Subjects: The
Making of Casablanca -- Bogart, Bergman, and World War II
by Aljean Harmatz.
Summary of the Ecological View of Perception
The theory of direct perception considers perception to be an innate mechanistic process, analogous to the S-R theory of learning or the psychophysical analysis of stimulus detection.
- All the information needed for perception is provided by the stimulus.
- With the proviso that the "stimulus" is defined broadly to include the entire pattern of proximal stimulation available to the perceiver, including information from the perceiver's own body as well as the external environment.
- The stimulus provides information for perception; the perceptual systems have evolved to extract this information. These mechanisms are part of the organism's innate biological endowment.
- Thus, perception requires little or no learning on the part of the organism, and little or no involvement of "higher" mental processes involved in judgment, memory, or inference based on prior experience. Perception is not mediated by cognition, which is why the ecological view is sometimes called direct perception.
- The information in the environment is sufficient to enable us to perceive the world the way it really is, which is why the ecological view is sometimes called direct realism.
For example, in a classic experiment by Eleanor J. Gibson and Richard Walk, neonates were observed to avoid a visual cliff on their first encounter with it. Noticing a visual cliff, and avoiding falling from it, requires perception of distance. Gibson and Walk argued that infants accomplish this immediately, without benefit of learning, and without benefit of judgment or inference. Their perceptual systems are built to extract information about depth and distance from the environment, and they do so automatically.
Is the Visual Cliff Really a Matter of Innate Depth Perception?
UCB's Prof. Joseph Campos has argued that Gibson and Walk erred in their conclusions from the "visual cliff" experiment. He noted that the infants were encouraged to crawl toward their mothers, who were situated on the other side of the cliff from the child. Campos argues that the infants avoided the cliff because they picked up on their mothers' facial and vocal expressions of anxiety, not because they innately perceived depth.
The Constructivist Approach to Perception
The ecological view of perception is a theory of veridical perception, and it specifies a set of perceptual mechanisms that allow us to perceive objects as they exist in the world by extracting the information they make available to us. However, mechanisms of this sort cannot be all that are involved, because sometimes we do not see the world as it really is. Moreover, as the pioneering American cognitive psychologist Jerome Bruner noted, sometimes the perceiver must go"beyond the information given" by the stimulus.
to an interview with Jerome Bruner.
exemplifies a long-running tradition in perception, known as
the constructivist view -- because it holds that
perception isn't given by the stimulus, but rather is actively
constructed by the perceiver.
- In the 19th century, Hermann von Helmholtz argued that perception was mediated by unconscious inferences made by the perceiver.
- In the 20th century, Richard Gregory, a British psychologist, argued for the constructivist viewpoint in his book The Intelligent Eye.
- Julian Hochberg, an American psychologist, argued for the constructivist viewpoint in many articles collected in a volume entitled The Mind's Eye.
- Irvin Rock, another American psychologist (who spent the last years of his career at UC Berkeley), wrote a book on perception entitled Indirect Perception, directly countering the Gibsonian ecological viewpoint.
So sometimes the correct perception isn't conveyed by the stimulus. For example, the stimulus information provided by the sensory apparatus may be insufficient or misleading. Under these circumstances, the information from stimulation must be supplemented with conceptual information and other world-knowledge retrieved from memory. Under these circumstances, perception is not direct. Rather, it involves inference. Perception is intelligent, not mechanistic, in that it involves knowledge of the world, and requires active thinking and problem-solving on the part of the perceiver.
This point was underscored in the 1920s and 1930s by a group of perception theorists, including Kurt Koffka, Wolfgang Kohler, and Max Wertheimer, known as the Gestalt school of psychology. "Gestalt" is a German word that roughly translates as "whole configuration", and the Gestalt psychologists focused on the tendency of the mind to organize individual stimuli into groups or sets -- in other words, to fuse stimulus elements into a perceptual whole. Like the functionalists, the Gestalt theorists were opposed to the structuralists. From a Gestalt point of view, we cannot analyze perceptual experience into its elementary constituents, because the elements interact with each other in such a way that
"The whole is different than the sum of its parts"
(sometimes rendered as the whole is greater than the some of its parts).
Perception will be as good as stimulus conditions allow.
More recently, the American perception psychologist Julian Hochberg (1974, 1978) modified the Law of Pragnanz with the minimum principle:
We perceive the simplest or most homogeneous organization that will fit the pattern of sensory stimulation.
Perception must account for the stimulus, but perception involves more than unpacking the stimulus array. The Law of Pragnanz and the minimum principle are not in the stimulus -- they are in the mind of the perceiver.
In their research, Wertheimer and other Gestalt psychologists identified a number of "laws" of perception which came to be known as the classical Gestalt principles of perception.
According to the principle of good continuation, perception avoids abrupt shifts in direction. Thus, in the figure we tend to see a curve crossed by a straight line rather four lines, two curved and two straight, that intersect at a point.
More recently, cognitive psychologists such as Irvin Rock (working first at Rutgers, then at Berkeley) and Steven Palmer (at Berkeley) have discovered a number of new principles supplementing the classical ones.
Many of the Gestalt principles come
together in the "Kanizsa figure" and similar illusions. Not
only do we see a triangle pointing downward instead of three
acute angles (an example of closure,
creating subjective contours which exist in perception
but not in the stimulus), but we also see another triangle,
pointing upward, created by the three "Pac-Men". The
triangle, of course, is not in the figure. It is created by
our visual system. There is nothing about the stimuli
themselves that requires these organizations. Many different
organizations of stimuli are possible, but according to the
Gestalt psychologists, the visual system, operating according
to Gestalt principles, creates (or prefers) one organization
over the others.
The Constellations and Gestalt
In some respects, the Gestalt principles are illustrated by the stellar constellations, groups of stars that seem to make up "pictures" in the sky. Every culture identifies some constellations in the night sky, though every culture has a somewhat different set, and sometimes the same patterns receive different names in different cultures. The most familiar of these, perhaps, are the Big Dipper, Little Dipper, Orion, and the constellations that make up the 12 signs of the zodiac. Many of these do, indeed, look like the objects after which they are named. And it is tempting to see in them such Gestalt principles of grouping by proximity, good continuation, and good form.
The earliest reference to the constellations
is in the Phaenomena, written by Aratus, a Greek
poet, about 270 BCE -- though the poem makes clear that the
idea of the constellations had already been around for a
long time. The likeliest source are ancient Sumerians and
Babylonians, as early as the 7th century BCE (both
Mesopotamia and Greece lie north of the equator, which helps
explain why these civilizations did not identify any
constellations in the southern hemisphere). Another ancient
Greek document, Ptolemy's Almagest, from 150 CE,
lists 48 constellations. In 1922 the International
Astronomical Union produced an official list of 88
constellations covering the entire sky, both northern and
southern hemispheres. Because these constellations are
intended to include every visible star, they really
don't look like the objects after which they're named.
Rather, they serve as convenient ways to identify a region
of the night sky.
In the Church
of San Lorenzo, in Florence, a fresco in the
cupola above the high altar depicts the night sky
over the northern hemisphere (the church itself was
designed by Filippo Brunelleschi, who was also
responsible for the great dome of Florence
Cathedral). The painting was supervised by
Paolo dal Pozzo Toscanelli, a Florentine astronomer.
The scientific import is revealed by the extreme
precision with which the celestial bodies are
positioned. The position of the planets with respect
to the constellations represents the sky over
Florence at on July 4, 1442, which was the day that
the King of Naples entered Florence.
|Another constellation fresco is found in the Pazzi Chapel in the Basilica of Santa Croce, also in Florence.|
Far from organizing patterns of the stars, archaeoastronomer Bradley E. Schaefer suggests that the constellations were projected onto the night sky as a convenient way of mapping the cosmos for astrological and other purposes (see "The Origin of the Greek Constellations", Scientific American, 11/2006).
Feature Detection and Pattern Recognition
We usually think of sensation as the most elementary of mental processes. But even at the level of detection, "higher" mental processes of memory and thought are involved, as we seen in the theory of signal detection. We get further evidence of the role of higher mental processes After the sensory processes have done their work detecting stimuli in the environment and transforming their energies into neural impulses, perceptual processes take over to build an internal, mental representation of the stimulus field. It is at this point that we move from sensation to perception -- not just "Is there a stimulus?", and "How intense is it?"; but also "What is it?", "Where is it?", and "What is it doing?".
After a sensory impulse has reached the
cortical projection area (and perhaps even before that time,
in the sensory tract), the first stage in perception is
feature detection: analysis of the stimulus to extract
elementary features.This is followed by pattern
recognition, which allows the perceiver to identify some
combinations of elementary features as familiar and
meaningful, and others as novel or meaningless.
idea of feature detection followed by pattern recognition is
closely associated with computer models of cognition that
began to be developed in the 1950s and 1960s. But work
on the neurophysiology of the visual system was also
influential. In a classic experiment by Jerome Lettvin
and his colleagues, various visual stimuli were presented to a
frog while they recorded the activity of specific fibers in
the frog's optic nerve. They discovered that certain of
these fibers were responsive only when certain stimuli were
- For example, one set of fibers was responsive only to
the presentation of a sustained contrast -- an edge that
divided space into light and dark regions. Think of these
as shadow detectors.
- Other fibers became active only when the frog was
presented with a net convexity -- that is, a dark dot
presented against a light background, or a light dot
presented against a dark background. These fibers are now
commonly known as the bug detector for the
- There were other fibers that responded only when an edge
moved across the visual fiend, and still other fibers that
responded only when the illumination was reduced.
So it appeared that the frog's visual system is organized in such a way as to analyze its environment into elementary features -- edges, dots, moving edges, and changes in illumination. Extrapolating just a little bit, Lettvin jokingly suggested that in humans, we might have grandmother cells -- fibers in our visual system that responded only to the appearance of our grandmother.
At roughly the same time, David Hubel and Torsten Wiesel were doing similar experiments, recording the activity of single cells in the visual cortex (not the optic nerve) of cats. Again, the idea was to present particular stimuli in the cat's visual field and then record the activity of single neurons, or very small bundles of neurons, in response to them. And, like Lettvin, they found that there were certain cells in the visual cortex that became active when the animal was presented with particular stimuli:
- to points of light against a dark background (or points of darkness against a light background),
- to edges (boundaries between light and dark regions), and
- to bars of light or darkness.
Within each stimulus category, individual cells were further differentiated by more specific qualities:
- angle of orientation (horizontal, vertical, etc.), for example, or
- stability or movement, or
- direction of movement.
- Some cells responded to vertical bars not horizontal
- others to stationary points but not moving ones;
- others to points moving to the right but not points
moving to the left.
- And so on.
Detailed study revealed
three basic kinds of feature-detecting cells.
- Simple cells respond to a particular stimulus appearing in a circumscribed area of the field (for example, a point of light in the upper-left quadrant). Thus, simple cells report location as well as feature.
- Complex cells respond to a particular stimulus (e.g., a point of light) appearing anywhere in the field; thus, they report only the presence of a feature, not its location.
- Hypercomplex cells respond to combinations of simple features, such as form corners, curves, and angles; they also respond to size. (More recent evidence suggests that "hypercomplex" cells are really a special class of simple cells.)
feature-detectors have also been found in the auditory system.
For example, there are individual cells in the auditory nerve
that are maximally responsive to particular frequencies.
Similarly, individual cells have been found in the auditory
cortex of the monkey that respond to particular auditory
- pure vs. complex tones,
- clicks, and noise;
- increases and decreases in pitch; and
- the onset vs. offset of sound.
For their work, Hubel and Wiesel shared
the Nobel Prize for Physiology or Medicine with Roger Sperry
for pioneering work on the physiology of the visual system.
The next step in
perception is pattern recognition. Pattern recognition
processes take as their input the output from the feature
detectors. Thus, while the feature detectors analyze stimulus
input (the proximal stimulus) into a list of its constituent
features and the spatial relations among them, the pattern
recognizers synthesize a mental representation of the distal
- Feature detectors are innate: they are part of the genetic endowment of the organism, a product of the evolution of the species.
- By contrast, pattern recognition processes are acquired: they are shaped by the organism's sensory environment, as the organism learns to recognize stimulus patterns which have meaning.
Again, pattern recognition has been studied most extensively in the visual system. A good example is the orthography of written language. Remember that while spoken language is a product of biological evolution, written language is a cultural product. We have brains prewired for spoken language, but not for written language, which is why learning a written language can be so hard while learning a spoken language is so easy.
Anyway, in principle all of the letters
in an written language can be decomposed into a small set of
features. In English, for example, all the letters are
composed of some combination of just 7 elementary features: 3
types of lines, vertical, horizontal, and oblique; two kinds
of angles, right and acute; and two kinds of curves,
continuous and discontinuous.
For example, in English orthography the uppercase letter "A" is composed of one horizontal line, two oblique lines, and 3 acute angles:
By contrast, the letter "B" is composed of 1 vertical line, 3 horizontal lines, 4 right angles, and 2 discontinuous curves:
The letter O is composed of a single continuous curve:
And the letter "R" is composed of 1 vertical line, 1 oblique line, 2 horizontal lines, and one discontinuous curve:
We learn these patterns as we learn to read English. And we are not really conscious of these orthographic rules; nevertheless, they underlie our ability to read.
Other languages have different, unique orthographies.
For example, German has a letter, "SISSET", which stands for a double-s, "ss". The letter looks like an English "B", but has a little "tail" created by the fact that the lower discontinuous curve isn't connected to the vertical line by a horizontal line:.
To look at the orthographies of other languages, check the "Alphabet Table" which comes with most good college dictionaries (usually found under "A" for "alphabet").
- the letter "GAMMA" is composed of one horizontal and one vertical line;
- "PI" of one horizontal and two vertical lines;
- The letter "THETA" is composed of one continuous curve and one horizontal line;
- "PHI" is composed of one continuous curve and one vertical line;
- "PSI" (as in psy-chology) is composed of a discontinuous curve and a vertical line;
- And "OMEGA" is composed of a discontinuous curve and 2 horizontal lines.
Notice that the Greek letter "RHO" looks like the English letter "P", but has an entirely different pronunciation.
- the letter "ZHE" is composed of two discontinuous curves (or perhaps two pairs of lines meeting at oblique angles), one horizontal line, and one vertical line;
- "TSE" is composed of 2 vertical lines, 1 horizontal line, and 1 oblique angle;
- "SHE" of 3 vertical lines and 1 horizontal line;
- "SHCHA" of 3 vertical lines, 1 horizontal line, and 1 oblique angle;
- "EE" is composed of 2 horizontal lines and one discontinuous curve;
- And "YA" of 1 vertical line, 1 acute angle, and 1 discontinuous curve.
Notice that the Russian letter "YA" looks like the English letter R, only backwards; but it's a vowel, not a consonant. Similarly, the Russian letter "ER" looks like both the Greek letter "RHO" and the English letter P.
Chinese and Japanese employ ideographs instead of strings of letters to stand for words.
The letters of Greek, Russian, Hebrew, and Arabic are simply meaningless to someone who doesn't know the language -- "it's all Greek" to them. The orthographic rules must be mastered laboriously in order to read or write in the language; but once we become fluent readers and writers, they become unconscious and we can read or write them automatically.
The process of pattern recognition in reading continues beyond the stage of letter recognition. Letters combine according to spelling-pattern codes (e.g., in English, the letter Q is always followed by the letter U), then into words, and then into word-group codes (e.g., words like a,an, and the are always followed by a noun). But in reading words, skilled readers don't just piece individual words together -- rather, they recognize words as wholes. When we learn to recognize words in some language, we are engaging in pattern recognition at a somewhat higher, more automatic, level.
Analogous processes have been observed in the auditory case. A powerful example of auditory pattern recognition is found in the phonemes of spoken language. Phonemes are the smallest units of speech: all speech sounds are composed of a finite set of phonemes, which in turn are produced by certain articulatory features.
English phonology is composed of some 40 phonemes, which in turn represent various combinations of just 16 articulatory features. The speech-perception apparatus extracts these features and recognizes various combinations of these features as familiar.
turn to your college dictionary, and find the section on
pronunciation. There you will find a list of about 40 vowels
and diphthongs (for example, the "e" in "silent" has the same
sound as the "o" in "connect"; the "c" in "race" has the same
sound as the "s" in "loose", etc. As English acquires new
words from other languages, its number of phonemes increases
progressively. But there are about 40 in the basic set of
English phonemes, and the others are phonemes from foreign
languages which have been imported into English.
Each phoneme, in turn, represents a particular combination of articulatory features, or positions of the tongue, mouth, teeth, etc. when pronouncing them.
The English vowels are classified according to:
- the part of the tongue used in pronouncing the phoneme,
- and the height of the tongue in the mouth
- moderately high
- moderately low
With respect to the English
- there are 5 types of articulation:
- laterals, and
- which are combined with 8 positions of articulation:
- velar, and
Don't bother trying to
memorize these terms. But pay attention to what goes on in
your mouth when you pronounce the following consonant-vowel
- plosives like "PA" and "BA" are bilabial, involving both lips, while
- plosives like "TA" and "DA" are alveolar, where the tip of the tongue contacts the ridge of the gum;
- the nasal consonant "MA" is bilabial, while
- the nasal consonant "NA" is alveolar.
- the fricative "WA" is bilabial, while
- the fricatives "FA" and "VA" are labiodental, where the upper teeth contact the lower lip.
In some sense, when we perceive the
difference between consonants like the "M" in "MA" and the "N"
in "NA", what we are perceiving are the differences between
the articulatory movements produced by the speaker's vocal
Note to Linguistics Students
- Hawaiian has just 14 phonemes, which is one reason that Hawaiian words are so long.
- German has a phoneme, the ch sound in Ach!, that does not appear in English.
- Russian has another phoneme, transliterated as shch, that likewise does not appear in English.
- Chinese has two spoken forms,Mandarin and Cantonese, that differ in terms the rise and fall of pitch.
There are many other examples. Just as readers learn to recognize certain letters as meaningful, so speakers learn to recognize certain sounds. It's all pattern-recognition.
Just as written words are composed of multiple letters, so spoken words are typically composed of multiple phonemes. Just as we recognize patterns of letters as meaningful words in written language, so we recognize patterns of sounds as meaningful words in spoken language. Again, this is pattern recognition at a higher level.
Feature detection and pattern recognition exemplify what is known as bottom-up processing in perception, also known as data-driven or perceptually driven processing, which take a low-level representation (like a letter) as input and generate a higher-level representation (like a word) as output. In a theory of visual perception offered by David Marr, there are four such levels of processing: information extracted from the retinal image is used to generate a representation of the visible surface of a scene, which is then used to identify the object, when is then used to categorize the object.
However, from the constructivist point of view bottom-up processing isn't all that's involved in perception. Consider the word-letter phenomenon uncovered by (Johnston & McClelland, 1974). These investigators were intrigued by another phenomenon, known as the word superiority effect, in which subjects find it easier to distinguish between words such as COIN and JOIN than they do between letters such as C and J. This is, of course, counterintuitive, because in order to recognize a word like COIN you've already got to recognize the letter C. Johnston & McClelland asked their subjects to detect the presence of a letter (e.g., C or J) in strings of four letters. Some of these four-letter strings were actual words (like COIN), whereas others were random strings (e.g., CPRD). Half the subjects were instructed to "try to see the whole word", while the other half were told to "fixate" on a particular letter position -- in fact, the precise position where the target letter was going to appear. The results were very striking. When the array consisted of actual words, subjects performed better when they were instructed to see the whole word than when they were informed in advance exactly where the target letter was going to appear. The reverse effect was obtained when subjects had to deal with random letter strings. Put simply, it was easier for subjects to see particular letters when they were presented in the context of words. Somehow, the word influenced the perception of its constituent letters.
The implication is that, in addition to "bottom-up" processing, there is also top-down processing (also known as conceptually driven, hypothesis-driven, or expectation-driven processing. "Top-down" processes take input from a higher-level representation, such as a word, and generate a lower-level representation, such as a letter.
These spatial metaphors,
illustrate the constructivist principle that the final percept
is the product of two different sources of information, or the
interplay between two kinds of processes:
- "bottom-up" processes, involving sensory information coming from the periphery;
- input from the distal stimulus in the current environment, extracted from the proximal stimulus by feature-detector mechanisms;
- "top-down" processes, involving conceptual information coming from central structures.
- knowledge derived from previous experiences, and retrieved from memory, by which we recognize patterns of features as meaningful.
For this reason, Ulric Neisser has characterized perception as the point in the mind
where cognition and reality meet.
The Perceptual Constancies
The contribution of the perceiver is also revealed by the perceptual constancies.
In size constancy, the perceived size of an object is does not change as its distance from the observer changes. In some ways, this is surprising, because the perceived size of an object is a function of the size of its retinal image, and retinal size varies with the distance between the observer and the object of regard. Therefore, as an object moves closer its retinal image gets larger, and as it moves away, its retinal image gets smaller. However, under natural viewing conditions moving objects do not appear to change in size.
In shape constancy, the perceived shape of an object is invariant over changes in the shape of its retinal image. The shape of a retinal image often changes when an object undergoes a spatial transformation: when a door opens, its retinal image changes from rectangular to trapezoidal to (almost) linear. But again, under natural viewing conditions, perceived shape remains invariant over spatial transformations. We see the door opening and closing, but we do not see it change shape.
In the perceptual constancies, the pattern of proximal stimulation changes, but the perception of the distal stimulus remains constant. Therefore, perception is not entirely driven by the stimulus.
In many cases, perceptual constancy reflects an automatic correction of the stimulus input. When we survey the environment, we don't just perceive the object of regard; we perceive it against its background, and these background stimuli can provide distance cues. The perceptual system then takes distance cues into account to make inferences about size, speed, and shape,given the perceived distance from the observer to the object.
In some sense, then the perceptual constancies are not completely inconsistent with the ecological view: Gibson always insisted that it was the entire pattern of stimulation, including figure and ground, that provided the information needed for perception. Viewed against the background of trees and other features of the landscape, it is clear that the lion is coming closer, and not changing in size.
But even so, the visual system is using information to make what Helmholtz called unconscious inferences about the scene. The perceptual system is performing certain calculations -- applying the size-distance rule, for example. But we are not aware of performing these calculations, and if we were asked we could not specify what they are. Still, the perceptual constancies indicate that we are making them nonetheless. These calculations are part of the cognitive contribution to perception. They indicate that not all the information for perception is available in the stimulus array. Some of it has to be calculated by the observer. These procedures, stored in memory, represent part of the cognitive contribution to perception.
Reversible (Bistable, Ambiguous) Figures
The same point is made, in the opposite way, by the reversible (or ambiguous, or bistable) figures, which can be perceived in two or more quite different ways.
In the Rubin vase, the observer sees a white goblet or vase against a black background. Look at if for a while, and see what else you see: a pair of profiles in silhouette, against a white background.
A real-life variant on Rubin's figure is a porcelain vase commissioned for the Silver Jubilee (25th anniversary) of the coronation of England's Queen Elizabeth II in 1977. The vase has been cut in such a way as to display the profiles of the Queen on the right and Philip, the Prince consort, on the left.
In the Necker cube, discovered in 1832 (by Louis Necker, a Swiss crystallographer, who initially saw this effect in some crystals he was examining under a microscope), one face (A or B) initially appears closest to the observer; then, after a while, the figure "flips" so that the other face (B or A) now appears closest.
The same effect was very popular in ancient Greece and Rome. On the right, a mosaic panel found in a house in Antioch, Greece, dating from the 2nd century BCE (from Gombrich, Art and Illusion, 1960). On the left, a floor from the "House of the Faun" in Pompeii, from before 79 CE (which is when Pompeii was destroyed by the eruption of Mount Vesuvius).
Here it is in the Art Deco-inspired floor of an apartment building in New York City (photo from the New York Times, 10/11/2013). Imagine walking on that floor all the time! On the right is another example: the hallway of William James's house in Cambridge, Massachusetts, circa 1981. I can't vouch that the floor was there when James lived in it, but the residents at the time, William and Kay Estes -- both famous psychologists -- tried hard to preserve the original appearance of the house (photo from the Observer published by the Association for Psychological Science, 02/2015).
The same "tumbling blocks" or "baby blocks" effect was also used by a Native American artist when painting the walls at the Mission San Xavier del Bac on the Tohono O'odham Indian Reservation in Tucson, Arizona, in the late 18th century. The walls can be seen on the sides of the interior image, on the right (photo by William Steen, New York Times 12/29/2013). According to a senior docent at the Mission, the decoration dates from the building of the church in the late 18th century, and that originally the Antioch blocks design was continued on the wooden floor. As disconcerting as it might be to walk on such a floor, as noted immediately above, he believes that the intent was to induce a feeling of disorientation in the worshipers, increasing their openness to the church's message.
This painting of The Last Supper, on the wall of the Mission's nave (posted anonymously, so far as I can tell, on Flickr), gives some idea of what the Missions original floor might have looked like. On the right: another view down the nave, by Marty Straub and posted to the scenicusa.net website.
And it appears in this Midwestern Amish pieced quilt (c. 1940, quilt-maker unknown; private collection, photograph from the America Hurrah Archive, reprinted in the "Quilts 2004" calendar, Ziga Design).
The San Francisco artist Kristin Farr has made good use of the "tumbling blocks effect:
Here, a vinyl print of a painting used as a mural for wall on Market Street in San Francisco, near 7th street (it's temporary, as of August 2015).
Here, on a mural for the Urban Outfitters store in Honolulu.
In the Boring
figure, originally drawn by a cartoonist for the
British humor magazine Puck (1915) and brought to
the attention of psychologists by E.G. Boring (1930), you see
a young woman, looking demurely away from you. Look at it for
a while, and see what else you see. You also see an old
woman, looking down and scowling. The original caption
for this cartoon was "His Wife and His Mother-in-Law", and so
this is sometimes known as the "Wife/Mother-in-Law
Figure". As described by E.G. Boring (1930), the drawing
"shows in one figure the left profile of a young woman,
three-quarters from behind. the other figure is an old woman,
three-quarters from the front. The ear of the 'wife' is the
left eye of the 'mother-in-law'; the left eyelash of the
former is the right eyelash of the latter; the jaw of the
former is the nose of the latter; the neck-ribbon of the
former, the mouth of the latter".
A cartoon in the New Yorker by Paul Noth (07/03/2017) cleverly combined Jastrow's duck-rabbit and Boring's wife/mother-in-law. An earlier cartoon, from December 14, 2009, and January 4, 2010, also used the duck-rabbit. For more on the duck-rabbit figure, see my articles, "Joseph Jastrow and His Duck -- or Is It a Rabbit?" and "Provenance of the Chef-Dog Reversible Figure", both posted to my website.
Similarly, this ulu, or cutting tool, used by the indigenous Yupik people, and discovered at the Nunalleq archeological site in Alaska. The handle looks like a whale, with its head on the right; and a seal, with its head on the left. (Image from "Racing the Thaw" by A.R. Williams, National Geographic, 04/2017.)
figures are frequently employed as artistic devices, for
example by M.C. Escher, the Dutch painter, in many pictures.
In "Sky and Water I", we see fish against a background of
birds, and vice-versa.In "Circle Limit IV", the observer can
see either white angels against a black background or black
devil-like bats against a white background. For a recent
documentary, see "M.C. Escher: Journey to Infinity" (2021),
reviewed by Ben Kenigsberg ("Is It Art?", New York Times,
|In the 1970s, antinuclear activists used a version of "Earth and Sky" in a poster, representing atomic bombs and doves of peace.||And in response to the Gulf Oil Spill disaster, Bob Staake did this cover for the June 5, 2010 issue of The New Yorker: "After Escher: Gulf Sky and Water.|
Salvador Dali used reversibility to great effect in this
painting: ."Slave Market with Disappearing Bust of Voltaire"
(1940). The portrait of Voltaire is based on a bust of
the French philosopher by Houdon (1778).
In the "Mask of Love" illusion, created by Gianni Sarcone and his colleagues, a woman's face, surrounded by a Venetian-style mask, can also be perceived as a man and a woman kissing. This image won the 2011 "Visual Illusion of the Year" contest sponsored by Scientific American -- even though bistable figures aren't, technically, illusions.
The work of the pioneering Pop artist Jasper Johns produced whole series of works on themes of flags, targets, maps, and numbers. Reversible figures are a frequent feature of his work -- especially the Rubin vase, which appears all the time in his work. In "Cup2Picasso" (1973), the silhouette is of Pablo Picasso; in "Untitled" (2000), I'm pretty sure Johns has co-opted, and distorted the souvenir vase from Queen Elizabeth's Silver Jubilee, as described above.
In all the reversible figures, the same stimulus can be perceived in at least two quite different ways, "depending on how you look at it". Just as in the perceptual constancies, perception remains constant despite transformations in the stimulus, so in the reversible figures perception varies even though the pattern of proximal stimulation remains constant. Either way, the observer is going beyond the information given in the stimulus. Perception is not driven exclusively by stimulation.
The Perceptual Illusions
According to the ecological view, the perceptual systems have evolved in such a way that we directly perceive the world as it really is. But in the perceptual illusions, we perceive things that aren't there.
In the Muller-Lyer illusion, created by Franz Muller-Lyer, a German psychiatrist (1889), the line with the "feathers" looks longer than the line with the "arrowheads", even though the two horizontal lines are precisely the same length.
In the Ponzo illusion,
created by Mario Ponzo, an Italian psychologist (1913), the
converging lines created by the "railroad tracks" make it seem
that the upper horizontal line is longer than the lower one.
The principle of the Ponzo illusion involves the same
unconscious inferences as in the Muller-Lyer illusion.
is the boomerang illusion popularized by Joseph
Jastrow (1892; he of the "duck-rabbit" figure described
above), based on earlier versions by Muller-Lyer (1889) and
Wilhelm Wundt. The lower "boomerang" appears larger,
although the two figures are identical in size. The
boomerang illusion is the basis for a popular magic trick,
described by Dr. Peter Prevos in The
Jastrow Illusion and Magic: A Treatise on the Boomerang
The Muller-Lyer and Ponzo
illusions illustrate the operation of Helmholtz's "unconscious
inferences" in perception -- this time, to create a false
- In the Muller-Lyer illusion, the feathers of the upper figure act like converging lines of perspective, creating the impression that the upper figure is farther away from the viewer than the lower figure.
- However, the two lines are actually equidistant from the viewer, they cast images of precisely the same length on the retina.
- Nevertheless, the visual system compensates for the depth cues
- Thus the inference:
- If the upper line is farther from me than the lower line; and
- If the image of the upper line is the same length as that of the lower line;
- Then, by virtue of the size-distance rule, the upper line must be longer than the lower line.
- Something similar goes on in the Ponzo illusion, where the "railroad tracks" act like converging lines of perspective, creating the impression that the upper line is farther away from the viewer than the lower line.
- However, the two lines are actually equidistant from the viewer, they cast images of precisely the same length on the retina.
- Nevertheless, the visual system compensates for the depth cues
- Again, the inference:
- If the upper line is farther from me than the lower line; and
- If the image of the upper line is the same length as that of the lower line;
- Then, by virtue of the size-distance rule, the upper line must be longer than the lower line.
Not all illusions capitalize on misleading depth cues. Sometimes the illusion is created by the influence of the surrounding context.
In the Poggendorff illusion, the top and bottom lines appear to be displaced, even though they are actually connected. The illusion was introduced by J.C. Poggendorff, a physicist (1860), based on observations by J.C. Zollner, an astronomer who noticed the effect in a fabric pattern.
The Poggendorff illusion plays a role in the British flag, known as the Union Jack (because it combines the English Cross of St. George) with the Scottish Cross of St. Andrew and the Irish Cross of St. Patrick). When you look at the Union Jack, you think you see the diagonal red bars meeting at the center of the flag. They actually don't meet, but we see that they do because the Poggendorff illusion effectively compensates for the physical displacement. The effect is so strong that the British have to be deliberately instructed how to draw their own flag: most people draw it the way they see it, rather than the way it really is.
the Ebbinghaus illusion, the two circles are
the same diameter, but the one surrounded by small circles
seems larger than the one surrounded by large circles. The
figure was invented by Hermann von Ebbinghaus (1897) but
popularized by the pioneering American structuralist E.B.
Titchener (1901, so it is sometimes known as the Titchener
The horizontal-vertical illusion is used to good effect in the Gateway Arch by the architect Eero Saarinen (1947), in St. Louis, Missouri, located in a park on the banks of the Mississippi River (also known as the Gateway to the West). The Gateway Arch is based on the catenary arch -- the shape assumed by a suspended rope or a chain -- but in this case the base of the arch is precisely equal to its height -- but it doesn't look that way.
Visual illusions also play a role in many instances of "Op" and "Pop" Art. Avery striking example is "Seven Sequences of the Movement of the Translational Motion of Red and Blue Segments" (1959), by the Argentine artist Julio Le Parc. Two bars come together in the middle panel to form the classic illusion, then retire to their corners, as it were. (For more on Le Parc, see "At 90, This Artist is Still Opening Doors of Perception" by Holland Cotter, reviewing an exhibition of Le Parc's work, "Julio Le Parc 1959" at the Met Breuer museum in New York City, New York Times, 01/25/2019.)
In the Helmholtz illusion, a
square composed of horizontal stripes looks taller, and
thinner, than one composed of vertical stripes. I don't
see it myself, but many people do (and Helmholtz did!), never
mind that it seems to contradict the horizontal-vertical
When Helmholtz reported his discovery, in 1867, he made an offhand comment that women who wear horizontal stripes would look taller than those wearing vertical stripes. Of course, that's seems to run counter to the advice commonly given to both women and men, that horizontal stripes make people look fatter. Nevertheless, Thompson and Mikellidou (2011) put Helmholtz's idea to the test, with both 2- and 3-dimensional models, and found that, in fact, horizontal stripes made the figure look taller. Who knew?
Other illusions are created by a misapplication of the principles of constancy, which also create an illusory sense of depth or distance.
role of constancy, and the size-distance relation, is seen to
good effect in the Moon illusion, as
represented in this (altered) photograph of the moon rising
over the Berkeley hills, with Alcatraz Island in the
foreground. The illusion is that the moon looks larger when
viewed at the horizon than when viewed at its zenith, and it
is created by the misapplication of distance cues. By virtue
of elevation, objects near the horizon
appear farther away than objects that are far from the
horizon. Therefore, the moon on the horizon looks farther away
than the moon at zenith. But the retinal size of the two moons
remains constant. Therefore, the perceptual system "concludes"
that the moon at the horizon must be larger. This is an
unconscious inference, in Helmholtz's terms, but it is an
The Moon Illusion
The moon illusion is one of the most frequently encountered visual illusions in nature. Most other visual illusions are manufactured in some way, leading direct realists like Gibson to discount their performance.
Most of the photographs on this page representing the moon illusion were taken on or about the Winter Solstice, 1999, at the time of a "celestial confluence " in which the Moon was at perigee, the closest it comes to the Earth in its monthly cycle (and, on this occasion, the closest it came to the Earth all year), at the same time as the Earth was approaching perigee with respect to the Sun (this actually occurred on 01/03/00). This particular confluence, of lunar and solar perigees at the time of the Winter Solstice, occurs only once every 133 years, and thus aroused wide interest -- hence the many photographs. The confluence effectively increased both the size and the brightness of the moon somewhat, but these changes were invisible to the naked eye. The major result of the confluence was meteorological: extremely high tides raising the possibility of severe flooding.
Photographic Representations of the Moon Illusion
|Moon over Wuhan, China, at the time of
the Mid-Autumn Moon Festival.
Photos by Zuma Press,West County Times, 10/01/06
|"Moon Rising: 12:29 a.m.", photography by Mark Jaremko, which appeared in the San Francisco Chronicle, 10/11/2009|
|Here are two images of the "Harvest Moon" (i.e., the full moon nearest to the autumnal equinox), from "Wayne's ECO Time" blog.|
|Here's moonrise over the East Bay hills, viewed from Richmond's Marina Bay (from the Marina Bay Neighborhood Council website).|
|Here are a number of images taken on the occasion of the lunar eclipse that occurred on December 21, 2010 -- the first lunar eclipse to occur on the Winter Solstice in 372 years. In each case, the presence of depth cues makes the moon look larger than it would otherwise (images from the National Geographic website).|
|Here's an art photograph by Jean-Louis Monfraix, "Red Harvest Moon Rising Over Washington". Taken in 2001, it graced the cover of American Psychologist for July-August 2011 (Monfraix is married to Cynthia Belar, a prominent psychologist).|
|There's a sun illusion, too, produced by the same principles, but you hardly ever get to see it -- because you're not supposed to look at the sun except with special protection. Here, the annular eclipse of the sun, May 20, 2012, photographed near Odessa, Texas (photo by Albert Cesare for the Odessa American/Associated Press).|
The mechanisms of
illusions are revealed dramatically in the Ames Room,
an example of which is sometimes on exhibit at the
Exploratorium in San Francisco (many of the Exploratorium's
exhibits are about visual perception and illusions). The
observer looks with one eye into the Ames Room, through a hole
in the wall. The two people appear to differ in height, but in
fact they are identical twins -- as close to identical in
height as two people are likely to get.
- The observer is actually looking from the side of the room, not the center, so he or she is not equidistant from the two side walls.
- The rear wall is angled sharply away from the observer, so it is not equidistant from the two girls.
- The ceiling is angled sharply away from the observer, so the ratio of the girls to the back wall is not invariant.
- The windows are subtly changed in shape, trapezoidal rather than rectangular, to reinforce the appearance of a standard room.
Thus, the room affords no
regular distance cues to the observer. Still, the perceiver,
based on prior experience with rooms, assumes that there are
equal distances and right angles. The observer thus infers
size directly from the retinal image. But because the
observer's assumptions are wrong, he or she makes incorrect
inferences about size. The Ames Room works because perception
is determined by the perceiver's knowledge and beliefs, not
just the physical stimulus.
In the Ames Room, visual cues were adjusted in such a way to make objects that are the same size appear different. A similar illusion can occur when there are no distance cues at all. Here's the back story, courtesy of Charles Wheelan, a journalist and author of Naked Statistics: Stripping the Dread from the Data (2013), cited in the lectures on Statistics) -- and also of Naked Economics: Undressing the Dismal Science (2002) and Naked Money: A Revealing Look at What It Is and Why It Matters (2016). In 2016, Wheelan took his entire family, including his wife and three teenagers, on a 9-month-long 'round-the-world trip, recounted in We Came, We Saw, We Left: A Family Gap Year (2021; reviewed by Amity Gaige in "Meet a Family Who Spent 9 Months Traveling the Globe, Pre-Plague", New York Times Book Review, 02/07/2021, from which this image is taken). At one point they found themselves in a bit of trackless, featureless desert, and one of the Wheelan kids took this photo of the rest of the family. In this instance, the effect results from the lack of linear perspective (no tire tracks), no superposition (the clouds on the horizon don't help), and, especially, no texture gradients (the desert floor is sandy, not rocky).
Apropos of nothing in particular: Commenting on his decision to inveigle his family into such a trip, Wheelan writes: "Experiences, rather than things, are what make us happy in the long run", because they become an "ingrained part of our identity".
Ames Room as Peepshow
The Ames Room was
foreshadowed by a type of art popular in the Dutch "golden
age" of painting known as a peepshow -- a
term that has somewhat different connotations now,
especially in The Netherlands, than it did then. Here's A
Peepshow with Views of the Interior of a Dutch House
(1655, in the National Gallery, London) by Samuel van
Hoogstraten (whom you read about before, in the context of
Dutch trompe l'oeil painting). The painting is
made on five inside surfaces of a box, which is left open on
one side, and then mounted on a stand. The open side is
covered with translucent paper, to allow light into the box,
which has two holes through with the viewer can peep inside.
Using the same techniques as Ames did some 300 years later,
van Hoogstraten has constructed a realistic illusion of a
room -- two rooms, actually, in all their depth.
Not all illusions are
produced by unconscious inferences. One interesting case
is the cafe wall illusion (Gregory, 1973) -- so named
because it was first noticed on the wall of a cafe in Bristol,
England (here's a picture of Prof. Richard Gregory beside that
very cafe wall). The lines are parallel, and horizontal
-- but they don't look it, partly due to the irradiation of
light from the black to the white bricks (the illusion is
diminished if the bricks are colored other than black and
"The Dress". Another
interesting illusion was "discovered" in February, 2015, when
a woman showed her son, and his fiancee, a photograph of the
dress she proposed to wear to their wedding. The couple
could not agree on the color of the dress: she saw it as white
and gold, while he saw it as blue and black. When the
image was posted to a social-networking website, it turned out
that there was wide disagreement about the color of the dress
-- revealing a new phenomenon of visual perception, previously
entirely unknown (Lafer-Sousa, Hermann, & Conway, Current
Biology, 2015). And also currently
unexplained. Actually, the dress is blue and
black. One possible explanation involves lighting: the
dress will actually change colors, depending on how it is
illuminated. But that can't be the entire explanation,
because many disagreements occurred between observers (like
the wedding couple) who viewed the dress under identical
viewing conditions. So another explanation involves
unconscious inferences: If observers assume that the dress is
illuminated by the blue sky, their visual system will
"subtract out" the blue in the dress, giving it the appearance
of white and gold; if observers assume that the dress is
illuminated by "yellow" sunlight, their visual systems will
"subtract" the gold, leaving the dress to appear blue and
black. Other hypotheses differ somewhat, but the bottom
line is that what we perceive isn't determined by the
stimulus. It's also determined by the context (in this
case, the lighting) in which the stimulus appears -- a
conclusion that is friendly to Gibson's ecological view.
But it's also determined by the expectations that we bring to
the act of perception -- a conclusion that supports the
constructivist point of view. For example, one study
found that about 50% of subjects who assumed that the
dress was photographed in artificial light perceived it as
white and gold; and about 80% of subjects who assumed
that it was photographed in shadow perceived it as black and
white. For more on the Dress Illusion, see "Unraveling
'the Dress'" by Stephen L. Macknik and Susana Martinez-Conde,
Scientific American Mind, 07-08/2015; their article,
"Colors Out of Space" (Scientific American Mind,
05-06/2011), provides additional technical information.
The precise mechanisms of many visual illusions are more complicated than presented here, and some details remain controversial. What the illusions make clear, however, is that perception is not just the product of information provided by the proximal stimulus, and extracted by innate, "mindless" perceptual mechanisms. The perceiver's mental representation of the world is also shaped by "higher" mental processes involving knowledge, memory, expectations, judgment, and inference.
For more information on illusions, see:
- The Great Book of Optical Illusions by Al Sekel (or any of Sekel's earlier books, from which the Great Book is derived). The books have lots of illusions, in color, plus brief explanations, where available, of how they work.
- Mind Sights by Roger N. Shepard (1990). Shepard is a distinguished vision scientist who is also a talented artist (and musician). In this book, he employs principles of visual perception to construct a large number of 'Original visual illusions, ambiguities, and other anomalies" of visual perception.
- 187 Illusions: How They Twist Your Brain, published by Scientific American Mind, which publishes material on illusions in almost every issue. Every year, the magazine publishes an article, usually titled something like "10 Top Illusions" (e.g., 05-06/2011).
- And Champions of Illusion (2017) by Susana Martinez-Conde and Stephen Macknik, who founded the "Best Illusion of the Year" contest at Scientific American Mind.
- See also Sekel's website,Illusionworks, at http://www.illusionworks.com/ or http://www.psychologie.tu-dresden.de/i1/kaw/diverses%20Material/www.illusionworks.com/.
Most of our knowledge of
sensation and perception comes from studies of the visual
domain, and that is true for illusions as well. However, a
number of perceptual illusions have been identified in
audition -- many by Diana Deutsch, a professor at UC San
- In one phenomenon, which Deutsch calls the octave illusion, the subject is presented over headphones with different tones in each ear, each separated by an octave (e.g., middle C and 3rd-space C on the treble clef). When the headphones are reversed, the high and low tones are now presented to the opposite ears, but the subject hears them in the same ears as before.
- In another, the tri-tone paradox, the listener hears a chord consisting of all the Cs that can be played on the piano (six of them), followed by all the F#s. Some subjects hear an ascending tone, others a descending tone.
Many of the illusions are very compelling, and deserve to be heard, which is why Deutsch has now produced two CD recordings of them with Philomel Records:Musical Illusions and Paradoxes (1995), and Phantom Words and Other Curiosities (2003).
There has now emerged a relatively large
literature on music perception.
Cultural Influences on Perception
The contributions of the perceiver to perception are also revealed by cultural influences on perception: people from different cultures may see very different things in the same stimulus.
For example, this geological structure in northwestern New Mexico, a volcanic vent millions of years old on the Navajo Indian Reservation, is commonly known as "Shiprock". That's the name the European settlers gave it in the 1870s, and that's the name of the nearest town, the largest in the Navajo Nation. Shiprock got its name because the settlers thought it looked like a clipper ship. But of course the Navajo had lived in the area long before the settlers came, and they didn't know anything about clipper ships. They named the mountain Tse'Bit'Ai, or "Rock with Wings". The same geological formation gets two different names, because it's perceived differently by people of two different cultures. (Shiprock is a sacred site for the Navajos. Click here for additional views and information.)
constellations in the night sky offer another example of
cultural differences in perception . The constellations, which
were developed by ancient cultures to aid navigation (and,
sometimes, astrology as well), are usually considered to
reflect organization by Gestalt principles such as good
continuation and closure. But even though the Gestalt
principles are universal, different cultures organized the
same pattern in different ways. So, what Europeans call The
Big Dipper (an important aid to finding Polaris, the North
Star), is seen differently in other countries (this example is
from the companion book to Carl Sagan's PBS television series,Cosmos):
- In ancient Greece, and among some Native American tribes, it is The Great Bear.
- In England, it is The Plough.
- In China, The Celestial Bureaucrat.
- The graphic to the right is a print of a relief in the Wu Liang tomb shrines, c. 147 CE, depicting the Big Dipper, or Ursa Major, as the Celestial Bureaucrat, printed in Science and Civilisation in China (1954ff) by Joseph Needham (reprinted in "The Passions of Joseph Needham", by Jonathan d. Spence, New York Review of Books, 08/14/2008).
- In medieval Europe, Charles' Wagon.
- In ancient Egypt, a complex depiction of a monster combining a bull, a man, and a hippopotamus, and a crocodile.
Something similar can be seen in the
moon. You've heard of "The Man in the Moon", the image
of a face created by the seas and highlands on the moon's
surface. But other cultures perceive these same features
differently (Images taken from National Geographic).
- The "man in the Moon is common throughout Europe, and in the US -- but the "American" Man in the Moon is quite different from the European one!
- In East Asia, and Mesoamerica, people see a "Moon rabbit".
- In India, a pair of hand-prints.
- In Hawaii, people see a tree.
- In New Zealand, the Maori see a woman on the moon, but in a location quite different from either the European or the American "men".
In each case, the perceiver brings cultural knowledge to bear in making sense of the same stimulus pattern, thus "seeing" different things.
example: In this figure most people see some sort of whale.
What else can you see? You can also see a kangaroo. I stumbled on
this figure when I was teaching at the University of Arizona,
and so this figure is known as the "Arizona Whale-Kangaroo.
Given the alternate perception of this
figure, we simply couldn't resist comparing North American and
Australian college students. Sometimes the figure was
presented with the kangaroo in its "canonical" orientation,
with its feet and tail on the ground; in other conditions, the
figure was rotated so that the kangaroo was presented in a
the whale percept was rotated as well. Everyone everyone saw
the whale, regardless of whether they were American or
Australian. That makes sense, because both Australians and
Americans have had lots of experience seeing whales, and lots
of experience seeing whales in different orientations (like
swimming or breaching), as in the logo
of the Pacific Life Insurance Company. However,
Australians were more likely to see the kangaroo, and they
were more likely to see the kangaroo at odd, non-canonical
orientations (like with the "tail" pointing straight up or
down). Australians, more familiar with kangaroos than North
Americans (Skippy is, after all, their national symbol), are
more likely to see the kangaroo in this ambiguous figure.
Perception as Problem-Solving
Sometimes, perceptual inference is unconscious, as Helmholtz noted. In other times, perceptual problem-solving requires active, conscious effort on the part of the perceiver. This fact is illustrated by the Gestalt figures, degraded line drawings and photographs of objects that are difficult to perceive, and identify, right away.
Here are some items from the Street Gestalt Completion Test, a psychological test intended to measure how good people are at achieving "closure" with Gestalt figures (there's a similar test devised by Mooney).
The point of these figures is that it's work to figure out what they represent. The information provided by the stimulus is incomplete, vague, fragmentary, and ambiguous. You focus on a feature, and you say "What might that be?" "If that's what that is, then what must this be?". You're trying to make sense of the entire pattern of stimulation, and in doing so you bring all your cognitive resources -- knowledge, expectations, beliefs -- to bear on the problem of perceiving.
Canals on Mars: A Case of Perceptual Construction?
But in a somewhat later map, from 1883, the Martian canals are fully in evidence. The features are much more regular, more geometrical -- feeding the speculation that Schiaparelli's canali were artificial structures. By 1893, Schiaparelli held to the full-formed belief that the canali were artificially created by intelligent beings to move water from the Martian poles to desert areas.
Very quickly, other astronomers began to see Schiaparelli's "canals" too -- for example, Percival Lowell, an American astronomer observing Mars from his private observatory at Flagstaff, Arizona, in 1894.
Now, Percival Lowell (1955-1916) was no fool; nor were the professionals he worked with. He was independently wealthy: his brother, Abbot Lawrence Lowell, was president of Harvard University (Harvard's Lowell House is named after him), and it was his family (after whom Lowell, Massachusetts is named) that is referred to in the famous verse by John Collins Bossidy (1910):And this is good old Boston,
The home of the bean and the cod,
Where the Lowells talk only to Cabots
And the Cabots talk only to God.
Lowell established the Flagstaff observatory in 1894, on what came to be known as "Mars Hill", specifically to observe Mars during a time when it was relatively close to Earth (I told you he was wealthy!).
- Lowell also directed a search for 'Planet X", whose existence he had predicted on the basis of eccentricities observed in the orbits of Uranus and Neptune. And, in fact, in 1930, more than a dozen years after Lowell died, one of his associates, Clyde Tombaugh, actually discovered Pluto.
- Another researcher at the Lowell Observatory, V.M. Sliphers, was the first (1912-1920) to observe the "red shift" in galaxies that gave rise to the theory of the expanding universe.
- A.E. Douglass, who located the site for the observatory and served as Lowell's assistant in its early years before moving to the Steward Observatory at the University of Arizona, discovered the relationship of climate to tree growth and invented the technology of dendrochronology, which uses tree rings to determining the age of trees.
The observatory still exists as a functioning enterprise, supported by grants and private philanthropy, and mostly devoted to planetary astronomy. You can visit it on your way to or from the Grand Canyon.
What Schiaparelli had originally termed canali, meaning "channels", which might well be natural formations on the surface, now became Lowell's "canals", implying deliberate construction by intelligent beings. Lowell also noted areas of green at the intersections of the canals, suggesting farm fields. It was just a small step to the idea that the canals were dug in desperation by water-starved beings to transport water from the Martian poles to agricultural areas near the equator -- the theme of Lowell's best-selling book,Mars.
Mars doesn't look anything like any of these drawings, as
convincingly demonstrated by images derived from photographs
taken by the Mariner spacecraft. There is nothing on the
surface of Mars even remotely resembling canals.
Actually, there is, or at least there was, water on the Martian surface, but we didn't learn this until 2002, when photographs from NASA's Odyssey spacecraft sent back images of the south pole of Mars indicating that the soil there contained hydrogen, evidence of ice. Odyssey photographed some "channels", too, but they don't remotely resemble what Schiaparelli and Lowell thought they saw (Photographs from the NASA website, www.nasa.gov).
What happened? Even in the
19th century, telescopes were not very good. They had low
magnification, and they produced poor images. The largest
surface features on Mars were at about the limit of the
resolving power of the aided eye, so stimulus information was
very vague and fragmentary. To make this point clearer,
- The Panama Canal, linking the Atlantic and Pacific oceans across Central America (note to trivia mavens: the Pacific opening actually lies east of the Atlantic opening), is about 40 miles long, with channels In Lakes Miraflores and Gatun) up to 1,000 feet wide and locks about 110 feet wide.
- The Suez Canal, linking the Mediterranean and Red seas and separating Africa from Asia, is 101 miles long and at least 179 feet wide.
- The Erie Canal, linking the Great Lakes to the Hudson River in upstate New York, is about 363 miles long and about 70 feet wide.
- Closer to home, the California Aqueduct, running from the Sacramento River Delta to Southern California, is about 273 miles long and about 40 feet wide.
Lowell thought he was seeing structures of this magnitude. At the same time, he missed completely the biggest features of the Martian surface, such as Olympus Mons, a volcanic cone 370 miles wide and 14 miles high, and Valles Marineris, the "Grand Canyon of Mars", 2,000 miles long, 120 miles wide, and up to 6 miles deep. If Lowell couldn't see these features through his telescope, he could not possibly have seen canals.
In addition, it is not easy to distinguish features of the Martian surface from features (such as dust storms) in the Martian atmosphere. Moreover, due to rapid changes in atmospheric conditions on earth, even on Mars Hill near Flagstaff, Arizona, where the independently wealthy Lowell set up his own observatory (and where Clyde Tombaugh discovered the ninth planet, Pluto, in the 1930s), telescopic views of Mars were often less than optimal.
So even under the best of circumstances, 19th-century astronomers really only got very brief glimpses of the surface -- glimpses that left much to the imagination. The observers' percepts were biased by Gestalt principles of "good form" to smooth out irregularities and connect gaps. Even so, vague stimuli left much to the imagination -- so that even careful, scientifically trained observers saw what they wanted, or expected, to see.Schiaparelli and Lowell (especially Lowell) "connected the dots", creating continuous lines from discontinuous surface feature markings -- in much the same way that ancient sky-watchers saw patterns of stars making up the constellations.
What about Lowell's green farm fields? They, too, were an illusion, but of a different sort. Mars is called the "red planet", and indeed its surface is an orange-red, due to a large amount of iron oxide n the soil (the red planet Mars, and the blue star Rigel, in the constellation Orion, are almost the only celestial objects whose colors are visible to the naked eye). Through a telescope, Mars looks very orange-red indeed, with spots of gray-brown reflecting other the presence of other minerals -- and that's where the green fields come from. Remember negative afterimages and the opponent-process theory of color vision? When a neutral area is surrounded by a colored, the operation of the opponent processes gives the neutral area an apparent color opposite to that of the field. And the opposite of orange-red is a kind of bluish green -- which is what Lowell interpreted as agricultural area.
The story of the Martian "canals" is discussed at length in The Planets and Perception: Telescopic Views and Interpretations, 1609-1909 by W. Sheehan (1988), and Mars: The Lure of the Red Planet by W. Sheehan & S.J. O'Meara (2001).
Personality "Projection" as Constructive Perception
The fact that ambiguous stimuli leave much to the imagination, and require a substantial contribution from the perceiver, forms the basis for certain "projective" personality tests. These tests are not, in fact, particularly useful for personality assessment. But they remain very popular among clinicians, and this fact does not prevent us from using them to illustrate constructive aspects of perception.
technique, introduced by the Swiss psychologist
Hermann Rorschach in 1922, employs a set of 10 inkblots that
are symmetrical (because Rorschach folded the paper on which
he spilled his ink), but otherwise have no structure.
Therefore, the inkblots have no inherent meaning or
significance. Yet people often "see things" in them, analogous
to what we do when we see constellations in the night sky or
familiar shapes in clouds.
- In the left-hand figure, one common percept is two bears
fighting, with blood. Another, focusing on the white
space, is a rocket ship taking off, or maybe a jet plane
viewed from above (or below).
- The right-hand figure is sometimes seen as an underwater scene, with crabs, and seahorses, and the like. A patient I tested once saw two English policeman ("bobbies") being chased by monsters, running toward the Eiffel tower. Another, focusing on the white space, saw a Japanese woman, dressed in a kimono, her hands folded in meditation.
I was trained to administer the
Rorschach in graduate school, and did so often when I was on
my clinical internship. One of my teachers, Julius Wishner,
was trained by Samuel Beck, who in turn was trained by
Rorschach himself, so I guess that makes me a third-generation
descendant. Wishner once described the Rorschach as
"psychology's most interesting test". Certainly it's more
interesting than an IQ test, and over the years a number of
researchers have attempted to construct useful systems for
scoring the Rorschach for purposes of personality assessment
and clinical diagnosis (which, by the way, was never
Rorschach's intention). By far the most popular of these
systems is the "Comprehensive System" for the Rorschach
promoted by the late John Exner. But despite its popularity
over the years, and the efforts of Exner and others to improve
its psychometric properties, the available research indicates
that, alas, it's not a very good test of personality. It
doesn't tell us anything that we couldn't find out through
alternative means that were both more valid and more
I offer the Rorschach here only as an
example of the constructive point of view on perception. In
fact, Rorschach was inspired by the Gestalt psychologists, who
emphasized how perception gravitated toward "good form". He
proposed that a person's "perceptual style" could be inferred
from what he or she saw in the blots. Only later, did
psychologists begin interpreting test results in terms of the
perceiver's personality. The best evidence, however, is that
the Rorschach is not particularly useful for assessing
personality. Perhaps psychologists would have been better if
they had stuck to Rorschach's original idea!
- For a summary of this literature, written by clinical researchers who have been critical of the Rorschach, see Wood, J. M., Nezworski, M. T., Garb, H. N., & Lilienfeld, S. O. (Spring, 2006). The controversy over Exner's comprehensive system for the Rorschach: The critics speak. The Independent Practitioner (available on the web).
- Link to a critique posted in August 2009 by James Wood and his colleagues, to the listserv maintained by the Society for a Science of Clinical Psychology. Wood's critique is persuasive, to me, but the Rorschach still has its defenders.
- For example, in 2011 Prof. Gregory Meyer and his colleagues introduced a new scoring system for the Rorschach that, they claim, has improved validity over previous systems (including Exner's highly popular "Comprehensive System"). Link to a paper from Gregory Meyer's research group, proposing a new, improved approach to the Rorschach.
- For a nice history of the "Rorschach test", see The Inkblots: Hermann Rorschach, His Iconic Test, and the Power of Seeing (2017) by Damion Searls.
Still, the idea that we can "see things" in ambiguous stimuli is familiar to anyone who has ever seen animals in cloud formations. Here's an example of projection from the history of art -- or, at least, a possible example. Henry Adams, an art historian, has claimed that Jackson Pollock embedded his name in his famous abstract expressionist painting, Mural (1943). According to Adams' hypothesis, Pollock began his painting by scrawling his name across the canvas, and then applied paint in such a way as to obscure it -- thus beginning with something figurative and representational, his name, and ending with something abstract -- his "signature" Abstract Expressionist painting, so central to Pollack's reputation that its creation was the centerpiece of the biopic, Pollack (2000). (For more detail, see "Decoding Jackson Pollock" by Henry Adams, Smithsonian, 11/2009.)
Apperception Test, introduced by the American
psychologist Henry A. Murray in 1938, employs a set of
photographs and drawings that depict various scenes. The
subject is then instructed to make up a story about what is
going on in the picture, and what the characters are doing and
thinking. Despite the fact that each of the cards is a
"picture" of something, however, still they have enough
ambiguity that they are open to a wide variety of
I used to administer the TAT, too, and when I was on the faculty at Harvard I taught a course with David McClelland, who probably did more than anyone else (except Henry Murray himself) to popularize the TAT. But while McClelland and his colleagues generated a great deal of interesting research using a variant on the TAT (which they sometimes called the "Picture-Story Exercise"), the original "clinical" version of the TAT never was given a standardized scoring system, which is absolutely essential for a valid personality test; nor did anyone collect norms from a representative sample of the population -- essential for interpreting test results. McClelland and his colleagues did create a standardized scoring system for some applications of the TAT method in their laboratory studies of the achievement motive and other aspects of personality, but a similar effort was never devoted to the clinical TAT.
Still, here are a couple of stories, written for the original clinical version of the TAT. What might these stories tell us about those who concocted them?
In the left-hand picture:
- This is the young Yehudi Menuhin, child prodigy of the violin, getting ready to play a recital at Carnegie Hall. He knows he's one of the world's greatest violin players, he's eagerly looking forward to strutting his stuff before the audience, and right now he's contemplating the piece he's about to play.
- This boy is being forced to take violin lessons by his parents, who think that the ability to play a musical instrument makes you smarter, and give him an advantage in the college admissions process. But he isn't interested in college, and he's definitely not interested in the violin. He isn't good at it, and he doesn't want to be good at it, and he hates classical music. He'd really rather be outside, playing a pickup game of soccer with his friends.
In the right-hand picture:
- This is Karl Wallenda, the patriarch of the Flying Wallendas, the famous troupe of circus acrobats. He's giving a command performance before the crowned heads of Europe. He's going to do his signature trick, the seven-person chair pyramid, with no net underneath. And he knows he's going to nail it.
- This is the Prisoner of Zenda. He's been confined in his cell for 20 years, and for all that time he's been fabricating a rope from pieces of cloth that he's managed to take from his bedding. This is the night: he's got an opportunity to escape, and he's started to work his way down the wall. But he's been spotted by the guards: they're waiting for him below, and they're waiting for him above. He's trapped.
Murray argued that what the person perceived in the pictures was indicative of his or her personality -- his or her motives, attitudes, interests, and concerns. This assumption is controversial, and if may very well not be correct. (In fact, Murray's original TAT is not a particularly good method for personality assessment, because no standardized scoring procedures or interpretive norms have ever been developed.) But, as with the Rorschach, the point remains that in any ambiguous stimulus, there will be differences in what is perceived -- inter-individual, depending on individual differences in expectations, beliefs, and the like; and intra-individual, depending on moment-to-moment changes in the individual's mental state. This is because the final percept is not determined solely by the features of the stimulus. It is also determined by the schema that the perceiver brings to the act of perception.
The Perceptual Cycle
In the final analysis,
perception is not just the product of information provided by
the stimulus environment, and extracted by evolved perceptual
mechanisms. Perception is problem-solving activity, in which
the perceiver has to make sense of information available from
- Information from the proximal stimulus, including the entire sensory field, analyzed by "bottom-up" processing.
- Information derived from memory, including expectations, beliefs, and world knowledge, contributing to "top-down" processing.
The perceiver does extract information from the stimulus input, but the perceiver also employs inferential rules to make a judgment about the object -- the "best guess" about what the object is, where it is, and what it is doing. These guesses are usually very accurate: after all, we usually see the world as it really is. But this is not necessarily so. Conflicting information, incorrect assumptions, and using the wrong rules may lead the perceiver to make the wrong inference.
The observer's task is to perceive the stimulus in the environment, but the observer never enters into any perceptual encounter "cold". Instead, he or she carries into the situation a pre-existing mental representation of the world. Neisser calls this representation a schema. The schema includes generalized representation of knowledge about objects, events, and the relations between them, as well as specific expectations about what will be met.
The distal stimulus provides information which is picked up by the sensory systems when the proximal stimulus is transduced by the sensory receptors into neural impulses transmitted to the central nervous system. This pattern of proximal stimulation is decoded by perceptual processes such as feature detection and pattern recognition.
If the stimulus information fits readily into the active schema, the object is immediately categorized, and is not processed further in the absence of active attention.
If there is a mismatch between the stimulus and the schema, recognition of the discrepancy initiates further cognitive activity. The perceiver may pay fuller attention to the object, providing a closer examination of available features. Or the perceiver may manipulate the object to reveal new features. Or the perceiver may engage in perceptual inference -- making judgments based on what is already known from information provided by the stimulus and knowledge retrieved from memory.
These two phases in interaction between the stimulus and schema may be described in terms borrowed from the Swiss developmental psychologist Jean Piaget:
- Assimilation: transforming the percept until it fits the schema.
- Accommodation: transforming the schema until it can incorporate the percept.
In perception, inferential
procedures lead to a perceptual hypothesis,
which is then tested, much like a scientist would test a
hypothesis, by obtaining further information.
- This cycle of perceptual hypothesis-testing is continued until a satisfactory percept is formed -- a percept that accounts, as well as possible, for stimulus information. Usually, at this point the object has been identified and categorized as similar to other objects the person has encountered in the past.
- The cycle begins anew when the perceiver encounters new, surprising input -- a new mismatch between what we perceive and what we expect.
When the stimulus is rich in information, and well structured, perception doesn't require much thought. It proceeds in a relatively automatic fashion.
But when the stimulus is vague,
fragmentary, and not well organized -- when stimulus
information can support many possible percepts -- perception
requires correspondingly more mental activity. The perceiver
must actively search for new information, fill in missing
pieces through inference, and put the pieces together -- much
like we would put together a jigsaw puzzle.
variation on Neisser's perceptual cycle -- and, for that
matter, Helmholtz's original constructivist theory has been
offered by Anil Seth ("Our Inner Universes", Scientific
American, 08/2019), a cognitive neuroscientist at the
University of Sussex who co-directs the Sackler Center for
Consciousness Science there (the Sackler center is likely the
inspiration for the research center depicted in David Lodge's
academic satire Thinks..., and discussed in my
lecture, in my "Consciousness"
course, on "Consciousness
in the Arts and Humanities"). Seth begins by
noting that different people see reality differently -- as
illustrated, for example, by perceptual anomalies such as the
Dress Illusion discussed earlier in
these lectures. He writes: "The story usually told about
illusions is that they exploit quirks in the circuitry of
perception, so that what we perceive deviates from what is
there. Implicit in this story, however, is the
assumption that a properly functioning perceptual system will
render to our consciousness things precisely as they
are. The deeper truth is that perception is never a
direct window onto an objective reality. All our
perceptions are active constructions, brain-based best guesses
at the nature of a world that is forever obscured behind
a sensory veil.... Visual illusions are fleeting
glimpses into this deeper truth.... The reality we
experience -- the way things seem -- is not a direct
reflection of what is actually out there. It is a clever
construction by the brain. And if my brain is different
from your brain, my reality may be different from yours,
Seth contrasts his version of
constructivism with what he calls the "classical model" of
"bottom-up" processing, in which perceptual contents are
conveyed by signals that flow from the sensory surfaces
toward the brain. The role of "top-down" processing is
merely to add context or detail to what is perceived.
The heavy lifting is all done the sensory signals
themselves. In an alternative view, which Seth calls predictive
coding or predictive processing. In this
theory, "the brain" engages in problem-solving activity,
attempting to determine what is going on in the external world
(or, for that matter, in the internal world of the
body). It does this by making "best guesses" about the
causes of sensory inputs, and then updating these hypotheses
by comparing its predictions with actual sensory signals, and
adjusting its predictions to minimize sensory-prediction
errors. In this case, most of the work of perception is
performed by the central prediction machine of the brain, and
sensory signals serve only to calibrate the process.
Seth writes, "Rather than being a passive registration of an
external objective reality, perception emerges as a process of
active construction -- a controlled hallucination, as it has
come to be known".
Two comments here.
- What Seth calls the "classical model" is hardly the
classical model. A good argument can be made that
Helmholtz's constructivism is the classical view, as it
dominated the study of perception from the mid-19th century
up until the time that Gibson's ecological optics came along
to shake things up. It's Gibson's revisionist view
that emphasizes bottom-up processing, with his assertion
that all the information needed for perception is provided
by the stimulus (broadly defined).
- Seth is a cognitive neuroscientist, and so he argues that
the brain is a "prediction machine", and it's the
brain that compares predictions with sensory signals,
updates predictions, etc. But talk of "the brain"
doing things, and "every brain is different, so we all
perceive a different reality" (I'm paraphrasing here) really
adds nothing to the argument. All psychologists agree
that the brain is the biological substrate of mental life,
and that "mind is what the brain does". But unless
someone identifies the particular module or system or
circuit in the brain that actually does the comparison, or
the updating, or whatever, all this neuroscientific talk is
just window-dressing. To be fair, Seth does cite a
brain-imaging study suggesting that activation in the
superior temporal sulcus. But even so, we could talk
about "the perceptual system", or "perception", or "the
mind" doing these things, and nothing would be lost.
In this way, perceptual activity represents a sort of compromise. The perceiver can't perceive just anything. Perception is constrained by the features of the stimulus. But, within limits, there are lots of possibilities for perceptual construction -- a situation called constructive alternativism by the American psychologist George Kelly. You can't just see anything: What you see is constrained by stimulus input. But when stimulus information is vague and fragmentary, perception is largely determined by expectations and beliefs. To some extent, you can choose what you see.
The Bottom Line in Perception
Sometimes, the information "in the light" is all we need to perceive the world accurately. However, stimulation is often insufficient or ambiguous, so that the perceiver must engage in what the British psychologist Frederick C. Bartlett called "effort after meaning" -- his version of Bruner's later phrase, going "beyond the information given" by the stimulus. Perception draws on knowledge, expectations, and beliefs; it relies on inferences, whether conscious or unconscious; and it it involves problem-solving and hypothesis-testing activity, as the perceiver figures out what objects, where, doing what, could possibly be giving rise to the available pattern of proximal stimulation.
In the final analysis, perception is
not like looking at a picture. It is like painting a
picture anew each time, based on fragmentary materials.
This page last modified 09/24/2021.