Keywords

Often, artworks are representational pictures, surfaces that we experience as showing other surfaces. They give us twofold experiences—two things simultaneously in one space: firstly, surfaces standing before us, and, secondly, represented surfaces (Wollheim 2003). To understand the double experience, we need to understand perception of surfaces, both the real ones and the represented ones. Here, we argue linear perspective, characterized by foreshortening, allows us to experience real surfaces (in touch as well as in vision), and representational pictures use perspective to depict surfaces with great fidelity. Surfaces and perspective are the key to an argument for realism. We show that there is plenty of information around us in the natural world for surfaces and the cornucopia allows us to experience our earthly environment accurately. Far afield, the same goes even for the Moon and Mars—anywhere we are not immersed in fog! The experience we get from representational pictures is based on this abundant information for surfaces. However, we confess, our principled defense of realism is highly circumscribed. We need to get around two issues. We will acknowledge here that, in practice, our experience of highly-foreshortened real surfaces has niggling errors. Further, we will admit here that the crosstalk behind the twofold experiences given by representational artwork is the source of illusions.

1 Real Surfaces

We begin our introduction to the experience of surface perception with a definition of a surface, a list of the shapes of surfaces, and their possible and impossible combinations.

Physically, a real surface is a continuous, polarized plane. About continuous surfaces, Gauss (1825/1827) wrote: “A curved surface is said to possess continuous curvature at one of its points A, if the directions of all the straight lines drawn from A to points of the surface at an infinitely small distance from A are deflected infinitely little from one and the same plane passing through A” (point 3, p. 6). Basically, a surface is two volumes meeting. The change from one volume to the other occurs at the surface. For vision, the volumes are usually filled by a solid or liquid and air. The surface is the boundary of the solid or liquid, which reflects light to the observer’s vantage point. The surface is polarized, that is, different on its two sides, and usually only the solid or liquid reflects light, not the boundary of the air. The exception is a mirage, in which air layers reflect light. Another kind of surface is that of a cloud, defined by the boundary between air with many drops of water (a vapor) reflecting light, and air with few drops. In a sense, vision is rather superficial, since when we look around, almost always what we see is just the surfaces of opaque things of the world and little more. Skin. Clothes. Bedding. Rugs. Curtains. Furniture coverings. The floor. In the open air, the ground. Brick fronts. Bark. Plumage. Fur. Stone. Roadways. Mountainsides. Evidently, to explain how see the world veridically (Runeson 1988; Pizlo 2008), accounts of accurate perception need a theory of surfaces (Pomerantz 2013).

Color, brightness, and texture appear to cover broad continuous expanses of surfaces. In nature, the expanses are rarely uniform. In marbling, reflectance varies continuously but within a distinct range. Gibson (1979) called this stochastic variation—continuously changing values within limits. Marbling’s streaks, skeins, and knots are like a rope’s strands, since few if any extend over the full expanse of the surface. Overlaps of the strands support stable perception of the surface’s continuity.

A cautionary note should be sounded about the sky. It has many of the color, brightness, and texture properties of a surface, without actually being one. The sky is air thinning. It appears to be behind anything else we see above the horizon, but otherwise its depth is indefinite. There is apparent space between us and the sky’s color, by day, and its blackness, at night (Sachs 2010).

Immersed in mist—perhaps atop Elsinore in a Hamlet movie—there might be no surface whatsoever that we see or feel. In swimming underwater, we can feel suspended in a place with no surfaces to see other than our body’s. At night, we see stars as dots and we cannot tell that they have flaming surfaces—but here on Earth, flames have visible surfaces.

Light gives vision lots of information about surfaces. Falling on a surface, a cast shadow tells us about bumps and hollows, especially if it moves across them. Sweeping across the ground, the direction of its boundary’s curves elevates with every bump and declines with every hollow, changing the shapes of curves the shadow projects to the eye. Attached shadows are just as useful, curving on the far side of hills from the apparent direction of illumination, and the near side of hollows. What looked like bumps can switch to hollows if the apparent direction of illumination reverses.

Highlights are bright optic images, appearing as if behind the surfaces bearing them. They can make the surfaces look like mirrors, transparent, or matte and grainy. Highlights are evidence for a bright surface, reflected by another surface. Like transparency, a highlight allows us to see more than one surface in a given direction. Like shadows, as highlights move they reveal the surface’s bumps and hollows (Norman et al. 2004). A bump’s highlight moves with us if we move to one side, and a bowl’s highlight moves in the opposite direction. Shadows stretched over a surface and highlights tracking over a surface are particularly good at showing that a surface is continuous, rather than a net. A spider’s web covered in dew sparkles along its threads and the brightness stays on the strands as we move—revealing there is nothing between them.

Information useful for perception of the shape of smooth surfaces follows from the simple fact that at any point on a surface, there are always two curves. Unlike an edge, which has one well-defined curve, a surface slants away from any point on it with at least two values of slant. An edge is a limit to a surface. It end-stops the surface. The edge can be represented by a single equation, but a surface needs at least two. Choose any direction and it will have a specific tangent—and the orthogonal direction will, too.

From any point on a surface, there is a curve in each direction. A ball has two convex curves at any point; the inside of a sphere, two concave curves (Alberti 1436). At the top of a steep slope, a wide ski-run may fall away steeply—a convex curve—but in the orthogonal direction the terrain may be a flat, mogul-free, broad, smooth hill all the way down to the lodge. Conversely, a runner at the bottom of a path may see it as a concave curve, gently slanted directly ahead and yet flat, wide, and with good footing in the opposite orientation. At the top of a saddle or mountain pass, the downhill slope is a convex curve, and the hills on either side are concave. Convex and convex, concave and concave, and convex and concave are the possible curves of surfaces at a point, wrote Gauss (1825/1827, p. 14). A channel or rut is flat in one direction and concave in the other, and a ridge is flat in one direction and convex in the other (Wagemans et al. 2013). In these terms, a plain is flat and flat.

Flat, convex, and concave are parts of a single mathematic function. So we can imagine that what vision considers a smooth surface fits a formula for a curve in all possible directions from a point. The possible curves have different rates of change of slant—linear, quadratic, or exponential functions, for example. Two directions that provide the most different rates of change suffice for describing many smooth surfaces. The surface between the two different rates can be taken as gradually changing from one to the other.

The slant of a surface can be specified optically by the surface’s texture. The texture on the surface projects optic texture to the observer’s vantage point; the units of texture foreshorten their optic projections the more they slant away from the vantage point. To the extent that vision experiences the optic texture gradient due to foreshortening as due to surface slant, it detects surface shape. In practice, failure to grasp the exact relationship between foreshortening and surface shape results in underestimation of a stretch of ground far off on a ground plain. We can check our impression by walking over to it. Underestimation also results in pictured scenes with highly foreshortened surfaces looking shallower than they should. We can correct this error by walking into a real scene matching the pictured one, of course!

Bumpy hills and potatoes do not offer monotonic, gradual changes in surface relief. To see each bump requires detection of its distinctive optic texture gradient. For gnarly surfaces such as clumpy roots, vision may take individual clumps as texture elements forming an ensemble (Cant and Xu 2012) in addition to being a continuous surface. To detect the overall shape of the ensemble, vision may take high spots on the clumps, where the gradients of optic change at the vantage point fall to zero, corresponding to one side of a hill giving way to the other, and fit them with two orthogonal curves. Like a root ball, the result would perhaps be generally convex and at times even equally curved in orthogonal directions like a sphere.

Taken as a group, separate elements can trigger perceived curves. Grouping the elements of a surface in a texture detection task occurs at different scales, tiny and large (Diggiss and Kingdom 2013). For example, a circle is seen if 8 dots on a flat surface are evenly spaced around a common center. At a tiny scale, the individual dots are seen and, at a large scale, their grouping is seen. Curiously, eight dots can form an octagon, but they generally group as a circle, enjoying what Gestalt theory called good continuation (Wertheimer 1922). Perhaps the key fact is that perception can fit a function, octagon or circle, but favors one.

Besides tiny dots, the texture on a surface is often mottled or spotty, much like ink-blots or amoeba. To see the shape of the mottled surface, vision can fit shape functions to centers of blots and clumps. The distribution of the centers can be detected even if the eccentricities of the spots make for an “anisotropic” texture (Knill 2003), meaning the spots protrude in a biased way, i.e., generally longer in some direction, as if smeared horizontally or diagonally. However, Knill (2003) finds some anisotropy affects the apparent slant of surfaces. Hence, we can make flat surfaces look other than they are. The result is a picture, a surface deliberately and artificially modified to present an optic pattern reserved by nature for another.

Using centers of spots and ignoring biases or smears, vision can fit shape formulae to tiny and large anisotropic elements, including ensembles of ants, leaves, bushes, or hills, since shape functions are independent of the scale of a target. The shape taken by a flock of birds or a school of fish can be seen in this fashion. Each element can be taken as a dot in an ensemble, and the dots can trigger a function in the visual brain that makes us see the convex shape of the flock or school.

It is worth stressing that the reason dots of a group can be discrete but a continuous shape is implied is because the dots trigger a shape function. A function such as y = ax2 + bx + c is continuous. This is useful in nature, since an extended object such as a log is often partly hidden by branches, and the curve fitting to samples of the log allows occluded parts to be implied. An equation for a curve is continuous even if the samples that triggered it in perception are not. Likewise, the shape function triggered by the curved front of an object implies how the curvature continues onto the back.

A circle made of dots is seen as having invisible joins between the visible dots because curve fitting provides continuous perceptual Gestalts, in Koffka (1935) terms. Wertheimer (1922) described Gestalts as having unity and simplicity, but they might best be described as results of curve fitting. Gestalt theory noted that a set of dots could be grouped by vision as a line. To understand representational art, it should be noted that the dotted line could also be taken as depicting the continuous edge of a surface. The equation for a line, triggered by dots, allows the perceiver to have an experience of the edge of a continuous surface. Also, regions on either side of the dotted line can be experienced as surfaces limited by the edge. Both regions depict surfaces if the edge is seen as a convex or concave corner. Only one region depicts a surface if the line is an occluding edge of, say, a flat knife blade against an empty background, or the occluding boundary at the rounded brow of a hill set against the sky.

2 Information and Surfaces

If only two tangents and curves are needed for vision to get information about large, smooth, continuous surfaces, this is not a difficult task in principle. Consider a vantage point above a ground plain stretching to the horizon, pictured in linear perspective (Fig. 1). Square tiles stretch from underfoot to the horizon. Above the horizon, the sky is depicted as empty; apart from being further than the horizon, there is no information for its depth. Below the horizon, there are many features providing information for distance in linear perspective. For a given task, skill in perception largely comes down to picking out the key information in reliefs such as the one depicted in Fig. 1 (Ooi et al. 2006).

Fig. 1
figure 1

Target circles on square tiles on the ground. The L joining the centers of three targets has an azimuth line joining two targets horizontally and a near-vertical line governed by the elevations of two targets

Information is present if A only occurs when B occurs. A specifies B. For an optic pattern A such as the one generated by Fig. 1 to be informative about a terrain B, it must occur within a set of constraints. For vision, the constraints are given mostly by the ecological environment in which sight evolved. The constraints make the problem of induction (Goodman 1968; Vickers 2012) irrelevant because, within the constraints, key light patterns can only occur when a particular distal source is present: a giraffe optic pattern only arising when a real giraffe is present. A fingerprint, say, or a DNA sample has this kind of specificity. The giraffe, the fingerprint, and the DNA sample are distinctive. Each picks out an individual. Besides being distinctive, for an optic pattern to be useful in practice, it must lie within the bounds of visual sensitivity. Underfoot, tiles on a ground are highly distinguishable, but ones farther away are highly foreshortened, with the result that their differences are hard to make out.

The accuracy of perception of the ground depends on what features are available and selected (Ooi and He 2006, 2007). Miss or omit the key information and observers can, of course only guess and infer using past experience (Berkeley 1709/1732) and biases (Wu et al. 2007). For example, to avoid guesswork, observers should zero in on information for distance in Fig. 1 present in the elevation of the tiles, that is, their proximity to the horizon. The further tiles are higher in elevation and closer to the horizon.

The visual angles subtended by far-off tiles in Fig. 1 are tiny. For a 2 m tall adult observer, standing and looking at a piazza like that depicted in Fig. 1, after 40 m, any 1 m square tile on the ground subtends less than .1° and it becomes difficult to tell the differences between angles subtended by the tile that starts at 41 m, the one at 42 m, etc. In shorter distances, the angular differences are much larger and more useful to perception. The 1 m tile starting at 5 m subtends 3.4°, and the one starting at 6 m subtends 2.5°. The difference of .9° is plainly visible—the moon subtends .5°.

Consider the angles 3.4° and 2.5° subtended by sides of tiles running into depth to be “elevation-extent” angles. They diminish with distance along the ground. The orthogonal dimension to elevation provides “azimuth angles,” also diminishing with distance. Elevation is measured from vertically below the observer to a point on a target, such as a corner of a square tile depicted in Fig. 1 (as in Juricevic et al. 2009) or the center of a circular target in Fig. 1 (as in Wnuczko and Kennedy 2014). Sometimes called “altitude,” elevation with respect to the horizon is 90° and zero is “straight down.” More generally, zero is the direction from the vantage point to the foot of the normal on a surface of interest. Alberti (1436) called this direction “centric.” The surface of interest can be the ground, a wall, or a surface at a slant to the ground. Each surface has its own horizon. The everyday ground and our familiar horizon is only a special case. For targets on any surface, the further from the foot of the normal, the higher the target’s elevation. It will approach the surface’s horizon if it moves further away from the foot. This has implications for vision. Raising the apparent horizon results in smaller elevations and smaller apparent distances to the objects (Rand et al. 2011). Lowering the apparent horizon has the opposite effect. If the near ends of parallel lines on walls are at eye-height and the lines are tilted downwards, observers underestimate their true eye-height. If elevation is decreased by viewing through base-up prisms, observers underestimate distance (Ooi et al. 2001) (Fig. 2).

Fig. 2
figure 2

In the elevation dimension, the angle subtended by the side of an object is governed by the extent of the tile in the z dimension. In the azimuth, angles are subtended by the width of an object—its extent in the x dimension. The optical slant of a surface is the angle between the line from the observer’s vantage point and the surface normal. An optical slant of 90° means 0° angular subtense. At an optical slant of 0°, sides and widths of squares subtend equal angles

Azimuths are orthogonal to elevations. Let the near side of a square tile on the ground run left-to-right, that is, let it be in the frontoparallel plane. At the observer’s vantage point, the azimuth angle subtended by the near side is the angle between the directions to the side’s left and right corners. As distance along the ground to the tile increases, the azimuth angle compresses and its elevation rises. The projection of a square tile on the ground onto a vertical picture surface becomes a trapezium with converging sides (Fig. 3). A flat circular target lying on the ground and near to the horizon in Fig. 1 is highly foreshortened and shows as a highly eccentric ellipse in the figure. The target is at an extreme optical slant, a slant defined by the angle between the normal of the surface at a point on the target and a line joining the point to the observer’s vantage point. As optical slant increases, projections to the observer’s vantage point become compressed and the circles on the ground depicted in Fig. 1 project ellipses—the more distant the circle, the more extreme the aspect ratios. The aspect ratios given by their minor axes divided by their major axes shrink towards zero.

Fig. 3
figure 3

A square projecting a trapezium, top and bottom sides parallel, symmetrical about the vertical, with two converging sides

The vertical axes of the ellipses in Fig. 1, the extents in elevation, would shrink especially quickly up the page if the figure is redrawn to show targets placed on a convex curved surface—a hill. They would decrease particularly slowly for targets on a concave surface—a bowl. Convex and concave surfaces can also be revealed by target azimuth angles changing more quickly or slowly than is true for a flat plain.

The optic projections from an object’s surrounds on the ground help show its location, distance, size, and shape. If a target lying on Fig. 1’s piazza projects an ellipse in the proximal optic array with a specific aspect ratio and elevation, it is a circle. Any ellipse can be projected by any other ellipse (Pizlo 2008), but information for it resting on a ground plane, its elevation, and its aspect ratio, taken together, specify the target’s true shape.

3 The Surface in the History of Perception Science

A surface was clearly and thoughtfully offered as an essential concept in perception theory by Alhazen (1039) in his Book of Optics. Alhazen noted that if we were looking through a peephole at a scene in which a pole poked up above a wall it would be very difficult to tell the distance to the pole. But, he wrote, if the wall was removed, and now a ground plain stretched towards the pole, it would be obvious how far the pole stood back from us. The pole’s base would be at a determinate spot on the ground. The amount of ground towards the pole’s base is a measure of its distance. Convert “amount of ground” to angle of elevation with respect to a horizon and Alhazen would be modern.

Surface perception was first mentioned in experimental psychology in the 1930s (Koffka 1935; Metzger 1936). Metzger found that we see a space-filling fog if we are inside a dimly-illuminated sphere—a Ganzfeld (half of a ping pong ball over an eye is an easy way to produce a Ganzfeld, Hochberg (1964) pointed out). Metzger’s Ganzfeld was a large sphere, into which the observer could put his head and shoulders. Metzger observed that if the concave Ganzfeld surrounding the eye is the right distance away and is lit with enough intensity, the apparent fog lifts and observer can see the microstructure of the surface. In essence, a surface at a determinate distance appears. The hollow clear space between the surface and the observer’s vantage point becomes evident (Sachs 2010). One interpretation of this demonstration is that vision of most anything precise requires a visible texture. Most natural surfaces are textured, so visible texture is a major basis for perception of shapes of surfaces. Texture is more important than shadows and highlights if lighting on the surface is fairly even. In seeing combinations of surfaces and surface shapes, it is a major partner with edges of surfaces (Pizlo 2008; Diggiss and Kingdom 2013). Being elementary in perception of the world, texture should have dedicated resources in the visual brain. Besides marbling, textures are often ensembles of objects that stick up from surfaces like grass or trees, or that rest on surfaces like leaves, cows in a field, or masses of downhill skiers careering down a slope. Cant and Xu (2012) find that anterior and medial aspects of the ventral visual stream are involved in processing large ensembles of multiple objects lying on a surface (e.g. cherries on a plate). In fMRI studies, Cant et al. (2009) and Cant and Goodale (2007) found texture inputs engage specific regions of occipital-temporal cortex different, say, from those engaged by expanses of color.

Following Metzger, the next important step in theory of surfaces in perception was taken by Gibson (1950, 1979) in his ground theory of perception. Wu et al. (2007) wrote that “studies have shown that the ground surface substantially influences object localization in the intermediate distance range (2–25 m), supporting the ground theory of space perception advocated by J. J. Gibson” (p. 654). Rand et al. (2011) wrote, “Gibson suggested that […] judged distance is consistent with the assumption that the target is on the ground plane, [which] has been shown to play a large role in both relative and absolute distance perception” (p. 426). Gibson discussed Fig. 4, lines converging up the page and then bending, decreasing their rate of convergence. In the bottom half of the figure, converging lines depict a ground surface, while the upper half, where the lines converge up the page at a slower rate, suggests the ground has turned into a hill. The lower lines depict the ground as if planks were laid on it. Wu et al. (2007) drew converging lines on a ground surface and observers overestimated the distance to objects on the ground, presumably taking the rapid convergence to suggest parallels going into the far distance. In the Renaissance, this architecture trick was used to increase the apparent size of corridors. Rand et al. (2011) put targets on stands, raising them in elevation. If the stands were visible, observers were accurate about the target’s distance. On invisible stands, observers overestimated the distance, presumably only taking the target elevation into account.

Fig. 4
figure 4

Ground meets hill slope

Gibson (1950, p. 83) wrote about the “Law of the Visual Angle.” According to this law, the azimuth angle “is the reciprocal of the distance (D)” to a stretch of ground at a distance. However, in addition, the elevation-extent angle of the stretch of ground “is proportional to 1/D 2.” To study the law for azimuths and its ally elevation, let us introduce an observer of height H standing on a ground, their feet at distance D from the near edge of a square tile. How does distance D to a tile affect the foreshortening evident in Fig. 3? Consider azimuth projections, and then elevation. The edges of the tile have width W. The angle subtended by W depends on D and H, the height of the observer’s vantage point. The middle of W is directly in front of the observer and the normal from the observer hits the middle of W. Hence tan A = 2 (.5 W / √(H2 + D2)), or simply tan A = 1 / √(H2 + D2), where “H” is the height of the observer. Because height is a constant, the denominator is changed by the square root of D squared, so we can further simplify the expression to tan A ≈ 1/D. Hence, in the distance A gets smaller as an inverse function of D—a linear function.

The angle subtended by the elevation dimension of the tile, “E,” is a difference between two subtended angles. The first is subtended by the elevation of the near edge of the tile, “An.” The second is subtended by the elevation of the far edge of the tile, “Af.” For a distance “D” of the near edge of the tile and a distance “D + W” for the far edge of the square tile of width “W,” tan E = tan (Af – An), which can be simplified using the difference formula for tangent to tan E = WH / (H2 + D2 + DW). Because width and height are constant, we can further simplify the expression to tan E ≈ 1 / D2 if D is larger than H and W. Hence, the angle subtended by the elevation dimension is an inverse function of distance squared.

Figure 4 has two rates of diminution of azimuth with elevation. If, instead, the diminution rate increases monotonically and steadily with elevation, Fig. 5 appears—a hill.

Fig. 5
figure 5

A hill

If the azimuth diminution rate decreases with elevation, the result is Fig. 6—a concave surface.

Fig. 6
figure 6

A concave surface

Sudden changes in density of the lines in the figure with elevation are sudden diminutions of azimuth angles. These indicate a drop-off (Fig. 7).

Fig. 7
figure 7

A drop-off, as if at an edge of a stage with a floor beyond the edge

Diminution of azimuth and elevation compresses the quadrilaterals projected by squares on the ground in Fig. 8a, b, and c. The result is that diagonals in the squares project as obliques closer and closer to horizontal in Fig. 8a as elevation increases. The set of obliques in Fig. 8b are shown explicitly converging to a point on the horizon, showing that they are depicting parallels in the world. A line showing the receding side of a square tile converges towards the central vanishing point and its angle of convergence on the picture surface is labeled in the figure.

Fig. 8
figure 8figure 8figure 8

(a) The short obliques are at different angles to the horizontal. However, they converge and come to a single point on the horizon line. Hence, they are parallels on the ground. (b) On the picture surface, the obliques are shown converging to a point on the horizon line, explicit information that they depict parallels on the depicted terrain. (c) On the picture surface, the erstwhile obliques are depicted by horizontals. The sides of the quadrilaterals converge to points on the horizon line, revealing that they are parallel on the depicted terrain

4 Perception and Elevation and Azimuth

Perspective is not just a convention. Nor is it purely visual. Elevation and azimuth are to do with the direction of targets. Direction matters to vision, but it also matters to touch. Besides looking out for targets, we reach out to targets to pick them up. Hence, linear perspective is as relevant to touch as it is to vision (Loomis and Philbeck 2008). It allows blind people to draw pictures showing objects in depth (Kennedy 2008).

To test the claim that both vision and touch are sensitive to perspective and the diminution of azimuth and elevation with distance, we arranged a path of targets, 0.5 m to 6.5 m from underfoot, made observers familiar with the targets, blindfolded them and asked them to point to the targets (Wnuczko and Kennedy 2014). We measured azimuth and elevation of their pointing arm. The targets were circles, in two parallel rows, each successive pair 1 m apart. All the participants were adults. One group viewed the circles before being blindfolded. Another group were blindfolded and then walked between the targets, touching them with a meter-long stick while walking past. Then they returned to one end of the path and pointed, still blindfolded. A third group were blind from early in life. They too explored the targets with the meter rod. Another rod was attached to the participant’s arm before pointing, and its position in space was measured as observers pointed to the targets. For all three groups, as distance to the circles increased, pointing azimuths shrank and elevations increased. Pointing azimuth shrank more linearly and pointing elevation more quadratically. Of interest, there were no significant differences between the sighted blindfolded-after-viewing, the sighted blindfolded-throughout-the-procedure, and the blind. Indeed, the blind increased their pointing elevation from the nearest targets to the further ones by an amount (about 38°) in-between those of the blindfolded-during-touching (about 35°) and the blindfolded-after-viewing (about 40°). Changes in azimuth and elevation specify a surface and vision and touch work along (Loomis and Philbeck 2008).

5 Perception’s Biases and Far Surfaces

Vision is a biological device and can only reflect mathematic certainty rather approximately. The result is minor biases in the use of the perfect azimuth and elevation geometry defining optic information. The biases affect nearby ground very little, but they grow with distance, as shown by the literature on visual impressions of depth and size (Bian and Andersen 2011). Let us examine the consequences.

The central dashed line on a motorway is good visual information for the road’s flatness. The stripes, and the distances between them, are uniform. But a driver may be forgiven for having an impression that the very distant stripes on the road, highly foreshortened, look a lot smaller than those nearby. The bicycles painted at the sides of highways are highly elongated. But a cyclist viewing them from far off sees them compressed and squat, it is likely, due to their foreshortening. The arrows on superhighways pointing like > to exits, viewed a kilometer ahead, seem to have sharp arrowheads, thin points, like highly acute angles, perhaps 5°. Come close and the arrowhead, it become obvious, is blunt, its edges forming a very obtuse angle, perhaps 170°.

Figure 9 shows a piazza with distant tiles darkened. Often viewers experience the higher-elevation and apparently-further square tiles as eccentric or brick-shaped (long axis horizontal). A 1 m square may look to be 1 m by .3 m. But further, asked to judge the proportions of the higher-elevation Fig. 9 quadrilaterals, the shapes on the page, which may be about 1 by .1 on the page, observers report them fatter than they truly are. Observers may say they are 1 by .2, as if biased towards the 1 by .3 forms they appear to depict. The distant tiles look slimmer than true, and the forms depicting them look fatter. First, let us consider the false eccentricity of the distant piazza tiles—then, secondly, the forms on the picture surface.

Fig. 9
figure 9

Piazza with far tiles shown foreshortened

6 Theories of Biases

It may be that the quadratic rate of change of elevation extents is underestimated, compared to the linear change of azimuth (Bian and Andersen 2011). The rate of change at extreme optical slant is underestimated. If so, far distances on the ground (high elevations in Fig. 10) are underestimated. Squares appear as rectangles, long axis in the azimuth. The result is that distant angles appear in error. In Fig. 8a, the diagonals of the distant squares, those at higher elevations, should not seem parallel to the nearby ones at low elevations.

Fig. 10
figure 10

Observer standing on a z-line with an oblique

Figure 10 shows a person viewing an oblique line, with its near end touching a line running directly away from underfoot, a z-line. Wnuczko et al. (2013) asked viewers to judge the angle formed by the oblique and the z-line. The oblique was set at different angles to the z-line, and its contact with the z-line was at different distances from the observer. Wnuczko, the chief investigator, also varied the distance from the z-line to the observer’s vantage point—the “eye-height.” If distance is underestimated, the z-line’s length is underestimated. The region of surface bearing both the z-line and the oblique is compressed. That compression pushes the oblique towards the azimuth, and the angle between the oblique and the z-line should be overestimated. Imagine the oblique is the diagonal of a 1 m square. The side of the square running in depth is compressed. The oblique of 45° might be seen as 60°. An oblique at 80° to the z-line might be seen as 85°. At a further distance, which suffers more apparent compression, the 45° oblique might look like 80°, and the 80° like 89°.

The results of Wnuczko’s experiments were indeed that judged angle was increasingly overestimated as optical slant increased. This was true for ground and wall surfaces, and low and high eye-heights.

Would the same error occur if more information was added? Experiments could be run with several obliques present, all parallel, all at different distances. To detect that all the obliques form the same angle with the z-line, perception can use the information that all the obliques converge to a single spot on the horizon. The bias evident if only one oblique is present can be skirted. That is, distance and angle information—spatial information—comes in many forms. The major task of a theorist and investigator is to find it and to establish what observers can readily use.

7 Bias on a Picture Surface

Pictures were invented in the ice age, roughly 40,000 years ago. Cave artists discovered that pictures can use two surfaces: the real one bearing the daubs put there by the artists and the depicted one—the flanks of, say, a mammoth. The presence of two surfaces in one direction from the viewer offered an unusual task, so new it was not an influence Charles Darwin would have found pressuring homo sapiens during evolution. One could look at the mammoth, or at the daubs. Since the task was new, and not part of our evolutionary history, it may come as no surprise that it is riddled with biases. Notably, perception of the 2D marks on the picture surface is mixed up with what the marks depict in 3D (Koenderink and van Doorn 2003), in the sense that perception of the 2D daubs is biased towards the properties of the 3D pictured surfaces, creating illusions, as Fig. 11 illustrates.

Fig. 11
figure 11

A line drawing of a cylinder with a circular top and seven ellipses. One matches the line form at the top of the line drawing of the cylinder

In Fig. 11, on the left, a 2D ellipse depicts a circular surface tilted in depth at 68° to the picture surface. The figure creates an illusion. Using line drawings like Fig. 11, Hammad et al. (2008) depicted circular tops of cylinders with tilts from 5° to 85°. Observers judged the aspect ratio of the 2D ellipses in the cylinder picture as less eccentric than similar ellipses viewed on their own, with no extra lines indicating cylinders, like the ellipses on the right of Fig. 11. That is, perception of the 2D form on the picture surface was biased towards the form of the depicted surface tilted in 3D. Presumably, an illusory bias resulted from cross-talk between two kinds of information presented simultaneously—information for 2D flat features on the picture surface and information for 3D surfaces in a space behind the picture plane (Sedgwick and Nicholls 1993; Koenderink and Van Doorn 2003).

If the 2D form made of lines is seen as being on the picture surface and the 3D form is a surface appearing behind it, they appear at different distances. If two shapes subtend the same angle at the observer, and vision uses linear perspective, the further one should be seen as larger. Axes of the 2D form seen as depicting cords of a circle tilted back in 3D from the picture plane should seem larger than true. The illusory effect on the minor axis should be large, since it is highly foreshortened. The effect on the major axis of the 2D ellipse on the picture surface should be small, for two reasons. It is hardly foreshortened, and the contribution from the extra distance of the pictorial depth is minimal, since the pictorial depth is likely slight. The depicted circle could have its near edge at the picture surface and the size of the depicted circle, stretching back from the picture surface, is only about the same size as the major axis.

Figure 11 shows a cylinder and seven ellipses. The ellipse on the far right physically matches the one depicting the top of the cylinder, but observers experience it is too small (Mastandrea et al. 2014), its minor axis not tall enough and its major axis almost but not quite wide enough.

In a study on this illusion (Mastandrea et al. 2014), observers adjusted the size of an ellipse to make it match the one depicting the cylinder’s top. The adjusted ellipses were taller than the true size by about 40 % on the minor axis and wider by 5 % on the major axis. The conclusion to be drawn is that observers cannot tell the true size of features on picture surfaces. It seems that ellipses, like other 2D features depicting shapes tilted in depth, are seen biased towards the shapes they depict. Linear perspective creates biases in viewing shapes on picture surfaces.

Untroubled, a Realist can accept that pictures involve many, many illusions. Pictures are artificial and break free from natural-world constraints within which optical information is to be defined. Also, 2D features of picture surfaces only fool us if we restrict ourselves to viewing on the normal to the picture surface. Turn the picture close to 90° and view it at a glancing angle, as if it was an anamorphic. Any 2D dimension of the ellipse can be seen in this fashion perfectly accurately, untroubled by the crosstalk from 3D information. Like walking into a scene, turning an object is a natural way for observers to inspect objects. The result is accurate impressions, a Realist concludes with satisfaction, despite illusions present in relatively immobile and needlessly restricted viewing.

8 Conclusion

In this short introduction to the experience of artworks as representational pictures, which are those that allow the perception of represented surfaces, the key point has been that smooth surfaces are polarized planes that reveal themselves by means of orthogonal optical variations, organized by linear perspective. For extended flat surfaces, angles subtended by elevation extents diminish at quadratic rates with increasing distance and corresponding increases in optical slant, and azimuths at linear rates. Observers use these rates imperfectly, with distance being underestimated, under high optical slants and severe foreshortening. Angles subtended by far-off stretches of ground are compressed and hard to distinguish, and angles inscribed in the stretches of ground appear large, in error, more so with distance. The arrival of pictorial art generated another error. Shapes on picture surfaces are seen as if biased towards the surfaces they depict. Despite these errors, there is a case to be made that Realism holds for our experiences in the normal world and the experiences we get from representational artworks.