This paper is concerned with two questions (and a related paradox) about pictorial caricature:

  1. Q1.

    What are the boundaries of the caricaturable?

  2. Q2.

    How does depiction by caricature work?

Caricature does not seek to flatter its subject. The exaggeration enacted by this genre of images tends to elicit contrasting emotions: the blatant exposure of one’s physical attributes can amuse the general public as much as embarrass the depicted subject. Not coincidentally, the latter is usually referred to as the victim (Berger, 1952). But what is the subject of caricature? It is natural to think that people, and especially well-known individuals, hold this position. The downtown streets of every major city offer countless examples of caricatures of famous actors and iconic personalities, and the front pages of many newspapers display lampoons of political figures. As such, there is no doubt that these subjects are the perfect victims. But is caricature limited to the depiction of people? While it is easy to point to examples that show that its scope extends beyond the depiction of human beings, a more demanding task—and the first aim of the present paper—is to set the limits of the caricaturable. As far as I know, Q1 has never been specifically and systematically addressed in the literature.

Caricature is also a complex mode of depiction that intentionally modifies some distinctive properties of its subject. But how does this intentional modification work? How do we single out the distinctive properties of the depicted subject? And what does their distinctiveness depend on? To adequately respond to these questions—which Q2 synthetizes—requires a positive account of depiction by caricature.

Relatedly, depiction by caricature has been claimed to present a paradox (Ross, 1974; Rhodes, 1996; Caldarola & Plebani, 2016). For how can a picture misrepresent its subject, while prompting an accurate recognition? This paradox, though, seems to depend on two premises: (P1) accurate visual recognition requires the actual appearance of what is recognized, and (P2) caricature misrepresents its subject. But are P1 and P2 acceptable premises?

This article is structured as follows. Section 1 maps the territory of caricature. First, I offer a taxonomy of different cases of caricature. Then, I argue that while the scope of caricature does extend beyond human beings, not everything can be caricatured—I sketch three general constraints on what is caricaturable. Section 2 accounts for the unquestionable preference caricature shows for people and, in particular, for faces. I then consider the flexibility of our recognitional capacities (especially) towards faces to dismiss P1. Section 3 analyses the intentional modification operated by caricature to determine which properties it applies to and how. Section 4 proposes that these properties have a relational nature in that they are instantiated by a given subject but their sense hinges on implicit norms. In light of this, I go on to describe the deep structure of depiction by caricature and use this account to justify the constraints previously placed on caricature. Section 5 clears up the paradox of caricature defending, pace P2, that caricature per se does not amount to misrepresentation.

1 Mapping the territory of caricature

The reaction we have in front of the caricature of, say, the actual President of the United States is usually something like “It’s really him!”, or “He really looks like this!”. What strikes the viewer are the properties that a caricature can capture notwithstanding their dramatic modification. On a primary phenomenological level, seeing a caricature involves two core features: we recognize a certain subject being aware that its outward appearance has been intentionally modified. Of course, not every modification yields successful recognition, and not every modification that yields recognition results in a caricature. Still, an intentional modification of the subject’s appearance seems a necessary condition for caricature, since a picture that did not modify in any relevant way the appearance of its subject would not be considered a caricature.Footnote 1 The challenge, then, lies in defining the nature of this intentional modification. Later on (Sect. 3), I propose that it is a principled modification defined by the set of properties it applies to and a precise directionality—I call this mode of depiction pictorial exaggeration. But for the time being, we can rely on an intuitive, and theoretically more neutral, notion of caricature that only implies an intentional modification of some relevant features of a given subject.Footnote 2

Now, the fact that an intentional modification is at the core of caricature raises an interesting problem: is any subject compatible with the modification enacted by caricature? Are there inherent limits to the caricaturable? If, on the one hand, it could be argued that every shape can in principle be modified as caricature typically does, on the other, one cannot but notice that caricatures of animals appear less frequently than those of people and that the thought of a caricature of a plant or a stone may perplex some. Of course, intuitions might partially diverge here. Thus, instead of deciding ex ante the extension of the caricaturable, a better strategy to tackle Q1 is to first sketch a taxonomy of different cases of caricature, and then examine the reasons behind borderline cases

  1. 1.

    Caricatures of particular individuals, such as movie stars, politicians, or our neighbor, are not perplexing at all. Portrait-caricature might indeed be said to constitute the most emblematic case of the genre (see Sherry, 1987). The downtown streets of every major city offer countless examples of pictures that fall into this category.

  2. 2.

    Caricatures of groups of particular individuals, although less common, come under the same unproblematic category of portrait-caricature. Examples may include caricatures of royal families or rock bands—Gerald Scarfe drew fine caricatures of The Rolling Stones.

  3. 3.

    Individual types and classes are also possible subjects of caricature. The history of the genre is plenty with images that mock identifiable human types rather than specific people: the greedy, the dandy, the unscrupulous businessman, etc. Honoré Daumier’s prolific work includes caricatures of different classes, such as lawyers, bourgeois, teachers and students, and nowadays we commonly find caricatures of blue-collar workers on the front pages of newspapers.

  4. 4.

    Caricatures of fictional characters constitute a more controversial but interesting case. Considering that fictional entities, such as those of myths and novels, lack sensible appearance, and that the modification operated by caricature needs a model of reference, caricature may obtain on two conditions: that a depiction of a fictional character be available (e.g., based on descriptions in a novel), and that such representation be taken as the iconographic standard for that character (Spinicci, 2009; for some discussion, see Maes, 2015). Daumier’s series Histoire ancienne includes depictions of characters from Greek mythology.

  5. 5.

    The case of historical figures of whom we have at least a description of their physical appearance can be treated similarly to the former. On the chance that paintings or sculptures are available, these can easily be taken as an iconographic standard for drawing caricatures.

There is a further case that is critical for this analysis. The history of the genre abounds with images, sometimes labeled as caricatures, others as cartoons, where the real protagonist appears to be the overall dramatic situation rather than the characters depicted therein (Sherry, 1987). Many visual satires by James Gillray are describable as complex, dynamic scenes that contain caricatural elements. Take, for instance, The Plumb-pudding in danger, one of the most famous political cartoons of all time. Napoleon and the British Prime Minister William Pitt the Younger sit across a dining table, both carving themselves a portion of the world in the shape of a steaming plum pudding. Interestingly for our purposes, only some of the elements are actually caricatured in this picture. Pictorial exaggeration is explicitly applied to the physiognomies (and expressions) of Napoleon and Pitt, while it is not applied to the chairs where they are seated or to the plates on the table. The latter elements are just drawn.Footnote 3 True, Gillray might simply not have intended to caricature every element in this image. But the crucial question is whether he could have done so. In other terms, can every entity in general be caricatured? Here is my view: (a) not everything can be caricatured, and (b) at least three general constraints on the caricaturable can be sketched, but (c) the concept of caricature extends well beyond human beings.

The first point can be made by considering some objects that seem either impossible to caricature or difficult to conceive as caricaturable. Geometrical figures, plane or solid, such as ellipses and decagons lean towards impossibility. Notwithstanding the transformation we could apply to such figures, they would not appear as caricatures. If we flatten an ellipsis a touch more to exaggerate its shape, we will not obtain a caricature, just as we will not recognize a caricature of a decagon in a dodecagon. These modifications merely lead to another (similar) figure that is not perceived as a caricature of the original figure. Caricatures of everyday objects such as boxes, plates, stones, or eggs are difficult to conceive, let alone easy to find in any collection of caricatures. And what about the sea and the sky? Even though they are easy to depict, as child drawing illustrates, they do not appear well-suited for caricature. Therefore, the concept of caricature cannot be indiscriminately applied to any object.

If so, what are the constraints on caricature? Here I outline three such constraints and, where useful, highlight their consistency with some of the cases of the above taxonomy. (A stronger case for these constraints will have to wait for the account of caricature put forward in Sect. 4). The first and most general constraint concerns visibility: only what can be seen or has at least a visible manifestation can be caricatured. Phenomenologically, pictorial caricature involves a modification of the outward appearance of its subject, and thus it seems logical that the original model be visible.Footnote 4 Yet it is common belief that caricature can capture the character of its subject—hence something that lies beyond outward appearance. I consider that this intuition can be retained, but with some caution. Personality traits are not directly perceivable (as material entities are) but are nonetheless suggested by visible features—through one’s facial expressions, or posture, for instance. In Gillray’s cartoon, Napoleon’s facial features and expression are visibly exaggerated. However, the pictorial modification enacted by caricature does not go further than this; the transition from Napoleon’s goggle-eyed expression to his ambitious personality transcends the domain of the visible. Thus, his personality and appetite for power may not be directly subject to the pictorial modification operated by caricature, but they can be indirectly caricatured—and convincingly so—through the intentional modification of appropriate visual features, as well as by the symbolic cutting of the plumb-pudding. As will become clear later (Sect. 4), caricatural reference to non-visual features still requires the presence of some caricatured visual features. The constraint on visibility is coherent with the case of fictional characters; without an iconographic standard, which makes the appearance of a fictional character visible, caricature cannot take place.

The second constraint relates to the knowledge of the appearance of the depicted subject—be it a particular F or some, but no particular, F of a certain type. If the outward features of F are to appear as pictorially modified, one needs to be familiar with the unmodified appearance of F in the first place. This constraint can be formulated as follows: knowledge of the appearance of F is necessary to interpret a caricatural depiction of F.Footnote 5 In most cases, such knowledge is either already available to the viewer or easy to acquire. But there are cases in which knowledge of the appearance of F cannot be acquired, hence F-caricatures are not directly attainable, even though F is in principle a visible entity. This is consistent with historical figures of whom all traces about their appearance are lost to us, such as Homer, and also with extinct animal species. Nonetheless, caricatures could equally be attainable provided that a representation of the subject—compatible with our knowledge of that subject—be taken as the iconographic standard for that character.

Finally, there is a third constraint: the objects that present stable features (across their type) and whose features allow to distinguish between objects of the same type are best suited for caricature. This twofold condition needs some unpacking. First, a high degree of variability is the enemy of caricature. If an object comes by default with highly variable features (as stones do), or if it has a protean appearance (as vegetation does), then such an object is likely to resist caricature. Second, caricature nevertheless requires some degree of variability (of its subject’s features) that allows for distinction. For instance, chicken eggs do have stable features, but since the only feature that varies is size, they are scarcely distinguishable from one another; something similar can be said for the geometric figures mentioned earlier. A fuller explanation of this constraint will be set out in Sect. 4, where the reasons for the distinctiveness of the subject’s features are discussed. For now, we can consolidate it by noting that faces—the common denominator in the above taxonomy—do comply with these requirements. Faces present indeed stable features (two eyes, one mouth, etc.), and the extent to which facial features vary is the extent to which faces are distinguishable.

Even if the above constraints set some limits on the scope of caricature, we should refrain from narrowing down the concept excessively. Caricaturing entities other than humans is indeed a genuine possibility.Footnote 6 Caricatures of animals are not hard to come by (see Hopkins, 1998, pp. 98–99; Hultgren, 1993). Dogs, for instance, can be subject to caricature: there are pictures of German shepherds that convincingly exaggerate their big upright ears and long, square-cut snout, and others of greyhounds where their slim, curved silhouette is emphasized. We may also think that many comic book characters and animated cartoons caricature the animals that figure in them—as in the case of the characters from the Looney Tunes. There is no shortage of caricatured subjects in the realm of inanimate objects too. Especially during the eighteenth century, fashion has been a focal target: poke bonnets, crinolines, tight corsets, high collars, and all manner of impractical hat styles have been caricatured (Lucie-Smith, 1981; McPhee & Orenstein, 2011). Further examples may come from architecture: I have come across pictures that make the Tower of Pisa immediately recognizable through the exaggeration of its iconic lean and tapering structure. In consequence, there seems to be no non-stipulative reason to exclude these cases, and potentially all the cases that comply with the constraints outlined herein, from the territory of caricature.

At this point, the limits of the caricaturable should be clearer. Yet more needs to be said on the preferences of the genre for certain subjects. As noted at the outset, humans are indisputably the canonical victims of caricature. Why, then, of all the caricaturable entities, do people, and especially their faces, occupy this prominent position?

2 Faces

That caricature necessarily entails a modification of the appearance of its subject is but part of the story, and perhaps the less succulent part too. As noted, the intentional aspect of the modification enacted by caricature is also a key feature of the genre. Here, “intentionality” has two possible meanings. The first concerns the fact that caricature is about something, that it refers to a certain subject, and this compels us to clarify how such reference is possible despite the obvious modification of its appearance (Sect. 4). The second concerns the intentions of the caricature’s maker. Not all pictorial surfaces mirror an intentional activity. Nature sometimes creates meaningful visual patterns that work like images, but they just so happen to be “images made by chance” (Janson, 1961). Caricatures, by contrast, are immediately recognizable as artifacts, and they typically convey specific communicative intentions (e.g., poking fun at one’s fiery temper, or lampooning a politician’s agenda). In this respect, humor, broadly construed, is arguably the most important feature of caricature, to the extent that browsing through a crossword collection, there is a good chance of reading “a comical portrait” as a clue for caricature. In addition, as noted earlier, the ability to reveal one’s character traits is also part of the common-sense notion of caricature. However, it is worth noting that both these aspects are not strictly necessary (Perkins, 1975); for a caricature that is neither comical nor character-revealing is indeed conceivable.Footnote 7 The significance of these features concerns instead what caricature can be used for and what it is normally used for. Thus, humor and character-revelation can be better understood as desiderata; they deal with the intentional component of the modification enacted by caricature.

On these premises, it is easier to see why caricature privileges certain subjects. Although everyday objects and animals are compatible with caricature, they could scarcely meet its desiderata. If someone were to claim that they had drawn a pungent satire of a dog, or captured on paper the personality of a tower, such claims would likely be welcomed with skepticism. By contrast, the same claims would match our expectations if the subjects were members of the political class. In this regard, caricature is akin to gossip, which is typically “about one person or a group of people, but not about other things like plants, dogs, or houses” (Robinson, 2016, p. 198). If we do not know who the author of a scandalous fact is, or if that fact is too remote from our personal lives, interest tends to wane. While depiction in general lends itself to a wide variety of uses and subjects, depiction by caricature is especially apt for mocking people and their foibles.

At this point, then, we need to understand why caricature, besides having a neat preference for (mocking) people, further shows a preference for depicting their faces. That this is the case, and that not all features of a person’s appearance have the same weight in caricature, is easy to notice. Colors are often omitted—many caricatures are in black and white. The proportions between the size of the head, the trunk, and the lower body are frequently overturned, with the head being disproportionately large. Many caricatures are just half-bust, and others just from the shoulders up. Now, my claim is that the importance of faces in caricature, far from being an arbitrary feature, directly hinges on visual recognition. As noted in Sect. 1, caricature needs to trigger the visual recognition of its subject—faces, in this respect, prove to be the optimal trigger. Several psychological studies on face perception and identity recognition support this point. However, I intend to proceed on a descriptive basis, showing first the essential bond between faces and ordinary visual recognition of individuals, and then how this bond becomes even tighter in pictorial recognition for specific phenomenological reasons.

In real-life scenarios, we have different means of attaining visual recognition of individuals. We may recognize a person by looking at their clothing, gait, posture, and, most obviously, face—under normal circumstances, visual recognition can rely on any combination of these aspects. However, under certain conditions of visibility, some aspects may become more relevant than others. For instance, if we see someone from afar and cannot rely on accurate face perception, gait may be a decisive factor for visual recognition (Cutting & Kozlowski, 1977; Stevenage et al., 1999). People have indeed distinctive ways of carrying themselves. However, recognition at a distance has a lower degree of reliability, as many court cases can teach us. When the viewer can choose what to see and how to see it, priority is normally given to face perception. It is by looking at a person’s face that we can be more confident to recognize and distinguish them from others. The same operation would indeed prove tricky if we were to focus, say, on the arms or the back. Consider this common experience: we are lining up to buy tickets for a show and we seem to see from the back a friend a little further in the line. As we approach our friend, they turn and we see that, in fact, we have mistaken them for another person. Typically, if we are unsure whether a person who looks like F really is F, the decisive factor in determining so is F’s physiognomy. Put differently, looking at a person’s face, when compared to other visible features, seems to have the final say in ordinary visual recognition.

Similar considerations can be made for depicted people. Those pictures in which we recognize F by their facial features are privileged in terms of recognition (Maes, 2015; Giovannelli, 2020; for scientific evidence, see Burton et al., 1999; O’Toole et al., 2010), and this privilege directly depends on the specific phenomenological structure of picture perception. Let us consider these two, strictly related claims in order.

That the depiction of a person’s facial features provides optimal access to their visual identity is made evident by those kinds of pictures that are expressly designed for visual recognition: wanted posters and mug shots. Picturing the criminals is a matter of picturing their faces (Finn, 2009, p. 2). One could object that each individual has yet their own physical peculiarities that can be equally apt for recognition. For, after all, we may well identify F by, say, F’s big hands (supposing that F is familiar to us). However, if someone gave me a picture of F’s hands and said “Here is a picture of F”, I would probably raise an eyebrow. I would be willing to acknowledge that it is indeed a photograph of F’s hands, but that would not be enough to exclaim that I recognize F in the picture (Freeland, 2010, p. 231). In contrast, we would have no problem saying that we see F in a close-up photo or a caricature of F’s face: the depiction of F’s face allows us to recognize F in full, even though it offers only a partial view of F. If so, not all physical features are equally apt for pictorial recognition.

The last point invites some further observations on the phenomenological structure of the pictorial. There is a sense in which all that a picture says is final compared to face-to-face seeing. If we look at a picture that shows from behind a person who looks like F, there is no way to determine if the person depicted therein really is F. Whatever we see in a picture does not disclose further sides as we move around the pictorial surface (Nanay, 2010; Hopkins, 2012; for a recent review, see Ferretti, 2023). When approaching one of the edges of a picture, the lateral side of its frame can be brought into view, but our perspective on the pictorial scene remains constant throughout.Footnote 8 Therefore, whether recognition can succeed depends on what of a particular subject is actually visible, as it were, at a glance within the pictorial space. The sensorimotor exploration that is available in real life increases our possibilities of recognition, but pictorial recognition cannot count on this process. Moreover, other relevant information such as gait is simply not available in static pictures.

The fact that the depiction of a person’s face is the optimal source for visual recognition accounts for the most recurrent—and otherwise hard to explain—device in caricature drawing: the depiction of an oversized head on a comparatively tiny body (see McPhee & Orenstein, 2011, pp. 192‒195). The purpose of this caricatural device is seldom to preserve the proportions of the individual thus depicted.Footnote 9 Drawing a supersized head serves instead as a spotlight instructing the viewer what to focus on in the picture to properly play the game that caricatures invite us to play: recognizing a certain subject while entertaining visual awareness of the wondrous modification of its features.

The following subsection starts over from recognition, but this time to provide a measure of its flexibility. The upshot is the dismissal of the first assumption underlying the puzzle of caricature, that is, that the visual recognition of a certain subject requires the actual appearance of that subject.

2.1 Face-like patterns and flexible recognition

The relevance of faces in visual recognition of individuals is mirrored by the great extent to which our recognitional capacities are disposed to identify visual patterns as faces. Indeed, we manage to recognize subjects despite significant changes in their appearance. The flexibility of visual recognition is clearly at work in the phenomenon of pareidolia. As we let our gaze run over objects, textures, and surfaces in our surroundings, we sometimes happen to catch sight of familiar shapes. For instance, looking at the facade of a house sometimes one cannot but see the general pattern of a face: the windows are the eyes, and the door stands for the mouth. This is a bizarre and unexpected visual synthesis, which nevertheless recurs in many circumstances. Leonardo da Vinci (1956) was well aware of this phenomenon, and in his writings he advises to seek artistic inspiration by looking “into the stains of walls, or the ashes of a fire, or clouds, or mud, or like things” (p. 51). Visual pareidolia does not occur only with respect to faces. Clouds, for instance, instead of appearing as mere whitish formations, can present us with shapes of various animals. There are few doubts, however, that visual pareidolia occurs especially with faces, whose general structure can indeed be detected from the scarcest amount of information (see Omer et al., 2019).Footnote 10 Indeed, not only can a configuration composed of two points and a line determine face pareidolia, but, depending on the tilt of the line, that face will also show a particular expression, such as indifference, or joy. The first emoticons composed solely of punctuation marks clearly show the scarcity of information required to see a face with a particular expression.

The flexibility of recognition is equally tangible in those cases where the appearance of an object is distorted rather than impoverished. Looking at the image of a friend in a deforming mirror or on the rippling surface of a pond, recognition can still be quite accurate. Pictorial representation in general—except perhaps life-sized illusionistic images—always introduces perceptible differences from the depicted subject that call on the flexibility of our recognitional capacities; just think of cubist portraiture. And, of course, our recognitional skills are tolerant of altered appearances in real-life scenarios too. When someone undergoes a significant weight loss due to a strict diet, or their appearance is altered due to an allergic reaction, recognition still works despite the alteration of their figure. Visual recognition does also survive ageing, to a certain extent (Lopes, 1996, pp. 138–139).

All these cases show the extent to which recognition is flexible. Thus, they can be used to reject P1—the assumption that the visual recognition of a certain subject requires the actual appearance of that subject. Once we acknowledge that the flexibility of recognition is the rule rather than the exception, caricature recognition becomes less puzzling. However, it is important to stress that caricature is different from pareidolia in that it is not an imprecise, random pattern, but a refined one, specifically devised for depiction. And it is also different from the kind of images obtained through deforming mirrors and other distorting surfaces. These latter act on any subject in the same way, mechanically, and thus inevitably dispel, at least in part, the likeness and misrepresent relevant features. On the other hand, the intentional modification enacted by caricature does not: pictorial exaggeration manages to preserve the likeness and does not per se amount to misrepresentation. How this is possible will be the focus of the following sections.

3 The workings of pictorial exaggeration

Caricature clearly departs from linear perspective. Pictorial exaggeration does not aim to establish a point-to-point correspondence between the pictorial surface and the depicted subject (Hagen, 1974). But so too do many other modes of depiction—cubist painting, to name one. What, then, does pictorial exaggeration consist in? And how can a caricature succeed at depicting its subject?Footnote 11

An initial naïve answer, which makes exaggeration a magnification in the strict sense, must be set aside. Although what is visually most striking may be the enlargement (or the shrinking) typically staged by caricature, this is not the criterion that pictorial exaggeration embodies. If that were the case, we would be forced to count the giant inhabitants of Brobdingnag and the tiny men of Lilliput from Gulliver’s Travels as caricatures (Rosenkranz, 2015, p. 234). Likewise, when Alice outgrows the White Rabbit’s house, she merely appears much bigger than she normally is, but she does not become a caricature of herself; indeed, we do not perceive her as a caricature in the book illustration. Borrowing a mathematical concept, we can thus say that caricature is not a homothety, that is, a transformation that homogeneously expands (or contracts) all the parts of a figure. Nor is caricature a non-homogeneous expansion (or contraction) of a figure—namely, a radial distortion. Otherwise, caricatures could simply be obtained from the distorted reflections displayed by spoons, or by applying image-altering effects that, for instance, dent or bulge a photo portrait. However, the distorted images reflected by spoons and the like turn out to be insightful for the present analysis. True, radial distortions do not give us caricatures and they often dispel the overall likeness, but when we look at a friend’s face edited with, say, a bulging filter, we may find that the result comes close to caricature. Besides the fact that faces thus distorted may be funny to look at, radial distortions may also be able to highlight a distinctive feature of a certain physiognomy, making it evident, for instance, that a face has a convex conformation rather than concave. In that case, these images do hint at how a caricature of a certain subject should be drawn.

The lesson to draw from these considerations is that pictorial exaggeration is not applied to all the features of the subject but only to some. This is consistent with the common intuition that caricature aims at distinctive features. The notion of distinctive features, though, immediately raises further questions: how do we single out such features? What does their distinctiveness depend on? Faced with such problems, Ross (1974) argues that the very notion of distinctive features proves incoherent. Supposedly, distinctive features are those features that allow us to tell one person from another. However, single distinctive features taken in isolation cannot identify anyone, for any number of people can have, say, similar brown close-set eyes. In order for a feature to be distinctive it needs to be “viewed as part of a face, seen in the context of surrounding features” (Ross, 1974, p. 285). But if this is true, then the notion of distinctive features “reduces to the empty fact that what is distinctive about a person’s face is his face” (Ross, 1974, p. 285). This argument, however, is far from conclusive. For one thing, it conflates identification and distinctiveness; while it may be true that a single feature does not identify anyone, this by no means implies that a single feature cannot be distinctive. For another, even if identification requires more than a single feature, there is no need to conclude that what is distinctive about a face is the entire face itself. What is true is that the notion of distinctive features needs refining to address the above issues and be used to account for caricature.

Looking at a face is never a neutral experience. We always form an impression that we can in principle translate in precise terms. For instance, a person’s lips may appear quite full, their nose may be described as aquiline, their eyes as having a downward slant, and so forth. In our everyday life, we do not stop to analyze the physiognomies of the people we encounter, and our perceptions do not necessarily translate into accurate descriptions. Nevertheless, it is possible, at least in principle, to express what stands out about a face. A skilled caricaturist precisely succeeds in this: translating their scrupulous perception of one’s physiognomy in a depictive grammar. Indeed, handbooks of caricature drawing abound in figurative vocabularies that collect and classify different types of noses, profile shapes, or even distance-ratios between facial features (see Grose, 1791). A good caricature is a picture that is able to grasp the tendencies that configure one’s physiognomy and to develop them so as to make them manifest to everyone—full lips are made fuller, an aquiline nose gains an even stronger curve, downward-slanted eyes are depicted with a greater slant, and so on. Therefore, caricature centers on those features that have a visible tendency (henceforth VT) and exaggerates such tendency (Perkins, 1975; Rhodes, 1996, p. 16).

Does VT specify the notion of distinctive features so as to address the above concerns? First, VT implies that caricature operates a selection of the features to exaggerate and specifies what features are sought-after by this selection—precisely those that present a visible tendency. True, one single VT taken in isolation will not identify anyone, but caricatures flaunt many features. And not all features of one’s face stand out (i.e., are VTs), which means that not all features need to be exaggerated in a caricature. Therefore, pace Ross, VT does not lead to the conclusion that what is distinctive of someone’s face are all the features of that face, and hence the face itself.Footnote 12 Second, VT specifies the direction that pictorial exaggeration must take to achieve a caricatural depiction. If a person’s eyes appear big and downward-slanted, then pictorial exaggeration will move along such VTs making their eyes bigger and with a greater downward-slant. The subject’s VTs, then, indicate the kind of modifications compatible with the process of caricature. When this aspect is ignored, the recognizability of the depicted subject risks being hindered. This occurs especially when a physiognomic modification conflicts with the original trait. For instance, a long face is compatible with further elongation (within certain limits), but it would be denatured if it were rounded instead: length and roundness are incompatible attributes in caricature drawing. Evidence shows that inverting the direction of the subject’s VTs—a process called “anticaricature”—lessens the likeness and recognizability (Rhodes, 1996, Chap. 6).

The characteristics of the notion of VT analyzed herein are justified on a phenomenological basis—they show up in our experience. Yet appeal to our immediate experience is not enough to understand how a feature comes to appear with a certain tendency, or, to put it differently, how a feature becomes distinctive. To account for this, we must look deeper into the genetic constitution of a VT.

4 The deep structure of depiction by caricature

A tendency is such only against a general direction, a norm from which it deviates, thus standing out from what falls within the norm (Arnheim, 1983). So, what is the norm to which VTs refer? As already noted, looking at a face is never a neutral experience since we always form some (usually tacit) impression about a person’s appearance. If one’s face looks particularly long, the norm against which it appears long cannot be that very face. It would not make any sense to claim that a face is long in itself. In other words, the phenomenal relief enjoyed by those features that a caricature selectively exaggerates cannot find its reason in the particular face that hosts them, nor in any other particular face. True, we do happen to compare particular faces, sometimes searching for differences and similarities. Contrasting the photos of two siblings, for instance, we may find that F’s cheekbones look very high compared to G’s. However, G’s physiognomic features cannot be the reason that makes F’s features look a certain way, for F’s features look as they do before any specific comparison has been made by the viewer; after all, one may just not know G’s appearance and see that F’s cheekbones are indeed high. What lends a phenomenal relief to the features of a certain subject is rather the whole immemorial sequence of subjects we are normally exposed to and acquainted with in a given environment. The members of such sequence constitute the scale of variations of the physiognomic traits, and those traits that approximate most to the middle of the scale configure an implicit norm of reference. Full lips can appear as such only against a standard of lips that are neither thin nor full, but average. Given this, VTs are relational properties in that their distinctiveness refers back to the viewer’s acquaintance with an implicit norm.

Caricature therefore presents a deep structure of depiction based on two components:

  1. R1,

    an explicit referent, and

  2. R2,

    an implicit norm of reference.

R1 is the depicted subject, that is, the subject we immediately recognize looking at a caricature; pictorial exaggeration applies to the VTs of R1 following the direction they indicate. The VTs of R1, in turn, draw their phenomenal relief from R2, a scale of the variations of the physiognomic traits. We are usually receptive to the distinctive features of faces, but the norm that grounds their distinctiveness remains implicit; it does not show up in our experience.

In light of the deep structure of depiction exploited by caricature, the genetic constitution of the subject’s VTs through R2 turns out to be a condition of possibility of caricature. R2 originates from the viewer’s experiences. It works as a dynamic background without firm boundaries, which does not solidify in a definite representation. Nonetheless, illustrations that exemplify R2 can be artificially created (Langlois & Roggman, 1990). Such images are usually constructed by averaging several faces of people of the same gender and of a similar age. The result can be described as a neutral face, whose features do not stand out; in other words, such faces lack VTs.Footnote 13 While an average face results from the particular faces that have been morphed together at a given time, R2 depends, for each perceiver, on the people they have been exposed to in their environment.Footnote 14 Therefore, it seems a fair expectation that people exposed to the same environment have corresponding norms of reference, whereas a person exposed to a different environment will have a norm of reference that bears some differences (within certain limits).

Suppose there exists an alien population with perceptual and recognitional skills like ours and that someone willing to test the account presented above sent them a caricature of a terrestrial being, F. Could the aliens see the picture as a caricature? They would have no problem recognizing the object as a picture since they have, by hypothesis, perceptual capacities like ours (and we are typically able to see pictures). However, since they are not acquainted with the human form, they would not be able to see that representation as a pictorial exaggeration of a human. Suppose further they receive a photograph of F and that they are asked to decide which of the two pictures exaggerates F’s features. Again, they do have the means to notice a certain likeness between the two pictorial subjects and to understand that some figural modification could reshape one into the other, and yet they could not say on which side the exaggeration stands. Revealing to them which picture is the caricature, and how caricatures are generally produced, would not help them to see it as a caricature. And finding a full inventory of photographs of F floating in space could only help them refine their notion of F’s (non-exaggerated) features. Having a single instance of a terrestrial may be enough to recognize another terrestrial if they were to see one, but it is not sufficient to see which features of F stand out (qua particular terrestrial). The only thing the alien population still lacks is a norm of reference against which F’s features can stand out, that is, obtain phenomenal relief. A quick trip to Earth would make for the acquisition of the relevant norms; at that point, F’s mouth will look, say, quite large, as far as mouths go, F’s eyes quite close-set, and so forth. From now on, the aliens would also be able to create caricatures on their own.

The deep structure of depiction by caricature allows to account for the constraints introduced in Sect. 1. The constraint on visibility is straightforwardly accommodated: the subject’s VTs—the features to which pictorial exaggeration applies—are relational properties with a visual nature. Moreover, the acquisition of R2, the norm needed for a feature to become a VT, depends on visual experience.

Recall the second constraint: knowledge of the appearance of F is necessary to interpret a caricatural depiction of F (whether it be a particular F or some, but no particular, F of a certain type). The deep structure of caricature allows to refine and justify this constraint. Not only does the viewer need to know the (non-exaggerated) features of F, but they must also have acquired the norm that makes the features of F distinctive—R1 and R2, respectively. As the alien argument illustrates, knowledge of the appearance of F alone is not sufficient to see a picture of F as a caricature: without acquiring the norm that makes the features of F distinctive, the pictorial exaggeration of F will not be perceived qua exaggeration, and therefore the picture will not appear as a caricature.Footnote 15

The third constraint concerns stability and distinguishability. Lack of stability in features prevents the forming of norms, and without norms the features of an object cannot become VTs. Rocks, as noted, do not present stable features, except perhaps being more convex than not. In consequence, the features of rocks do not normally possess any particular phenomenal relief; they do not stand out. (If I judge the rock that I hold in my hands to be small, such judgment is usually made in comparison with my hands, or my body in general, not in relation to a norm specific to rocks.) Faces, by contrast, have stable features—two eyes, one mouth, etc.—that vary within precise limits, thus allowing the constitution of norms. Stability, however, must be coupled with distinguishability. For a minimum degree of variability between the features of the objects of a certain type is precisely what enables a feature to deviate from the norm and hence become a VT. Chicken eggs, for example, do comply with the requirement of stability since they all have the same elliptical shape slightly squished at one end. However, this is the only relevant feature of their aspect, and as a result, eggs are scarcely distinguishable one from another; each egg is phenomenologically equal to the average, and so there cannot be appreciable deviations from the norm.

5 Deflating the paradox of caricature: against misrepresentation

Misrepresenting a given subject is usually understood as attributing to it properties it does not actually possess (Lopes, 1996, p. 4; Hopkins, 1998, p. 96; Abell, 2009; Voltolini, 2021). Now, recall P2: caricature misrepresents the appearance of its subject. If we accept P2 and couple it with the fact that caricature usually prompts an accurate recognition, then depiction by caricature presents a paradox. But should we accept P2?

The first thing to note is that the very possibility of falsely representing implies the possibility of accurately representing (or maybe the other way around). And indeed, there are modes of depiction that are usually regarded as accurate, such as photography and projective drawing. Consider, however, a black and white photograph and a drawing of the Tower of Pisa: the former is silent about colors, and the latter may omit some properties, such as the exact number of columns or floors. Yet neither of these modes of depiction is usually regarded as misrepresentation. Considering that caricature too happens to omit colors and details, this suggests that if caricature is a type of misrepresentation, it must be due to the modification it applies to its subject. And so, does pictorial exaggeration amount to misrepresentation? The more intuitive answer is that it surely does (P2). But I will defend instead the less intuitive answer: pictorial exaggeration per se does not amount to misrepresentation. Let us consider these options in order.

A photograph of a skyscraper displayed on a mobile phone necessarily reduces its size, but we do not think that the skyscraper is misrepresented. Indeed, we see that its proportions are faithfully preserved. Photography does precisely that: it respects the proportions of its subjects. And caricature does exactly the opposite: it alters, through pictorial exaggeration, the proportions of its subjects. So, if we keep to the definition of misrepresentation, then caricature does misrepresent its subject because it attributes incorrect proportions. This is hard to deny. And yet this is not a very satisfactory answer. For after all, it does not do justice to our typical reactions to successful caricatures—for example, “It’s really her!”, or “He really looks like this!”. The viewer feels that a successful caricature grasps the visual essence of its victim, sometimes even more than a photograph does (Sartre, 2004, p. 17). Perhaps attributing incorrect proportions is not so important to become the discriminating factor between accurately representing and misrepresenting, and perhaps there can be a sense in which the proportions of a caricatured subject are not so incorrect. Just as we condone the reduction of size in the case of photography (and the omission of color in the case of black and white photography), could we not condone the exaggeration of proportions enacted by caricature?

In the previous section, I argued that caricature preserves the likeness by selecting the VTs of its subject and exaggerating them along the direction they themselves indicate. Instead of depicting features that would invert or contravene the direction of the VTs of the subject, pictorial exaggeration affirms them and offers a persuasive visual synthesis of the subject’s appearance. In this way, the exaggerated features can remain phenomenologically consistent with the original ones. Caricature modifies indeed proportions, thereby visually disclosing what we knew only implicitly by virtue of our acquaintance with the relevant norms: “That person does indeed have a long nose”, which is to say, a nose that is long relative to the norm of reference (R2). By playing with the relationship between the subject’s distinctive features and the norms that make such features distinctive, caricature is able to show even deeper truths than other modes of depiction. In this light, caricature turns out to be quite informative and faithful to the visual properties that its subject has—even without being informative and faithful in the same way that projective modes of depiction are.Footnote 16

5.1 Some objections and replies

A possible objection to reinstating caricature as misrepresentation might come from those caricatures that represent particular individuals combined with animals or objects (Caldarola & Plebani, 2016; see also Hopkins, 1998, Chap. 5). There are at least two cases in which this can be done. Consider first the well-known caricatures of King Louis Philippe as a pear. These pictures exploit a noticeable similarity: the shape of the pear, with a round and wide base and a progressively narrowing structure, recalls the shape of Louis Philippe’s head, and especially his round and sagging jawline. The features of the former, then, are compatible with those of the latter. The figure of the pear suggests the appropriate direction for the exaggeration of the VTs of the king’s head.Footnote 17 Therefore, in this case, the workings of caricature do not amount to false description.

Consider now the case—probably less common—of caricatures that combine subjects whose features do not share relevant similarities. Pigs and politicians, for instance, are often mingled together to convey certain ideas, and this is usually done regardless of their visual appearance. Images like these seem then to imply that caricature does employ misrepresentation, given that their subjects are indeed represented with features that they do not in fact enjoy. Perhaps, though, some further consideration can help disentangle this intricate type of representation. These pictures usually show composite physiognomies in which some features are exaggerated in a way that is consistent with the VTs of a given subject, F, while other features are exaggerated in a way consistent with those of another subject, G. Presumably, then, the former are responsible for the representation of F, and the latter for the representation of G. If so, the way pictorial reference is carried out for each subject does not entail a false description of the features of F included in the image, nor of the features of G included in the image. It is important to stress that the depiction of F and the depiction of G are gained separately, despite the fact that they are combined in the image. So, if pictorial exaggeration does not per se amount to misrepresentation, then the potential conclusion that the depicted politician is a pig is not directly due to the way depiction by caricature works. (Arguably, in this case, misrepresentation lies on the side of the combination of features belonging to different subjects.) What is true is that caricature—perhaps more than other forms of depiction—is frequently exploited in the service of misrepresentation (see Mag Uidhir, 2013). Yet again, this is not enough to equate caricature and misrepresentation.

A further point supports this view. The very fact that some caricatures combine subjects whose features do not share VTs has a significant consequence: the recognizability of their subject is usually lesser than the recognizability of subjects individually depicted. In line with this point, Perkins (1975) observes that, when a caricature contra-indicates one or more of the key features of an individual, its recognizability is compromised; Goldman and Hagen (1978) offer an experimental confirmation of Perkins’ study. Thus, if caricature is used in the service of misrepresentation, this is done at the expense of recognition. In other words, the more a caricature shuns misrepresentation, the more it succeeds (recognitionally). This inversely proportional relationship between misrepresentation and recognition further shows that P2—the assumption that caricature misrepresents its subject—is unwarranted, and thus that the paradox of caricature rests on unstable premises.

In light of these considerations, we can draw a more precise line between caricature and the distortive images previously encountered. Funhouse mirrors, for instance, produce gross distortions that tend to dispel the overall likeness of the subject they reflect. Unlike caricature, this kind of distortion is operated mechanically: it does not take into consideration what is distinctive about its subject and the direction indicated by its VTs. Recognition is, at least partially, compromised. Pictorial misrepresentation occurs.

A final objection could jeopardize the scope of the account of caricature proposed in this paper. I defended the claim that exaggeration of the outward appearance of a subject marks caricature as a specific mode of depiction. Yet this is not uncontroversial. Caldarola and Plebani (2016) distinguish between hyperbolic caricature, which does fit my account, and metaphor-like caricature, whose workings would be independent of the pictorial exaggeration of the outward appearance of its subject. To support their view on metaphor-like caricature, they analyze a picture by Steve Bell depicting George W. Bush as a monkey and argue that we can understand this picture as a caricature insofar as we are interested in focusing on certain of Bush’s intellectual or behavioral properties, which are typically attributed to monkeys (p. 412). On their account, metaphor-like caricature misrepresents the visual appearance of its subject but conveys correct information about some of its non-visual features.

I disagree: metaphor-like caricature too makes substantial use of the exaggeration of the visual appearance of its subjects. The monkey-like depiction of Bush—to stick to the authors’ example—can be analyzed along the lines of those caricatures that combine different subjects drawing on the noticeable similarities of their visual features. Indeed, the author of the caricature himself states that Bush’s appearance presents exquisite chimp-like features, such as close-set eyes, the distinct pout formed by his mouth, and his posture (Bell, as cited in Benson, 2021, p. 133).Footnote 18 Paralleling the case of the pear-like depiction of Louis Philippe, certain visual features of our forebears that show similarities to Bush’s appearance are intentionally used to exaggerate the VTs of the latter.

Now, I am far from denying that this caricature lampoons certain intellectual (non-visual) properties of the US president, and that properties of this kind are often the main target of political cartoons. However, I showed that also this type of caricature employs pictorial exaggeration, in the way described herein. This view has an explanatory advantage. It reconciles metaphor-like caricature with hyperbolic caricature: both are based on the same depictive mechanisms, and the former is a complexification of the latter. Indeed, the fruition of a caricature may be open to different levels of understanding. Those who are not familiar with Bush’s intellectual properties can still appreciate the caricature of his outward appearance, and those who have a broader knowledge of his character can surely appreciate the further meanings of the image.