Introduction
It would be exaggerating to say that our relationship is hostile; I live, I let myself live, so that Borges can weave his literature and that literature justifies me. ... I don't know which of us is writing this page .
--Jorge Luis Borges
Â Â Â At his faint chuckle she turned and faced her once-beloved uncle. Unceremoniously she ripped the papers from the pocket of his Hawaiian shirt as he nervously backed away toward the hotel room door, and with unmitigated disgust at both his blubber and his duplicity she hissed, "Twenty-two point eight percent of all bankruptcies filed between July 1995 and June 1997 were attributed to bad legal advice, up nine point two percent over the last biennial period."
Â Â Â "I did the best I could," the 273-pound man answered faintly. He was desperate to avoid further rousing his enraged niece, who despite her lithe figure, 113 pounds, and angelic face was capable of inflicting severe damage. Once safely in the hallway, however, he took heart and offered, "A meta-analysis of several studies suggests that fewer than forty percent of legal malpractice cases are due to malicious intent, the balance to simple incompetence." At this she lunged at him, tearing into his thick neck with strong, sharp fingers and ripping the shirt from his bloodied back.
Â Â Â As this vignette shows, stories we tell in everyday life often coexist uncomfortably with statistics of supposed relevance, even when the two do not ostensibly contradict one another. Our stories are filled with people who do things out of desire, fear--and possibly an unnatural love of rigatoni. Each particular circumstance and situation looms large in every description. In statistics, however, there are rarely agents; only demographics, general laws, processes. Particularities and details are usually dismissed as unimportant.
Â Â Â The disjunction between narratives and numbers ranges from the commonplace--mistaking a correlation for a causal connection--to the abstruse. One recent, unusual instance of the collision between our desire for comprehensible stories and a simultaneous attraction to impersonal statistics is provided by the Bible codes phenomenon. The craze began when Eliyahu Rips and two other Israeli mathematicians published a paper in a statistical journal that seemed to suggest that the Torah--the first five books of the Bible--contained many so-called equidistant letter sequences, or ELS, that pointed to significant relations among people, events, and dates.
Â Â Â An ELS is a sequence of letters (Hebrew in this case) each separated from the previous by a fixed interval of other letters. The words of the text are run together and the spaces between them ignored. Thus the English word generalization contains an ELS for "Nazi"-- geNerAliZatIon --in which the fixed interval is only of length 3. (Commonly, ELS intervals are much longer--say the 23rd, 46th, 69th, 92nd letter, and so on, after some initial letter.) The paper found that ELSs of (some variants of) the names of famous rabbis who lived centuries after biblical times and that of their birthdays were often close together in the Torah text, and that the probability of this was minuscule.
Â Â Â The publication of this paper was viewed by the journal's editors as a sort of mathematical puzzle: what, among the many things that might account for this low probability, actually does? This, however, was not how the paper was received. Various groups pounced on this "evidence" as they had on previous Christian and Islamic numerological findings, and pronounced it proof of divine inspiration for the Torah. Michael Drosnin's international best-seller The Bible Code went even further and claimed to find in the Torah a prophecy of Itzhak Rabin's assassination and other contemporary events. Not surprisingly, also present is the perennial Kennedy connection, an ELS for Kennedy not being far from one for Dallas . Although I will discuss the resolution of this and the simple mathematics underlying Bible codes later in the book, the point here is that our hunger for stories, agents, and motives is so strong that contextless sequences of letters are seen by many as teeming with significance.
Â Â Â The abovementioned snippets illustrate only two bad ways in which stories and statistics are bridged. This book is about more intelligent ways of spanning and exploring this gap. The nontechnical question of how we fit both stories and statistics into our lives also is discussed; how we answer it helps define who we are.
Â Â Â Nearly everyone has seen urbanocentric posters of New York (or some other city) with the region's attractions in the foreground and the rest of the world vanishing to a point in the distant background. Our psychological worlds are similarly egocentric: other people form the background for our lives--and most annoyingly, we form theirs. How can all these parochial posters and self-conceptions be reconciled with accurate maps, external complexities, and the disembodied view from nowhere?
Â Â Â Again the question is, to what extent can the logical and psychological gap between stories and statistics--and the related gaps between subjective viewpoint and impersonal probability, informal discourse and logic, and meaning and information--be closed, or at least clarified? There exists a similar uneasy complementarity between literature and science. Literary discussions of individual perspectives, possible scenarios, useful archetypes, and singular oddities are awkwardly paired with scientific talk of objectivity, definitive outcomes, universal truths, and general cases. Along slightly different lines, terms such as lady luck and miracle are uncomfortably arrayed against chance and coincidence .
Â Â Â Can we successfully straddle the chasm between people who see the world exclusively in terms of good guys and bad guys and those who see it in terms of chance and number, between "literary" and "scientific" culture, and, at two extremes, between conspiracy theorists and "nowhere men"? In an increasingly webified world, can purely personal perspectives and attitudes that do not undervalue scientific objectivity gain respectable status? If so, how do we square the fact that almost everyone feels personally aggrieved yet almost no one deems himself an aggriever?
Â Â Â How, for instance, do we draw a coherent picture that encompasses both human meanings and fragmented bits of information? In what ways do stories (say, of the woman and her corpulent uncle) and statistics (say, of legal malpractice) prove cogent and binding? Are not statistical notions refinements and distillations of ideas suggested by repetitive stories and events? What are the narrative implications of the mathematical notions of complexity and "order for free"? What do interpretations of literature have in common with applications of statistics? What does literary criticism have to do with cryptography?
Â Â Â In the following connected essays I hope to cast an oblique but penetrating light on these questions. Some of the oddities and problems associated with the obvious gulf between stories and statistics (such as mistaking anecdotes for statistical evidence or, conversely, taking averages to be descriptive of individual cases) are the result of applying a logic appropriate in one domain to another, quite disparate, one. Unlike the logic of mathematics and the physical sciences, the truths of informal, everyday logic depend critically on context and on the individual, nonsubstitutable aspects of any situation. Projecting specific conditions of a game or activity, say, or particular religious beliefs onto the physical universe--or, conversely, deriving strategies for such a game or activity or for an individual's religious beliefs from the laws of physics--is only one example of this confusion of domains. So too, in a slightly different way, is the ascribing of significance to biblical codes.
Â Â Â The relationship between personal and objective is often subtle. How we choose to define problems or issues affects their resolution; this occurs, for example, when personally chosen lottery numbers turn out to be more likely winners than machine-picked numbers (even though any set of numbers is as likely to be chosen by the state as any other). More generally, the way in which we are inextricably linked by our common knowledge and implicit understandings points to some interesting extensions of standard mathematical practice.
Â Â Â In between vignettes and parables I will discuss alternative logics, ideas from probability and statistics, codes and information theory, the philosophy of science, and a smidgen of literary theory and use them to limn the intricate connections between two fundamental ways of relating to our world--narratives and numbers. Bridging this gap has been, in one way or another, a concern in all my previous books. It is, I think, a concern of 63.21 percent of us.
Chapter One
Between stories
and statistics
" Now a bismuth isotope is going to come out!" I said hastily, watching the newborn elements crackle forth from the crucible of a "supernova" star. "Let's bet! "
--Italo Calvino
Stories and statistics? Whatever might this juxtaposition be getting at? Literature by the numbers? Features in the sporting news? Biographies of Harris, Field, Gallup, and Yankelovich? If pressed, most people would probably say something dismissive, such as stories and statistics go together like a horse and paper clip, or to preserve the alliteration, like stamps and stogies; yet this book takes the relationship seriously.
Â Â Â One of its presuppositions is that storytelling and informal discourse have given birth over time to the complementary modes of thinking employed in statistics, logic, and mathematics generally. Although the latter skills are perhaps more difficult to come by and may even run counter to our intuitions, we can say that first, we tell stories, and then--in the blink of an eon--we cite statistics.
Â Â Â There are a number of vaguely similar "obstetric" relationships: particular versus general, subjective versus universal, intuition versus proof, drama versus the timeless, first person versus third person, special versus standard. The first element in each pairing, while it may be held in lower regard, gives rise to or provides the ground for the second. Thus, a feeling of subjectivity is a necessary preliminary for an appreciation of universality, and dramatic immersion in the moment gradually leads to an awareness of the timeless.
Â Â Â Thinking of these oppositions in a naturalistic way suggests that the chasm between them is more a matter of tradition, degree, and terminology than something untraversably deep. I believe this to be so; and because the gap between stories and statistics is a synecdoche for the better-known gap between what C. P. Snow has deemed two cultures--the literary and the scientific--some of my points may have broader resonance than initial appearances would indicate. (I will sometimes use statistics in a quite extended sense.) Since synecdoche is a literary term for a figure of speech in which the part is substituted for the whole, or sometimes the other way around, its use is somewhat analogous to substituting a sample for the whole population. With this bit of pedantry we have already landed our first piece of kite string across the chasm.
Primitive Glimmerings
Notions of probability and statistics did not suddenly appear in the full dress regalia we encounter in mathematics classes. There were plebeian glimmerings of the concepts of average and variability in stories dating from antiquity. Bones and rocks were already in use as dice. References to likelihood appear in early literature. For some at least, the importance of chance in everyday life was clearly understood. It is not hard to imagine thoughts of probability flitting through our ancestors' minds. (If I'm lucky, I'll get back before they finish eating the beast; it doesn't seem likely that they would leave his cattle untouched, but steal his collection of acorns; he usually exaggerates his kill.)
Â Â Â Millennia later ideas of chance and probability were formalized as Pascal and Fermat refined them to solve certain gambling problems in the seventeenth century. Laplace and Gauss further developed their applications to scientific concerns in the next century and a half; and Quetelet and Durkheim used them in the nineteenth century to help understand regularities in social phenomena. (The chances of getting at least one 6 in four rolls of a single die are greater than the chances of getting at least one 12 in twenty-four rolls of a pair of dice; the probability of a particle decaying in the next minute is .927; exit polls show that 4 out of 5 voters in favor of gun control legislation cast their ballots for Gore.)
Â Â Â After this bullet train through the history of statistics, let me slow down to note some of the many colloquial ancestors of the most salient ideas in probability and statistics. Consider first the notions of central tendency: average, median, mode, et cetera. They most certainly grew out of such workaday words as usual, customary, typical, same, middling, most, standard, stereotypical, expected, non-descript, normal, ordinary, medium, conventional, commonplace, so-so , and so on. It is hard to imagine prehistoric humans--even those lacking the vocabulary above--not possessing some rudimentary idea of the typical. Any situation or entities such as storms, animals, or rocks that occurred again and again would, it seems, lead naturally to the notion of a typical or average recurrence.
Â Â Â Or examine the precursors of the notions of statistical variation: standard deviation, variance, and the like. These are words such as unusual, peculiar, strange, singular, original, extreme, special, unlike, unique, deviant, dissimilar, disparate, different, bizarre, too much , and so on. The slang term far-out to indicate unconventionality is particularly interesting, because an observation that is far out on the "tail" of a graph of a statistical distribution is rare and unusual, and bespeaks a high degree of variability in the quantity in question. Over time, any recurrent situation or entity would suggest the notion of an unusual exception. If some events are common, others are rare.
Â Â Â Probability itself is present in such words as chance, likelihood, fate, odds, gods, fortune, luck, happenstance, random , and many others. Note that mere acceptance of the idea of alternative possibilities and open-endedness essential to storytelling almost entails some notion of probability; some scenarios will be judged more likely than others. The need to single out aspects of recurring situations and entities leads to the key statistical concept of sampling as well, reflected in words such as instance, case, example, cross-section, observation, specimen , and swatch . Likewise, the natural mental process of yoking together like things suggests the important idea of correlation, which has the following correlates (so to speak): association, connection, relation, linkage, conjunction, conformity, dependence, proportionate , and the ever too ready cause .
Â Â Â As R. P. Cuzzort and James Vrettos demonstrated in The Elementary Forms of Statistical Reason , even less familiar statistical ideas such as control, standardization, hypothesis testing, so-called Bayesian analysis (how we revise our probability estimates in light of new evidence), and categorization correspond to commonsense phrases and ideas that are an integral part of human cognition and storytelling. Like Moliere's character who is shocked to find he has been speaking prose his whole life, many people are surprised when told that much of what they characterize as common sense is statistics, or more generally, mathematics. It is also telling that the word account refers not only to numbers but to narratives as well.
Â Â Â Admit it or not, we are all statisticians, as when we make grand inferences about a person from that tiny sample of behavior known as a first impression. The difference between mathematical statistics and the everyday variety often is simply the degree of formalization and objective rigor. Standard deviation is computed according to specific rules and definitions, as are correlation coefficients, the rank-sum statistic, chi-square values, and averages (what these are is not here important, although I maintain that they can be communicated via stories and common situations); their everyday cousins are not so formalized.
Â Â Â There may be constraints on the mundane use of these terms as well. The comedian Steven Wright tells a story about going into a clothing store and telling the clerk he's looking for a shirt that is "extra medium." I've expropriated this remark a number of times (usually at ice-cream parlors) and have found that it usually elicits temporary confusion--evidence that people appreciate that the formal properties of an average make the phrase extra redundant. (Or perhaps the response is to my being extra annoying.) Likewise, people recognize the humor of Garrison Keillor's Wobegon effect, according to which almost everybody is above average, or that of a recent headline in a West Virginia newspaper that read "Area Jobless Rate Up, But Still at Record Low." Empty commentary such as "Surveys show that among some voters there is support for the initiative," which I heard recently on a local radio station, provides another example: except for initiatives that are unanimously hated, this is always the case.
Â Â Â The great French mathematician Laplace wrote, "The theory of probabilities is at bottom nothing but common sense reduced to calculus." Voltaire, his much older contemporary, added, "Common sense is not so common."
Stories as Context for Statistics
Unfortunately, people generally ignore the connections between the formal notions of statistics and the informal understandings and stories from which they grow. They consider numbers as coming from a different realm than narratives and not as distillations, complements, or summaries of them. People often cite statistics in bald form, without the supporting story and context needed to give them meaning.
Â Â Â Part of context is internal and attitudinal. As will be discussed in a later chapter, people don't fully realize that how we characterize people and events, how we view their circumstances and context, and how we imbed them into stories often determines to a large extent what we think of them. For example, if we describe a person, Waldo, as coming from country X, 45 percent of whose citizens have a certain characteristic, then it seems reasonable to assume (if we know nothing else about him) there is a 45 percent probability that Waldo shares this characteristic. But if we describe Waldo as belonging to a certain ethnic group, 80 percent of whose members in the region comprising countries X, Y, and Z have the characteristic in question, then we will most likely conclude the chances are 80 percent that Waldo shares this characteristic. And if we describe Waldo as belonging to a nation-X-wide organization, only 15 percent of whose members have this characteristic, then we are likely to state that his chances of having the characteristic are only 15 percent. Which (combination of) descriptions we employ to an extent is up to us, so the pleasingly precise statistics we confidently cite are as revealing of us as they are of Waldo (who, just for the record, does not have the characteristic).
Â Â Â More commonly, the problem is not with our attitudes but with our knowledge. We are simply unaware of the external context of most statistics we read or hear about. The contextual questions we should ask when reading news stories, for example, are the very ones statisticians ask when presented with a survey of some sort. We want answers to questions such as how many, how likely, and what percentage, of course. But we also want to know whether the numbers on homelessness or child abuse, say, come from police blotter reports (in which case they are likely to be low) or whether they come from scientifically controlled studies (in which case they are likely to be somewhat higher) or whether they come from the press releases of groups with an ideological axe to grind (in which case they are liable to be extremely high--or extremely low, depending on the ideology).
Â Â Â Without an ambient story, background knowledge, and some indication of the provenance of the statistics, it is impossible to evaluate their validity. Common sense and informal logic are as essential to this task as an understanding of the formal statistical notions; both are preconditions for numeracy. Although many stories need no numbers, some accounts without supporting statistics run the risk of being dismissed as anecdotal. Conversely, while some figures are almost self-explanatory, statistics without any context always run the risk of being arid, irrelevant, even meaningless.
Â Â Â Consider two recent items in the news, the Consumer Price Index and the birth order effect among siblings. Understanding the CPI's considerable impact on the economy requires that one have an appreciation not only of rates and exponential growth but also economic theory, taxation codes, partisan politics, and psychology. Many economists have suggested that the CPI, which tracks the price of a relatively fixed basket of consumer goods, gives too high an estimate for inflation and will cost the government hundreds of billions of dollars over the next decade in increased costs of programs and reduced tax revenues. Surprisingly, the argument does not involve mathematics or even economics so much as it does psychology. Many believe the overestimate results from the fact that the CPI ignores improvements in the quality of goods (televisions and cars, for example), the introduction of new goods (the notebook computer on which I'm writing), and the substitution of one good in the basket for another not in the basket (chicken for beef when the latter's price rises). The CPI's alleged overestimation is a story in which mathematics plays an important role, but one in which the issues of tax law, social practice, and personal psychology provide the essential context.
Â Â Â A similar point can be made about the birth order effect, the topic of a book by Frank Sulloway in which he maintains that despite sharing 50 percent of their DNA siblings differ systematically from each other owing to their order of birth. Sulloway ascribes this difference to family dynamics: firstborns establish a niche in the family, and to protect it they remain more attentive to their parents' desires and therefore tend to be conservative and supportive of the status quo. Laterborns must find more creative ways to compete with their older siblings for parental favors, and thus tend to be more innovative. Both the topic and the book are huge and statistics plays a key role in Sulloway's argument, but as in the case of the Consumer Price Index, the ambient story and its assumptions are necessary and open to criticism (even if the formal mathematics is unexceptionable).
Â Â Â Why, for example, are only children considered firstborns? They're also the "babies" of their families. Is functional birth order (due to adoption, sibling death, desertion, etc.) a reasonable substitute for biological birth order? How does one decide whether a scientist or political figure (those whom Sulloway studied) should be classified as a conservative or a liberal? What effects might be a consequence of limiting the study to only those historical figures famous enough to have been written about?
Â Â Â Without going into such complex issues, I do want to stress that denial of the mutual dependence of stories and statistics--and the pedagogy that results from such denial--is one reason for the disesteem in which statistics, and mathematics and science generally, are widely held. Its practitioners are simultaneously hailed as awe-inspiring geniuses and summarily dismissed as ivory-tower eccentrics. (Most of the time they are neither, sometimes one or the other, rarely both.) Describing the world may be thought of as an Olympic contest between simplifiers--scientists in general, statisticians in particular--and complicators--humanists in general, storytellers in particular. It is a contest both should win.
Sketch for a
Mathematical Short Story
Stories not only provide context for statistical statements but can illustrate and vivify them as well:
Â Â Â A bookish, somewhat nerdy man is telling his kids the Leo Rosten story about the famous rabbi who was asked by an admiring student how it was that the rabbi always had a perfect parable for any subject. The rabbi replied with a parable about a recruiter in the Tsar's army who was riding through a small town and noticed dozens of chalked circular targets on the side of a barn, each with a bullet hole through the bullseye. The recruiter was impressed and asked a neighbor who this perfect shooter might be. The neighbor responded, "Oh that's Shepsel, the shoemaker's son. He's a little peculiar." The enthusiastic recruiter was undeterred until the neighbor added, "You see, first Shepsel shoots and then he draws the chalk circles around the bullet hole." The rabbi grinned. "That's the way it is with me. I don't look for a parable to fit the subject. I introduce only subjects for which I have parables."
Â Â Â A stricken look crosses the man's face as he closes the book, hurries his kids off to bed, distractedly bids his wife good night and retreats to his study, where he starts scribbling, making calls, and performing calculations. The idea for a lucrative con game grows clearer in his mind. The next day he does some research, stops by the post office, and for the next two evenings sends letters to thousands of known sports bettors "predicting" the outcome of a certain sporting event. To half of these people he predicts that the home team will win, to the other half that it will lose. His con depends on the simple fact that whatever happens in the sporting event, he is right for half the bettors.
Â Â Â His wife wonders at their huge postage bills and secret telephone calls and nags him about their worsening financial and marital situation. The following week he sends out more letters and makes another prediction, but this time to only half of the people for whom he has been right; the other half he ignores. To half of this smaller group he predicts a win in another sporting event, to the other half a loss. Again, for half of this group his prediction will be correct, and thus for one-fourth of the original group he will be correct two times in a row. To half of this one-fourth he predicts a win the week after that, to the other half a loss; again he ignores those to whom he has made an incorrect prediction. Once again he is correct, for the third straight time--that is, for one-eighth of the original population. He continues in this way to extend his string of "successful predictions" to a smaller and smaller group of bettors. With great anticipation he then sends a letter to those who are left in which he points out his impressive string of successes and requests a substantial payment to keep these valuable and seemingly oracular "predictions" coming.
Â Â Â He receives many payments and makes a further prediction. Again he is correct for half of the remaining bettors and drops the half for whom he is incorrect. He asks the former for even more money for another prediction, receives it, and continues. Finally, with only a few bettors left, one of them, a rough underworld type, traces the man down, kidnaps him, and demands a prediction on whose strength he plans to bet a lot of money. The kidnapper threatens the man's family, and not understanding how he could be the recipient of so many consecutive correct predictions, refuses to believe that this is a con game. The man makes some interesting philosophical points in an effort to convince the kidnapper that he is not divine. The nerdy scam artist and the muscled extortionist are a study in contrasts: they speak different languages and have different frames of reference but seem to have similar attitudes toward women and money. Under extreme duress the con man makes a prediction that happens to be correct, and the kidnapper, more convinced than ever that he is in control of a money tree, now wants to bet all his assets and those of his associates on the next prediction.
Â Â Â The denouement involves the man's mistress, on account of whom he originally felt the need for the extra cash supplied by his scam. She is instrumental in his escape from his captor before he makes an incorrect prediction that would result in his family being murdered. Using an ingenious code between them they manage to give pause to the kidnapper and scare him from ever bothering them again. In the last scene he is working the same scam, this time with a stock market index since he wants a higher-class clientele. He is married to his mistress but has another on the side, who is beginning to make even bigger money demands. He sits at his desk, doodling little bullseyes and targets on an envelope.
* * *
Â Â Â The branching possibilities idea in this sketch comes naturally to a probabilist or statistician, since so-called tree diagrams (which date from Dutch mathematician Christian Huygens in the late seventeenth century) are useful for determining the probability of sequences of events. But tree diagrams are also helpful when thinking of the choices facing characters in stories or when considering plot turns that are more externally driven. Each path along the branches of the tree of possibilities (it may help to visualize the tree growing to the right with time rather than up) corresponds to a sequence of choices by the characters or to other turns of the plot, while sub-branches and twigs correspond to various digressions and diversions. And so the forward branching, sideward digression, and occasional backtracking at various levels and scales can be taken as a model for how we generally tell stories.
Â Â Â This forking of reality suggests the increasingly popular idea of computer-generated fiction in which linear progression through a story would not be necessary. One wouldn't read them so much as wander around in them. Without a conventional story line there would be indefinitely many narrative excursions, not all of them unified by the consciousness of one protagonist. After reading a passage, one would proceed forward linearly, backtrack to a previous passage, or move sideward by focusing (clicking) on any significant word or phrase in the passage and being directed to an elaboration of it. The virtue of this arboreal proliferation of digressions presumably would be the evanescent, open-ended, lifelike feel it would provide the reader-browser.
Â Â Â Ideally, one would read only those developments, asides, and vignettes he or she finds intriguing. It would be interesting if the imagined text/software had a quiz at the end, the answers to which would be dependent on which portions the reader had selected. Even in such a mammoth text as this one would not be able to develop every conceivable fork the tale might take. Artistry is required to overcome the combinatorial explosion of possibilities and seamlessly bind and weave material to create the illusion of free choices and unbounded bifurcation. At crucial junctures, for example, there might be few, if any, alternatives. The effect, like that of a stream of water through a bottleneck, would suggest the protagonist's single-mindedness at such moments.
Â Â Â Done right (and I have not yet seen an example that comes close to being satisfying), the almost sentient matrix of diversion, digression, and horizontal movement within such a work would animate the characters and foster a greater sense of identification with them. Details, large and small, on matters both critical and trivial, would tumble forth from such a multidimensional chronicle and help truly animate its milieu and time. Mathematicians often speculate about what Archimedes, Gauss, Poincare, or other mathematical virtuosos of the past might have accomplished with the checking and searching capacities of a computer. I wonder what Sterne, Joyce, Borges, or others whose works are reminiscent of what I am envisioning might have done with electronic help. For all of its dense sprawl, one might come away from such a text with as vivid and precise a grasp of virtual individuals and their circumstances as it is possible to have.
Â Â Â Of course, such a work might be dismissed as a mere technical curiosity. A more likely obstacle to its imminent creation is the dearth of writers capable of literary nuance and psychological subtlety and also of the architectonic vision and software skills necessary to articulate such a complex branching "story."
The Different Scopes of
Stories and Statistics
Zillions of stories, from the Iliad and Odyssey to art films and television soap operas, and zillions of surveys, polls, and studies demonstrate the many contrasts between stories and statistics. ( Zillions is a useful word even for a mathematician; it certainly beats "an indeterminately large number." The words umpteen and oodles are also useful.) One major difference is that in storytelling the focus is almost always on individuals rather than analyses, arguments, and averages. Such a focus is a necessary corrective to overweening abstraction and keeps the statistics in human perspective.
Â Â Â Even if they are true, to take an extreme example, there is something inhuman and vaguely pornographic about statistics that maintain that since half the people in the United States are men and half are women, the average American adult has one ovary and one testicle. Or that the average resident of Dade County, Florida, is born Hispanic and dies Jewish. Pornography, though, with its loosely bound sequences of storyless sexual couplings (or triplings) often has the feel of a statistical survey.
Â Â Â But a focus on individuals can be deceptive and manipulative and can distort discussions of public policy issues, especially those involving health and safety. A poignant television story of a victim of a rare reaction to a vaccine can render invisible the vast good brought about by this same vaccine. There are countless examples of such media-induced bathos.
Â Â Â Some writers try to enjoy the virtues of both individual accounts and statistical surveys by improperly conflating them. The result doesn't so much bridge the gulf between them as fall into it. A typical example is the convention of conjuring up some "representative" person--a fictional Jeremy, Linda, or Kevin (but never a Waldo or Gertrude)--to endorse or exemplify whatever statistical conclusion a newspaper or magazine article has reached. (Janet Cooke of the Washington Post had a Pulitzer Prize taken away from her for taking this practice to extremes.)
Â Â Â A number of other critical aspects of the gap between statistics citing and storytelling derive from the fact that, as the proverbial writing teacher's maxim enjoins, a story shows, rather than tells. Stories may employ dialogue and other devices and do not limit themselves to declarative pronouncements; they develop the context and relevant relationships instead of merely positing raw data; they are open-ended and metaphorical, whereas statistics and mathematics generally are determinate and literal; and stories unfold in time instead of being presented as timeless.
Â Â Â Stories presuppose a particular point of view (or possibly several) rather than offering an agentless impersonal overview as do statistics. Consider, for example, the notion of a probabilistic distribution of the weights of females in a certain population. Via a formula or graph (such as the well-known normal curve, or belly-shaped curve, as one of my students aptly named it), it gives one a god's eye of the fraction of women within any given interval of weights. From the distribution one can read off the heaviest weight, the lightest, the most common, the least common, and much else. All the information is there in one snapshot, but devoid of the draconian diets, ice cream, cookies, gorging, and fasting of any particular woman.
Â Â Â For better and for worse, individual stories are more elemental than statistics and hence more emotionally evocative. Phrases such as "betrayed his wife," "hair blowing in the breeze," and "reeking underarm rot" never appear in scientific studies. Instead we get phrases like "72.6 percent thought," "the correlation between," and "margins of error." Even in a domain as saturated with statistics as baseball, the romance of Babe Ruth's story makes his former records of 60 home runs in a season and 714 in a career somehow grander than the newer records set by Roger Maris and Henry Aaron, respectively (even for one such as I, an erstwhile fan of the erstwhile Milwaukee Braves).
Â Â Â Yet there are amalgams of statistics and stories that do, to an extent, bridge the two. In this fuzzy middle ground we find Rashomon -like stories that portray many disparate views of the same set of events. Here too are ensemble stories (like many television series) that interweave accounts of each member of a group of related people, and also San Luis Rey stories that loosely tie together the doings of many unrelated people. The more people or viewpoints considered, however, the flatter and more featureless they must be, and the forward progression of time gradually slows into the cross-sectional nowness of most statistical snapshots and surveys (although there are subdisciplines of statistics--stochastic processes and time series--where the concern is with the evolution of variable quantities through time).
Â Â Â A computer analogy is helpful. If we think of conventional stories as being told from one point of view (just as a serial processor performs one calculation at a time), then statistics may be thought of as providing a view from nowhere in particular (many parallel processors performing simultaneous calculations). Between them are amalgams, which may be thought of as a varying number of variously connected viewpoints (processors). Combining the virtues of these two very different ways of apprehending the world--through stories and statistics--can be considered a literary analogue to a common problem in computer design and architecture.
Too Many Characteristics,
Not Enough People
The right balance between depth of characterization and the number of characters, however, isn't always clear. In stories, as in everyday life, we interact with relatively few people personally, but they are real three-dimensional folk (actually, in a mathematical sense they are N-dimensional folk for large values of N ). They possess or are associated with an indeterminately large number of possible traits, circumstances, relationships, informal rules, and agreements. We certainly do not know everything about those closest to us (or even about ourselves), yet we are aware implicitly of so many details and richness of context that writing it all down would make us all bad novelists. Even to those we don't know well we can attach a dozen adjectives, a few adverbs, and a couple of anecdotes. Contrast this abundance of personal particulars with most scientific studies where, while there may be a very large number of people (or other data), the people surveyed are flat, having only one or two dimensions--who they will vote for, whether they smoke, or what brand of soft drink or laxative they prefer.
Â Â Â Stories and statistics offer us the complementary choices of knowing a lot about a few people or knowing a little about many people. The first option leads to the common observation that novels illuminate great truths of the human condition. Novels are multivalent and bursting with ironies, details, and metaphors, while social science and demographic statistics can seem simpleminded and repellingly earnest by comparison. We can easily delude ourselves, however, into thinking that more of a general nature is being revealed to us by a memoir, personal reminiscence, novel, or short story than is truly the case. Biased and small samples are always major problems, of course, but my caveat arises from something more specific: the technical, uneuphonic statistical notion of an adjusted multiple correlation coefficient.
Â Â Â If the number of traits considered is large compared to the number of people being surveyed, there will appear to be more of a relationship among the traits than actually obtains. Imagine a study that examined only two people and two characteristics, say intelligence and shyness. Imagine further a graph with degrees of intelligence on one axis and degrees of shyness on the other, and two points on it corresponding to the two people. If the shyer of the two were more intelligent, there would be a perfect correlation between the two traits and a straight line connecting the two points on the graph. More shy, more intelligent. But if the shyer of the two were less intelligent, there still would be a perfect correlation between the two traits and a straight line pointing in the opposite direction connecting the two points. More shy, less intelligent.
Â Â Â You can find perfect correlations that mean nothing for any three people and three characteristics, and in general for any N people and N characteristics. The number of characteristics need not equal the number of people. Whenever the number of characteristics is a significant fraction of the number of people, the so-called multiple correlation among the characteristics will suggest spurious associations.
Â Â Â To tell us anything useful, multiple correlation analysis must be based on a relatively large number of people and a much smaller number of characteristics. Yet the insights that commonly come from stories and everyday life are precisely the opposite. We each know in a full-bodied way relatively few people, and for these people the number of characteristics, relationships, characteristics of relationships, relationships of characteristics, and so on that we are aware of is indeterminately large. Thus we tend to overestimate our general knowledge of others and are convinced of all sorts of associations (more complicated variants of "more shy, less intelligent") that are simply bogus. By failing to adjust downward our multiple correlation coefficients, so to speak, we convince ourselves that we know all manner of stuff that just isn't so.
Â Â Â Just as stories are sometimes a corrective to the excessive abstraction of statistics, statistics are sometimes a corrective to the misleading richness of stories.
Stereotypes, Whimsy, and
Statistical Conservatism
The alternative in everyday life to probabilistic calculations and explanations is the amorphous "discipline" of common sense and rough appraisal. Rather than presenting rigorous proof or careful calculation for fixed propositions, common sense involves thinking in terms of scenarios and situations, empathizing and identifying with people, responding to conversations and weighing observations, and then finally coming to a tentative, sometimes fickle judgment. The knowledge that results is qualitative, imprecise, and context-bound. Common sense often is couched in the language of probability, but attaching a particular number, a precise probability, to a possible outcome is frequently (81.93 percent of the time) an exercise in fatuity. Yet the specter of unwarranted precision seldom deters those who want to give their hunches an air of scientific respectability.
Â Â Â Rather than invoking precise probabilities, in our everyday approach to life we find it more natural to deal with rules of thumb and approximate categories; in other words, with stereotypes. Although many assume that stereotypes are always evil vestiges of a benighted mind-set, more often they are essential to effective communication and have themselves been unfairly stereotyped (assuming a concept can be treated unfairly). Many stereotypes permit the economy of expression necessary for rapid communication and effective functioning. Chair is a stereotype, but one never hears complaints from bar stools, recliners, beanbags, art deco pieces, high back dining room varieties, precious antiques, chaise longues, or kitchen instances of the notion. Stereotypes, of course, admit of all sorts of exceptions that upon further examination in individual cases are easily apparent, but this does not mean they should or even can be universally proscribed. Complexity, subtlety, and precision cost time and money, and these expenditures often are unnecessary and sometimes even obscuring.
Â Â Â Recognition of common stereotypes and knowledge of recurring stereotypical situations such as restaurant behavior, retail purchasing, hygiene practices, audience deportment, and so on are essential for navigating through everyday life. Approaches to artificial intelligence, in particular that of computer scientist Roger Schank and others, have reinforced the observation that we chart our course and communicate with others by invoking common types, scenarios, and scripts as shorthand. Like statistical notions, stereotypes do violence to particular cases and individuals but pay their way by summarizing general information the many exceptions to which would be too time-consuming to note.
Â Â Â Of course, I don't dispute that stereotyping people can stimulate unthinking, cruel, and self-fulfilling prejudice, and I'm most certainly agin that. And yet when we meet or even catch a glimpse of someone, there is a tendency to construct (all right, I tend to construct) an instant biography of that person, and in the process, to make all sorts of immediate appraisals. I still find it hard not to draw far-reaching (and often mistaken) judgments, for example, about a person who uses the phrase "between you and I."
Â Â Â But foregoing speculation altogether seems too harsh a solution to the problem of stereotypes. Taking the liberty of reporting a seeming case of prescience on my part, I remember reading the Unabomber's manifesto several years ago on a mountaintop in Maine and guessing from its tone, content, and structure that its author was a mathematician. Later, at the time of his arrest, I wrote an Op-Ed piece for the New York Times to that effect. It aroused so much ire among some mathematicians, who thought it besmirched their reputations, that the Wall Street Journal devoted a long article to the resulting fracas. In the essay I opined that Theodore Kaczynski's Ph.D. in mathematics was perhaps not quite as anomalous as it seemed (despite the fact that mathematicians are for the most part humorous sorts, not asocial loners, and that the only time most of us use the phrase "blow up" is when we consider division by zero).
Â Â Â Even in such rare cases it is difficult to suppress hasty judgments and stereotyping; perhaps it is unwise even to try. Nevertheless, if we attempt to keep judgments tentative and back them up to the extent possible, no great harm is done. Unfortunately, I frequently run into people who make no such effort. They claim with an air of dismissive certainty that someone is a racist, or a secret admirer, or is extraordinarily wealthy, or gay, or something else. Usually these assertions are based on some ineffable complex of traits that simply must be recognized. Unless he or she is well known, the person thought to possess the trait is rarely interviewed or investigated (legitimately) to determine if he or she truly has it; some of the hunches are occasionally discovered to be correct by other means, and this is taken to certify them all.
* * *
Â Â Â While stereotypes may be a bridge between statistics and stories, like bridges they are sometimes old, rickety, and unreliable. Statistical conclusions, unlike stereotypes, must undergo stringent tests. This point is usually dismissed as statistical nit-picking; after all, "everybody knows" whatever is being asserted. I have a version of this bias myself: people who make frequent claims about what everybody knows are fools. But everybody knows this.
Â Â Â Statistical decision-making is a drab, conservative process unlike the spirited snap judgments that characterize personal appraisals. The so-called null hypothesis in statistics is the assumption that the phenomenon, relationship, or hypothesis under observation is not significant but merely the result of chance. To reject the null hypothesis it is conventional to require that the probability of the phenomenon occurring merely by chance be less than 5 percent. (This is the source of the story about the statistician who witnessed the decapitation of twenty-five cows, noted that one survived the ordeal, and dismissed the phenomenon as not significant.) In my peregrinations through this world I've observed that few people regularly make decisions like this in their personal lives; it would be too boring even if such precision were possible. (Having offered a partial defense of stereotypes, I should mention that a common related stereotype of statisticians is that they are people who chose their profession because they couldn't stand the excitement of accounting.)
Â Â Â The idea of boredom suggests yet another difference between stories and statistics. In listening to stories we are inclined to suspend disbelief so as to be entertained, whereas in evaluating statistics we are inclined to suspend belief so as not to be beguiled. In statistics we are said to commit a Type I error when we reject a truth and a Type II error when we accept a falsehood. Of course, there is no way to always avoid both types of error, and we have different error thresholds in different endeavors. Nevertheless, the type of error people feel more comfortable making gives some indication of their intellectual personality type. People who like to be entertained and beguiled and hate the prospect of making a Type I error may be more likely to prefer stories to statistics. Those who do not like being entertained or beguiled and hate the prospect of making a Type II error may be more likely to prefer statistics to stories. In any case, this speculation is a short story with no statistics to back it up, so make of it what you wish.
Â Â Â Although wrong much of the time, we have more confidence in our own gut decisions than we do in public ones. We all (not just right-wing Republicans) distrust decisions that are made far from us. We insist on exacting statistical protocols in public decision making, yet oftentimes accept the sloppiest reasoning from those closest to us. In small groups there is trust and little perceived need for statistics. As Theodore Porter has shown in his Trust in Numbers , quantitative methods and controls often arise owing to the political weakness of expert communities and a suspicion of their findings by the larger community. Those anticipating distrust are most likely to undergird their conclusions with substantial statistics, or at least adorn them with fake statistical finery.
Â Â Â The sheer impersonality of statistics is attractive to those who dislike the messiness, intimacy, and (melo) drama of particular stories, situations, and people. Stories, it would seem, appeal more to stereotypical women, statistics to stereotypical men (according to conventional wisdom; I have no statistics on whether this is true of real men and women). The conservatism and impersonality of statistical practices is one source of their trustworthiness, while the whimsy and diversity of personal stories is one source of their appeal.
* * *
Â Â Â Since probability and statistics are formalizations of our pretheoretical intuitions, they usually accord reasonably well with our gut feelings. Still, these disciplines have developed a life of their own independent of our attitudes and beliefs, and in many situations statistics tells us our gut feelings have been led astray. People best respond in small groups where they can use their folk wisdom about others' intents and purposes and where their psychology gives them insight into others' stereotypical behaviors and actions. It is in this realm too where our narrative intuitions are surest, where a few telling details are often sufficient to sketch in a whole world. I conclude with an example from Leonard Michaels's collection of very, very short stories, I Would Have Saved Them If I Could . "The Hand" is a fifty-nine word incompressible psychological nugget almost mathematical in its spareness:
I smacked my little boy. My anger was powerful. Like justice. Then I discovered no feeling in the hand. I said, "Listen, I want to explain the complexities to you." I spoke with seriousness and care, particularly of fathers. He asked, when I finished, if I wanted him to forgive me. I said yes. He said no. Like trumps.
Copyright Â© 1998 John Allen Paulos. All rights reserved.
Introduction | p. 1 |
Between Stories and Statistics | p. 7 |
Between Subjective Viewpoint and Impersonal Probability | p. 35 |
Between Informal Discourse and Logic | p. 79 |
Between Meaning and Information | p. 137 |
Bridging the Gap | p. 179 |
Selected Bibliography | p. 203 |
Index | p. 207 |
Table of Contents provided by Blackwell. All Rights Reserved. |