Tuesday, January 25, 2005

A Numerical Diversion

When is a number not a statistic? Plenty of the time.

Daniel Okrent, the Public Editor of the New York Times (And probably more famous to baseball fans as the creator of rotisserie baseball and the author of Nine Innings) writes an excellent column about how valueless many of the numbers you see in the media are. Although he saves his sharpest jabs for the news divisions, it has direct implications with the quality of sports writing we read.

He writes that the numbers used in print often aren’t important to the story, or are included without any sort of context of frame of reference.

Number fumbling arises, I believe, not from mendacity but from laziness, carelessness or lack of comprehension. I'll put myself in the latter category (as some readers no doubt will as well, after they've read through my representation of the numbers that follow). Most of the journalists I know who enter the profession comfortable with numbers write about sports, where debate about the meaning of statistics is a daily competition, or economics, a field in which interpretation of numbers will no more likely produce inarguable results than will finger painting.

So it is left to the rest of us who write for the paper to stumble through numbers, scatter them on the page and hope that readers understand. Does it matter if many of these figures are meaningless symbols serving the interests of the parties that issue them? Take a variety of reports on some recent lawsuits: A man is suing the city for $20 million arising from charges, eventually dismissed, brought against him for kidnapping and sexual abuse [story]. The mother of the football player Derrick Thomas, who died in 2000, is suing General Motors for $75 million [story]. Villagers on an Indonesian island are suing Newmont Mining Corporation for $543 million [story]. Not one of these numbers is grounded in anything more substantial than the imagination of a plaintiff's lawyer, but each is given the authority of print.

No different, really, was Wednesday's assertion that Bernard J. Ebbers, if convicted of all charges in the MCI-WorldCom accounting scandal, "could be sentenced to as much as 85 years," a formulation that bears no relationship to any conceivable outcome yet serves the prosecutor's public case very nicely [story].

He goes on to cite the use of real versus nominal dollars, which are notoriously abused by the film industry, as one example where newspapers are more than willing to regurgitate the studio’s latest press releases, instead of applying critical thinking to what the numbers mean--or in this case, don’t mean.

In a supplementary posting, he uses batting average as an example of another misused number.

All three of these imprecise and generally unhelpful numbers [batting average, the unemployment rate and the Dow Jones Average] has, through overuse and under-explanation, become part of the language, but none means what it purports to mean. Even writers who know better will at times resort to them when they’re too hurried, too lazy or too weary to search for alternatives or to pause for elaboration.

A few weeks ago, David Leonhardt – who moonlights from his usual gig as an economics reporter to write a biweekly column on sports statistics – sought to establish that outfielder Carlos Beltran hadn’t had such a great season last year, and clinched his case by noting that Beltran’s batting average of .267 was “good for 118th place in baseball.”

No journalist I know understands numbers as well as David does, no one has taught me more about them, and few, I'm happy to say, are as good-natured as he is. That’s why I don’t think he’ll mind if I point out that in July, just six months before he used it to evaluate Beltran, Leonhardt described batting average as “a flawed measure of performance.”

Such is the persistent power of a bad number – it can bring the best of us to our knees.

What is the cause though? Is it just laziness? In some cases, I’m sure it is. But is it also ignorance? I think that that’s a pretty huge factor too, especially in sports writing. It seems like there are many writers who are quick to write off anything more advanced than average and RBI and many who flaunt their ignorance as a badge of honor. (Probably, in part, because of the overbearing attitude and pomposity of many in the stathead community.)

Some writers seem to get it better than others. It’s not enough to know that Vinny Castilla hit 35 home runs last year. It’s not enough to know that Cristian Guzman fielded .983 last year. You need to know context (Coors Field and a range factor right around league average, respectively) to give those stats any sort of meaning.

By not providing context, the writer is doing a disservice to his reader. Without a critical (which does not have to mean negative) look at the numbers and stats, they’re doing nothing more than publishing a public relations piece.

Everyone knows that words mean things, although the same word can mean greatly different things depending on the words surrounding it and the way in which it’s used. Numbers are no different. And what Okrent is saying, is that writers need to be just as careful with how they shape their numbers within a story as how they would shape their words.

That’s just as true if you’re covering the Nats, as it is if you’re covering an Appropriations bill.


  • Great post, Chris.

    It's probably some part laziness, some part ignorance, and some part arrogance (both by statheads and sports journalists). But I also think it is time. Ideas need time to take root and grow, for the base of knowledge to expand, etc. There are many Bill Conlins out there, old farts who'll call you a "stats geek" (and use it without hesitation or courtesy) if you cite a number with which they're not familiar. But there are also the Joe Posnanskis of the sports media out there, too, guys who grew up on the Abstracts and enjoy pondering the possibilities (and limitations) of numbers. And in a few years, there will be the guys who grew up with access to Prospectus and Primer and Rob Neyer and Capitol Punishment. And in a generation or less, the Bill Conlins will be outnumbered; they'll adapt or they'll die (well, probably literally . . .).

    This doesn't really address your point, though, because just because you have the 1983 Abstract sitting next to your toilet (hypothetically, of course!) doesn't mean you use numbers in a responsible manner. I'll freely admit that I don't, not consistently. It's part laziness, part slow home internet connection (i.e., insufficient time taken for research), part lack of knowledge of all of the resources at my disposal, and part whatever else.

    But you made me think about it.

    By Blogger Basil, at 1/26/2005 1:41 PM  

  • You're correct about the time aspect. I think much of this has to do with the internet and the spread of communication. Before the internet, we only knew the theories directly in front of our faces, which for most of us was Skip Carey or Joe Morgan. I had never heard of Bill James before I got on the internet--and actually, not until I started playing his classic baseball simulation. And, thanks to Ebay, I've read everything he's written.

    We're all prone to misusing numbers. Just as we're all prone to misusing the language. But, from what I've seen on your blog, you've got the right idea. (Not to imply that I'm the final arbiter!)

    By Blogger Chris Needham, at 1/26/2005 2:54 PM  

Post a Comment

<< Home