INTRODUCTION

INTRODUCTION: This blog is my effort to help improve the understanding of numbers, especially as used by the press and in research reported by the press. I hope journalists will find it useful to improving the quality and validity of what they write. The topics are chosen from items I encounter with depressing frequency, in which failure to understand what they are saying or reporting leads journalists to write material that may mislead the public and result in ill-advised policy decisions. Please understand that my comments do not reflect my opinions of the subject matter. I protest misleading information even when it supports my opinions.

Monday, September 17, 2012

Big Numbers - Deaths of Children


A brief article by Donald G. McNeil Jr. (“Young Children’s Deaths Said to Drop Again in 2011”, NY Times 12 September 2012 ) reports that fewer than* 7 million children died from disease and birth complications in 2011—a new low.  The problem with this type of statement is that it offers no perspective for the “7 million”, so the reader does not understand the import of the number.  It would make no difference to most of us if the story had said 70 million.  It is quite simple to discover that the world population of children aged 0-4 was close to 624 million in 2011, so that the 7 million deaths represents about 1.1 percent.  Now we understand something of what the number means.

*I note meaningless phrasings.  One hopes that by "fewer than" the author means "approximately".

Friday, September 7, 2012

CORRELATION vs CAUSALITY


This is a very widely recognized problem in science—to mistake a simple relationship for a causal relationship. Here’s a simple example: as a general matter tall adults tend to eat more food than short adults. Does that  mean that eating more will cause one to become taller?  We understand that that is silly, that height is fairly fixed in adults, and that it is tallness that results in eating more because tallness goes along with being larger and so needing more food. But all too often, we see such a relationship and assume it means that one thing causes the other without considering other possibilities. This post will contain a growing list—ultimately a long list—of examples.
See my blog post, Happiness and Having Children for one example.
Here are others

It is known that adults who were breast-fed as babies report somewhat greater happiness than those who were not breast fed. I suspect that most of us immediately see that statement as suggesting that breast feeding improves a child’s chances of happiness in adulthood.   But it is surely plausible that the causal relationship is that happy babies are more likely to be breast fed, and to become happy adults.

The convenient data set problem.



In an April 21, 2012 Article, “Sexual Strategies,” the Economist describes research by Robert Dunbar that purports to show that the sex of peoples’ best friends varies by age. This research illustrates two problems that journalists, in particular, ought to be sensitive to. I suspect that it is a common sort of research study that arises because of the discovery of an available dataset that the researchers adopt not because it is appropriate but because it's the best the researchers could find or afford, or simply because it seems to support the theory the researchers want to advance.

The first problem is the impossibility of knowing what population the research findings apply to. The population studied by Dr. Dunbar is the customers of an unnamed European mobile-phone operator, a sample that is almost certainly not representative of the population as a whole: the mobile-phone operator may be more popular with the young or with particular nationalities, or with the gay population, for example.  The mere fact that the sample is large is not sufficient to make the results meaningful; it must also be representative of some definable population. The problem is perhaps most obvious if we suppose Dr. Dunbar were to repeat the study in 5 years and identify differences in the results--we would have no way to know whether those differences represented changes in behavior of the subject population or changes in the composition of that population (e.g., that the mobile operator has become more popular among the elderly).  We just cannot know for what population Dr. Dunbar's results apply.

The second problem is that the research claims to provide information about peoples'  best friends, but "best friend" is defined as the person with whom the subject has most frequent cell phone contact.  That's clearly not how one would define "best friend" and it is surely debatable whether “most frequent phone contact" is a good way of identifying a subscriber's best friend.  Maybe most frequent phone contact is more typically with a child, child-care worker, or business colleague.  Maybe “most minutes of phone contact” would be a better definition of best friend, or quite possibly one tends to speak with their best friend mostly in person and relatively infrequently by phone.  The point is that the researchers have studied "most frequent phone contact" and rather arbitrarily labeled that as "best friend." Possibly they did not have information on the length of the phone contacts, so they had no other choice if they were to force this arbitrary data set to fit the subject they wanted to address.

It would be a huge improvement in reporting if journalists could recognize and point out these sorts of questions.

Thursday, September 6, 2012

Sloppy language.

I recently heard a statement like this on the radio [not an exact quote]: “According to a CNN poll, most Americans now narrowly favor the health care law, by a 52% to 48% margin.”  This statement brought to my mind the idea that most people were not strongly opposed or strongly supportive of the law, but instead supported the law in a "narrow," modest, but not especially committed, manner.
But then I realized that that is wrong; the survey indicated that the percentage who support the law narrowly exceeds the percentage who oppose it.  All could be adamant in their views (so that no person “narrowly” favors the law).  The careless phrasing of the statement inadvertently made it misleading in a very effective way.  It says that peoples' differences of opinion about he law are modest, which is probably very far from the truth.   Indeed, opinions seem to be rather strong, so the near-even balance of "pro" and "con" opinions might be said to be an extreme level of disagreement, more so than an 80% / 20% split of strong opinions.

But What Are We Taking About?

It's surprising how often one sees discussions of topics that, when you think about it, have no clear meaning.  We are reading about or discussing something without actually knowing what it is that we're taking about.
The Pew Research Center actually does an ongoing poll on one of these:
  • A new Pew Research Center survey of 2,048 adults finds that about two-thirds of the public (66%) believes there are “very strong” or “strong” conflicts between the rich and the poor.
Suppose you were asked to what extent you thought there were conflicts between the rich and the poor. I don't know how you or others might respond, but I would ask, "What do you mean by 'conflicts' among the rich and poor?"  Fistfights? Arguments? General dislike of each other?  Well, you'd think that Pew might offer a definition, but they don't. So what, exactly, did they ask their respondents?  In its summaries, the Pew Centers says that the question they asked is simply "In America, how much conflict is there between poor people and rich people: very strong conflicts, strong conflicts, not very strong conflicts, there are not [sic] conflicts?”  But that's not really true. If you dig far enough down into their report, you find that the full question, which was read to the respondent by the Pew interviewer, was this: "In all countries, there are differences or conflicts between different social groups. In your opinion, in AMERICA, how much conflict is there between poor people and rich people: ..."  So the respondents are first told that there are differences or conflicts between different social groups.  How does this affect the likelihood that respondents will answer that "there are not conflicts,"  or "what do you mean by 'conflicts'?"  Note that even respondents  who "understand" the question will likely have varying ideas of its meaning. 

So we have a survey question of no obvious meaning asked in a very biasing manner. 
And then we have the NY Times dutifully reporting the results without any discussion of what it is supposed to mean ("Survey Finds Rising Perception of Class Tension" By Sabrina Tavernise, January 11, 2012 ).

I don't know about you, but I have enough  meaningless stuff in my brain already. 

Immigrants Are Crucial to Innovation, Study Says [NOT]

This one is from the NY Times  (Andrew Martin, "Immigrants Are Crucial to Innovation, Study Says" June 25, 2012).  The problem is that the study doesn't really say that, but it is from an obviously biased source and may be calculated to mislead us to think that's what it says. The source is the Partnership for a New American Economy, which backs increasing allowable immigration of skilled foreign workers.
Here's what it actually says: "76% of patents awarded to the top 10 patent-producing US universities in 2011 had at least one foreign-born inventor.” Whether that means that "immigrants are crucial to innovation" depends on how many inventors there are in a typical patent: if only 1 or 2, then if 76% of those 1 or 2 include an immigrant, immigrants are clearly playing a major role in these inventions.  But if patents typically have, say, 20 inventors, then the fact that 76% of them include at least one immigrant inventor doesn't mean much.  Consider an extreme analogy: "Study reports that 99% of basketball audiences include at least one member of the mafia."  Would that lead to concern that NBA attendance was heavily dependent on the mafia?
I'm not sure what is typical for the number of inventors per patent (I've seen 20), but the study authors almost certainly had the relevant figures, and they could have told us the average percentage of immigrants for those inventions.  But they didn't give us that unambiguous figure.  I wonder why.  The author of the Times article should also have wondered why, and not simply parroted the study's meaningless-but-misleading statement.
The article was promptly picked up by other news outlets, which predictably took the bait and claimed that immigrants were "responsible" for 76% of patents. So Partnership for a New American Economy got it's wish: to get the press to misread and misrepresent a truthful but meaningless statement.


The Demon "Percentage" - #1

Percentages, although often very useful, are also easily misunderstood.  This is a recurring theme.

I recall reading an article that compared the accuracy of voice recognition programs.  It compared one that was 97% accurate to another that was 99% accurate, and concluded that the two weren't very different in accuracy.  While it's true that 97 and 99 are in general not too different, they are very different as percentages: one means 3% errors and the other means 1% errors!  The 1% error rate program is far more accurate than the one with a 3% error rate.
Remember to think about percentages along with their opposites, especially when close to the 0% and 100% extremes.