For a Statistically Savvy 2013
My print column offers tips, shared by statistics professionals and readers responding to my blog post, for how to make 2013 a more numerically savvy year. Not all the great suggestions could fit in the column, so here are some more, starting with those from readers:
Michael Dean, a senior marketing analyst in Minneapolis, wanted more clarity in weather news and forecasts. “How accurate are the five-, seven-, and 10- day forecasts?” Dean asked. “Can’t someone collect data on the predicted temperature various days in advance, and then see what temperature it ends up being? What is the range of error by the number of days out for the forecast? What times of the year or what regions of the country does this range vary the most? I am thinking I should ignore anything longer than a five-day forecast, but those may be off a lot, too.” (Some of Dean’s questions are answered by the website Forecast Advisor.)
Harvey Bale, a retired economist in Washington, D.C., wants to see monthly labor data news reports to include information about the chronically unemployed and the labor participation rate. “The narrow unemployment rate highlighted each month is relatively unimportant,” Bale said. “It masks the serious harm being suffered” by discouraged workers and involuntary part-time workers.
Dave Fitzpatrick, who works in marketing analytics in New York, wants to see more context around other numbers: Percentage changes from the year before, for instance, instead of just presenting raw statistics. “Too often we see aggregate statistics such as simple percentages cited without any context as to their direction and composition,” Fitzpatrick said. “A much more insightful way of communicating statistical results is to cite the percentage change or, better yet, predictive modeling results that can tell us the impact of one variable on another.”
Jeremy Schneider, another marketing-analytics professional, doesn’t want to see averages falsely smoothed out to create arresting statistics. “My pet peeve is when ads or articles cite murder rates or death rates by saying ‘That’s one murder every 10 minutes,’ or, ‘Someone is dying from starvation every five seconds,’ ” Schneider said. “That certainly might be the average murder or death rate but its not like every 10 minutes on the dot someone is dropping dead.” These stats may be used for a good cause, Schneider said, “but the impact is marred in my eyes by making that claim.”
Kelly Jackson, who teaches at Camden County College in Blackwood, N.J., would like to see better practices in charting, where the Y-axis should always start at 0 when possible. “One of the problems my students have is interpreting data and graphs that don’t use ’0′ as the starting point,” Jackson said. “Imagine a graph that starts vertically at 500 and shows bars of height 550 and 600.”
Richard Hoffbeck, a research data analyst in Minneapolis, would like to see more mention of study design in reports about medical research. “I think a small population case-control study done 20 years ago should be weighted differently than a large-scale experimental trial,” Hoffbeck said.
Judea Pearl, director of the Cognitive Systems Laboratory at the University of California, Los Angeles, cites as a statistical pet peeve “the century-old confusion between correlation and causation,” a point he elaborated on in a recent interview with American Statistician News. (Pearl is the father of Daniel Pearl, the Wall Street Journal reporter who was murdered in Pakistan in 2002.)
Brad Carlin, professor and head of biostatistics at the University of Minnesota, mentioned a lesson from the success of Nate Silver, election forecaster for the New York Times: “Never believe in just one poll; always take some sort of average of all the polls you respect.” Other forecasters who also aggregated polls had success in this election cycle.
New York University mathematics professor Sylvain Cappell offered a few tips. Among them: “There’s a widespread disinclination to recognize how often choosing between alternative courses involves making a judicious balance between quantities, and thus making specific numerical formulations to be able to compute the advantageous tradeoff point,” Cappell said. “Recognizing that there are tradeoffs involves qualitative thinking but after that there’s just no short-cut to computing to see where the tradeoff point actually lies.”
Cappell added that sometimes simple computations, not complex ones, can suffice to aid in decisionmaking. “It’s amazing, even in our complex modern world, how many assertions fail simple ‘back of the envelope’ reasonable estimates with elementary computations,” he said.
- 3:04 am December 29, 2012
- Jonathan Seder wrote :
Two more peeves:
Relative risk needs to be framed with absolute risk – if a drug cuts a mortality rate “in half,” the improvement is not very interesting to a general audience if the absolute rate falls from 2 in fifty million to 1.
Statistical significance should be distinguished from ordinary significance. One painkiller might be better than another with a high level of significance, but that statistically significant improvement might be minuscule, undetectable by consumers.
- 10:43 am December 29, 2012
- dqk wrote :
Time and time again, journalists misuse the word “percent”. They write, for example, “The market fell 11 percent” when the market fell 11 percentage points.
- 2:21 pm December 29, 2012
- Prof Luis Pericchi wrote :
interesting your article! insightful
There are turning tides: “From, damm lies and Statistics” to
“Statistics, the only manner to decipher reality, to disentagle its tricks, to separate signal
from the ocean of noises”
congratulations for the new tide in favor of statistitical thinking
- 11:52 am December 30, 2012
- SW16 wrote :
We often read nonsense such as “A earns ten times less than B”, when what is usually meant is “B earns ten times as much as A”. If the first was true, A would be paying his/her employer for the privilege of going to work.
Can anyone give me a real-world example of where “X times less than” is likely to be correct? If not, if journalists can’t be numerate, just ban the phrase “X times less than”
- 1:56 pm December 30, 2012
- Philip B. Stark wrote :
Thank you for bringing a spectrum of perspectives about numeracy.
Here are some basic concepts I see butchered all too frequently:
1) The sample is not the population.
2) The margin of error of is supposed to measure how far the sample-based result is likely to be from the results for the whole population, due to the luck of the draw in selecting the sample. The reported margin of error typically doesn’t take into account a variety of other sources of error, such as nonresponse and other biases. Such “non-sampling” errors can be much larger than the margin of error.
3) “Random” is not the same as “haphazard” or “arbitrary.” It is a term of art. Generally, you have to work quite hard to make things random–it doesn’t happen “accidentally.” In most situations where people talk about probabilities, the probabilities are fictions: there really isn’t anything random.
4) There is no such thing as “a statistically significant sampling” or “statistically significant sample size.” (I see this in legal documents frequently.)
5) Don’t confuse “the chance of observing what was actually seen, assuming the hypothesis is true” with “the chance the hypothesis is true, given the observations.” (This is a common misinterpretation of p-values.) A related garble is saying “there’s only an X% chance that this result could be due to chance” in place of “on the assumption that a specific chance mechanism generated the data, the chance of observing those data would be small.”
Here are some rules of thumb I’ve compiled for graduate students studying applied statistics:
* Consider the underlying science. The interesting scientific questions are not always questions statistics can answer.
* Think about where the data come from and how they happened to become your sample.
* Think before you calculate. Will the answer mean anything? What?
* The data, the formula, and the algorithm all can be right, and the answer still can be wrong: Assumptions matter.
* Enumerate the assumptions. Check those you can; flag those you can’t. Which are plausible? Which are plainly false? How much might it matter?
* A statistician’s most powerful tool is randomness—real, not supposed.
* Errors never have a normal distribution. The consequence of pretending that they do depends on the situation, the science, and the goal.
* Worry about systematic error. Constantly.
* There’s always a bug, even after you find the last bug.
* Association is not necessarily causation, even if it’s Really Strong association.
* Significance is not importance. Insignificance is not unimportance.
* Life is full of Type III errors.
* Order of operations: Get it right. Then get it published.
* The most important work is usually not the hardest nor the most interesting technically, but it often requires the most patience: a technical tour-de-force is usually worth less than persistence and shoe leather.
- 10:14 am December 31, 2012
- Steve D wrote :
No more reporting of rates of change of rates of change. When someone’s tax rate changes from 4% to 5%, that should not be reported as a 25% increase. Some anti-environmentalists like to say that some fishery stocks have doubled recently. They don’t tell you they went fro 1% to 2% of what they were 50 years ago.
More raw data. Don’t tell me someone pays only 10% of his income in taxes. Tell me how many dollars.
When you tell me it will cost billions to address climate change, show me the economic forecasting model you used and its accuracy rate.
- 3:02 pm January 2, 2013
- Michael Dey wrote :
Dear Mr. Bialik,
I enjoyed your important article of 12/29-30/2012, entitled “Statistical Habits to Add, or Subtract, in 2013.” Mr. Rodriguez, President of the American Statistical Association makes an excellent point on using experimentation to establish cause and effect. Beyond the randomized control trial (RCT) noted (also known as an A-B split in other fields), there are actually more powerful variations on the same theme for the same sample size as employed in testing a single change.
The power of evaluating 20-40 changes to status quo, in a live business environment, is enormous. Especially when applied to complex issues faced by healthcare and education. Large, orthogonal statistical design starts with unlocking the creative energy of organizations and ends with putting sacred cows and folklore to the test. End results are almost always surprising and leap to solutions re-proven in subsequent implementation. Advantages across industry are largely latent due to the greater popularity of small designs (such as A-B splits) with smaller return.
Performance tends to increase during a study as a result of standardization, without which applications in healthcare and education would be more problematic. Statistical design in fact stops any roulette that might accidentally occur.
It’s often though problems can be solved by data analysis. However many of the great inventions used no data. Statistical design structures innovation, then proves or disproves while providing data. To this extent it is simply the scientific method (induction-deduction) whereas much data analysis is heavy on deduction. For example, “root cause” analysis, even where successful, still leaves unanswered what the solution(s) are. Statistical design leaps to those solutions among which about half of expert ideas will be found to work.
It is generally accepted that the first RCT appeared in the literature in 1948 though there was work prior). That was an important step in mainstream use of the scientific method and remains important in medical research as well as generally. With the ubiquity of computing power more sophisticated study designs become even more attractive (while easy for users to action once designed well).
President & CEO
This Gild algorithm does not sound very impressive. It includes 300 variables, but some of those are the skills a person lists for him/herself on LinkedIn and the ranking of the person’s school in US News & WR, which is a crap ranking. Also, there is no discussion of how they determined that these 300 variables are drivers…what are the outcomes they are measuring against? An employer’s rating of an employee over time, and then do a regression to what variables a highly-rated individual touts? The approach is nice in theory, but the article doesn’t give me confidence in the accuracy of the output.