Grappling with Significance: P-value as a Basis for Belief and Action
Last week, The American Statistical Association published a policy statement in The American Statistician about the nature and shortcomings of the procedure known as “statistical significance.”
Here’s the press release.
The negative consequences of inappropriate use of statistical significance pushed the professional society to action.
There are many references and discussions available on-line. Here’s a particularly accessible summary in Nature.
The policy statement’s supplementary material consists of 21 statements from the ASA’s expert panel.
I started reading the supplementary material with the contributions of John Ionnides and Don Berry who state their personal views with typical clarity and insight.
Dr. Berry in his note “P-values are not what they are cracked up to be” crisply describes the contingent nature of all calculations of statistical significance: these calculations are conditional on the precise formulation of the data set and a statistical model.
However, how the data were produced (and edited in “cleaning”) and how the model was proposed—the practical stuff of scientific investigation—not only affect the calculated p-value in often difficult to assess ways but are almost always much more critical to scientific inference.
“As a practical matter, when I worry that I don’t know enough about the extra-numerical aspects of the 'data' or about the possibility of incorporating this information into a quantitative measure of evidence then I resort to a ‘black-box warning’: Our study is exploratory and we make no claims for generalizability. Statistical calculations such as p-values and confidence intervals are descriptive only and have no inferential content.’”
As W.E. Deming pointed out in “Probability as a basis for action” 41 years ago also in The American Statistician:
“Presentation of results, to be optimally useful, and to be good science, must conform to Shewhart's rule: viz., preserve, for the uses intended, all the evidence in the original data.”
“The data of an experiment consist of much more than a mean and its standard deviation. In fact, not even the original observations constitute all the data. The user of the results, in order to understand them, may require also a description or reference to the method of investigation, the date, place, the duration of the test, a record of the faults discovered by the statistical controls, the amount of nonresponse, and in some cases, even the name of the observer.” (p. 147, article available here.)