When in the Book of Revelations the Lamb of God opens the
first four of seven seals on the scroll in God’s right hand four figures emerge
on white, red, black and pale horses. The riders represent conquest, war,
famine, and death. They are well known through their group name; the Four
Horsemen of Apocalypse.
In our technological and data-driven world virtually
everyone needs statistics in one form or another. However, most people are not
initiated into the subtleties and dangers of using statistics on real data. The
little bit of learning in your undergrad class is in no way adequate. This is
why it is extremely important to have a statistician in your enterprise or to be
able to fall back on a statistical consultant. But I digress…
In this series of four posts I want to introduce a new
concept; the Four Horsemen of Statistics. Four concepts/situations where great
danger for the uninitiated lies ahead. To keep the audience captivated I will
not disclose the list right now, but start with the first Horseman.
The complications of multiple testing probably ruins the
credibility of more publications than any other statistical concept. This is
most beautifully illustrated by the essay of John P.A. Ioannidis from 2005 with
the intriguing title: «Why most published research findings are false».
Ioannidis, in examining the causes of errors in research findings considers
multiple testing as a major factor.
Though not fully intuitive multiple testing can be explained
in few words. Whenever you perform a statistical test, you allow for a certain
amount of error. Performing additional tests dramatically accumulates this
«allowed» error. Unfortunately, this «allowed» error is necessary for the logic
of testing; without it there would never be a decision. This error has many
names, like significance level, type I error, etc. and is often indicated by
the Greek letter α.
To give an
indication of the magnitude of this situation: assume that you choose a
significance level of α=5%, in which case a single test has the chance of a
false positive, i.e., finding something, when there is nothing, is 5%. Performing
a single additional test at the same level of significance will increase the
probability for at least one false positive to 9.75%. When we perform 13 tests,
the chance of having at least one wrong test result is an overwhelming 50%.
Fortunately, there
is a solution available in dealing with this issue.
It is quite unspectacularly called multiple testing correction. The idea is to adjust the level of significance α, so that the effect of the multiple tests on the probability of making false positive decisions is eliminated. In our above example this would mean that for the two tests performed we do not use 5% as a significance level, but instead divide it by the number of tests, i.e., two, yielding a new α of 2.5%. When we now compute the chance of observing a false positive, we get a mere 4.9% as intended.
It is quite unspectacularly called multiple testing correction. The idea is to adjust the level of significance α, so that the effect of the multiple tests on the probability of making false positive decisions is eliminated. In our above example this would mean that for the two tests performed we do not use 5% as a significance level, but instead divide it by the number of tests, i.e., two, yielding a new α of 2.5%. When we now compute the chance of observing a false positive, we get a mere 4.9% as intended.
There is, however,
an ultimate danger that when having performed a couple of thousand tests,
The uncomfortable
fact remains that having performed a couple of thousand tests, the level of
significance becomes infinitesimal, leaving researchers desperately
trying to find something to publish, with nothing significant at all.
Soon I will present
the second Horseman of Statistics, right here…
Keine Kommentare:
Kommentar veröffentlichen