| from Beaton et al., 2015, Mol Met, dx.doi.org/10.1016/j.molmet.2015.08.003 |
OK, to be honest that one is not from last weekend - but there were many like it. If you don't believe me, take a stop watch and check how long it takes you on Google Scholar to find one of these useless bar plot - usually less than a minute inside the life sciences.
What is bad about these plots, you ask? Well, put simply, they couldn't be more misleading. There are several issues with this nonsensical way to represent different samples of measurements, like for instance the amount of 14C-Clucose per well.
1. Spread/variation of data versus precision of estimation
The general goal of your average PhD student at a scientific conference, retreat or whatever these events might be called nowadays, is to show that a group of measurements she has done on a control is less (or more) than a group of measurements she has done on a sample, which was in some way disturbed from being a control - usually coined treatment.Finally, a difference between two groups of measurements is qualified using a statistical test, for instance a t-test, if your data is really nice, or a Wilcoxon rank-sum test, if your data is kind of naughty. However, it is - at least from a marketing perspective - useful to find a way to visualize your results in some way.
Now, there are two things you might want to show when illustrating a group of observations:
- The spread/variability of the group.
- How good you were in estimating some kind of summary of a group, i.e., the mean value.
Anyhow, adding either a confidence interval or a standard error to a mean value has no descriptive power for the distribution of the data - or variability, or spread!
2. Bar plots cannot show you differences
Let's look at the following example:On the left we have a selection of six groups each with 20 observations. Clearly, these groups are not the same when we look at the scatter plot. However, when using a bar plot it seems that everything is the same in these groups. Even the standard error bars indicate no difference. Probably, we messed up the experiment or something.
If instead, we use the much more useful box plot, we immediately identify different groups. Even more forensic is the use of violin plots, which show the mirrored probability density of the data and as such allow for the identification of bi- or multimodal distribution of the data.
Try it yourself on https://stekhoven.shinyapps.io/barplotNonsense
3. There might be a bright future
I actually have to be honest to you once more, the first chart I found, wanting to show the distribution of multiple groups of measurements was this one:| from Sonay et al., 2015, Genome Res, doi/10.1101/gr.190868.115 |
Keine Kommentare:
Kommentar veröffentlichen