Beyond Bonferroni revisited: concerns over inflated false positive research findings in the fields of conservation genetics, biology, and medicine

White, Tonya; van der Ende, Jan; Nichols, Thomas E.

doi:10.1007/s10592-019-01178-0

Beyond Bonferroni revisited: concerns over inflated false positive research findings in the fields of conservation genetics, biology, and medicine

Short Communication
Open access
Published: 11 April 2019

Volume 20, pages 927–937, (2019)
Cite this article

Download PDF

You have full access to this open access article

Conservation Genetics Aims and scope Submit manuscript

Beyond Bonferroni revisited: concerns over inflated false positive research findings in the fields of conservation genetics, biology, and medicine

Download PDF

Tonya White^1,2,
Jan van der Ende¹ &
Thomas E. Nichols^3,4,5

8604 Accesses
50 Citations
31 Altmetric
Explore all metrics

Abstract

In 2006, Narum published a paper in Conservation Genetics emphasizing that Bonferroni correction for multiple testing can be highly conservative with poor statistical power (high Type II error). He pointed out that other approaches for multiple testing correction can control the false discovery rate (FDR) with a better balance of Type I and Type II errors and suggested that the approach of Benjamini and Yekutieli (BY) 2001 provides the most biologically relevant correction for evaluating the significance of population differentiation in conservation genetics. However, there are crucial differences between the original Benjamini and Yekutieli procedure and that described by Narum. After carefully reviewing both papers, we found an error due to the incorrect implementation of the BY procedure in Narum (Conserv Genet 7:783–787, 2006) such that the approach does not adequately control FDR. Since the incorrect BY approach has been increasingly used, not only in conservation genetics, but also in medicine and biology, it is important that the error is made known to the scientific community. In addition, we provide an overview of FDR approaches for multiple testing correction and encourage authors first and foremost to provide effect sizes for their results; and second, to be transparent in their descriptions of multiple testing correction. Finally, the impact of this error on conservation genetics and other fields will be study-dependent, as it is related to the number of true to false positives for each study.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

In 2006, Narum published a paper in Conservation Genetics pointing out the conservative nature of the Bonferroni approach to correct for multiple testing when considering a set of statistical inferences and the potential for higher Type II errors (Narum 2006). He suggested that alternative approaches, such as the use of false discovery rate (FDR) to correct for multiple testing can be very effective and can provide a better balance between Type I and Type II errors (Type I error is a false positive, incorrectly rejecting a true null hypothesis; whereas Type II error is a false negative, a failure to reject a false null hypothesis). Further, Narum (2006) argued that tests to correct for multiple testing should be chosen on a case-by-case basis depending on the priority of potential Type I and Type II errors. Finally, he proposed the FDR approach of (Benjamini and Yekutieli 2001) as an alternative approach and potentially more biologically relevant for conservation genetics.

His paper, “Beyond Bonferroni: Less conservative analyses for Conservation Genetics,” has been cited over 600 times to date. The article has not only been cited in the field of conservation genetics, but also has been increasingly cited in the fields of biology and medicine. These studies apply the equation described by Narum (2006) attributed to the Benjamini and Yekutieli (2001) procedure for multiple testing correction (BY-FDR). However, a careful review of the published BY method and what Narum describes as the BY method shows crucial differences. Close examination of the two works shows that not all steps were included in calculating the BY-FDR procedure in Narum (2006), and thus this implementation of BY is incorrect and cannot be guaranteed to control the FDR. Thus, we believe that this error has created confusion about the BY procedure and the misimplementation is being propagated along an increasing number of studies.

Within this context, we have three goals of this paper: The first is to provide an overview of the Bonferroni method, the original (Benjamini and Hochberg 1995) FDR (BH-FDR), and the Benjamini and Yekutieli (2001) method (BY-FDR); the second goal is to describe the incorrect implementation of the BY-FDR approach described by Narum, which we will henceforth label as the BY-mis (short for BY-Misimplementation) approach; and the third is to assess the potential impact of this error using 30 of the most recent publications that cite the Narum (2006) paper. However, with the large number of papers that have applied this approach, the specific impact within the fields of conservation genetics, biology, and medicine will need to be evaluated by experts within each of the domains or sub-domains of research in these fields. We will demonstrate that using the BY-mis approach for multiple testing correction results in higher rates of false positives, especially when a large number of multiple tests are performed. However, as pointed out by Narum (2006), false negatives can also be a concern and specific situations may require approaches that limit Type II errors. Typically larger sample sizes are needed to confirm true negatives. In situations where sample sizes are low, as is often the case in conservation genetics (e.g., low number of sampled individuals and/or populations, low number of loci in non-model species) decisions based on false negatives could lead to less productive conservation management strategies (Narum 2006). Thus, we also provide simulations to demonstrate the rates of false negatives using different approaches for multiple testing correction in two specific scenarios.

Theory

We first review the different multiple testing approaches discussed by Narum (2006) using his notation as closely as possible. We start with a collection of k tests, each with a corresponding p value, $p_{i}$, i = 1,…,k. A multiple testing procedure identifies a subset of the k tests as significant while controlling some measure of false positive risk that takes into account the number of tests performed. The Bonferroni method controls the family-wise error (FWE), the chance of one or more false positives, by using a fixed threshold of:

$$\alpha_{\text{Bonf}} = \frac{ 1}{k}\alpha_{\text{FWE}}$$

where α_FWE is the desired FWE level: All tests with $p_{i}$ ≤ α_Bonf can be declared significant while controlling the FWE.

Benjamini and Hochberg (1995) introduced the false discovery rate (FDR) for multiple testing correction. In describing the FDR it is useful to first define the false discovery proportion (FDP): FDP is the ratio of the number of false positive tests to total number of significant tests, defined as 0 if no tests are significant. The FDR is the expected value of FDP; put another way, FDR is the expected proportion of false positives among positives. To find FDR-significant tests, denote the ordered p-values $p_{(1)}$ ≤ $p_{(2)}$ ≤ ··· ≤ $p_{(k)}$. Then for a desired α_FDR, let the index i^* be found as

$$i^{*} = \hbox{max} \left\{ {i:p_{\left( i \right)} \le \frac{i}{k}\alpha_{\text{FDR}} } \right\},$$

and the tests with $p_{i}$ ≤ $p_{{(i^{*} )}}$ can be declared significant while controlling FDR at α_FDR.

The assumptions of this BH-FDR procedure (BH-FDR) are independence among the test statistics (Benjamini and Hochberg 1995). However, Benjamini and Yekutieli (2001) found that weaker assumptions could be used, allowing a general form of positive dependence among the test statistics. They proposed another method for controlling FDR that makes no assumptions about the dependence among the tests, as long as a more stringent criterion was used (Theorem 1.3, BY), with the index $i_{BY}^{*}$ computed:

$$i_{\text{BY}}^{*} = \hbox{max} \left\{ {i:p_{\left( i \right)} \le \frac{i}{k}\frac{1}{{\mathop \sum \nolimits_{{i^{'} = 1}}^{k} \frac{1}{{ i^{'} }}}}\alpha_{\text{FDR}} } \right\}.$$

With this approach, the tests with $p_{i}$ ≤ $p_{{(i_{\text{BY}}^{*} )}}$ are marked significant and FDR is controlled at α_FDR under any form of dependency. Note that $\mathop \sum \nolimits_{{i^{'} = 1 }}^{k} \frac{1}{{i^{'} }} \approx \log \left( k \right) + \gamma$, where $\gamma \approx 0.57721$ is Euler–Mascheroni constant. This is the method we refer to by BY-FDR.

We can now make a quick comparison of three methods on the basis of the smallest p-value $p_{(1)}$: Bonferroni has the fixed threshold α_FWE/k, while BH-FDR will compare $p_{(1)}$ to α_FDR/k and BY-FDR will compare $p_{(1)}$ to approximately α_FDR/(k log (k)). Of course, BH-FDR and BY-FDR are adaptive and thus the comparison for each p-value within a test set has successively more lenient thresholds. However, as BH-FDR and BY-FDR use the same inequality except for the ≈ 1/log (k) term, BY-FDR can only be more stringent than BH-FDR.Now, in Narum (2006), the author incorrectly states that the BY-FDR threshold is fixed and equal to:

$$\begin{array}{*{20}c} {\frac{1}{{\mathop \sum \nolimits_{i = 1}^{k} \frac{1}{i}}}\alpha_{\text{FDR}} } \\ \end{array}.$$

This is a fundamental error, as a key feature of FDR methods is that they are adaptive. The error arose from neglecting that this expression was just one component of the BY procedure [to be substituted for q in BY Eq. (1) on pp. 1167 (Benjamini and Yekutieli 2001)]. This incorrect application of the BY approach (BY-mis) results in a fixed threshold for a specific k.

Since a fixed threshold specifies the average or per comparison error rate (PCE), we have taken several approaches to assess the impact of this error. Assuming the complete null, i.e. no signal for any test, k × PCE is the expected number of false positives. For the threshold at the 0.05 level, for k = 105, BY-mis has k × PCE ≈ 1, while for k = 1590, k × PCE ≈ 10. This demonstrates that the BY-mis approach can be assured to produce an increasing number of false positives for an increasing k. In contrast, for Bonferroni k × PCE is exactly α_FWE, i.e. always less than 1, and every valid FWE or FDR level α procedure is guaranteed to produce no false positives with probability 1−α (again, in this complete null setting). While the BY-mis approach does asymptote to zero as k approaches infinity, it approaches zero extremely slowly. For example, with 10 million tests performed, the BY-mis p-value threshold is 0.003, in contrast to the Bonferroni threshold of 0.000000005.

To evaluate the rate of significant p-values found with the Bonferroni, BH, BY, BY-mis, and uncorrected approaches we conducted a simulation using the Python programming language version 2.7.13 (Zope Corporation and a cast of thousands;www.python.org); the code used for all simulations is available in the supplement.. We performed simulations using k values ranging from 1 to 100 tests. For each k, we created 50,000 random realizations where null p-values were computed from test statistics generated as a standard normal distribution. Thus, for k = 1 we had a total of 50,000 independent p-values and in this case the four approaches were identical. For k > 1 we generated k independent p-values and applied each of the four methods. A nominal α_FWE = α_FDR = 0.05 was used for all methods. In this null setting, any “discovery” is a false discovery and so the measured FDR and FWE are the same. We computed the proportion of realizations where any p-values were found significant, representing a FWE error and a FDP of 1. Figure 1a shows the FDR and FWE as a function of the number of tests, showing that Bonferroni and BH-FDR both control false positives as expected (as an aside, while Bonferroni is often regarded as conservative, in this setting of small k and independent tests, it is essentially exact). The FDR/FWE of BY-FDR becomes increasing conservative while the BY-mis has inflated false positives with a near linear increase with increasing k.

In addition, we performed simulations using python to measure both false negative rates for the Bonferroni, BH, BY, and the BY-mis approaches for multiple testing correction. These simulations were creating 50,000 realizations of sets of k tests, 1 to 100, but in this simulation we included a mix of null and non-null tests. We performed two classes of simulations, one with 1 non-null test and one with 25 non-null tests. For example, with k = 50 and the situation of 1 non-null test, there were 49 random p-values computed from a standard Normal distribution test statistic, and 1 p-value that was generated with from a non-null Normal with mean set to give a test with 80% power at the uncorrected level α = 0.05. The same situation with k = 50 for the case with 25 non-null tests, where 25 p-values were generated from null test statistics and 25 non-null p-values were generated to have 80% power to reject the null. This can be seen in Fig. 1b, c where the probability of a false negative for uncorrected comparisons remains at 0.2. These simulations show that the BY-FDR, has the highest probability of a Type II error with one simulated non-null result, whereas the BH-FDR and Bonferroni are very similar.

To illustrate these simulations with an example, say that a study was conducted in which 50 tests were performed (k = 50) with half of the tests actually being significant. Thus, there are 25 tests in which there is a possibility of false positive, and 25 tests in which there is a possibility of a false negative. Since Fig. 1c models the case of 25 out of k = 25 to 100 significant tests, the probability of a false negative for k = 50 is approximately 0.4 for the BH-FDR, 0.43 for the BY-mis, 0.68 for the BY-FDR, and 0.72 for the Bonferroni approach. The probability of a false positive for 25 non-significant tests can be determined from Fig. 1a. With k = 25, the FDR and FWE rate would be at approximately 5% and below for the Bonferroni, BH-FDR, and BY-FDR, but the false discovery rate would be approximately 30% for the BY-mis (Fig. 1a). Figure 1b, c shows that for all methods used to correct for multiple testing, the risk of Type II error increases with the number of tests k. However, there is a dramatic difference between the performance of BY-FDR and the BY-mis. Note the advantage of the BH-FDR approach in minimizing both false positive and false negative errors, while still controlling FDR.

We also consider the specific set of 15 p-values used in Narum (2006) to tabulate the p-value thresholds for the Bonferroni, BH, BY, and the BY-mis approaches. Table 1 shows the thresholds used for each of the 15-exemplar p-values, with significant tests marked in bold. It can be seen that the BY-FDR and the BY-mis are not the same. Narum (2006) reported four significant tests as compared to the correct BY-FDR’s having two significant tests.

Table 1 A set of p-values from 15 significance testing taken from the Narum 2006 paper (column labeled ‘p-value examples’) and comparison with four approaches to multiple testing (critical p-values for significance)

Full size table

The example in Table 1 also demonstrates one of the challenges in finding a balance between Type I and Type II errors and the choice for multiple testing correction. The probability that 12 of 15 independent tests would show an uncorrected p-value less than 0.05 is very low. Thus, Bonferroni, having only two significant tests, is likely overly conservative and would result in a higher type II error rate. The BH-FDR approach, however, shows that 10 of the 15 tests are identified as significant, which in this situation may be more plausible, although it would be helpful to know the covariance structure between the different variables, as statistical dependence between variables is not uncommon. Figure 1c demonstrates type II error rates for the simulation of 25 true positives (80% chance of being less than p < 0.05) and the notable differences between the Bonferroni and BH-FDR for k = 1–100 independent tests.

Finally, we used Scopus to identify the 30 most recent publications (search date: February, 9, 2019) that cite Narum (2006) to sample the impact of this error on the literature (Table 2). Of these 30 articles, nine articles (30%) were specifically related to conservation genetics; ten articles were in the fields of biology, mostly involving genetic analyses (33.3%); nine articles (30%) were in the field of medicine, most commonly in psychiatry; and the two additional articles were in the fields of statistics and anthropology. In 20 of these articles (67%) we could confidently determine that BY-mis was used (2006), while it was unclear in six articles (20%), and one article cited Narum (2006), but did not use the BY-mis approach. None of the papers described using a standard statistical software package to calculate the BY-FDR. Eight of the twenty articles that applied the BY-mis approach also cited the Benjamini and Yekutieli (2001) article. Of the 28 relevant articles [excluding Hauser et al. (2018) and Stepien et al. (2018) as these papers cited but did not apply the BY-mis approach], only eight articles (29%) provide enough information to calculate the alternate multiple testing corrections for the data provided for the specific study. Four of these eight articles show an reduction in the number of significant tests when BY-mis is replaced with BY-FDR, whereas the other four have tests that either are negative (one article) or are so strongly significant that all the tests also pass Bonferroni correction (three articles). Also noteworthy, eight of the twenty articles that applied the BY-mis approach (40%) applied independent levels of multiple testing, rather than applying multiple testing to all tests in the article.

Table 2 List of the 30 most recent articles identified via Scopus (9 February 2019) who cited the Narum (2006) article

Full size table

Discussion

In 1995, Benjamini and Hochberg proposed the FDR metric and a method to control FDR. Benjamini and Yekutieli in 2001 proposed a method to control FDR with weaker assumptions, but more stringent correction than the BH approach. Narum’s (2006) paper provided an overview and examples of the BY-FDR procedure, however, did not include all steps of the BY algorithm (shown above). A careful reading of Benjamini and Yekutieli (2001) reveals that the equation for multiple testing from Narum (2006) (from Theorem 1.3 on pp. 1169 of BY) should be entered as the α in the B-H equation (Eq. (1) on pp. 1167 in BY), producing an adaptive threshold. Further, based on a series of p-values taken from the Narum (2006) paper (Table 1), different results are obtained comparing the Narum (2006) description of the BY approach and the BY-FDR described by Benjamini and Yekutieli (2001).

Direct calculation shows that BY-mis has expected number of false positives that increases nearly linearly with number of tests k, and that this increasing false positive rate differs dramatically from the BY-FDR approach (Fig. 1a). We believe that a large percentage of the over 600 publications are liable to have this inflated rate of false positives in their results, notably since results arising from Type I errors are much easier to publish than those from Type II errors. We found that at least 40% of a sample of the 30 most recent papers that cite Narum (2006) article also cite Benjamini and Yekutieli (2001) and that they have applied the BY approach, but actually apply the BY-mis-FDR approach (Table 2).

We do agree with Narum that the Bonferroni approach can be highly conservative in some situations of multiple testing correction, especially with dependent data. However, there has also been a growing concern that many studies fail to replicate (Ioannidis 2005; Open Science Collaboration 2015; Nichols et al. 2017; Gelman 2018). In the past, analyses were performed without adequately controlling for the numbers of tests performed (Carp 2012) which resulted in numerous Type I errors but also likely fewer Type II errors. We also agree with Narum that the individual studies should determine the balance between Type I and Type II errors, as there are some situations in many fields where researchers want to limit Type II errors. Examples include situations in conservation genetics where a failure to show a positive effect could direct conservation management strategies that are counter to the survival of a species (Narum 2006). Species in which there is concern over extinction often have smaller populations and lower rates of reproduction (Lynch and Lande 1998) and decisions based on false negatives in some populations could lead to less productive conservation management strategies. Examples in medicine with concerns over false negatives include the presurgical use of functional magnetic resonance imaging to identify eloquent cortex (Durnez et al. 2013). In such cases, a false negative could result in the removal of eloquent cortical regions and thus stringent correction for multiple testing would not be indicated. Thus, in conservation genetics, biology, medicine, and other fields, individual studies may shift the choice of limiting either Type I or Type II errors and providing the rationale for the choice of (or lack of) multiple testing correction should always be provided.

Our attempt to extract vital information to assess the multiple testing correction within each of the 30 most recent articles that cite the Narum (2006) paper highlights the need in the literature for greater transparency regarding the use of multiple testing correction. Over two-thirds of these papers did not provide enough information to replicate the authors approach for multiple testing correction nor to compare the different methods. Further, a minority of these papers presented effect sizes or confidence intervals for their findings, and omission of these data been shown to be a problem in many fields of science (Chavalarias et al. 2016). None of the authors described using statistical software packages, (i.e. R or SAS) to calculate the BY-FDR, which, if performed correctly, would have resulted in an accurate calculation of multiple testing correction. It is likely that the BY-mis approach, which provides a single critical p-value and is trivial to calculate, was easier than the use of statistical software. There is currently discussions regarding moving away from the use of the p ≤ 0.05 approach (American Statistical Association 2016), we would recommend that if p-values are presented, they should always be the full, unadjusted p-value and should be accompanied by effect sizes or confidence intervals. Effect sizes or confidence intervals provide greater details regarding hypothesis testing compared to p-values (Smith 2018) and will enhance replication, as studies evaluating small effects in the wake of considerable noise are likely false positives (Gelman 2018), considering a system rewarded by positive findings.

In summary, so long as p-values remain one of the top methods of choice to report statistical results, we agree with the Narum (2006) that researchers should carefully consider the different tests for multiple testing correction and should make a priori decisions based on Type I and Type II errors within their specific study. Further, we provide an overview of FWE and FDR correction approaches and several simulations to show both type I and type II errors. We point out an error in Narum’s (2006) paper describing the BY approach and show that the BY-mis does not adequately control for FDR when used for multiple testing correction. Finally, we recommend that authors be transparent in reporting the number of tests, the number of clusters of tests, and method used when performing multiple testing correction. Authors should also present effect sizes or confidence intervals is also key.

References

American Statistical Association (2016) American Statistical Association releases statement on statistical significance and p -values: provides principles to improve the conduct and interpretation of quantitative science. ASA News. http://amstat.tandfonline.com/doi/abs/10.1080/00031305.2016.1154108#.Vt2XIOaE2MN. Accessed 7 Mar 2019
Austin JD, Greene DU, Honeycutt RL, McCleery RA (2018) Genetic evidence indicates ecological divergence rather than geographic barriers structure florida fox squirrels. J Mammal. https://doi.org/10.1093/jmammal/gyy128
Article Google Scholar
Barendse MEA, Simmons JG, Byrne ML et al (2018) Associations between adrenarcheal hormones, amygdala functional connectivity and anxiety symptoms in children. Psychoneuroendocrinology 97:156–163. https://doi.org/10.1016/j.psyneuen.2018.07.020
Article CAS PubMed Google Scholar
Bartholomeusz CF, Ganella EP, Whittle S et al (2018) An fMRI study of theory of mind in individuals with first episode psychosis. Psychiatry Res Neuroimaging 281:1–11. https://doi.org/10.1016/j.pscychresns.2018.08.011
Article PubMed Google Scholar
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57:289–300
Google Scholar
Benjamini Y, Yekutieli D (2001) The control of the false discovery rate in multiple testing under dependency. Ann Stat 29:1165–1188. https://doi.org/10.2307/2674075
Article Google Scholar
Buchanan B, Hamilton MJ, Hartley JC, Kuhn SL (2019) Investigating the scale of prehistoric social networks using culture, language, and point types in western North America. Archaeol Anthropol Sci 11:199–207. https://doi.org/10.1007/s12520-017-0537-y
Article Google Scholar
Carp J (2012) The secret lives of experiments: methods reporting in the fMRI literature. Neuroimage 63:289–300. https://doi.org/10.1016/j.neuroimage.2012.07.004
Article PubMed Google Scholar
Casey CS, Orozco-terWengel P, Yaya K et al (2018) Comparing genetic diversity and demographic history in co-distributed wild South American camelids. Heredity (Edinb) 121:387–400. https://doi.org/10.1038/s41437-018-0120-z
Article CAS Google Scholar
Chavalarias D, Wallach JD, Li AHT, Ioannidis JPA (2016) Evolution of reporting P values in the biomedical literature, 1990-2015. JAMA 315:1141. https://doi.org/10.1001/jama.2016.1952
Article CAS PubMed Google Scholar
Davis AR, Becerro M, Turon X (2018) Living on the edge: early life history phases as determinants of distribution in Pyura praeputialis (Heller, 1878), a rocky shore ecosystem engineer. Mar Environ Res 142:40–47. https://doi.org/10.1016/j.marenvres.2018.09.019
Article CAS PubMed Google Scholar
Deane C, Vijayakumar N, Allen NB et al (2018) Parentingxbrain development interactions as predictors of adolescent depressive symptoms and well-being: differential susceptibility or diathesis-stress? Dev Psychopathol. https://doi.org/10.1017/s0954579418001475
Article Google Scholar
DiBattista JD, Wakefield CB, Moore GI et al (2018) Genomic and life-history discontinuity reveals a precinctive lineage for a deep-water grouper with gene flow from tropical to temperate waters on the west coast of Australia. Ecol Genet Genomics 9:23–33. https://doi.org/10.1016/j.egg.2018.09.001
Article Google Scholar
Durnez J, Moerkerke B, Bartsch A, Nichols TE (2013) Alternative-based thresholding with application to presurgical fMRI. Cogn Affect Behav Neurosci 13:703–713. https://doi.org/10.3758/s13415-013-0185-3
Article PubMed Google Scholar
Gelman A (2018) The failure of null hypothesis significance testing when studying incremental changes, and what to do about it. Personal Soc Psychol Bull 44:16–23. https://doi.org/10.1177/0146167217729162
Article Google Scholar
Gibson-Smith D, Bot M, Brouwer IA et al (2018) Diet quality in persons with and without depressive and anxiety disorders. J Psychiatr Res 106:1–7. https://doi.org/10.1016/j.jpsychires.2018.09.006
Article PubMed Google Scholar
Hasselman DJ, Bentzen P, Narum SR, Quinn TP (2018) Formation of population genetic structure following the introduction and establishment of non-native American shad (Alosa sapidissima) along the Pacific Coast of North America. Biol Invasions 20:3123–3143. https://doi.org/10.1007/s10530-018-1763-7
Article Google Scholar
Hauser S, Wakeland K, Leberg P (2018) Inconsistent use of multiple comparison corrections in studies of population genetic structure: Are some type I errors more tolerable than others? Mol Ecol Resour. 19:144–148
Article CAS PubMed Google Scholar
Hoffmann C, Van Rheenen TE, Mancuso SG et al (2018) Exploring the moderating effects of dopaminergic polymorphisms and childhood adversity on brain morphology in schizophrenia-spectrum disorders. Psychiatr Res Neuroimaging 281:61–68. https://doi.org/10.1016/j.pscychresns.2018.09.002
Article Google Scholar
Huang W, Li M, Yu K et al (2018) Genetic diversity and large-scale connectivity of the scleractinian coral Porites lutea in the South China Sea. Coral Reefs 37:1259–1271. https://doi.org/10.1007/s00338-018-1724-8
Article Google Scholar
Ioannidis JPA (2005) Why most published research findings are false. PLoS Med 2:0696–0701. https://doi.org/10.1371/journal.pmed.0020124
Article Google Scholar
Lynch M, Lande R (1998) The critical effective size for a genetically secure population. Anim Conserv 1:70–72
Article Google Scholar
Mahdavi S, Jenkins DJA, Borchers CH, El-Sohemy A (2018) Genetic variation in 9p21 and the plasma proteome. J Proteome Res 17:2649–2656. https://doi.org/10.1021/acs.jproteome.8b00117
Article CAS PubMed Google Scholar
McDowell JR, Mamoozadeh NR, Brightman HL, Graves JE (2018) Use of rapidly evolving molecular markers to distinguish species and clarify range uncertainties in the spearfishes (Istiophoridae, Tetrapturus). Bull Mar Sci 94:1355–1378. https://doi.org/10.5343/bms.2017.1130
Article Google Scholar
Mutton TY, Fuller SJ, Tucker D, Baker AM (2018) Discovered and disappearing? Conservation genetics of a recently named Australian carnivorous marsupial. Ecol Evol 8:9413–9425. https://doi.org/10.1002/ece3.4376
Article PubMed PubMed Central Google Scholar
Narum SR (2006) Beyond Bonferroni: less conservative analyses for conservation genetics. Conserv Genet 7:783–787
Article CAS Google Scholar
Nichols T, Das S, Evans AC et al (2017) Best practices in data analysis and sharing in neuroimaging using MRI best practices in data analysis and sharing in neuroimaging using MRI. Nat Neurosci 20:299–303. https://doi.org/10.1038/nn.4500
Article CAS PubMed PubMed Central Google Scholar
Open Science Collaboration (2015) Estimating the reproducibility of psychological science. Science 349:aac4716–aac4716. https://doi.org/10.1126/science.aac4716
Paans NPG, Gibson-Smith D, Bot M et al (2019) Depression and eating styles are independently associated with dietary intake. Appetite 134:103–110. https://doi.org/10.1016/j.appet.2018.12.030
Article PubMed Google Scholar
Pérez-Portela R, Bumford A, Coffman B et al (2018) Genetic homogeneity of the invasive lionfish across the Northwestern Atlantic and the Gulf of Mexico based on single nucleotide polymorphisms. Sci Rep 8:5062. https://doi.org/10.1038/s41598-018-23339-w
Article CAS PubMed PubMed Central Google Scholar
Pérez-Portela R, Wangensteen OS, Garcia-Cisneros A et al (2019) Spatio-temporal patterns of genetic variation in Arbacia lixula, a thermophilous sea urchin in expansion in the mediterranean. Heredity (Edinb) 122:244–259. https://doi.org/10.1038/s41437-018-0098-6
Article Google Scholar
Petereit C, Bekkevold D, Nickel S et al (2018) Population genetic structure after 125 years of stocking in sea trout (Salmo trutta L.). Conserv Genet 19:1123–1136. https://doi.org/10.1007/s10592-018-1083-6
Article CAS Google Scholar
Pregler KC, Kanno Y, Rankin D et al (2018) Characterizing genetic integrity of rear-edge trout populations in the southern Appalachians. Conserv Genet 19:1487–1503. https://doi.org/10.1007/s10592-018-1116-1
Article Google Scholar
Riesgo A, Taboada S, Pérez-Portela R et al (2019) Genetic diversity, connectivity and gene flow along the distribution of the emblematic Atlanto-Mediterranean sponge Petrosia ficiformis (Haplosclerida, Demospongiae). BMC Evol Biol 19:24. https://doi.org/10.1186/s12862-018-1343-6
Article PubMed PubMed Central Google Scholar
Sandoval Laurrabaquio-A N, Islas-Villanueva V, Adams DH et al (2019) Genetic evidence for regional philopatry of the Bull shark (Carcharhinus leucas), to nursery areas in estuaries of the Gulf of Mexico and western North Atlantic ocean. Fish Res 209:67–74. https://doi.org/10.1016/j.fishres.2018.09.013
Article Google Scholar
Smith RJ (2018) The continuing misuse of null hypothesis significance testing in biological anthropology. Am J Phys Anthropol 166:236–245. https://doi.org/10.1002/ajpa.23399
Article PubMed Google Scholar
Stepien CA, Snyder MR, Knight CT (2018) Genetic divergence of nearby walleye spawning groups in central lake erie: implications for management. North Am J Fish Manag 38:783–793. https://doi.org/10.1002/nafm.10176
Article Google Scholar
Sucec J, Herzog M, Van Diest I et al (2018) The impairing effect of dyspnea on response inhibition. Int J Psychophysiol 133:41–49. https://doi.org/10.1016/j.ijpsycho.2018.08.012
Article PubMed Google Scholar
Sucec J, Herzog M, Van Diest I et al (2019) The impact of dyspnea and threat of dyspnea on error processing. Psychophysiology 56:e13278. https://doi.org/10.1111/psyp.13278
Article PubMed Google Scholar
Van Wyk AM, Kotzé A, Grobler JP et al (2018) Isolation and characterization of species-specific microsatellite markers for blue and black wildebeest (Connochaetes taurinus and C. gnou). J Genet 97:101–109. https://doi.org/10.1007/s12041-018-1000-2
Article CAS Google Scholar
Wieman AC, Berendzen PB (2018) Spatial genetic variation and habitat association of Rhinichthys cataractae, the longnose dace, in the driftless area of the upper mississippi River basin. Conserv Genet 19:1367–1378. https://doi.org/10.1007/s10592-018-1106-3
Article Google Scholar
Xue D-X, Graves J, Carranza A et al (2018) Successful worldwide invasion of the veined rapa whelk, Rapana venosa, despite a dramatic genetic bottleneck. Biol Invasions 20:3297–3314. https://doi.org/10.1007/s10530-018-1774-4
Article Google Scholar
Xue D-X, Yang Q-L, Li Y-L et al (2019) Comprehensive assessment of population genetic structure of the overexploited Japanese grenadier anchovy (Coilia nasus): implications for fisheries management and conservation. Fish Res 213:113–120. https://doi.org/10.1016/j.fishres.2019.01.012
Article Google Scholar

Download references

Funding

Funding was provided by ZonMw (Grant No: 91211021).

Author information

Authors and Affiliations

Department of Child and Adolescent Psychiatry, Erasmus University Medical Center, Erasmus MC-Sophia/Kamer KP-2869, Postbus 2060, 3000 CB, Rotterdam, The Netherlands
Tonya White & Jan van der Ende
Department of Radiology, Erasmus University Medical Center, Rotterdam, The Netherlands
Tonya White
Oxford Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, Nuffield Department of Population Health, University of Oxford, Oxford, OX3 7LF, UK
Thomas E. Nichols
Wellcome Centre for Integrative Neuroimaging, FMRIB, Nuffield Department of Clinical Neurosciences, University of Oxford, Oxford, OX3 9DU, UK
Thomas E. Nichols
Department of Statistics, University of Warwick, Coventry, CV4 7AL, UK
Thomas E. Nichols

Authors

Tonya White
View author publications
You can also search for this author in PubMed Google Scholar
Jan van der Ende
View author publications
You can also search for this author in PubMed Google Scholar
Thomas E. Nichols
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tonya White.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (DOCX 65 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

White, T., van der Ende, J. & Nichols, T.E. Beyond Bonferroni revisited: concerns over inflated false positive research findings in the fields of conservation genetics, biology, and medicine. Conserv Genet 20, 927–937 (2019). https://doi.org/10.1007/s10592-019-01178-0

Download citation

Received: 05 September 2018
Accepted: 25 March 2019
Published: 11 April 2019
Issue Date: 15 August 2019
DOI: https://doi.org/10.1007/s10592-019-01178-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Beyond Bonferroni revisited: concerns over inflated false positive research findings in the fields of conservation genetics, biology, and medicine

Abstract

Introduction

Theory

Discussion

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Electronic supplementary material

Supplementary material 1 (DOCX 65 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation