Skip to main content
  • Registered Report
  • Open access
  • Published:

Publication bias impacts on effect size, statistical power, and magnitude (Type M) and sign (Type S) errors in ecology and evolutionary biology

Abstract

Collaborative efforts to directly replicate empirical studies in the medical and social sciences have revealed alarmingly low rates of replicability, a phenomenon dubbed the ‘replication crisis’. Poor replicability has spurred cultural changes targeted at improving reliability in these disciplines. Given the absence of equivalent replication projects in ecology and evolutionary biology, two inter-related indicators offer the opportunity to retrospectively assess replicability: publication bias and statistical power. This registered report assesses the prevalence and severity of small-study (i.e., smaller studies reporting larger effect sizes) and decline effects (i.e., effect sizes decreasing over time) across ecology and evolutionary biology using 87 meta-analyses comprising 4,250 primary studies and 17,638 effect sizes. Further, we estimate how publication bias might distort the estimation of effect sizes, statistical power, and errors in magnitude (Type M or exaggeration ratio) and sign (Type S). We show strong evidence for the pervasiveness of both small-study and decline effects in ecology and evolution. There was widespread prevalence of publication bias that resulted in meta-analytic means being over-estimated by (at least) 0.12 standard deviations. The prevalence of publication bias distorted confidence in meta-analytic results, with 66% of initially statistically significant meta-analytic means becoming non-significant after correcting for publication bias. Ecological and evolutionary studies consistently had low statistical power (15%) with a 4-fold exaggeration of effects on average (Type M error rates = 4.4). Notably, publication bias reduced power from 23% to 15% and increased type M error rates from 2.7 to 4.4 because it creates a non-random sample of effect size evidence. The sign errors of effect sizes (Type S error) increased from 5% to 8% because of publication bias. Our research provides clear evidence that many published ecological and evolutionary findings are inflated. Our results highlight the importance of designing high-power empirical studies (e.g., via collaborative team science), promoting and encouraging replication studies, testing and correcting for publication bias in meta-analyses, and adopting open and transparent research practices, such as (pre)registration, data- and code-sharing, and transparent reporting.

Introduction

Replicable prior findings are the foundation of cumulative scientific research. However, large-scale collaborative attempts to repeat studies have demonstrated that prior findings often fail to replicate in the medical and social sciences [1,2,3]. This raises concerns about the reliability of previously published studies (often referred to as the ‘replication crisis’ [4]). A similar issue of low replicability is likely to occur in ecology and evolutionary biology [5] (see also [6]). Yet, systematic assessments of replicability in this field are exceedingly rare [6, 7] perhaps because of the absence of strong incentives towards conducting replication studies [7, 8], and for logistical reasons (e.g. difficulties of conducting studies of rare species or remote ecosystems [9, 10]).

There are, however, two inter-related indicators that can be used to retrospectively gauge replicability in ecology and evolutionary biology: publication bias and statistical power. Publication bias and low statistical power increase the occurrence of unreliable effect size estimates that cannot be replicated. Publication bias commonly occurs when studies with statistically significant results are published more frequently than those with statistically non-significant findings (also referred to as ‘file-drawer problem’ [11]) or are published more quickly [12, 13]. More rapid publication of statistically significant results can also lead to a decline in reported effects over time (‘decline effect’ [12, 13]). When statistically significant effects are preferentially published, smaller studies will tend to report larger effect sizes (known as ‘small-study effects’ [14]). Statistical power, by definition, is the likelihood of identifying a ‘true effect’ when it is present. It is often used as a proxy of ‘replicability probability’ (but see [15]), as studies with high statistical power are more likely to yield findings that can be replicated by other researchers compared to studies with low statistical power [16,17,18].

Several meta-research studies in ecology and evolutionary biology have investigated the prevalence of publication biases and low statistical power. Jennions and Moller [12] reported a statistically significant decline effect in a survey of 44 ecology and evolutionary biology meta-analyses that had been published in 2002. Using 52 meta-analyses published before 2000, Barto and Rillig [19] reached a similar conclusion. In a cumulative meta-analysis, Crystal-Ornelas and Lockwood [20] also identified a statistically significant decline in the magnitude of the effect of invasive species on species richness, using 240 papers published between 1999 and 2016. In their work, this decline effect was present consistently regardless of taxonomic groups, invasion time, or journal quality. Twenty years ago, statistical power in 10 ecology, evolution, and behaviour journals was estimated at 13–16% for small effects and 40–47% for medium effects (where small effects are r = 0.1 and medium effects are r = 0.3; sensu Cohen [21]) [22]. Even lower statistical power was estimated for the journal Animal Behaviour in 1996, 2003, and 2009 (7–8% and 23–26% to detect Cohen’s small and medium effect sizes, respectively [23]).

Despite earlier efforts in ecology and evolutionary biology [24], the field still lacks a systematic overview of the extent to which different forms of publication bias would distort the estimation of true effects. Further, no studies have evaluated how such distorted effect sizes prevent us from correctly estimating statistical power. The statistical power of a given study depends on sample size and the estimate of corresponding ‘true’ effect size (e.g. a larger effect size leads to a higher power; see Fig. 1A). Therefore, to avoid overestimating the statistical power of a given study, an unbiased proxy of the ‘true’ effect size should be used. Contrastingly, previous attempts in ecology and evolution often used Cohen’s benchmarks to quantify statistical power for a given study [22, 23]. Yet, these benchmarks were derived from Cohen’s qualitative intuitions for studies in the social sciences rather than a quantitative synthesis of the representative literature [25]. Cohen’s benchmarks are arbitrary, and not necessarily applicable to ecological and evolutionary studies. As with exemplar studies in other fields [16], ‘true’ effects can be estimated via meta-analytic approaches and preferably corrected for potential publication bias [26, 27]. Using publication bias-corrected effect size estimates as ‘true’ effects would, more accurately, quantify statistical power as well as the two related, yet underappreciated, statistical errors: Type M and S errors (Fig. 1B and C; [28]). Type M error, also known as exaggeration ratio (magnitude error), represents the ratio between an estimated effect and a ‘true’ effect, whereas Type S error represents the probability of attaining statistical significance in the direction opposite to the true effect [29]. No study has yet quantified these two quantities systematically across the field of ecology and evolutionary biology.

Fig. 1.
figure 1

Statistical power, Type S and M errors as a function of the ‘true’ effect size (the alpha level is fixed at 0.05). The generic form of effect sizes (e.g. SMD, lnRR, Zr) are simulated from 0 to 1 with a fixed standard error (0.25). These panels (AC) show that studies investigating larger true effects have higher power (A) and lower rates of Type M (B) and Type S (C) errors. If a study suffers from publication bias, the effect size is likely to be exaggerated, and consequently, the corresponding statistical power, Type M and S errors would be underestimated

Here, we capitalise on the rapid growth of ecological and evolutionary meta-analyses to systematically assess the extent to which patterns consistent with publication biases are common across the fields of ecology and evolutionary biology, and, if attributed to actual publication bias, their impacts on estimates of effect size, statistical power, and Type M and S errors [30]. First, we test for the presence and severity of two indices of publication bias (i.e. small-study effects and decline effects) at two levels: (i) the within-meta-analysis level using a newly proposed multilevel meta-regression method and (ii) the between-meta-analysis level using second-order meta-analyses (i.e. meta-meta-analyses). Second, we correct for these publication biases and quantify the degree of decline in bias-corrected effect-size magnitude. Finally, we use uncorrected and bias-corrected mean effect sizes as proxies of the ‘true’ effect to assess statistical power, Type M and S errors in ecology and evolutionary biology both at the primary study (effect-size) and the synthesis (meta-analysis) level.

Methods

Before submission of stage 1 of this registered report, we finished collection (‘Data collection’ section), retrieval, and cleaning (‘Data retrieval and cleaning’ section) of data from a pre-existing dataset [31]. After this stage 1 registered report was accepted, we commenced the statistical analysis process (‘Statistical analysis’ section).

Database

Data retrieval and cleaning

By checking the main text, supplementary materials, and/or online data repositories (e.g. Dryad, GitHub, Open Science Framework) of the 102 meta-analytic papers, and emailing corresponding authors, if necessary, we were able to include 80 papers that reported essential information for our statistical analyses. These 80 papers contained 108 independent meta-analyses. Among these 107, 36 meta-analyses used standardised mean difference (SMD) which includes some well-known estimators such as Hedges’ g or Cohen’s d [32]; 20 of these meta-analyses provided raw data (i.e. descriptive statistics: mean, standard error or deviation, and sample size) whereas the remaining 16 cases provided only effect sizes and variance. Twenty meta-analyses used the log response ratio (lnRR [33]; also known as the ratio of means, ROM): 10 cases with raw data, and 10 cases without raw data. Thirty-one cases used the correlation coefficient or its Fisher’s transformation, Zr (given that the variance of Zr and sample size is convertible, all cases of Zr were with raw data). All correlation coefficients were converted to Zr to better approximate normal errors [34]. The remaining 20 meta-analyses used other effect size metrics, such as heritability (h2 [35]), regression slope (e.g. reaction norm or selection gradient [36, 37]), 2-by-2 binary data (e.g. log odds and risk ratios [38]), raw mean difference [39], and non-standard metrics (proportion [40]).

We decided to only include meta-analytic cases using SMD, lnRR, and Zr in our datasets because, in addition to being the most commonly used effect sizes in ecology and evolutionary biology [41, 42], they share statistical properties necessary to fit a formal meta-analytic model: (i) they are ‘unit-less’, which allows comparisons of studies originally using different units, (ii) they are (asymptotically) normally distributed, and (iii) they have readily computable (unbiased) sampling variance [34]. To keep our datasets independent, we only used the effect sizes in their original forms, although data augmentations (e.g. conversions between Zr to SMD) could maximise the statistical power of the following statistical analyses by maximising the number of sample sizes per dataset (in this case, the number of effect sizes). Therefore, our final three datasets consisted of (1) 36 meta-analytic cases of SMD, (2) 20 cases of lnRR, and (3) 31 cases of Zr (Fig. 2). For each primary study included in the final dataset, we retrieved four key variables: (i) effect sizes reported (i.e. SMD, lnRR, or Zr), (ii) standard errors (or sampling variances) of each effect size (to test for small-study effects), (iii) sample sizes per condition where possible (i.e. experimental group versus control group for SMD and lnRR); sample sizes are used to create a predictor to test and correct for small-study effects (i.e. ‘effective sample size’; see the ‘Second-order meta-analysis’ section for details), and (iv) publication year (to test for a decline effect).

Fig. 2.
figure 2

The workflow showing the data compilation, statistical modelling processes, and our aims. Using the datasets containing 87 independent meta-analyses (36 SMD, 20 lnRR and 31 Zr cases, respectively), we used a two-step modelling procedure to assess (i) the estimated prevalence and severity of publication bias in ecology and evolutionary biology and (ii) how such publication bias affects the estimates of effect size, statistical power, Type M and S errors. In the first step (i.e. within-meta-analysis level), multilevel meta-analytic approaches will be used to estimate the overall mean (used for power and errors calculations), and test and adjust for publication bias for each meta-analytic case. In the second step (i.e. between-meta-analysis level), the estimates from the first step were statistically aggregated using either mixed-effects models or random-effects meta-analytic models (i.e. secondary meta-analysis). β0 is the meta-analytic overall mean (i.e. β0[overall] in Equation 1), which signifies the uncorrected effect size estimate if publication bias exists but is not corrected. β1 and β2 are the indicators of small-study effects and decline effects (equivalent to β1[small − study] and β1[time − lag] in Equation 2). η0[u] is the standardised β0. (i.e. η0[overall]). η0[c] is the standardised bias-corrected meta-analytic overall mean (i.e. η0[bias − corrected] in Equation 6). η1[small − effect], η2[time − lag] are standardised model coefficients corresponding to β0, β1 and β2 (i.e. η1[small − effect] and η2[time − lag] in Equation 6)

Statistical analysis

Data collection

We used a recent meta-analytic database that had been collected to evaluate the reporting quality of systematic reviews and meta-analyses published in ecology and evolutionary biology [31]. The inclusion and screening criteria identified meta-analyses that were broadly representative of meta-analyses published in ecology and evolutionary biology journals from 2010-2019. In brief, the database creators compiled a list of ‘Ecology’ and/or ‘Evolutionary Biology’ journals via the categories of the ISI InCites Journal Citation Reports®. Within the included journals, they searched Scopus using the string ‘meta-analy*’ OR ‘metaanaly*’ OR ‘meta-regression’. They restricted the search to articles published from January 2010 to 25 March 2019. Search results were then filtered to the 31 journals most frequently publishing meta-analyses. By taking a random sample of studies within each journal, a total of 297 papers was returned. After screening (search records, and inclusion and screening criteria are available at [31]), the database included a representative sample of 102 ecological or evolutionary meta-analyses.

Multilevel meta-analytic modelling

We used multilevel meta-analytic approaches to (i) estimate the meta-analytic overall mean (i.e. uncorrected effect size estimates), (ii) detect potential publication bias (i.e. test small-study and decline effects), and (iii) correct for publication bias for each meta-analysis included in our datasets (Fig. 2).

Estimating uncorrected effect sizes

To obtain uncorrected effect sizes for each meta-analysis (i.e. within-meta-analysis level), we fitted intercept-only multilevel meta-analytic models with SMD, lnRR, and Zr as our response variables, as in Equation 1 [42]. Equation 1 can account for dependent data by modelling both between-study variance (heterogeneity) and within-study variance (residual). It was written as:

$${ES}_{ji}={\beta}_{\textrm{0}\left[\textrm{overall}\right]}+{s}_j+{o}_{ji}+{m}_{ji},$$
(1)

where ESji is the extracted effect size, either SMD, lnRR, or Zr; β0[overall] is the intercept, representing the estimate of overall effect (i.e. meta-analytic estimate of effect size); sj = the study-specific (between-study) effect of study j; oji = the observation-level (within-study) effect for the effect size i (used to account for residual heterogeneity); mji = the measurement (sampling) error effect for the effect size i. Between- and within-study effects are normally distributed with mean 0 and variance, σ2 (i.e. \(\mathcal{N}\left(0,{\sigma}^2\right)\)). In Equation 1, effect size (ESji) and sampling variance (mji) can be calculated from the meta-analytic data. Using the restricted maximum likelihood (REML) method, we can obtain (approximately) unbiased estimates of variance parameters σ2 for between- and within-study effects (sj and oji) [43]. With the REML estimate of σ2, we can obtain the maximum likelihood estimate of the model coefficients (i.e. β0[overall]). These estimated model coefficients represent the (uncorrected) overall meta-analytic means for SMD, lnRR, or Zr. The model fitting was implemented via the (rma.mv) function from the metafor R package (version 3.4-0) [44].

Detecting publication bias

To test for patterns consistent with publication bias within each meta-analysis, we used a multi-moderator multilevel meta-regression model (an extended Egger’s regression; cf. [45]). This approach deals with two common issues in ecological and evolutionary datasets: (i) using a multilevel model to control for data dependency [46], and (ii) using a regression method with multiple moderators to account for between-study heterogeneity [47]. We adopted this approach to test the presence of small-study and decline effects, respectively. This was written as:

$${ES}_{ji}={\beta}_{0\left[\textrm{bias}-\textrm{corrected}\right]}+{\beta}_{1\left[\textrm{small}-\textrm{study}\right]}{error}_i+{\beta}_{2\left[\textrm{time}-\textrm{lag}\right]}\left({year}_i-{year}_{latest}\right)+{s}_j+{o}_{ji}+{m}_{ji},$$
(2)

where β0[bias − corrected] is the intercept, representing bias-corrected overall effect/meta-analytic estimate of effect size (see more details below); errori is the uncertainty index of effect size (i.e. sampling error of effect size, sei), and β1[small − study] is the corresponding slope and an indicator of small-study effects; yeari is the publication year, yearlatest is the latest year of published papers, and β2[time − lag] is the corresponding slope and an indicator of decline effect (i.e. time-lag bias).

When assuming there is no small-study effect (i.e. errori = 0) and decline effect (i.e. yeari − yearlatest = 0), the intercept β0[bias-corrected] in Equation 2 becomes a conditional estimate that can be interpreted as the bias-corrected overall effect (i.e. the estimate of ‘true’ effect which is distinct from the unconditional estimate of β0[overall] in Equation 1). We centred the ‘year’ variable by subtracting each year (yeari ) from the latest yearlatest to set the latest year as the intercept, β0[bias − corrected]. This process allowed the estimate of true effect (i.e. β0[bias − corrected] in Equation 2) to be conditional on yeari = yearlatest so that β0 was least affected by a decline effect if it existed. Further, we used a sampling error equivalent \(\sqrt{1/\tilde{n}_i}=\sqrt{\left({n}_e+{n}_c\right)/{n}_e{n}_c}\)) to replace sei when fitting SMD and lnRR where possible (\(4\tilde{n}_i\) is referred to as an effective sample; ne is the sample size of the experimental group, nc is the sample size of the control group [45]). This can correct for the ‘artefactual’ correlation between ESji and errori as the point estimate of SMD and lnRR are inherently correlated with their sampling variances (see Table 3 in [34], and Equation 10 in [48]).

A small-study effect is statistically detected if Equation 2 has a statistically significant β1[small − study] (i.e. p-value < 0.05). Similarly, the decline effect (i.e. time-lag bias) is indicated by a statistically significant β2[time − lag]. Depending on the specific phenomenon tested, β1[small − study] and β2[time − lag] might be expected to be positive or negative when publication bias exists. For example, for an effect that is expected to be positive, a small-study effect and decline effect would be expressed in a positive value of β1[small − study] (i.e. small-size non-statistically significant effects and small-size statistically significant negative effects are underrepresented)) and negative value of β2[time − lag] (i.e. overall effect size declines over time), respectively. In such a case, a slope (β1[small − study] or β2[time − lag]) with opposing direction (unexpected sign) indicates no detectable publication bias and subsequently does not require correction for such a bias. The magnitude of the slope represents the severity of the small-study effect or decline effect. Therefore, using Equation 2, we were able to detect the existence of publication bias and identify its severity for each meta-analysis and each effect size statistic.

Correcting overall estimates for publication bias

To avoid the biased estimate of β0[bias − corrected], we fitted Equation 3 when detecting a statistically significant β0[bias − corrected] in Equation 2. Equation 3 was written as:

$${ES}_{ji}={\beta}_{0\left[\textrm{bias}-\textrm{corrected}\right]}+{\beta}_{1\left[\textrm{small}-\textrm{study}\right]}{error}_i^2+{\beta}_{2\left[\textrm{time}-\textrm{lag}\right]}\left({year}_i-{year}_{latest}\right)+{s}_j+{o}_{ji}+{m}_{ji},$$
(3)

In contrast to Equation 2, Equation 3 used a quadratic term of uncertainty index (i.e. sampling variance vi or \(1/\tilde{n}_i\)) to alleviate the downward bias of an effect size estimate (for explanations see [45, 49]). Theoretically, this procedure provided an easy-to-implement method to correct for publication bias for each meta-analysis (i.e. the conditional estimate of intercept in Equation 3). In practice, however, there were two different types of β0[bias − corrected] estimates to consider. This is because high heterogeneity [47] can lead the signs of the slopes (β1[small − study] and β2[time − lag]) to be opposite from that expected from publication bias [45]. We would subsequently misestimate β0[bias − corrected] if slopes with unexpected signs are included in Equations 2 and 3.

Depending on the signs of the slopes (β1[small − study] and β2[time − lag]), there were two types of estimated β0[bias − corrected]. We used a decision tree (Fig. 3) to obtain the estimate of each type of β0[bias − corrected] for each meta-analytic case. The function of the decision tree was that, if the slopes (β1[small − study] and β2[time − lag]) had unexpected signs, we took out the corresponding slope-related term(s) from the full models to form reduced models (Equations 4 and 5) to better estimate β0. The reduced models were written as Equations 4 and 5, respectively:

$${ES}_{ji}={\beta}_{0\left[\textrm{bias}-\textrm{corrected}\right]}+{\beta}_{1\left[\textrm{small}-\textrm{study}\right]}{error}_i+{s}_j+{o}_{ji}+{m}_{ji},$$
(4)
$${ES}_{ji}={\beta}_{0\left[\textrm{bias}-\textrm{corrected}\right]}+{\beta}_{2\left[\textrm{time}-\textrm{lag}\right]}\left({year}_i-{year}_{latest}\right)+{s}_j+{o}_{ji}+{m}_{ji},$$
(5)
Fig. 3.
figure 3

The decision tree used to obtain the estimate of the ‘unbiased’ effect (i.e. conditional β0). First, use a two-step procedure to estimate β0, β1 and β2 from the full model (Equations 2 or 3). Then, depending on whether the signs of slopes (β1 and β2) are opposite from what will be expected from publication bias (caused by a high amount of unaccounted heterogeneity), there are two types of estimates of β0. The first type includes all β0 regardless of their signs (β1 and β2); the second type of estimated β0 has four scenarios. Scenario 1 = only select β0 with expected signs of β1 and β2 from the full model; Scenario 2 = employ reduced model 1 (Equation 4) to re-estimate β0 where β1 has an unexpected sign, while β2 has an expected sign; Scenario 3 = employ reduced model 3 (Equation 5) to re-estimate β0 if β1 has an expected sign, while β2 has an unexpected sign; Scenario 4 = use β0 from the null model (Equation 1) when both β1 and β2 have unexpected signs (i.e. without the small-study effects or decline effects). The symbols (β0, β1, and β2) are as in Fig. 2

Specifically, the first type of estimate of β0[bias − corrected] was obtained by fitting Equation 2 or 3 (termed as full models). That included all cases of β0[bias − corrected] without consideration of the signs of β1 and β2 (i.e. conditional β0[bias − corrected] estimated from the full model; see Fig. 3). The second type of estimate of β0[bias − corrected] was obtained under the following four scenarios: (i) β0[bias − corrected] estimated under expected signs of β1[small − study] and β2[time − lag] (i.e. conditional β0[bias − corrected] estimated from the direction-controlled full model; see Fig. 3), which meant a co-occurrence of a small-study effect and a decline effect, (ii) β0[bias − corrected] estimated under the expected sign of β1[small − study] and the unexpected sign of β2[time − lag], which signalled the existence of a small-study effect but no decline effect (i.e. conditional β0[bias − corrected] estimated from reduced model 1; see Equation 4 and Fig. 3), (iii) β0[bias − corrected] estimated under the unexpected sign of β1 and the expected sign of β2, which indicated the occurrence of a decline effect but no small-study effect (i.e. conditional β0[bias − corrected] estimated from reduced model 2; see Equation 5 and Fig. 3), and (iv) β0[bias − corrected] estimated under unexpected signs of β1[small − study] and β2[time − lag], which suggested little concerns about a small-study effect or a decline effect.

Second-order meta-analysis

In this section, we statistically aggregated the above-mentioned regression coefficients (i.e. β0[bias − corrected], β1[small − study] and β2[time − lag]) to (i) reveal the patterns of potential publication bias across the fields of ecology and evolutionary biology, and (ii) quantify the extent to which publication bias might cause a reduction in effect-size magnitude across meta-analyses (Fig. 2).

Estimating the overall extent and severity of publication bias

To allow for aggregations of β1[small − study] (i.e. an indicator of small-study effect) and β2[time − lag] (i.e. an indicator of decline effect) over different effect size metrics (i.e. SMD, lnRR, and Zr), we standardised coefficients to eliminate scale-dependency [50]. This was achieved by z-scaling (i.e. mean-centring and dividing by the standard deviation) errori, yeari − yearlatest, and standardising the response variable ESji by dividing by the standard deviation without mean-centring, prior to modelling, as given by Equation 6:

$$c\left({ES}_{ji}\right)={\eta}_{0\left[\textrm{bias}-\textrm{corrected}\right]}+{\eta}_{1\left[\textrm{small}-\textrm{effect}\right]}z\left({error}_i\right)+{\eta}_{2\left[\textrm{time}-\textrm{lag}\right]}z\left({year}_i-{year}_{latest}\right)+{s}_j+{o}_{ji}+{m}_{ji},$$
(6)

Equation 6 indicates that one standard deviation change in errori and yeari − yearlatest would change ESji by η1[small − effect] and η2[time − lag] standard deviations, respectively. Further, to interpret β0 as a bias-corrected overall effect, β0 was set conditional on errori = 0 (i.e. without small-study effect) and yeari − yearlatest = 0 (i.e. without decline effect). As such, we replaced z(errori) by z(errori) − z(error0) and replace z(yeari − yearlatest) by z(yeari) − z(yearlatest), as shown in Equation 7:

$$c\left({ES}_{ji}\right)={\eta}_{0\left[\textrm{bias}-\textrm{corrected}\right]}+{\eta}_{1\left[\textrm{small}-\textrm{effect}\right]}\left(z\left({error}_i\right)-z\left({error}_0\right)\right)+{\eta}_{2\left[\textrm{time}-\textrm{lag}\right]}\left(z\left({year}_i\right)-z\left({year}_{latest}\right)\right)+{s}_j+{o}_{ji}+{m}_{ji},$$
(7)

where z(error0) denotes the z-score when errori = 0, which is equal to \(\frac{0-\textrm{mean}\left[{error}_i\right]}{\textrm{SD}\left[{error}_i\right]}\); z(yearlatest) is the z-score when yeari is the latest year. Likewise, to obtain the best estimate of standardised bias-corrected effects, we introduced Equation 8 where a quadratic error term was used:

$$c\left({ES}_{ji}\right)={\eta}_{0\left[\textrm{bias}-\textrm{corrected}\right]}+{\eta}_{1\left[\textrm{small}-\textrm{effect}\right]}\ {\left(z\left({error}_i\right)-z\left({error}_0\right)\right)}^2+{\eta}_{2\left[\textrm{time}-\textrm{lag}\right]}\ \left(z\left({year}_i\right)-z\left({year}_{latest}\right)\right)+{s}_j+{o}_{ji}+{m}_{ji},$$
(8)

Therefore, fitting 8 created two datasets: (1) the full dataset containing η0[bias − corrected], η1[small − effect] and η2[time − lag] without consideration of their signs (standardised slopes of the first type estimate), and (2) the reduced dataset containing η0[bias − corrected], η1[small − effect] and η2[time − lag] with expected directions (standardised slopes of the second type estimate: scenarios 1–4, Fig. 3). We then conducted a series of second-order meta-analyses to statistically aggregate these standardised regression coefficients across meta-analyses [51, 52]. We employed a random-effects meta-analytic model with the inverse square of each coefficient’s standard error as weights to fit such second-order meta-analyses [44]. For both the full and reduced databases, we obtained a weighted average of the regression coefficient η1[small − effect] (or η2[time − lag]) to indicate the occurrence of small-study effects (or decline effects) across the fields of ecology and evolutionary biology. To compare the severity of publication bias between different types of effect size, we further incorporated effect-size types as a moderator (i.e. a fixed factor or predictor with three levels: SMD, lnRR, and Zr) in these random-effects models.

Quantifying the reduction in effect-size magnitude after controlling for publication bias

Likewise, to quantify the differences between uncorrected effect sizes and their bias-corrected estimates for the different types of effect-size metrics, we required standardised estimates of these effect sizes to draw comparisons. The term η0[bias − corrected] in the full dataset provided a standardised bias-corrected effect size (i.e. an intercept estimated using the full model, where all cases of η1[small − effect] and η2[time − lag] were included regardless of their directions). Also, η0[bias − corrected] in the reduced dataset provided standardised bias-corrected effect sizes, which were obtained using expected directions of η1[small − effect] and η2[time − lag]. In contrast, the standardised uncorrected effect sizes were obtained via standardising ESji by dividing by standard deviation before fitting Equation 1 (that is, standardised intercept in the null model: η0[overall]). We then used the absolute mean difference as a metric to quantify the reduction in effect-size magnitude following correction for publication bias, where the point estimate and sampling variance was written as:

$$D=\mid {\upgamma}_{\textrm{uncorrected}-\textrm{effect}}^s-{\upgamma}_{\textrm{corrected}-\textrm{effect}}^s\mid,$$
(9)
$$\textrm{Var}(D)={\textrm{SE}}_{\upgamma_{\textrm{corrected}-\textrm{effect}}^s}^2+{\textrm{SE}}_{\upgamma_{\textrm{uncorrected}-\textrm{effect}}^s}^2-2r{\textrm{SE}}_{\upgamma_{\textrm{corrected}-\textrm{effect}}^s}{\textrm{SE}}_{\upgamma_{\textrm{uncorrected}-\textrm{effect}}^s},$$
(10)

where \({\upgamma}_{\textrm{corrected}-\textrm{effect}}^s\) and \({\upgamma}_{\textrm{uncorrected}-\textrm{effect}}^s\) are the values of standardised uncorrected effect size (standardised η0[overall] in the null model) and its bias-corrected version (standardised η0[bias − corrected] in the full or reduced models), respectively; \({\textrm{SE}}_{\upgamma_{\textrm{corrected}-\textrm{effect}}^s}\) and \({\textrm{SE}}_{\upgamma_{\textrm{uncorrected}-\textrm{effect}}^s}\) are associated standard errors; r is the correlation between standard errors (\({\textrm{SE}}_{\upgamma_{\textrm{corrected}-\textrm{effect}}^s}\)vs. and \({\textrm{SE}}_{\upgamma_{\textrm{uncorrected}-\textrm{effect}}^s}\)), which is assumed to be 1 because the two estimates should be strongly correlated.

Given that D is an absolute variable, it follows a ‘folded’ normal distribution because taking the absolute value will force probability density on its left side (x-axis < 0) to be folded to the right [53, 54]. The corresponding folded mean and variance could be derived from its ‘folded’ normal distribution as Equations 11 and 12:

$${D}_f=\sqrt{\frac{2}{\pi}\textrm{Var}(D)}{e}^{-{D}^2/2\textrm{Var}(D)}+D\left(1-2\Phi \left(\frac{-D}{\sqrt{\textrm{Var}(D)}}\right)\right),$$
(11)
$$\textrm{Var}\left({D}_f\right)={D}^2+\textrm{Var}\left(\textrm{D}\right)-{\left(\sqrt{\frac{2}{\pi}\textrm{Var}\left(\textrm{D}\right)}{e}^{-{D}^2/2\textrm{Var}\left(\textrm{D}\right)}+D\left(1-2\Phi \left(\frac{-D}{\sqrt{\textrm{Var}\left(\textrm{D}\right)}}\right)\right)\right)}^2,$$
(12)

where Φ is the standard normal cumulative distribution function (see more details in [53, 55]). Equations 9 to 12 enable us to calculate Df and Var(Df) for both full and reduced databases. We used a random-effects meta-analytic model ((rma.uni) function [44]) to synthesise these Df with Var(D)f as sampling variance across meta-analyses. Also, we incorporated effect size type as a moderator to compare the differences in effect size reduction between SMD, lnRR, and Zr.

Estimating statistical power, and type M and S errors

We assessed the statistical power and Type M and S errors in the primary studies with experimental effects that were approximated by uncorrected and bias-corrected effect sizes [27, 56]. Although meta-analyses can increase power over primary studies [57], they might still be underpowered to detect the true effect (i.e. p-value > 0.05 despite the existence of a true effect). Therefore, we also calculated the statistical power, Type M and S errors for each meta-analysis. To obtain average statistical power, and Type M and S errors at the primary study level, we used a linear mixed-effects model to aggregate over the estimates of power, and Type M and S errors from primary studies. We used the (lmer) function in the lme4 R package (version 1.1-26) to fit these mixed-effects models [58], which incorporated the identity of the primary study as a random factor to account for between-study variation. Similarly, we used a weighted regression to aggregate meta-analysis level power, and Type M and S errors, with the number of effect sizes (k) within each meta-analysis as weights. We implemented the weighted regression via the base R function (version 4.0.3), (lm).

Deviations and additions

The Stage 2 of this registered report has three deviations from the Stage 1 protocol. First, in the ‘Correcting for overall estimates for publication bias’ section, the best estimate of the bias-corrected overall effect (i.e. model intercept β0[bias − corrected]) was initially planned to be obtained by a two-step procedure where when a zero effect exists (i.e. statistically non-significant β0[bias − corrected]), uncertainty index (i.e. sampling error errori or \(\sqrt{1/\tilde{n}_i}\)) was used (Equation 2) to estimate β0[bias − corrected], while when a non-zero effect exists (i.e. statistically significant β0[bias − corrected]), a quadratic term of uncertainty index (i.e. sampling variance vi or \(1/\tilde{n}_i\)) was used (Equation 3) to estimate β0[bias − corrected] [59, 60]. We decided to only use Equation 3 to estimate β0[bias − corrected] because there is no need to estimate β0[bias − corrected] when no statistically significant effect exists (Equation 2).

Second, in the ‘Estimating the overall extent and severity of publication bias’ section, we changed z-scaling (i.e. mean-centring and dividing by the standard deviation) response variable ESji prior to model fitting to standardising response variable ESji by dividing by the standard deviation without mean-centring. This is because centring the response variable would make estimating model intercept (β0[bias − corrected]) unfeasible [50]. The same change was made in the ‘Quantifying the reduction in effect-size magnitude after controlling for publication biases’ section.

Third, we added a post hoc analysis where we removed the meta-analyses with statistically non-significant mean effects and subsequently calculated the average statistical power, Type M and S error rates. The reason why adding this post hoc analysis was that the underlying true effect sizes in some meta-analyses were likely to be so trivially small (and biologically insignificant) that corresponding power calculation was meaningless. In such a case, if we included those effects when estimating average power across meta-analyses in ecology and evolution, we would get a downwardly biased average power estimate. Note that relevant results were reported in Supplementary Material (Table S4).

Results

The pattern of small-study effects in ecology and evolutionary biology

Within-meta-analysis level

Of the 87 ecological and evolutionary meta-analyses tested, 15 (17%) meta-analyses showed evidence for small-study effects (i.e. statistically significant β1[small − study]; see Fig. 4A), where smaller studies reported larger effect sizes. Importantly, β1[small − study] from 54 (62%) meta-analyses were in the expected direction (Fig. 4A), indicating that these meta-analyses exhibited a (statistically non-significant) tendency for a small-study effect (note that the likelihood of a meta-analysis to show this tendency is 50% if there is no real effect).

Fig. 4.
figure 4

The percentage of ecology and evolutionary meta-analyses showing evidence of publication bias. A A small-study effect (i.e. small non-statistically significant effects and small statistically significant effects of opposite direction to the overall effect are underrepresented). B A decline effect (the magnitude of effect sizes changes over time). See more details in the legend of Fig. 3. All figures were drawn using the geom_bar() function in ggplot2 R package (version 3.3.5) [61]

Between-meta-analysis level

When conducting a second-order meta-analysis by aggregating the β1[small − study] obtained from the 87 meta-analyses, there was a statistically significant pooled β1[small − study] (grand mean β1[small − study] = 0.084, 95% confidence intervals (CI) = 0.034 to 0.135, p-value = 0.001, N = 87; Fig. 5A). This provides statistical evidence for the existence of small-study effects across the meta-analyses. Furthermore, the heterogeneity among the β1[small − study] estimates obtained from the 87 meta-analyses was low (\({\sigma}_{among- meta- analysis}^2\) = 0.0050; \({I}_{among- meta- analysis}^2\) = 10%), indicating that these results are highly generalizable. Three per cent of this heterogeneity could be explained by the types of effect sizes (SMD, lnRR, Zr) being meta-analyzed (\({R}_{marginal}^2\) = 0.031). The non-random pattern of the small-study effect was mainly driven by SMD (grand mean β1[small − study] = 0.091, 95% CI = 0.018 to 0.165, p-value = 0.015, N = 36) and Zr (grand mean β1[small − study] = 0.119, 95% CI = 0.026 to 0.212, p-value = 0.013, N = 20), but not lnRR (grand mean β1[small − study] = 0.029, 95% CI = −0.072 to 0.130, p-value = 0.571, N = 31).

Fig. 5.
figure 5

Orchard plots showing the distribution of the indicator of small-study effect (model slope β1[small − study]) for each meta-analysis and meta-analytic aggregation of β1[small − study] (pooled β1[small − study]). (A) Pooled β1[small − study] across different meta-analyses and different types of effect size, indicating the pattern of small-study effects. (B) Pooled β1[small − study] for each type of effect size. Solid circles = β1[small − study] estimates obtained from each meta-analysis; the size of each solid circle is proportional to its inverse standard error (i.e. precision). Open circles = pooled β1[small − study]. Thick error bars = 95% confidence intervals (CI). Thin error bars = prediction intervals (PIs). See more details in the legend of Fig. 2. All panels were made using orchard_plot() function in orchaRd R package (version 2.0) [62]

The pattern of decline effects in ecology and evolutionary biology

Within-meta-analysis level

Out of the 87 ecological and evolutionary meta-analyses reviewed, 13 (15%) revealed evidence of a decline effect, where the effect sizes significantly decreased over time (Fig. 4B). Additionally, 54 (62%) of the meta-analyses showed a statistically non-significant decline in effect size over time.

Between-meta-analysis level

There was a statistically significant pooled β2[time − lag] (grand mean β2[time − lag] = −0.006, 95% CI = −0.009 to −0.002, p-value < 0.001; Fig. 6A) across 87 meta-analyses, providing statistical evidence for the existence of decline effects. The estimates of β2[time − lag] were homogeneous across these meta-analyses, indicating high generalizability of the results, with a low relative heterogeneity (\({\sigma}_{among- meta- analysis}^2\) = 0.0001; \({I}_{among- meta- analysis}^2\) < 1%). Five per cent of that heterogeneity could be explained by the types of effect sizes (\({R}_{marginal}^2\) = 0.05); SMD and Zr exhibited a statistically significant pattern of decline effect (SMD: pooled β2[time − lag] = −0.005, 95% CI = −0.010 to −0.001, p-value = 0.013, N = 36; Zr: pooled β2[time − bias] = −0.008, 95% CI = −0.015 to −0.001, p-value = 0.023, N = 31; Fig. 6B), but lnRR did not (pooled β2[time − bias] = −0.004, 95% CI = −0.010 to 0.003, p-value = 0.289, N = 20).

Fig. 6.
figure 6

Orchard plots showing the distribution of the indicator of decline effects (model slope β2[time − lag]) for each meta-analysis and meta-analytic aggregation of β2[time − lag] (pooled β2[time − lag]). A Pooled β2[time − lag] across different meta-analyses and different types of effect size, indicating the systematic pattern of decline effect. B Pooled β2[time − lag] for each type of effect size. See more details in the legend of Figs. 2 and 3. All panels were made using orchard_plot() function in orchaRd R package (version 2.0) [62]

The inflation of effect size estimates and distortion of meta-analytic evidence by publication bias

Among the 87 meta-analyses examined, the estimated absolute mean difference between the original (uncorrected) effect size (β0[overall]) and its bias-corrected version (β0[bias − corrected]) was statistically significant (pooled D = 0.225, 95% CI = 0.180 to 0.269, p-value < 0.001; Fig. S1A). An overestimation of 0.189, 0.195 and 0.333 standard deviation units were found in SMD, lnRR, and Zr, respectively (Fig. S1B). After back-transformation to the original scale, the publication bias led to an exaggeration of the estimates of SMD, lnRR, and Zr by an average of 0.217, 0.116 and 0.128 (Fig. 7), respectively. Additionally, after correcting for publication bias, 33 out of 50 initially statistically significant meta-analytic means became non-significant.

Fig. 7.
figure 7

The magnitude of each meta-analysis' estimated effect size declines after correcting for publication bias. Nine of 20 meta-analyses of lnRR, 17 of 36 meta-analyses of SMD and 14 of 31 meta-analyses of Zr had corrected directions of slope after adjusting for publication bias. The remaining 11 in lnRR, 19 in SMD, and 17 in Zr had the wrong direction of slope, presumably because of a high degree of heterogeneity that could not be controlled for. Original = uncorrected meta-analytic estimate effect sizes (i.e. βo[overall] in Equation 1). Bias-corrected = meta-analytic estimate effect size corrected for the presence of two forms of publication bias, small-study and decline effects (i.e. β0[bias − corrected] in Equation 3)

Statistical power and type S and M error rates

Sampling level (primary studies)

Overall, primary studies or single experiments (i.e. at the sampling level) had a low statistical power of only 23% to detect the ‘true’ effect, as indicated by the original (uncorrected) meta-analytic estimate of effect sizes, β0[overall]. This was found to be the case across the different types of effect sizes, with power of 19%, 24% and 28% for sampling level of SMD, lnRR, and Zr, respectively (see Fig. 8 and Table S1). When bias correction was applied, the overall power to detect the ‘true’ effect (β0[bias − corrected]) decreased further to 15% (12%, 16%, and 18% for sampling level of SMD, lnRR, and Zr, respectively; see Fig. 8A and Table S1).

Fig. 8.
figure 8

Ecological and evolutionary studies’ median statistical power to detect ‘true’ effects that were approximated by meta-analytic mean effect size estimates (labels: Meta-analysis, Sampling) and their bias-corrected versions (labels: cMeta-analysis, cSampling). On the y-axis, effect size metrics with different subscripts represent different individual meta-analyses (see Fig. 2). Sampling = statistical power at sampling level (primary studies). cSampling = statistical power at sampling level after correcting for publication bias. Meta-analysis = statistical power at meta-analysis level. cMeta-analysis = statistical power at meta-analysis level after correcting for publication bias. See more details in the legend of Fig. 3. All figures were drawn via geom_tile() function in ggplot2 R package (version 2.0) [61]

The primary studies infrequently showed incorrect estimation of the signs of the true effect sizes (overall Type S error = 5%; Fig. 9 and Table S2). For example, the primary studies (i.e. at sampling level) using lnRR and SMD had only 5% and 6% probabilities of having a direction that was opposite to the meta-analytic mean estimated as β0[overall]. When correcting for publication bias the Type S error increased from 5% to 8%.

By contrast, the primary studies tended to exaggerate the magnitude of the meta-analytic mean estimated as β0[overall], due to the limitation of finite sample size (overall Type M error = 2.7; Fig. 10 and Table S3). For example, the magnitude of lnRR, SMD and Zr were overestimated by an average of 2.5, 3.5 and 2 times, respectively. When correcting for publication bias (β0[bias − corrected]), the Type M errors increased to 4 (3.5 for lnRR, 6 for SMD and 3.4 for Zr).

Fig. 9.
figure 9

Ecological and evolutionary studies’ median Type S error rates (sign error) in detecting ‘true’ effects that were approximated by meta-analytic mean effect size estimates (labels: Meta-analysis, Sampling) and their bias-corrected versions (labels: cMeta-analysis, cSampling). On the y-axis, effect size metrics with different subscripts represent different individual meta-analyses (see Fig. 2). Sampling = statistical power at sampling level (primary studies). See more details in the legend of Figs. 3 and 8. All figures were drawn via geom_tile() function in ggplot2 R package (version 2.0) [61]

Fig. 10.
figure 10

Ecological and evolutionary studies’ median Type M error rates (magnitude error) in detecting ‘true’ effects that were approximated by meta-analytic mean effect size estimates (labels: Meta-analysis, Sampling) and their bias-corrected versions (labels: cMeta-analysis, cSampling). On the y-axis, effect size metrics with different subscripts represent different individual meta-analyses (see Fig. 2). Grey cells indicate that Type M errors are greater than 10. See more details in the legend of Figs. 3 and 8. All figures were drawn via geom_tile() function in ggplot2 R package (version 2.0) [61]

Meta-analysis level

On average, at the level of individual meta-analyses, lnRR and Zr had statistical power that was at or above the nominal 80% level for detecting the true effects estimated as β0[bias − corrected]. Specifically, the power was found to be 81% for both lnRR and Zr (Fig. 8 and Table S1). In contrast, the estimated power of SMD was only 41%, which falls short of the nominal 80% level. When detecting true effects indicated by β0[bias − corrected], the statistical power of each meta-analysis decreased further, with lnRR, SMD, and Zr decreasing to 63%, 25% and 51%, respectively.

Ecological and evolutionary meta-analyses had a relatively low probability of reporting an opposite sign to the true direction of both β0[overall] and β0[bias − corrected] (Type S = 5%–8%; Fig. 9 and Table S2). The meta-analyses were also able to considerably reduce the overestimation of the true effect size for lnRR (Type M = 1.1 for β0[overall] and 1.3 for β0[bias − corrected]; Fig. 10 and Table S3), SMD (Type M = 1.9 for β0[overall] and 2.5 for β0[bias − corrected]) and Zr (Type M = 1.1 for β0[overall] and 1.6 for β0[bias − corrected]).

Discussion

We have conducted the first comprehensive investigation of the prevalence and severity of two common forms of publication bias, small-study and decline effects) in the fields of ecology and evolutionary biology using modern analytic techniques. Overall, we found strong support for small-study and decline effects (time-lag bias) with little heterogeneity across studies. The prevalence of such publication bias resulted in overestimating meta-analytic mean effect size estimates by at least 0.12 standard deviations and substantially distorted the ecological and evolutionary evidence. The statistical power of ecological and evolutionary studies and experiments was found to be consistently low at 15%. Ecological and evolutionary studies also showed a 4-fold overestimation of effects (Type M error = 4.4) and a low but nontrivial rate of misidentifying the sign of the effects (Type S error = 8%; error in the direction that leads to the opposite conclusion). To place these in perspective with the replication crisis [5, 6], we conclude that prior published findings in ecology and evolutionary biology, at least for the dataset used in this study (87 meta-analyses, 4250 primary studies, 17,638 effect sizes) are likely to have low replicability.

The persistent and non-negligible publication bias in ecological and evolutionary meta-analyses

Small-study and decline effects are general phenomena

We have found that 17% of ecological and evolutionary meta-analyses show evidence for small-study effects (i.e. smaller studies reporting larger effect sizes). Medical researchers found a similar percentage of meta-analyses showing small-study effects (7–18%) in a survey of 6873 meta-analyses (the large sample is because medical research has a bigger pool of meta-analyses to draw from and because that study extracted a much narrower scope of data from each meta-analysis than did our study [7, 63]). Similarly, 13–25% of psychological meta-analyses presented evidence for small-study effects [64, 65]). These values may seem relatively small, but this is in part because, for a given meta-analysis, bias detection methods often lack sufficient statistical power to identify a small-study effect [45, 63, 66]. Indeed, simulations have shown that the power to detect a moderate small-study effect in a medical meta-analysis with 10 studies was as low as 21% [14].

Given the limited power to detect a small-study effect [14], it seems reasonable to focus on the sign and magnitude of the relationship between effect size and sampling error rather than on p-values (i.e. null-hypothesis significance testing). By doing so, we found that more than 60% of meta-analyses had a positive statistically non-significant relationship between the effect size and its sampling error, indicating that small studies (i.e. with large sampling error or small precision) tend to report larger effects (note that the likelihood of meta-analysis showing this tendency is 50% under the null hypothesis). We confirmed these results by employing a more powerful approach, i.e. a second-order meta-analysis or meta-meta-analysis, which showed a statistically significant positive estimate of the relationship between effect size and sampling error. This result is in line with recent investigations revealing an negative mean association of effect size and sample size in psychology and psychiatry meta-analyses [51, 67]. Moreover, our analysis also showed a small amount of heterogeneity among these 87 slopes. This positive and homogenous effect implies that small-study effects are commonplace in ecology and evolutionary biology. Similar conclusions were reached in investigations of economic and psychological meta-analyses: small-study effects are widespread phenomena [68,69,70].

We conclude that decline effects are also widespread in the field. More than 50% of ecological and evolutionary meta-analyses showed a negative relationship between effect size and their year of publication, indicating that effect sizes decrease over time. As mentioned above, the principal reason for failing to detect a decline effect in a single meta-analysis lies in the low statistical power of the available detection methods [13, 45, 71]. The observed power to determine a decline effect in the current set of 87 meta-analyses was low (median = 13%), which is similar to that observed in another much larger survey of 464 ecological meta-analyses (median = 17%; [71, 72]). Importantly, our second-order meta-analysis found a statistically significant and homogeneous effect (Fig. 6A), corroborating that decline effects are common in both sub-fields previously explored (status signalling [73], plant and insect biodiversity [20, 74] and ocean acidification [75]) and more generally in ecology and evolutionary biology [12, 71]. Evidence from other disciplines also reveals the pervasiveness of decline effects (medical and social sciences [51, 76, 77]).

The distorted meta-analytic estimate of effect sizes and evidence by publication bias

By combining the observed bias from both small-study and decline effects, we found evidence that magnitudes of effect sizes might have been overestimated by 0.217, 0.116 and 0.128 SDs of their original units for lnRR, SMD and Zr, respectively. A recent investigation of 433 psychological meta-analyses also showed a statistically significant, albeit small, decrease in meta-analytic estimates after correcting for publication bias [78]. A comparison of meta-analyses that were published without pre-registration versus registered reports (which are less prone to publication bias) has also shown that unregistered meta-analyses substantially overestimated effect sizes although bias-correction methods like the one used in this study can correct for difference in results between meta-analyses and registered reports [79]. In our dataset, correcting for publication bias led to 33 of 50 initially statistically significant meta-analytic estimates of the mean effect becoming non-significant, suggesting unmerited confidence in the outcomes of 66% of published ecological and evolutionary meta-analyses (when using a frequentist approach with a p-value of 0.05). Recent psychological investigations revealed a similar percentage (60%) of erroneous conclusions of meta-analytic evidence because of publication bias [80].

Low statistical power and high type M error in ecological and evolutionary studies

Ecological and evolutionary studies lack power and are prone to type M error

Primary studies in ecology and evolutionary biology included in our sample of meta-analyses, on average, only had a power of 15% to detect the biased-corrected effect size identified in the meta-analysis, which is consistent with earlier findings in the sub-fields of global change biology [56, 81] and animal behaviour [10, 23]. When excluding studies with effects that are not statistically significant, the corresponding average power of primary studies was still very low (17%; Table S4). As a result, only studies with largely exaggerated effect sizes (4-fold) have reached statistical significance. Contrastingly, Type S error was small but not trivial (8%); note that making an error in direction can result in a completely opposite conclusion. A lack of statistical power seems to be a general phenomenon in scientific research, low power has been identified in many disciplines (medical sciences = 20% [82], neuroscience = 21% [16], psychological sciences = 36% [27], economics = 18% [83]). Given this, meta-analysis with appropriate bias correction is an important way to generate more reliable estimates of effect sizes [30]. Statistically speaking, meta-analysis is an effective way to approximate population-level estimates by combining sampling level estimates, despite its shortcomings, some of which were shown above. Science is a process of evidence accumulation in which primary studies are the basis that can be used to produce high-order and high-quality evidence (e.g. via systematic review and meta-analysis).

Publication bias aggravates the low power and high Type M error

Publication bias is expected to reveal lower power and higher Type M error rates because it creates a non-random sample of effect size evidence used in meta-analyses. We show that correcting for publication bias resulted in a decrease in statistical power from 23% to 15%, an increase in Type S error rate from 5% to 8%, and an increase Type M error rates from 2.7 to 4.4. Psychological and economic research also confirm that meta-analyses without bias adjustments overestimate the estimate of statistical power [27, 28]. The exaggeration of power and effect size is even more severe in ecological and evolutionary studies if no bias correction is made [5], providing further support for recent concerns about the likelihood of low replicability (‘the replication crisis’) in ecology and evolutionary biology [6, 10].

Limitations

There are four limitations in the present registered report. First, when calculating statistical power to detect true effects in ecology and evolutionary studies, we used the meta-analytic mean effect size (and corresponding bias-corrected version) as the true effect for each primary study within the same meta-analysis. This means that we assumed that the multiple primary studies included in the same meta-analysis share a common true effect. However, the high heterogeneity in ecology and evolutionary meta-analyses indicates that each primary study may have a specific true effect size that is dependent on the research context (e.g. population, species, methodology, lab effects [47]). Therefore, using such context-dependent effects as the proxies of true effect is probably more reasonable [81]. Second, in the post hoc analysis, we used the statistical significance (p-value < 0.05) of the meta-analytic mean effect size as the threshold to decide whether the true effect in a meta-analysis is so tiny that can be biologically neglected and subsequently excluded to calculate average power. We acknowledge that this categorisation is arbitrary because the statistical significance does not represent biological significance [4]. In some fields, very small effects still have biological importance. Third, the meta-analytic effect size estimates after correcting for publication bias may still be overestimated or underestimated because the incomplete reporting of important moderators in meta-analyses prevented us from accurately correcting for publication bias using our regression-based method [42, 46]. Fourth, notably, in testing for publication bias at both the within- and between-meta-analysis levels, we used statistical significance at the 0.05 level as a criteria to determine if there was publication bias. We acknowledge that this process, which is commonly referred to as a "significance filter", is prone to exaggeration and might result in a so-called "winner's curse" [84,85,86]. To partially mitigate this issue, the percentage of both statistically significant and non-significant results was reported in Figs. 4, 5 and 6. Furthermore, to avoid drawing conclusions based solely on statistically significant results, downstream analyses were conducted to assess the extent to which publication bias distorted the estimates of effect size (as shown in Fig. 7) and the calculation of power and Type M/S error rates (as shown in Figs. 8, 9 and 10).

Implications

How to properly test for publication bias and correct for its impacts?

Given the strong and widespread evidence of publication bias found in this study (and others), publication bias tests should be a standard part of meta-analyses. A recent survey showed that publication bias tests have become more widespread in ecology and evolution in recent years [45]; however, inappropriate bias detection methods still dominate the literature [45]. Generally, regression-based methods are more powerful than other methods such as correlation-based methods [14, 63]. The regression-based method in the multilevel model framework used in the current study can further handle non-independence and high heterogeneity, which are common in the field, to bring down the rate of false positives [45,46,47]. Importantly, the method used here provides an intuitive quantification of the severity of publication bias. For example, the pooled β1[small − study] of Zr was larger than that of SMD (0.119 vs. 0.091), suggesting publication bias in Zr is more severe than in SMD. Regression-based methods to correct for publication bias have been shown to produce effect size estimates similar to those of registered reports [79]. We strongly recommend that meta-analysts employ the regression-based method used in the current paper to routinely test for the presence of publication bias, correct for its impact and, report the corrected effect sizes, allowing stakeholders to better judge how robust the reported effects are.

How to increase power and mitigate overestimation of effect for primary studies and meta-analyses?

For primary studies, a fundamental solution to increase statistical power and mitigate effect size overestimation is to increase sample sizes by building up more big-team science [87] or global-scale collaborative scientific networks such as Nutrient Network [88], US Long-Term Ecological Research network [89], and Zostera Experimental Network [90]. Our results confirm that lnRR is a more powerful effect size metric than SMD [81]. Power of meta-analyses using lnRR was almost twice as large as SMD (lnRR vs. SMD: 81% vs. 41%). Moreover, lnRR was less prone to exaggeration (lnRR vs. SMD: 1 vs. 2). Practically, we recommend using lnRR as the main effect size when conducting meta-analyses if the biological questions focused on mean differences (but see [91]), but conduct sensitivity analyses using SMD (see [81, 92] for comparisons of the pros and cons of lnRR and SMD).

Conclusions

We indirectly examined the extent of the replication crisis in ecology and evolutionary biology using two inter-related indicators: publication bias and statistical power. Our results indicate that two expected outcomes of publication bias, small-study effects and decline effects, are persistent and non-negligible in the field. Primary studies in ecology and evolutionary biology are often underpowered and prone to overestimate the magnitude of the effect (i.e. Type M error). Pervasive publication bias leads to exaggerated effect sizes, inflated meta-analytic evidence and overestimated statistical power, and to underestimated Type M error rates, undermining the reliability of previous findings. Although no single indicator can capture the true extent or all relevant evidence of the replication crisis [93], we have provided clear evidence that, as in many other disciplines [1, 2, 4], previously published findings in ecology and evolutionary biology are likely to have low replicability. The likely replication crisis in these fields highlights the importance of (i) designing high-power primary studies by building up big-team science [7, 87] where possible, (ii) adopting appropriate publication bias detection and correction methods for meta-analyses [45], (iii) embracing publication-bias-robust publication forms (e.g. Registered Reports — like the current article) for both empirical studies and meta-analyses alike. More generally, researchers need to adhere more closely to open and transparent research practices [94], such as (pre-)registration [95], data and code sharing [96, 97], and transparent reporting [5], to achieve credible, reliable and reproducible ecology and evolutionary biology.

Availability of data and materials

The relevant data and code that reproduce the results of this registered report are available at GitHub Repository (https://github.com/Yefeng0920/EcoEvo_PB) and Zenodo (Yefeng0920/EcoEvo_PB: Registered Report - publicaiton bias in Eco & Evo. DOI: https://doi.org/10.5281/zenodo.7762126).

References

  1. Collaboration OS. Estimating the reproducibility of psychological science. Science. 2015;349:aac4716.

  2. Camerer CF, Dreber A, Forsell E, Ho T-H, Huber J, Johannesson M, et al. Evaluating replicability of laboratory experiments in economics. Science. 2016;351(6280):1433–6.

    Article  CAS  PubMed  Google Scholar 

  3. Ebersole CR, Mathur MB, Baranski E, Bart-Plange D-J, Buttrick NR, Chartier CR, et al. Many labs 5: testing pre-data-collection peer review as an intervention to increase replicability. Adv Methods Pract Psychol Sci. 2020;3(3):309–31.

    Article  Google Scholar 

  4. Baker M. Reproducibility crisis. Nature. 2016;533(26):353–66.

    Google Scholar 

  5. Kelly CD. Rate and success of study replication in ecology and evolution. PeerJ. 2019;7:e7654.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Parker TH, Forstmeier W, Koricheva J, Fidler F, Hadfield JD, Chee YE, et al. Transparency in ecology and evolution: real problems, real solutions. Trends Ecol Evol. 2016;31(9):711–9.

    Article  PubMed  Google Scholar 

  7. O’Dea RE, Parker TH, Chee YE, Culina A, Drobniak SM, Duncan DH, et al. Towards open, reliable, and transparent ecology and evolutionary biology. BMC Biol. 2021;19(1):1–5.

    Article  Google Scholar 

  8. Fraser H, Barnett A, Parker TH, Fidler F. The role of replication studies in ecology. Ecol Evol. 2020;10(12):5197–207.

    Article  PubMed  PubMed Central  Google Scholar 

  9. Nakagawa S, Parker TH. Replicating research in ecology and evolution: feasibility, incentives, and the cost-benefit conundrum. BMC Biol. 2015;13(1):1–6.

    Article  CAS  Google Scholar 

  10. Feng X, Park DS, Walker C, Peterson AT, Merow C, Papeş M. A checklist for maximizing reproducibility of ecological niche models. Nat Ecol Evol. 2019;3(10):1382–95.

    Article  PubMed  Google Scholar 

  11. Rosenthal R. The file drawer problem and tolerance for null results. Psychol Bull. 1979;86(3):638.

    Article  Google Scholar 

  12. Jennions MD, Møller AP. Relationships fade with time: a meta-analysis of temporal trends in publication in ecology and evolution. Proc R Soc Lond Ser B Biol Sci. 2002;269(1486):43–8.

    Article  Google Scholar 

  13. Koricheva J, Kulinskaya E. Temporal instability of evidence base: a threat to policy making? Trends Ecol Evol. 2019;34(10):895–902.

    Article  PubMed  Google Scholar 

  14. Sterne JA, Gavaghan D, Egger M. Publication and related bias in meta-analysis: power of statistical tests and prevalence in the literature. J Clin Epidemiol. 2000;53(11):1119–29.

    Article  CAS  PubMed  Google Scholar 

  15. McShane BB, Böckenholt U, Hansen KT. Average power: a cautionary note. Adv Methods Pract Psychol Sci. 2020;3(2):185–99.

    Article  Google Scholar 

  16. Button KS, Ioannidis JP, Mokrysz C, Nosek BA, Flint J, Robinson ES, et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013;14(5):365–76.

    Article  CAS  PubMed  Google Scholar 

  17. Szucs D, Ioannidis JP. Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. PLoS Biol. 2017;15(3):e2000797.

  18. Fraley RC, Chong JY, Baacke KA, Greco AJ, Guan H, Vazire S. Journal N-pact factors from 2011 to 2019: evaluating the quality of social/personality journals with respect to sample size and statistical power. Adv Meth Pract Psychol Sci. 2022;5(4):1–17.

    Google Scholar 

  19. Barto EK, Rillig MC. Dissemination biases in ecology: effect sizes matter more than quality. Oikos. 2012;121(2):228–35.

    Article  Google Scholar 

  20. Crystal‐Ornelas R, Lockwood JL. Cumulative meta‐analysis identifies declining but negative impacts of invasive species on richness after 20 yr. Ecology. 2020;101(8):e03082.

  21. Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale: Lawrence Erlbaum; 1988.

    Google Scholar 

  22. Jennions MD, Møller AP. A survey of the statistical power of research in behavioral ecology and animal behavior. Behav Ecol. 2003;14(3):438–45.

    Article  Google Scholar 

  23. Smith DR, Hardy IC, Gammell MP. Power rangers: no improvement in the statistical power of analyses published in animal behaviour. Anim Behav. 2011;1(81):347–52.

    Article  Google Scholar 

  24. Jennions MD, Moeller AP. Publication bias in ecology and evolution: an empirical assessment using the ‘trim and fill’method. Biol Rev. 2002;77(2):211–22.

    Article  PubMed  Google Scholar 

  25. Correll J, Mellinger C, McClelland GH, Judd CM. Avoid Cohen’s ‘small’,‘medium’, and ‘large’for power analysis. Trends Cogn Sci. 2020;24(3):200–7.

    Article  PubMed  Google Scholar 

  26. Ioannidis JP, Stanley TD, Doucouliagos H. The power of bias in economics research. Econ J. 2017;127(605):F236–65.

    Article  Google Scholar 

  27. Stanley T, Carter EC, Doucouliagos H. What meta-analyses reveal about the replicability of psychological research. Psychol Bull. 2018;144(12):1325–46.

  28. Gelman A, Tuerlinckx F. Type S error rates for classical and Bayesian single and multiple comparison procedures. Comput Stat. 2000;15(3):373–90.

    Article  Google Scholar 

  29. Gelman A, Carlin J. Beyond power calculations: assessing type S (sign) and type M (magnitude) errors. Perspect Psychol Sci. 2014;9(6):641–51.

    Article  PubMed  Google Scholar 

  30. Gurevitch J, Koricheva J, Nakagawa S, Stewart G. Meta-analysis and the science of research synthesis. Nature. 2018;555(7695):175–82.

    Article  CAS  PubMed  Google Scholar 

  31. O’Dea RE, Lagisz M, Jennions MD, Koricheva J, Noble DW, Parker TH, et al. Preferred reporting items for systematic reviews and meta- analyses in ecology and evolutionary biology: a PRISMA extension. Biol Rev. 2021;96(5):1695–722.

  32. Hedges LV. Estimation of effect size from a series of independent experiments. Psychol Bull. 1982;92(2):490–9.

  33. Hedges LV, Gurevitch J, Curtis PS. The meta-analysis of response ratios in experimental ecology. Ecology. 1999;80(4):1150–6.

    Article  Google Scholar 

  34. Nakagawa S, Cuthill IC. Effect size, confidence interval and statistical significance: a practical guide for biologists. Biol Rev. 2007;82(4):591–605.

    Article  PubMed  Google Scholar 

  35. Wood JL, Yates MC, Fraser DJ. Are heritability and selection related to population size in nature? Meta-analysis and conservation implications. Evol Appl. 2016;9(5):640–57.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Murren CJ, Maclean HJ, Diamond SE, Steiner UK, Heskel MA, Handelsman CA, et al. Evolutionary change in continuous reaction norms. Am Nat. 2014;183(4):453–67.

    Article  PubMed  Google Scholar 

  37. Caruso CM, Eisen KE, Martin RA, Sletvold N. A meta-analysis of the agents of selection on floral traits. Evolution. 2019;73(1):4–14.

    Article  PubMed  Google Scholar 

  38. Yates MC, Fraser DJ. Does source population size affect performance in new environments? Evol Appl. 2014;7(8):871–82.

    Article  PubMed  PubMed Central  Google Scholar 

  39. Barrientos R. Adult sex-ratio distortion in the native European polecat is related to the expansion of the invasive American mink. Biol Conserv. 2015;186:28–34.

    Article  Google Scholar 

  40. Wehi P, Nakagawa S, Trewick S, Morgan-Richards M. Does predation result in adult sex ratio skew in a sexually dimorphic insect genus? J Evol Biol. 2011;24(11):2321–8.

    Article  CAS  PubMed  Google Scholar 

  41. Koricheva J, Gurevitch J. Uses and misuses of meta-analysis in plant ecology. J Ecol. 2014;102(4):828–44.

    Article  Google Scholar 

  42. Nakagawa S, Santos ES. Methodological issues and advances in biological meta-analysis. Evol Ecol. 2012;26(5):1253–74.

    Article  Google Scholar 

  43. Viechtbauer W. Bias and efficiency of meta-analytic variance estimators in the random-effects model. J Educ Behav Stat. 2005;30(3):261–93.

    Article  Google Scholar 

  44. Viechtbauer W. Conducting meta-analyses in R with the metafor package. J Stat Softw. 2010;36(3):1–48.

    Article  Google Scholar 

  45. Nakagawa S, Lagisz M, Jennions MD, Koricheva J, Noble D, Parker TH, et al. Methods for testing publication bias in ecological and evolutionarymeta-analyses. Methods Ecol Evol. 2022;13(1):4–21.

    Article  Google Scholar 

  46. Noble DW, Lagisz M, O'Dea RE, Nakagawa S. Nonindependence and sensitivity analyses in ecological and evolutionary meta-analyses. Mol Ecol. 2017;26(9):2410–25.

    Article  PubMed  Google Scholar 

  47. Senior AM, Grueber CE, Kamiya T, Lagisz M. O'dwyer K, Santos ES, Nakagawa S: heterogeneity in ecological and evolutionary meta-analyses: its magnitude and implications. Ecology. 2016;97(12):3293–9.

    Article  PubMed  Google Scholar 

  48. Senior AM, Viechtbauer W, Nakagawa S. Revisiting and expanding the meta-analysis of variation: the log coefficient of variation ratio, lnCVR. Res Synth Methods. 2020;11(4):553–67.

    Article  PubMed  Google Scholar 

  49. Stanley TD, Doucouliagos H, Ioannidis JP. Finding the power to reduce publication bias. Stat Med. 2017;36(10):1580–98.

    CAS  PubMed  Google Scholar 

  50. Schielzeth H. Simple means to improve the interpretability of regression coefficients. Methods Ecol Evol. 2010;1(2):103–13.

    Article  Google Scholar 

  51. Fanelli D, Costas R, Ioannidis JP. Meta-assessment of bias in science. Proc Natl Acad Sci. 2017;114(14):3714–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Nakagawa S, Samarasinghe G, Haddaway NR, Westgate MJ, O’Dea RE, Noble DW, et al. Research weaving: visualizing the future of research synthesis. Trends Ecol Evol. 2019;34(3):224–38.

    Article  PubMed  Google Scholar 

  53. Leone F, Nelson L, Nottingham R. The folded normal distribution. Technometrics. 1961;3(4):543–50.

    Article  Google Scholar 

  54. Nakagawa S, Lagisz M. Visualizing unbiased and biased unweighted meta-analyses. J Evol Biol. 2016;29(10):1914–6.

    Article  CAS  PubMed  Google Scholar 

  55. Morrissey MB. Meta-analysis of magnitudes, differences and variation in evolutionary parameters. J Evol Biol. 2016;29(10):1882–904.

    Article  CAS  PubMed  Google Scholar 

  56. Lemoine NP, Hoffman A, Felton AJ, Baur L, Chaves F, Gray J, et al. Underappreciated problems of low replication in ecological field studies. Ecology. 2016;97(10):2554–61.

    Article  PubMed  Google Scholar 

  57. Cohn LD, Becker BJ. How meta-analysis increases statistical power. Psychol Methods. 2003;8(3):243–53.

    Article  PubMed  Google Scholar 

  58. Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4. J Stat Softw. 2014;67(1):1–46.

    Google Scholar 

  59. Stanley TD, Doucouliagos H. Meta-regression approximations to reduce publication selection bias. Res Synth Methods. 2014;5(1):60–78.

    Article  CAS  PubMed  Google Scholar 

  60. Stanley TD. Limitations of PET-PEESE and other meta-analysis methods. Soc Psychol Personal Sci. 2017;8(5):581–91.

    Article  Google Scholar 

  61. Wickham H, Chang W, Wickham MH. ggplot2: elegant graphics for data analysis. New York: Springer; 2016.

    Book  Google Scholar 

  62. Nakagawa S, Lagisz M, O'Dea RE, Rutkowska J, Yang Y, Noble DW, et al. The orchard plot: cultivating a forest plot for use in ecology, evolution, and beyond. Res Synth Methods. 2021;12(1):4–12.

    Article  PubMed  Google Scholar 

  63. Ioannidis JP, Trikalinos TA. The appropriateness of asymmetry tests for publication bias in meta-analyses: a large survey. Cmaj. 2007;176(8):1091–6.

    Article  PubMed  PubMed Central  Google Scholar 

  64. Van Aert RC, Wicherts JM, Van Assen MA. Publication bias examined in meta-analyses from psychology and medicine: a meta-meta-analysis. PLoS One. 2019;14(4):e0215052.

    Article  PubMed  PubMed Central  Google Scholar 

  65. Ferguson CJ, Brannick MT. Publication bias in psychological science: prevalence, methods for identifying and controlling, and implications for the use of meta-analyses. Psychol Methods. 2012;17(1):120–8.

  66. Ioannidis JP. Why most discovered true associations are inflated. Epidemiology. 2008:640–8.

  67. Kühberger A, Fritz A, Scherndl T. Publication bias in psychology: a diagnosis based on the correlation between effect size and sample size. PLoS One. 2014;9(9):e105825.

    Article  PubMed  PubMed Central  Google Scholar 

  68. Doucouliagos C, Stanley TD. Are all economic facts greatly exaggerated? Theory competition and selectivity. J Econ Surv. 2013;27(2):316–39.

    Article  Google Scholar 

  69. Franco A, Malhotra N, Simonovits G. Underreporting in psychology experiments: evidence from a study registry. Soc Psychol Personal Sci. 2016;7(1):8–12.

    Article  Google Scholar 

  70. Bartoš F, Maier M, Wagenmakers E-J, Nippold F, Doucouliagos H, Ioannidis J, et al. Footprint of publication selection bias on meta-analyses in medicine, economics, and psychology. arXiv preprint arXiv. 2022:220812334.

  71. Yang Y, Nakagawa S, Lagisz M. Decline effects are rare in ecology: comment. EcoEvoRxiv. 2022:1032942/osfio/qc7bx.

  72. Costello L, Fox JW. Decline effects are rare in ecology. Ecology. 2022:e3680.

  73. Sanchez-Tojar A, Nakagawa S, Sanchez-Fortun M, Martin DA, Ramani S, Girndt A, et al. Meta-analysis challenges a textbook example of status signalling and demonstrates publication bias. Elife. 2018;7:e37385.

    Article  PubMed  PubMed Central  Google Scholar 

  74. Van Klink R, Bowler DE, Gongalsky KB, Swengel AB, Gentile A, Chase JM. Meta-analysis reveals declines in terrestrial but increases in freshwater insect abundances. Science. 2020;368(6489):417–20.

    Article  PubMed  Google Scholar 

  75. Clements JC, Sundin J, Clark TD, Jutfelt F. Meta-analysis reveals an extreme “decline effect” in the impacts of ocean acidification on fish behavior. PLoS Biol. 2022;20(2):e3001511.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Fanshawe TR, Shaw LF, Spence GT. A large-scale assessment of temporal trends in meta-analyses using systematic review reports from the Cochrane library. Res Synth Methods. 2017;8(4):404–15.

    Article  PubMed  Google Scholar 

  77. Pietschnig J, Siegel M, Eder JSN, Gittler G. Effect declines are systematic, strong, and ubiquitous: a meta-meta-analysis of the decline effect in intelligence research. Front Psychol. 2019:2874.

  78. Sladekova M, Webb LE, Field AP. Estimating the change in meta-analytic effect size estimates after the application of publication bias adjustment methods. Psychol Methods. 2022.

  79. Kvarven A, Strømland E, Johannesson M. Comparing meta-analyses and preregistered multiple-laboratory replication projects. Nat Hum Behav. 2020;4(4):423–34.

    Article  PubMed  Google Scholar 

  80. Bartoš F, Maier M, Shanks D, Stanley T, Sladekova M, Wagenmakers E-J. Meta-analyses in psychology often overestimate evidence for and size of effects; 2022.

    Book  Google Scholar 

  81. Yang Y, Hillebrand H, Lagisz M, Cleasby I, Nakagawa S. Low statistical power and overestimated anthropogenic impacts, exacerbated by publication bias, dominate field studies in global change biology. Glob Chang Biol. 2022;28(3):969–89.

    Article  CAS  PubMed  Google Scholar 

  82. Lamberink HJ, Otte WM, Sinke MR, Lakens D, Glasziou PP, Tijdink JK, et al. Statistical power of clinical trials increased while effect size remained stable: an empirical analysis of 136,212 clinical trials between 1975 and 2014. J Clin Epidemiol. 2018;102:123–8.

    Article  PubMed  Google Scholar 

  83. Ioannidis JP, Stanley TD, Doucouliagos H. The power of bias in economics research. Econ J. 2017;127(605):F236–65.

  84. van Zwet EW, Cator EA. The significance filter, the winner’s curse and the need to shrink. Statistica Neerlandica. 2021;75(4):437–52.

  85. Berner D, Amrhein V. Why and how we should join the shift from significance testing to estimation. J Evol Biol. 2022;35(6):777–87.

  86. Amrhein V, Greenland S, McShane B. Scientists rise up against statistical significance. Nature 2019;567(7748):305–7.

  87. Coles NA, Hamlin JK, Sullivan LL, Parker TH, Altschul D. Building up big-team science. Nature. 2022;601(7894):505–7.

  88. Harpole WS, Sullivan LL, Lind EM, Firn J, Adler PB, Borer ET, et al. Addition of multiple limiting resources reduces grassland diversity. Nature. 2016;537(7618):93–6.

    Article  CAS  PubMed  Google Scholar 

  89. Crossley MS, Meier AR, Baldwin EM, Berry LL, Crenshaw LC, Hartman GL, et al. No net insect abundance and diversity declines across US long term ecological research sites. Nat Ecol Evol. 2020;4(10):1368–76.

    Article  PubMed  Google Scholar 

  90. Wu PP-Y, Mengersen K, McMahon K, Kendrick GA, Chartrand K, York PH, et al. Timing anthropogenic stressors to mitigate their impact on marine ecosystem resilience. Nat Commun. 2017;8(1):1–11.

    Article  Google Scholar 

  91. Bakbergenuly I, Hoaglin DC, Kulinskaya E. Estimation in meta-analyses of response ratios. BMC Med Res Methodol. 2020;20(1):1–24.

    Article  Google Scholar 

  92. Nakagawa S, Poulin R, Mengersen K, Reinhold K, Engqvist L, Lagisz M, et al. Meta-analysis of variation: ecological and evolutionary applications and beyond. Methods Ecol Evol. 2015;6(2):143–52.

    Article  Google Scholar 

  93. Fidler F, Chee YE, Wintle BC, Burgman MA, McCarthy MA, Gordon A. Metaresearch for evaluating reproducibility in ecology and evolution. BioScience. 2017;67(3):282–9.

    PubMed  PubMed Central  Google Scholar 

  94. Gallagher RV, Falster DS, Maitner BS, Salguero-Gómez R, Vandvik V, Pearse WD, et al. Open Science principles for accelerating trait-based science across the tree of life. Nat Ecol Evol. 2020;4(3):294–303.

    Article  PubMed  Google Scholar 

  95. Parker T, Fraser H, Nakagawa S. Making conservation science more reliable with preregistration and registered reports. Conserv Biol. 2019;33(4):747–50.

    Article  PubMed  Google Scholar 

  96. Parr CS, Cummings MP. Data sharing in ecology and evolution. Trends Ecol Evol. 2005;20(7):362–3.

    Article  PubMed  Google Scholar 

  97. Culina A, van den Berg I, Evans S, Sánchez-Tójar A. Low availability of code in ecology: a call for urgent action. PLoS Biol. 2020;18(7):e3000763.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgement

We thank Valentin Amrhein for his comments on this manuscript. We thank the Faculty of Science and the office of Deputy Vice-chancellor of Research, UNSW, Sydney for the support to YY and SN. YY was funded by the National Natural Science Foundation of China (NO. 32102597). SN, YY, and ML were funded by the Australian Research Council Discovery Grant (DP210100812). DN was supported by an ARC Discovery Grant (DP210101152).

Author information

Authors and Affiliations

Authors

Contributions

YY: conceptualising the paper, collecting the data, analysing the data, and drafting the manuscript. AST: collecting the data, commenting, and editing the manuscript. REO: collecting the data, analysing the data, commenting, and editing the manuscript. DWAN: collecting the data, commenting, and editing the manuscript. JK: collecting the data, commenting, and editing the manuscript. MDJ: collecting the data, commenting, and editing the manuscript. THP: collecting the data, commenting, and editing the manuscript. ML: visualising, collecting the data, commenting, editing the manuscript, and supervising the project. SN: conceptualising the paper, collecting the data, analysing the data, commenting, editing the manuscript, and supervising the project. The authors read and approved the final manuscript.

Corresponding authors

Correspondence to Yefeng Yang or Shinichi Nakagawa.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Supporting Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, Y., Sánchez-Tójar, A., O’Dea, R.E. et al. Publication bias impacts on effect size, statistical power, and magnitude (Type M) and sign (Type S) errors in ecology and evolutionary biology. BMC Biol 21, 71 (2023). https://doi.org/10.1186/s12915-022-01485-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12915-022-01485-y

Keywords