- Registered Report
- Open access
- Published:

# Publication bias impacts on effect size, statistical power, and magnitude (Type M) and sign (Type S) errors in ecology and evolutionary biology

*BMC Biology*
**volume 21**, Article number: 71 (2023)

## Abstract

Collaborative efforts to directly replicate empirical studies in the medical and social sciences have revealed alarmingly low rates of replicability, a phenomenon dubbed the ‘replication crisis’. Poor replicability has spurred cultural changes targeted at improving reliability in these disciplines. Given the absence of equivalent replication projects in ecology and evolutionary biology, two inter-related indicators offer the opportunity to retrospectively assess replicability: publication bias and statistical power. This registered report assesses the prevalence and severity of small-study (i.e., smaller studies reporting larger effect sizes) and decline effects (i.e., effect sizes decreasing over time) across ecology and evolutionary biology using 87 meta-analyses comprising 4,250 primary studies and 17,638 effect sizes. Further, we estimate how publication bias might distort the estimation of effect sizes, statistical power, and errors in magnitude (Type M or exaggeration ratio) and sign (Type S). We show strong evidence for the pervasiveness of both small-study and decline effects in ecology and evolution. There was widespread prevalence of publication bias that resulted in meta-analytic means being over-estimated by (at least) 0.12 standard deviations. The prevalence of publication bias distorted confidence in meta-analytic results, with 66% of initially statistically significant meta-analytic means becoming non-significant after correcting for publication bias. Ecological and evolutionary studies consistently had low statistical power (15%) with a 4-fold exaggeration of effects on average (Type M error rates = 4.4). Notably, publication bias reduced power from 23% to 15% and increased type M error rates from 2.7 to 4.4 because it creates a non-random sample of effect size evidence. The sign errors of effect sizes (Type S error) increased from 5% to 8% because of publication bias. Our research provides clear evidence that many published ecological and evolutionary findings are inflated. Our results highlight the importance of designing high-power empirical studies (e.g., via collaborative team science), promoting and encouraging replication studies, testing and correcting for publication bias in meta-analyses, and adopting open and transparent research practices, such as (pre)registration, data- and code-sharing, and transparent reporting.

## Introduction

Replicable prior findings are the foundation of cumulative scientific research. However, large-scale collaborative attempts to repeat studies have demonstrated that prior findings often fail to replicate in the medical and social sciences [1,2,3]. This raises concerns about the reliability of previously published studies (often referred to as the ‘replication crisis’ [4]). A similar issue of low replicability is likely to occur in ecology and evolutionary biology [5] (see also [6]). Yet, systematic assessments of replicability in this field are exceedingly rare [6, 7] perhaps because of the absence of strong incentives towards conducting replication studies [7, 8], and for logistical reasons (e.g. difficulties of conducting studies of rare species or remote ecosystems [9, 10]).

There are, however, two inter-related indicators that can be used to retrospectively gauge replicability in ecology and evolutionary biology: publication bias and statistical power. Publication bias and low statistical power increase the occurrence of unreliable effect size estimates that cannot be replicated. Publication bias commonly occurs when studies with statistically significant results are published more frequently than those with statistically non-significant findings (also referred to as ‘file-drawer problem’ [11]) or are published more quickly [12, 13]. More rapid publication of statistically significant results can also lead to a decline in reported effects over time (‘decline effect’ [12, 13]). When statistically significant effects are preferentially published, smaller studies will tend to report larger effect sizes (known as ‘small-study effects’ [14]). Statistical power, by definition, is the likelihood of identifying a ‘true effect’ when it is present. It is often used as a proxy of ‘replicability probability’ (but see [15]), as studies with high statistical power are more likely to yield findings that can be replicated by other researchers compared to studies with low statistical power [16,17,18].

Several meta-research studies in ecology and evolutionary biology have investigated the prevalence of publication biases and low statistical power. Jennions and Moller [12] reported a statistically significant decline effect in a survey of 44 ecology and evolutionary biology meta-analyses that had been published in 2002. Using 52 meta-analyses published before 2000, Barto and Rillig [19] reached a similar conclusion. In a cumulative meta-analysis, Crystal-Ornelas and Lockwood [20] also identified a statistically significant decline in the magnitude of the effect of invasive species on species richness, using 240 papers published between 1999 and 2016. In their work, this decline effect was present consistently regardless of taxonomic groups, invasion time, or journal quality. Twenty years ago, statistical power in 10 ecology, evolution, and behaviour journals was estimated at 13–16% for small effects and 40–47% for medium effects (where small effects are *r* = 0.1 and medium effects are *r* = 0.3; *sensu* Cohen [21]) [22]. Even lower statistical power was estimated for the journal *Animal Behaviour* in 1996, 2003, and 2009 (7–8% and 23–26% to detect Cohen’s small and medium effect sizes, respectively [23]).

Despite earlier efforts in ecology and evolutionary biology [24], the field still lacks a systematic overview of the extent to which different forms of publication bias would distort the estimation of true effects. Further, no studies have evaluated how such distorted effect sizes prevent us from correctly estimating statistical power. The statistical power of a given study depends on sample size and the estimate of corresponding ‘true’ effect size (e.g. a larger effect size leads to a higher power; see Fig. 1A). Therefore, to avoid overestimating the statistical power of a given study, an unbiased proxy of the ‘true’ effect size should be used. Contrastingly, previous attempts in ecology and evolution often used Cohen’s benchmarks to quantify statistical power for a given study [22, 23]. Yet, these benchmarks were derived from Cohen’s qualitative intuitions for studies in the social sciences rather than a quantitative synthesis of the representative literature [25]. Cohen’s benchmarks are arbitrary, and not necessarily applicable to ecological and evolutionary studies. As with exemplar studies in other fields [16], ‘true’ effects can be estimated via meta-analytic approaches and preferably corrected for potential publication bias [26, 27]. Using publication bias-corrected effect size estimates as ‘true’ effects would, more accurately, quantify statistical power as well as the two related, yet underappreciated, statistical errors: Type M and S errors (Fig. 1B and C; [28]). Type M error, also known as exaggeration ratio (magnitude error), represents the ratio between an estimated effect and a ‘true’ effect, whereas Type S error represents the probability of attaining statistical significance in the direction opposite to the true effect [29]. No study has yet quantified these two quantities systematically across the field of ecology and evolutionary biology.

Here, we capitalise on the rapid growth of ecological and evolutionary meta-analyses to systematically assess the extent to which patterns consistent with publication biases are common across the fields of ecology and evolutionary biology, and, if attributed to actual publication bias, their impacts on estimates of effect size, statistical power, and Type M and S errors [30]. First, we test for the presence and severity of two indices of publication bias (i.e. small-study effects and decline effects) at two levels: (i) the within-meta-analysis level using a newly proposed multilevel meta-regression method and (ii) the between-meta-analysis level using second-order meta-analyses (i.e. meta-meta-analyses). Second, we correct for these publication biases and quantify the degree of decline in bias-corrected effect-size magnitude. Finally, we use uncorrected and bias-corrected mean effect sizes as proxies of the ‘true’ effect to assess statistical power, Type M and S errors in ecology and evolutionary biology both at the primary study (effect-size) and the synthesis (meta-analysis) level.

## Methods

Before submission of stage 1 of this registered report, we finished collection (‘Data collection’ section), retrieval, and cleaning (‘Data retrieval and cleaning’ section) of data from a pre-existing dataset [31]. After this stage 1 registered report was accepted, we commenced the statistical analysis process (‘Statistical analysis’ section).

### Database

#### Data retrieval and cleaning

By checking the main text, supplementary materials, and/or online data repositories (e.g. Dryad, GitHub, Open Science Framework) of the 102 meta-analytic papers, and emailing corresponding authors, if necessary, we were able to include 80 papers that reported essential information for our statistical analyses. These 80 papers contained 108 independent meta-analyses. Among these 107, 36 meta-analyses used standardised mean difference (SMD) which includes some well-known estimators such as Hedges’ *g* or Cohen’s *d* [32]; 20 of these meta-analyses provided raw data (i.e. descriptive statistics: mean, standard error or deviation, and sample size) whereas the remaining 16 cases provided only effect sizes and variance. Twenty meta-analyses used the log response ratio (lnRR [33]; also known as the ratio of means, ROM): 10 cases with raw data, and 10 cases without raw data. Thirty-one cases used the correlation coefficient or its Fisher’s transformation, *Zr* (given that the variance of *Zr* and sample size is convertible, all cases of *Zr* were with raw data). All correlation coefficients were converted to *Zr* to better approximate normal errors [34]. The remaining 20 meta-analyses used other effect size metrics, such as heritability (*h*^{2} [35]), regression slope (e.g. reaction norm or selection gradient [36, 37]), 2-by-2 binary data (e.g. log odds and risk ratios [38]), raw mean difference [39], and non-standard metrics (proportion [40]).

We decided to only include meta-analytic cases using SMD, lnRR, and *Zr* in our datasets because, in addition to being the most commonly used effect sizes in ecology and evolutionary biology [41, 42], they share statistical properties necessary to fit a formal meta-analytic model: (i) they are ‘unit-less’, which allows comparisons of studies originally using different units, (ii) they are (asymptotically) normally distributed, and (iii) they have readily computable (unbiased) sampling variance [34]. To keep our datasets independent, we only used the effect sizes in their original forms, although data augmentations (e.g. conversions between *Zr* to SMD) could maximise the statistical power of the following statistical analyses by maximising the number of sample sizes per dataset (in this case, the number of effect sizes). Therefore, our final three datasets consisted of (1) 36 meta-analytic cases of SMD, (2) 20 cases of lnRR, and (3) 31 cases of *Zr* (Fig. 2). For each primary study included in the final dataset, we retrieved four key variables: (i) effect sizes reported (i.e. SMD, lnRR, or *Zr*), (ii) standard errors (or sampling variances) of each effect size (to test for small-study effects), (iii) sample sizes per condition where possible (i.e. experimental group *versus* control group for SMD and lnRR); sample sizes are used to create a predictor to test and correct for small-study effects (i.e. ‘effective sample size’; see the ‘Second-order meta-analysis’ section for details), and (iv) publication year (to test for a decline effect).

### Statistical analysis

#### Data collection

We used a recent meta-analytic database that had been collected to evaluate the reporting quality of systematic reviews and meta-analyses published in ecology and evolutionary biology [31]. The inclusion and screening criteria identified meta-analyses that were broadly representative of meta-analyses published in ecology and evolutionary biology journals from 2010-2019. In brief, the database creators compiled a list of ‘Ecology’ and/or ‘Evolutionary Biology’ journals via the categories of the ISI InCites Journal Citation Reports®. Within the included journals, they searched Scopus using the string ‘meta-analy*’ OR ‘metaanaly*’ OR ‘meta-regression’. They restricted the search to articles published from January 2010 to 25 March 2019. Search results were then filtered to the 31 journals most frequently publishing meta-analyses. By taking a random sample of studies within each journal, a total of 297 papers was returned. After screening (search records, and inclusion and screening criteria are available at [31]), the database included a representative sample of 102 ecological or evolutionary meta-analyses.

#### Multilevel meta-analytic modelling

We used multilevel meta-analytic approaches to (i) estimate the meta-analytic overall mean (i.e. uncorrected effect size estimates), (ii) detect potential publication bias (i.e. test small-study and decline effects), and (iii) correct for publication bias for each meta-analysis included in our datasets (Fig. 2).

### Estimating uncorrected effect sizes

To obtain uncorrected effect sizes for each meta-analysis (i.e. within-meta-analysis level), we fitted intercept-only multilevel meta-analytic models with SMD, lnRR, and *Zr* as our response variables, as in Equation 1 [42]. Equation 1 can account for dependent data by modelling both between-study variance (heterogeneity) and within-study variance (residual). It was written as:

where *ES*_{ji} is the extracted effect size, either SMD, lnRR, or *Zr*; *β*_{0[overall]} is the intercept, representing the estimate of overall effect (i.e. meta-analytic estimate of effect size); *s*_{j} = the study-specific (between-study) effect of study *j*; *o*_{ji} = the observation-level (within-study) effect for the effect size *i* (used to account for residual heterogeneity); *m*_{ji} = the measurement (sampling) error effect for the effect size *i*. Between- and within-study effects are normally distributed with mean 0 and variance, *σ*^{2} (i.e. \(\mathcal{N}\left(0,{\sigma}^2\right)\)). In Equation 1, effect size (*ES*_{ji}) and sampling variance (*m*_{ji}) can be calculated from the meta-analytic data. Using the restricted maximum likelihood (REML) method, we can obtain (approximately) unbiased estimates of variance parameters *σ*^{2} for between- and within-study effects (*s*_{j} and *o*_{ji}) [43]. With the REML estimate of *σ*^{2}, we can obtain the maximum likelihood estimate of the model coefficients (i.e. *β*_{0[overall]}). These estimated model coefficients represent the (uncorrected) overall meta-analytic means for SMD, lnRR, or *Zr*. The model fitting was implemented via the (*rma.mv*) function from the *metafor* R package (version 3.4-0) [44].

### Detecting publication bias

To test for patterns consistent with publication bias within each meta-analysis, we used a multi-moderator multilevel meta-regression model (an extended Egger’s regression; cf. [45]). This approach deals with two common issues in ecological and evolutionary datasets: (i) using a multilevel model to control for data dependency [46], and (ii) using a regression method with multiple moderators to account for between-study heterogeneity [47]. We adopted this approach to test the presence of small-study and decline effects, respectively. This was written as:

where *β*_{0[bias − corrected]} is the intercept, representing bias-corrected overall effect/meta-analytic estimate of effect size (see more details below); *error*_{i} is the uncertainty index of effect size (i.e. sampling error of effect size, *se*_{i}), and *β*_{1[small − study]} is the corresponding slope and an indicator of small-study effects; *year*_{i} is the publication year, *year*_{latest} is the latest year of published papers, and *β*_{2[time − lag]} is the corresponding slope and an indicator of decline effect (i.e. time-lag bias).

When assuming there is no small-study effect (i.e. *error*_{i} = 0) and decline effect (i.e. *year*_{i} − *year*_{latest} = 0), the intercept *β*_{0[bias-corrected]} in Equation 2 becomes a conditional estimate that can be interpreted as the bias-corrected overall effect (i.e. the estimate of ‘true’ effect which is distinct from the unconditional estimate of *β*_{0[overall]} in Equation 1). We centred the ‘year’ variable by subtracting each year (*year*_{i} ) from the latest *year*_{latest} to set the latest year as the intercept, *β*_{0[bias − corrected]}. This process allowed the estimate of true effect (i.e. *β*_{0[bias − corrected]} in Equation 2) to be conditional on *year*_{i} = *year*_{latest} so that *β*_{0} was least affected by a decline effect if it existed. Further, we used a sampling error equivalent \(\sqrt{1/\tilde{n}_i}=\sqrt{\left({n}_e+{n}_c\right)/{n}_e{n}_c}\)) to replace *se*_{i} when fitting SMD and lnRR where possible (\(4\tilde{n}_i\) is referred to as an effective sample; *n*_{e} is the sample size of the experimental group, *n*_{c} is the sample size of the control group [45]). This can correct for the ‘artefactual’ correlation between *ES*_{ji} and *error*_{i} as the point estimate of SMD and lnRR are inherently correlated with their sampling variances (see Table 3 in [34], and Equation 10 in [48]).

A small-study effect is statistically detected if Equation 2 has a statistically significant *β*_{1[small − study]} (i.e. *p*-value < 0.05). Similarly, the decline effect (i.e. time-lag bias) is indicated by a statistically significant *β*_{2[time − lag]}. Depending on the specific phenomenon tested, *β*_{1[small − study]} and *β*_{2[time − lag]} might be expected to be positive or negative when publication bias exists. For example, for an effect that is expected to be positive, a small-study effect and decline effect would be expressed in a positive value of *β*_{1[small − study]} (i.e. small-size non-statistically significant effects and small-size statistically significant negative effects are underrepresented)) and negative value of *β*_{2[time − lag]} (i.e. overall effect size declines over time), respectively. In such a case, a slope (*β*_{1[small − study]} or *β*_{2[time − lag]}) with opposing direction (unexpected sign) indicates no detectable publication bias and subsequently does not require correction for such a bias. The magnitude of the slope represents the severity of the small-study effect or decline effect. Therefore, using Equation 2, we were able to detect the existence of publication bias and identify its severity for each meta-analysis and each effect size statistic.

### Correcting overall estimates for publication bias

To avoid the biased estimate of *β*_{0[bias − corrected]}, we fitted Equation 3 when detecting a statistically significant *β*_{0[bias − corrected]} in Equation 2. Equation 3 was written as:

In contrast to Equation 2, Equation 3 used a quadratic term of uncertainty index (i.e. sampling variance *v*_{i} or \(1/\tilde{n}_i\)) to alleviate the downward bias of an effect size estimate (for explanations see [45, 49]). Theoretically, this procedure provided an easy-to-implement method to correct for publication bias for each meta-analysis (i.e. the conditional estimate of intercept in Equation 3). In practice, however, there were two different types of *β*_{0[bias − corrected]} estimates to consider. This is because high heterogeneity [47] can lead the signs of the slopes (*β*_{1[small − study]} and *β*_{2[time − lag]}) to be opposite from that expected from publication bias [45]. We would subsequently misestimate *β*_{0[bias − corrected]} if slopes with unexpected signs are included in Equations 2 and 3.

Depending on the signs of the slopes (*β*_{1[small − study]} and *β*_{2[time − lag]}), there were two types of estimated *β*_{0[bias − corrected]}. We used a decision tree (Fig. 3) to obtain the estimate of each type of *β*_{0[bias − corrected]} for each meta-analytic case. The function of the decision tree was that, if the slopes (*β*_{1[small − study]} and *β*_{2[time − lag]}) had unexpected signs, we took out the corresponding slope-related term(s) from the full models to form reduced models (Equations 4 and 5) to better estimate *β*_{0}. The reduced models were written as Equations 4 and 5, respectively:

Specifically, the first type of estimate of *β*_{0[bias − corrected]} was obtained by fitting Equation 2 or 3 (termed as full models). That included all cases of *β*_{0[bias − corrected]} without consideration of the signs of *β*_{1} and *β*_{2} (i.e. conditional *β*_{0[bias − corrected]} estimated from the full model; see Fig. 3). The second type of estimate of *β*_{0[bias − corrected]} was obtained under the following four scenarios: (i) *β*_{0[bias − corrected]} estimated under expected signs of *β*_{1[small − study]} and *β*_{2[time − lag]} (i.e. conditional *β*_{0[bias − corrected]} estimated from the direction-controlled full model; see Fig. 3), which meant a co-occurrence of a small-study effect and a decline effect, (ii) *β*_{0[bias − corrected]} estimated under the expected sign of *β*_{1[small − study]} and the unexpected sign of *β*_{2[time − lag]}, which signalled the existence of a small-study effect but no decline effect (i.e. conditional *β*_{0[bias − corrected]} estimated from reduced model 1; see Equation 4 and Fig. 3), (iii) *β*_{0[bias − corrected]} estimated under the unexpected sign of *β*_{1} and the expected sign of *β*_{2}, which indicated the occurrence of a decline effect but no small-study effect (i.e. conditional *β*_{0[bias − corrected]} estimated from reduced model 2; see Equation 5 and Fig. 3), and (iv) *β*_{0[bias − corrected]} estimated under unexpected signs of *β*_{1[small − study]} and *β*_{2[time − lag]}, which suggested little concerns about a small-study effect or a decline effect.

#### Second-order meta-analysis

In this section, we statistically aggregated the above-mentioned regression coefficients (i.e. *β*_{0[bias − corrected]}, *β*_{1[small − study]} and *β*_{2[time − lag]}) to (i) reveal the patterns of potential publication bias across the fields of ecology and evolutionary biology, and (ii) quantify the extent to which publication bias might cause a reduction in effect-size magnitude across meta-analyses (Fig. 2).

### Estimating the overall extent and severity of publication bias

To allow for aggregations of *β*_{1[small − study]} (i.e. an indicator of small-study effect) and *β*_{2[time − lag]} (i.e. an indicator of decline effect) over different effect size metrics (i.e. SMD, lnRR, and *Zr*), we standardised coefficients to eliminate scale-dependency [50]. This was achieved by *z*-scaling (i.e. mean-centring and dividing by the standard deviation) *error*_{i}, *year*_{i} − *year*_{latest}, and standardising the response variable *ES*_{ji} by dividing by the standard deviation without mean-centring, prior to modelling, as given by Equation 6:

Equation 6 indicates that one standard deviation change in *error*_{i} and *year*_{i} − *year*_{latest} would change *ES*_{ji} by *η*_{1[small − effect]} and *η*_{2[time − lag]} standard deviations, respectively. Further, to interpret *β*_{0} as a bias-corrected overall effect, *β*_{0} was set conditional on *error*_{i} = 0 (i.e. without small-study effect) and *year*_{i} − *year*_{latest} = 0 (i.e. without decline effect). As such, we replaced *z*(*error*_{i}) by *z*(*error*_{i}) − *z*(*error*_{0}) and replace *z*(*year*_{i} − *year*_{latest}) by *z*(*year*_{i}) − *z*(*year*_{latest}), as shown in Equation 7:

where *z*(*error*_{0}) denotes the *z*-score when *error*_{i} = 0, which is equal to \(\frac{0-\textrm{mean}\left[{error}_i\right]}{\textrm{SD}\left[{error}_i\right]}\); *z*(*year*_{latest}) is the *z*-score when *year*_{i} is the latest year. Likewise, to obtain the best estimate of standardised bias-corrected effects, we introduced Equation 8 where a quadratic error term was used:

Therefore, fitting 8 created two datasets: (1) the full dataset containing *η*_{0[bias − corrected]}, *η*_{1[small − effect]} and *η*_{2[time − lag]} without consideration of their signs (standardised slopes of the first type estimate), and (2) the reduced dataset containing *η*_{0[bias − corrected]}, *η*_{1[small − effect]} and *η*_{2[time − lag]} with expected directions (standardised slopes of the second type estimate: scenarios 1–4, Fig. 3). We then conducted a series of second-order meta-analyses to statistically aggregate these standardised regression coefficients across meta-analyses [51, 52]. We employed a random-effects meta-analytic model with the inverse square of each coefficient’s standard error as weights to fit such second-order meta-analyses [44]. For both the full and reduced databases, we obtained a weighted average of the regression coefficient *η*_{1[small − effect]} (or *η*_{2[time − lag]}) to indicate the occurrence of small-study effects (or decline effects) across the fields of ecology and evolutionary biology. To compare the severity of publication bias between different types of effect size, we further incorporated effect-size types as a moderator (i.e. a fixed factor or predictor with three levels: SMD, lnRR, and *Zr*) in these random-effects models.

### Quantifying the reduction in effect-size magnitude after controlling for publication bias

Likewise, to quantify the differences between uncorrected effect sizes and their bias-corrected estimates for the different types of effect-size metrics, we required standardised estimates of these effect sizes to draw comparisons. The term *η*_{0[bias − corrected]} in the full dataset provided a standardised bias-corrected effect size (i.e. an intercept estimated using the full model, where all cases of *η*_{1[small − effect]} and *η*_{2[time − lag]} were included regardless of their directions). Also, *η*_{0[bias − corrected]} in the reduced dataset provided standardised bias-corrected effect sizes, which were obtained using expected directions of *η*_{1[small − effect]} and *η*_{2[time − lag]}. In contrast, the standardised uncorrected effect sizes were obtained via standardising *ES*_{ji} by dividing by standard deviation before fitting Equation 1 (that is, standardised intercept in the null model: *η*_{0[overall]}). We then used the absolute mean difference as a metric to quantify the reduction in effect-size magnitude following correction for publication bias, where the point estimate and sampling variance was written as:

where \({\upgamma}_{\textrm{corrected}-\textrm{effect}}^s\) and \({\upgamma}_{\textrm{uncorrected}-\textrm{effect}}^s\) are the values of standardised uncorrected effect size (standardised *η*_{0[overall]} in the null model) and its bias-corrected version (standardised *η*_{0[bias − corrected]} in the full or reduced models), respectively; \({\textrm{SE}}_{\upgamma_{\textrm{corrected}-\textrm{effect}}^s}\) and \({\textrm{SE}}_{\upgamma_{\textrm{uncorrected}-\textrm{effect}}^s}\) are associated standard errors; *r* is the correlation between standard errors (\({\textrm{SE}}_{\upgamma_{\textrm{corrected}-\textrm{effect}}^s}\)*vs.* and \({\textrm{SE}}_{\upgamma_{\textrm{uncorrected}-\textrm{effect}}^s}\)), which is assumed to be 1 because the two estimates should be strongly correlated.

Given that *D* is an absolute variable, it follows a ‘folded’ normal distribution because taking the absolute value will force probability density on its left side (*x*-axis < 0) to be folded to the right [53, 54]. The corresponding folded mean and variance could be derived from its ‘folded’ normal distribution as Equations 11 and 12:

where Φ is the standard normal cumulative distribution function (see more details in [53, 55]). Equations 9 to 12 enable us to calculate *D*_{f} and Var(*D*_{f}) for both full and reduced databases. We used a random-effects meta-analytic model ((*rma.uni*) function [44]) to synthesise these *D*_{f} with Var(*D*)_{f} as sampling variance across meta-analyses. Also, we incorporated effect size type as a moderator to compare the differences in effect size reduction between SMD, lnRR, and *Zr*.

#### Estimating statistical power, and type M and S errors

We assessed the statistical power and Type M and S errors in the primary studies with experimental effects that were approximated by uncorrected and bias-corrected effect sizes [27, 56]. Although meta-analyses can increase power over primary studies [57], they might still be underpowered to detect the true effect (i.e. *p*-value > 0.05 despite the existence of a true effect). Therefore, we also calculated the statistical power, Type M and S errors for each meta-analysis. To obtain average statistical power, and Type M and S errors at the primary study level, we used a linear mixed-effects model to aggregate over the estimates of power, and Type M and S errors from primary studies. We used the (*lmer*) function in the *lme4* R package (version 1.1-26) to fit these mixed-effects models [58], which incorporated the identity of the primary study as a random factor to account for between-study variation. Similarly, we used a weighted regression to aggregate meta-analysis level power, and Type M and S errors, with the number of effect sizes (*k*) within each meta-analysis as weights. We implemented the weighted regression via the *base R* function (version 4.0.3), (*lm*).

#### Deviations and additions

The Stage 2 of this registered report has three deviations from the Stage 1 protocol. First, in the ‘Correcting for overall estimates for publication bias’ section, the best estimate of the bias-corrected overall effect (i.e. model intercept *β*_{0[bias − corrected]}) was initially planned to be obtained by a two-step procedure where when a zero effect exists (i.e. statistically non-significant *β*_{0[bias − corrected]}), uncertainty index (i.e. sampling error *error*_{i} or \(\sqrt{1/\tilde{n}_i}\)) was used (Equation 2) to estimate *β*_{0[bias − corrected]}, while when a non-zero effect exists (i.e. statistically significant *β*_{0[bias − corrected]}), a quadratic term of uncertainty index (i.e. sampling variance *v*_{i} or \(1/\tilde{n}_i\)) was used (Equation 3) to estimate *β*_{0[bias − corrected]} [59, 60]. We decided to only use Equation 3 to estimate *β*_{0[bias − corrected]} because there is no need to estimate *β*_{0[bias − corrected]} when no statistically significant effect exists (Equation 2).

Second, in the ‘Estimating the overall extent and severity of publication bias’ section, we changed *z*-scaling (i.e. mean-centring and dividing by the standard deviation) response variable *ES*_{ji} prior to model fitting to standardising response variable *ES*_{ji} by dividing by the standard deviation without mean-centring. This is because centring the response variable would make estimating model intercept (*β*_{0[bias − corrected]}) unfeasible [50]. The same change was made in the ‘Quantifying the reduction in effect-size magnitude after controlling for publication biases’ section.

Third, we added a post hoc analysis where we removed the meta-analyses with statistically non-significant mean effects and subsequently calculated the average statistical power, Type M and S error rates. The reason why adding this post hoc analysis was that the underlying true effect sizes in some meta-analyses were likely to be so trivially small (and biologically insignificant) that corresponding power calculation was meaningless. In such a case, if we included those effects when estimating average power across meta-analyses in ecology and evolution, we would get a downwardly biased average power estimate. Note that relevant results were reported in Supplementary Material (Table S4).

## Results

### The pattern of small-study effects in ecology and evolutionary biology

#### Within-meta-analysis level

Of the 87 ecological and evolutionary meta-analyses tested, 15 (17%) meta-analyses showed evidence for small-study effects (i.e. statistically significant *β*_{1[small − study]}; see Fig. 4A), where smaller studies reported larger effect sizes. Importantly, *β*_{1[small − study]} from 54 (62%) meta-analyses were in the expected direction (Fig. 4A), indicating that these meta-analyses exhibited a (statistically non-significant) tendency for a small-study effect (note that the likelihood of a meta-analysis to show this tendency is 50% if there is no real effect).

#### Between-meta-analysis level

When conducting a second-order meta-analysis by aggregating the *β*_{1[small − study]} obtained from the 87 meta-analyses, there was a statistically significant pooled *β*_{1[small − study]} (grand mean *β*_{1[small − study]} = 0.084, 95% confidence intervals (CI) = 0.034 to 0.135, *p*-value = 0.001, *N* = 87; Fig. 5A). This provides statistical evidence for the existence of small-study effects across the meta-analyses. Furthermore, the heterogeneity among the *β*_{1[small − study]} estimates obtained from the 87 meta-analyses was low (\({\sigma}_{among- meta- analysis}^2\) = 0.0050; \({I}_{among- meta- analysis}^2\) = 10%), indicating that these results are highly generalizable. Three per cent of this heterogeneity could be explained by the types of effect sizes (SMD, lnRR, *Zr*) being meta-analyzed (\({R}_{marginal}^2\) = 0.031). The non-random pattern of the small-study effect was mainly driven by SMD (grand mean *β*_{1[small − study]} = 0.091, 95% CI = 0.018 to 0.165, *p*-value = 0.015, *N* = 36) and *Zr* (grand mean *β*_{1[small − study]} = 0.119, 95% CI = 0.026 to 0.212, *p*-value = 0.013, *N* = 20), but not lnRR (grand mean *β*_{1[small − study]} = 0.029, 95% CI = −0.072 to 0.130, *p*-value = 0.571, *N* = 31).

### The pattern of decline effects in ecology and evolutionary biology

#### Within-meta-analysis level

Out of the 87 ecological and evolutionary meta-analyses reviewed, 13 (15%) revealed evidence of a decline effect, where the effect sizes significantly decreased over time (Fig. 4B). Additionally, 54 (62%) of the meta-analyses showed a statistically non-significant decline in effect size over time.

#### Between-meta-analysis level

There was a statistically significant pooled *β*_{2[time − lag]} (grand mean *β*_{2[time − lag]} = −0.006, 95% CI = −0.009 to −0.002, *p*-value < 0.001; Fig. 6A) across 87 meta-analyses, providing statistical evidence for the existence of decline effects. The estimates of *β*_{2[time − lag]} were homogeneous across these meta-analyses, indicating high generalizability of the results, with a low relative heterogeneity (\({\sigma}_{among- meta- analysis}^2\) = 0.0001; \({I}_{among- meta- analysis}^2\) < 1%). Five per cent of that heterogeneity could be explained by the types of effect sizes (\({R}_{marginal}^2\) = 0.05); SMD and *Zr* exhibited a statistically significant pattern of decline effect (SMD: pooled *β*_{2[time − lag]} = −0.005, 95% CI = −0.010 to −0.001, *p*-value = 0.013, *N* = 36; *Zr*: pooled *β*_{2[time − bias]} = −0.008, 95% CI = −0.015 to −0.001, *p*-value = 0.023, *N* = 31; Fig. 6B), but lnRR did not (pooled *β*_{2[time − bias]} = −0.004, 95% CI = −0.010 to 0.003, *p*-value = 0.289, *N* = 20).

### The inflation of effect size estimates and distortion of meta-analytic evidence by publication bias

Among the 87 meta-analyses examined, the estimated absolute mean difference between the original (uncorrected) effect size (*β*_{0[overall]}) and its bias-corrected version (*β*_{0[bias − corrected]}) was statistically significant (pooled *D* = 0.225, 95% CI = 0.180 to 0.269, *p*-value < 0.001; Fig. S1A). An overestimation of 0.189, 0.195 and 0.333 standard deviation units were found in SMD, lnRR, and *Zr*, respectively (Fig. S1B). After back-transformation to the original scale, the publication bias led to an exaggeration of the estimates of SMD, lnRR, and *Zr* by an average of 0.217, 0.116 and 0.128 (Fig. 7), respectively. Additionally, after correcting for publication bias, 33 out of 50 initially statistically significant meta-analytic means became non-significant.

### Statistical power and type S and M error rates

#### Sampling level (primary studies)

Overall, primary studies or single experiments (i.e. at the sampling level) had a low statistical power of only 23% to detect the ‘true’ effect, as indicated by the original (uncorrected) meta-analytic estimate of effect sizes, *β*_{0[overall]}. This was found to be the case across the different types of effect sizes, with power of 19%, 24% and 28% for sampling level of SMD, lnRR, and *Zr*, respectively (see Fig. 8 and Table S1). When bias correction was applied, the overall power to detect the ‘true’ effect (*β*_{0[bias − corrected]}) decreased further to 15% (12%, 16%, and 18% for sampling level of SMD, lnRR, and *Zr*, respectively; see Fig. 8A and Table S1).

The primary studies infrequently showed incorrect estimation of the signs of the true effect sizes (overall Type S error = 5%; Fig. 9 and Table S2). For example, the primary studies (i.e. at sampling level) using lnRR and SMD had only 5% and 6% probabilities of having a direction that was opposite to the meta-analytic mean estimated as *β*_{0[overall]}. When correcting for publication bias the Type S error increased from 5% to 8%.

By contrast, the primary studies tended to exaggerate the magnitude of the meta-analytic mean estimated as *β*_{0[overall]}, due to the limitation of finite sample size (overall Type M error = 2.7; Fig. 10 and Table S3). For example, the magnitude of lnRR, SMD and *Zr* were overestimated by an average of 2.5, 3.5 and 2 times, respectively. When correcting for publication bias (*β*_{0[bias − corrected]}), the Type M errors increased to 4 (3.5 for lnRR, 6 for SMD and 3.4 for *Zr*).

#### Meta-analysis level

On average, at the level of individual meta-analyses, lnRR and *Zr* had statistical power that was at or above the nominal 80% level for detecting the true effects estimated as *β*_{0[bias − corrected]}. Specifically, the power was found to be 81% for both lnRR and *Zr* (Fig. 8 and Table S1). In contrast, the estimated power of SMD was only 41%, which falls short of the nominal 80% level. When detecting true effects indicated by *β*_{0[bias − corrected]}, the statistical power of each meta-analysis decreased further, with lnRR, SMD, and *Zr* decreasing to 63%, 25% and 51%, respectively.

Ecological and evolutionary meta-analyses had a relatively low probability of reporting an opposite sign to the true direction of both *β*_{0[overall]} and *β*_{0[bias − corrected]} (Type S = 5%–8%; Fig. 9 and Table S2). The meta-analyses were also able to considerably reduce the overestimation of the true effect size for lnRR (Type M = 1.1 for *β*_{0[overall]} and 1.3 for *β*_{0[bias − corrected]}; Fig. 10 and Table S3), SMD (Type M = 1.9 for *β*_{0[overall]} and 2.5 for *β*_{0[bias − corrected]}) and *Zr* (Type M = 1.1 for *β*_{0[overall]} and 1.6 for *β*_{0[bias − corrected]}).

## Discussion

We have conducted the first comprehensive investigation of the prevalence and severity of two common forms of publication bias, small-study and decline effects) in the fields of ecology and evolutionary biology using modern analytic techniques. Overall, we found strong support for small-study and decline effects (time-lag bias) with little heterogeneity across studies. The prevalence of such publication bias resulted in overestimating meta-analytic mean effect size estimates by at least 0.12 standard deviations and substantially distorted the ecological and evolutionary evidence. The statistical power of ecological and evolutionary studies and experiments was found to be consistently low at 15%. Ecological and evolutionary studies also showed a 4-fold overestimation of effects (Type M error = 4.4) and a low but nontrivial rate of misidentifying the sign of the effects (Type S error = 8%; error in the direction that leads to the opposite conclusion). To place these in perspective with the replication crisis [5, 6], we conclude that prior published findings in ecology and evolutionary biology, at least for the dataset used in this study (87 meta-analyses, 4250 primary studies, 17,638 effect sizes) are likely to have low replicability.

### The persistent and non-negligible publication bias in ecological and evolutionary meta-analyses

#### Small-study and decline effects are general phenomena

We have found that 17% of ecological and evolutionary meta-analyses show evidence for small-study effects (i.e. smaller studies reporting larger effect sizes). Medical researchers found a similar percentage of meta-analyses showing small-study effects (7–18%) in a survey of 6873 meta-analyses (the large sample is because medical research has a bigger pool of meta-analyses to draw from and because that study extracted a much narrower scope of data from each meta-analysis than did our study [7, 63]). Similarly, 13–25% of psychological meta-analyses presented evidence for small-study effects [64, 65]). These values may seem relatively small, but this is in part because, for a given meta-analysis, bias detection methods often lack sufficient statistical power to identify a small-study effect [45, 63, 66]. Indeed, simulations have shown that the power to detect a moderate small-study effect in a medical meta-analysis with 10 studies was as low as 21% [14].

Given the limited power to detect a small-study effect [14], it seems reasonable to focus on the sign and magnitude of the relationship between effect size and sampling error rather than on *p*-values (i.e. null-hypothesis significance testing). By doing so, we found that more than 60% of meta-analyses had a positive statistically non-significant relationship between the effect size and its sampling error, indicating that small studies (i.e. with large sampling error or small precision) tend to report larger effects (note that the likelihood of meta-analysis showing this tendency is 50% under the null hypothesis). We confirmed these results by employing a more powerful approach, i.e. a second-order meta-analysis or meta-meta-analysis, which showed a statistically significant positive estimate of the relationship between effect size and sampling error. This result is in line with recent investigations revealing an negative mean association of effect size and sample size in psychology and psychiatry meta-analyses [51, 67]. Moreover, our analysis also showed a small amount of heterogeneity among these 87 slopes. This positive and homogenous effect implies that small-study effects are commonplace in ecology and evolutionary biology. Similar conclusions were reached in investigations of economic and psychological meta-analyses: small-study effects are widespread phenomena [68,69,70].

We conclude that decline effects are also widespread in the field. More than 50% of ecological and evolutionary meta-analyses showed a negative relationship between effect size and their year of publication, indicating that effect sizes decrease over time. As mentioned above, the principal reason for failing to detect a decline effect in a single meta-analysis lies in the low statistical power of the available detection methods [13, 45, 71]. The observed power to determine a decline effect in the current set of 87 meta-analyses was low (median = 13%), which is similar to that observed in another much larger survey of 464 ecological meta-analyses (median = 17%; [71, 72]). Importantly, our second-order meta-analysis found a statistically significant and homogeneous effect (Fig. 6A), corroborating that decline effects are common in both sub-fields previously explored (status signalling [73], plant and insect biodiversity [20, 74] and ocean acidification [75]) and more generally in ecology and evolutionary biology [12, 71]. Evidence from other disciplines also reveals the pervasiveness of decline effects (medical and social sciences [51, 76, 77]).

#### The distorted meta-analytic estimate of effect sizes and evidence by publication bias

By combining the observed bias from both small-study and decline effects, we found evidence that magnitudes of effect sizes might have been overestimated by 0.217, 0.116 and 0.128 SDs of their original units for lnRR, SMD and *Zr*, respectively. A recent investigation of 433 psychological meta-analyses also showed a statistically significant, albeit small, decrease in meta-analytic estimates after correcting for publication bias [78]. A comparison of meta-analyses that were published without pre-registration versus registered reports (which are less prone to publication bias) has also shown that unregistered meta-analyses substantially overestimated effect sizes although bias-correction methods like the one used in this study can correct for difference in results between meta-analyses and registered reports [79]. In our dataset, correcting for publication bias led to 33 of 50 initially statistically significant meta-analytic estimates of the mean effect becoming non-significant, suggesting unmerited confidence in the outcomes of 66% of published ecological and evolutionary meta-analyses (when using a frequentist approach with a *p*-value of 0.05). Recent psychological investigations revealed a similar percentage (60%) of erroneous conclusions of meta-analytic evidence because of publication bias [80].

### Low statistical power and high type M error in ecological and evolutionary studies

#### Ecological and evolutionary studies lack power and are prone to type M error

Primary studies in ecology and evolutionary biology included in our sample of meta-analyses, on average, only had a power of 15% to detect the biased-corrected effect size identified in the meta-analysis, which is consistent with earlier findings in the sub-fields of global change biology [56, 81] and animal behaviour [10, 23]. When excluding studies with effects that are not statistically significant, the corresponding average power of primary studies was still very low (17%; Table S4). As a result, only studies with largely exaggerated effect sizes (4-fold) have reached statistical significance. Contrastingly, Type S error was small but not trivial (8%); note that making an error in direction can result in a completely opposite conclusion. A lack of statistical power seems to be a general phenomenon in scientific research, low power has been identified in many disciplines (medical sciences = 20% [82], neuroscience = 21% [16], psychological sciences = 36% [27], economics = 18% [83]). Given this, meta-analysis with appropriate bias correction is an important way to generate more reliable estimates of effect sizes [30]. Statistically speaking, meta-analysis is an effective way to approximate population-level estimates by combining sampling level estimates, despite its shortcomings, some of which were shown above. Science is a process of evidence accumulation in which primary studies are the basis that can be used to produce high-order and high-quality evidence (e.g. via systematic review and meta-analysis).

#### Publication bias aggravates the low power and high Type M error

Publication bias is expected to reveal lower power and higher Type M error rates because it creates a non-random sample of effect size evidence used in meta-analyses. We show that correcting for publication bias resulted in a decrease in statistical power from 23% to 15%, an increase in Type S error rate from 5% to 8%, and an increase Type M error rates from 2.7 to 4.4. Psychological and economic research also confirm that meta-analyses without bias adjustments overestimate the estimate of statistical power [27, 28]. The exaggeration of power and effect size is even more severe in ecological and evolutionary studies if no bias correction is made [5], providing further support for recent concerns about the likelihood of low replicability (‘the replication crisis’) in ecology and evolutionary biology [6, 10].

### Limitations

There are four limitations in the present registered report. First, when calculating statistical power to detect true effects in ecology and evolutionary studies, we used the meta-analytic mean effect size (and corresponding bias-corrected version) as the true effect for each primary study within the same meta-analysis. This means that we assumed that the multiple primary studies included in the same meta-analysis share a common true effect. However, the high heterogeneity in ecology and evolutionary meta-analyses indicates that each primary study may have a specific true effect size that is dependent on the research context (e.g. population, species, methodology, lab effects [47]). Therefore, using such context-dependent effects as the proxies of true effect is probably more reasonable [81]. Second, in the post hoc analysis, we used the statistical significance (*p*-value < 0.05) of the meta-analytic mean effect size as the threshold to decide whether the true effect in a meta-analysis is so tiny that can be biologically neglected and subsequently excluded to calculate average power. We acknowledge that this categorisation is arbitrary because the statistical significance does not represent biological significance [4]. In some fields, very small effects still have biological importance. Third, the meta-analytic effect size estimates after correcting for publication bias may still be overestimated or underestimated because the incomplete reporting of important moderators in meta-analyses prevented us from accurately correcting for publication bias using our regression-based method [42, 46]. Fourth, notably, in testing for publication bias at both the within- and between-meta-analysis levels, we used statistical significance at the 0.05 level as a criteria to determine if there was publication bias. We acknowledge that this process, which is commonly referred to as a "significance filter", is prone to exaggeration and might result in a so-called "winner's curse" [84,85,86]. To partially mitigate this issue, the percentage of both statistically significant and non-significant results was reported in Figs. 4, 5 and 6. Furthermore, to avoid drawing conclusions based solely on statistically significant results, downstream analyses were conducted to assess the extent to which publication bias distorted the estimates of effect size (as shown in Fig. 7) and the calculation of power and Type M/S error rates (as shown in Figs. 8, 9 and 10).

### Implications

#### How to properly test for publication bias and correct for its impacts?

Given the strong and widespread evidence of publication bias found in this study (and others), publication bias tests should be a standard part of meta-analyses. A recent survey showed that publication bias tests have become more widespread in ecology and evolution in recent years [45]; however, inappropriate bias detection methods still dominate the literature [45]. Generally, regression-based methods are more powerful than other methods such as correlation-based methods [14, 63]. The regression-based method in the multilevel model framework used in the current study can further handle non-independence and high heterogeneity, which are common in the field, to bring down the rate of false positives [45,46,47]. Importantly, the method used here provides an intuitive quantification of the severity of publication bias. For example, the pooled *β*_{1[small − study]} of *Zr* was larger than that of SMD (0.119 *vs.* 0.091), suggesting publication bias in *Zr* is more severe than in SMD. Regression-based methods to correct for publication bias have been shown to produce effect size estimates similar to those of registered reports [79]. We strongly recommend that meta-analysts employ the regression-based method used in the current paper to routinely test for the presence of publication bias, correct for its impact and, report the corrected effect sizes, allowing stakeholders to better judge how robust the reported effects are.

#### How to increase power and mitigate overestimation of effect for primary studies and meta-analyses?

For primary studies, a fundamental solution to increase statistical power and mitigate effect size overestimation is to increase sample sizes by building up more big-team science [87] or global-scale collaborative scientific networks such as Nutrient Network [88], US Long-Term Ecological Research network [89], and Zostera Experimental Network [90]. Our results confirm that lnRR is a more powerful effect size metric than SMD [81]. Power of meta-analyses using lnRR was almost twice as large as SMD (lnRR *vs.* SMD: 81% *vs.* 41%). Moreover, lnRR was less prone to exaggeration (lnRR *vs.* SMD: 1 *vs.* 2). Practically, we recommend using lnRR as the main effect size when conducting meta-analyses if the biological questions focused on mean differences (but see [91]), but conduct sensitivity analyses using SMD (see [81, 92] for comparisons of the pros and cons of lnRR and SMD).

## Conclusions

We indirectly examined the extent of the replication crisis in ecology and evolutionary biology using two inter-related indicators: publication bias and statistical power. Our results indicate that two expected outcomes of publication bias, small-study effects and decline effects, are persistent and non-negligible in the field. Primary studies in ecology and evolutionary biology are often underpowered and prone to overestimate the magnitude of the effect (i.e. Type M error). Pervasive publication bias leads to exaggerated effect sizes, inflated meta-analytic evidence and overestimated statistical power, and to underestimated Type M error rates, undermining the reliability of previous findings. Although no single indicator can capture the true extent or all relevant evidence of the replication crisis [93], we have provided clear evidence that, as in many other disciplines [1, 2, 4], previously published findings in ecology and evolutionary biology are likely to have low replicability. The likely replication crisis in these fields highlights the importance of (i) designing high-power primary studies by building up big-team science [7, 87] where possible, (ii) adopting appropriate publication bias detection and correction methods for meta-analyses [45], (iii) embracing publication-bias-robust publication forms (e.g. Registered Reports — like the current article) for both empirical studies and meta-analyses alike. More generally, researchers need to adhere more closely to open and transparent research practices [94], such as (pre-)registration [95], data and code sharing [96, 97], and transparent reporting [5], to achieve credible, reliable and reproducible ecology and evolutionary biology.

## Availability of data and materials

The relevant data and code that reproduce the results of this registered report are available at GitHub Repository (https://github.com/Yefeng0920/EcoEvo_PB) and Zenodo (Yefeng0920/EcoEvo_PB: Registered Report - publicaiton bias in Eco & Evo. DOI: https://doi.org/10.5281/zenodo.7762126).

## References

Collaboration OS. Estimating the reproducibility of psychological science. Science. 2015;349:aac4716.

Camerer CF, Dreber A, Forsell E, Ho T-H, Huber J, Johannesson M, et al. Evaluating replicability of laboratory experiments in economics. Science. 2016;351(6280):1433–6.

Ebersole CR, Mathur MB, Baranski E, Bart-Plange D-J, Buttrick NR, Chartier CR, et al. Many labs 5: testing pre-data-collection peer review as an intervention to increase replicability. Adv Methods Pract Psychol Sci. 2020;3(3):309–31.

Baker M. Reproducibility crisis. Nature. 2016;533(26):353–66.

Kelly CD. Rate and success of study replication in ecology and evolution. PeerJ. 2019;7:e7654.

Parker TH, Forstmeier W, Koricheva J, Fidler F, Hadfield JD, Chee YE, et al. Transparency in ecology and evolution: real problems, real solutions. Trends Ecol Evol. 2016;31(9):711–9.

O’Dea RE, Parker TH, Chee YE, Culina A, Drobniak SM, Duncan DH, et al. Towards open, reliable, and transparent ecology and evolutionary biology. BMC Biol. 2021;19(1):1–5.

Fraser H, Barnett A, Parker TH, Fidler F. The role of replication studies in ecology. Ecol Evol. 2020;10(12):5197–207.

Nakagawa S, Parker TH. Replicating research in ecology and evolution: feasibility, incentives, and the cost-benefit conundrum. BMC Biol. 2015;13(1):1–6.

Feng X, Park DS, Walker C, Peterson AT, Merow C, Papeş M. A checklist for maximizing reproducibility of ecological niche models. Nat Ecol Evol. 2019;3(10):1382–95.

Rosenthal R. The file drawer problem and tolerance for null results. Psychol Bull. 1979;86(3):638.

Jennions MD, Møller AP. Relationships fade with time: a meta-analysis of temporal trends in publication in ecology and evolution. Proc R Soc Lond Ser B Biol Sci. 2002;269(1486):43–8.

Koricheva J, Kulinskaya E. Temporal instability of evidence base: a threat to policy making? Trends Ecol Evol. 2019;34(10):895–902.

Sterne JA, Gavaghan D, Egger M. Publication and related bias in meta-analysis: power of statistical tests and prevalence in the literature. J Clin Epidemiol. 2000;53(11):1119–29.

McShane BB, Böckenholt U, Hansen KT. Average power: a cautionary note. Adv Methods Pract Psychol Sci. 2020;3(2):185–99.

Button KS, Ioannidis JP, Mokrysz C, Nosek BA, Flint J, Robinson ES, et al. Power failure: why small sample size undermines the reliability of neuroscience. Nat Rev Neurosci. 2013;14(5):365–76.

Szucs D, Ioannidis JP. Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. PLoS Biol. 2017;15(3):e2000797.

Fraley RC, Chong JY, Baacke KA, Greco AJ, Guan H, Vazire S. Journal N-pact factors from 2011 to 2019: evaluating the quality of social/personality journals with respect to sample size and statistical power. Adv Meth Pract Psychol Sci. 2022;5(4):1–17.

Barto EK, Rillig MC. Dissemination biases in ecology: effect sizes matter more than quality. Oikos. 2012;121(2):228–35.

Crystal‐Ornelas R, Lockwood JL. Cumulative meta‐analysis identifies declining but negative impacts of invasive species on richness after 20 yr. Ecology. 2020;101(8):e03082.

Cohen J. Statistical power analysis for the behavioral sciences. 2nd ed. Hillsdale: Lawrence Erlbaum; 1988.

Jennions MD, Møller AP. A survey of the statistical power of research in behavioral ecology and animal behavior. Behav Ecol. 2003;14(3):438–45.

Smith DR, Hardy IC, Gammell MP. Power rangers: no improvement in the statistical power of analyses published in animal behaviour. Anim Behav. 2011;1(81):347–52.

Jennions MD, Moeller AP. Publication bias in ecology and evolution: an empirical assessment using the ‘trim and fill’method. Biol Rev. 2002;77(2):211–22.

Correll J, Mellinger C, McClelland GH, Judd CM. Avoid Cohen’s ‘small’,‘medium’, and ‘large’for power analysis. Trends Cogn Sci. 2020;24(3):200–7.

Ioannidis JP, Stanley TD, Doucouliagos H. The power of bias in economics research. Econ J. 2017;127(605):F236–65.

Stanley T, Carter EC, Doucouliagos H. What meta-analyses reveal about the replicability of psychological research. Psychol Bull. 2018;144(12):1325–46.

Gelman A, Tuerlinckx F. Type S error rates for classical and Bayesian single and multiple comparison procedures. Comput Stat. 2000;15(3):373–90.

Gelman A, Carlin J. Beyond power calculations: assessing type S (sign) and type M (magnitude) errors. Perspect Psychol Sci. 2014;9(6):641–51.

Gurevitch J, Koricheva J, Nakagawa S, Stewart G. Meta-analysis and the science of research synthesis. Nature. 2018;555(7695):175–82.

O’Dea RE, Lagisz M, Jennions MD, Koricheva J, Noble DW, Parker TH, et al. Preferred reporting items for systematic reviews and meta- analyses in ecology and evolutionary biology: a PRISMA extension. Biol Rev. 2021;96(5):1695–722.

Hedges LV. Estimation of effect size from a series of independent experiments. Psychol Bull. 1982;92(2):490–9.

Hedges LV, Gurevitch J, Curtis PS. The meta-analysis of response ratios in experimental ecology. Ecology. 1999;80(4):1150–6.

Nakagawa S, Cuthill IC. Effect size, confidence interval and statistical significance: a practical guide for biologists. Biol Rev. 2007;82(4):591–605.

Wood JL, Yates MC, Fraser DJ. Are heritability and selection related to population size in nature? Meta-analysis and conservation implications. Evol Appl. 2016;9(5):640–57.

Murren CJ, Maclean HJ, Diamond SE, Steiner UK, Heskel MA, Handelsman CA, et al. Evolutionary change in continuous reaction norms. Am Nat. 2014;183(4):453–67.

Caruso CM, Eisen KE, Martin RA, Sletvold N. A meta-analysis of the agents of selection on floral traits. Evolution. 2019;73(1):4–14.

Yates MC, Fraser DJ. Does source population size affect performance in new environments? Evol Appl. 2014;7(8):871–82.

Barrientos R. Adult sex-ratio distortion in the native European polecat is related to the expansion of the invasive American mink. Biol Conserv. 2015;186:28–34.

Wehi P, Nakagawa S, Trewick S, Morgan-Richards M. Does predation result in adult sex ratio skew in a sexually dimorphic insect genus? J Evol Biol. 2011;24(11):2321–8.

Koricheva J, Gurevitch J. Uses and misuses of meta-analysis in plant ecology. J Ecol. 2014;102(4):828–44.

Nakagawa S, Santos ES. Methodological issues and advances in biological meta-analysis. Evol Ecol. 2012;26(5):1253–74.

Viechtbauer W. Bias and efficiency of meta-analytic variance estimators in the random-effects model. J Educ Behav Stat. 2005;30(3):261–93.

Viechtbauer W. Conducting meta-analyses in R with the metafor package. J Stat Softw. 2010;36(3):1–48.

Nakagawa S, Lagisz M, Jennions MD, Koricheva J, Noble D, Parker TH, et al. Methods for testing publication bias in ecological and evolutionarymeta-analyses. Methods Ecol Evol. 2022;13(1):4–21.

Noble DW, Lagisz M, O'Dea RE, Nakagawa S. Nonindependence and sensitivity analyses in ecological and evolutionary meta-analyses. Mol Ecol. 2017;26(9):2410–25.

Senior AM, Grueber CE, Kamiya T, Lagisz M. O'dwyer K, Santos ES, Nakagawa S: heterogeneity in ecological and evolutionary meta-analyses: its magnitude and implications. Ecology. 2016;97(12):3293–9.

Senior AM, Viechtbauer W, Nakagawa S. Revisiting and expanding the meta-analysis of variation: the log coefficient of variation ratio, lnCVR. Res Synth Methods. 2020;11(4):553–67.

Stanley TD, Doucouliagos H, Ioannidis JP. Finding the power to reduce publication bias. Stat Med. 2017;36(10):1580–98.

Schielzeth H. Simple means to improve the interpretability of regression coefficients. Methods Ecol Evol. 2010;1(2):103–13.

Fanelli D, Costas R, Ioannidis JP. Meta-assessment of bias in science. Proc Natl Acad Sci. 2017;114(14):3714–9.

Nakagawa S, Samarasinghe G, Haddaway NR, Westgate MJ, O’Dea RE, Noble DW, et al. Research weaving: visualizing the future of research synthesis. Trends Ecol Evol. 2019;34(3):224–38.

Leone F, Nelson L, Nottingham R. The folded normal distribution. Technometrics. 1961;3(4):543–50.

Nakagawa S, Lagisz M. Visualizing unbiased and biased unweighted meta-analyses. J Evol Biol. 2016;29(10):1914–6.

Morrissey MB. Meta-analysis of magnitudes, differences and variation in evolutionary parameters. J Evol Biol. 2016;29(10):1882–904.

Lemoine NP, Hoffman A, Felton AJ, Baur L, Chaves F, Gray J, et al. Underappreciated problems of low replication in ecological field studies. Ecology. 2016;97(10):2554–61.

Cohn LD, Becker BJ. How meta-analysis increases statistical power. Psychol Methods. 2003;8(3):243–53.

Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4. J Stat Softw. 2014;67(1):1–46.

Stanley TD, Doucouliagos H. Meta-regression approximations to reduce publication selection bias. Res Synth Methods. 2014;5(1):60–78.

Stanley TD. Limitations of PET-PEESE and other meta-analysis methods. Soc Psychol Personal Sci. 2017;8(5):581–91.

Wickham H, Chang W, Wickham MH. ggplot2: elegant graphics for data analysis. New York: Springer; 2016.

Nakagawa S, Lagisz M, O'Dea RE, Rutkowska J, Yang Y, Noble DW, et al. The orchard plot: cultivating a forest plot for use in ecology, evolution, and beyond. Res Synth Methods. 2021;12(1):4–12.

Ioannidis JP, Trikalinos TA. The appropriateness of asymmetry tests for publication bias in meta-analyses: a large survey. Cmaj. 2007;176(8):1091–6.

Van Aert RC, Wicherts JM, Van Assen MA. Publication bias examined in meta-analyses from psychology and medicine: a meta-meta-analysis. PLoS One. 2019;14(4):e0215052.

Ferguson CJ, Brannick MT. Publication bias in psychological science: prevalence, methods for identifying and controlling, and implications for the use of meta-analyses. Psychol Methods. 2012;17(1):120–8.

Ioannidis JP. Why most discovered true associations are inflated. Epidemiology. 2008:640–8.

Kühberger A, Fritz A, Scherndl T. Publication bias in psychology: a diagnosis based on the correlation between effect size and sample size. PLoS One. 2014;9(9):e105825.

Doucouliagos C, Stanley TD. Are all economic facts greatly exaggerated? Theory competition and selectivity. J Econ Surv. 2013;27(2):316–39.

Franco A, Malhotra N, Simonovits G. Underreporting in psychology experiments: evidence from a study registry. Soc Psychol Personal Sci. 2016;7(1):8–12.

Bartoš F, Maier M, Wagenmakers E-J, Nippold F, Doucouliagos H, Ioannidis J, et al. Footprint of publication selection bias on meta-analyses in medicine, economics, and psychology. arXiv preprint arXiv. 2022:220812334.

Yang Y, Nakagawa S, Lagisz M. Decline effects are rare in ecology: comment. EcoEvoRxiv. 2022:1032942/osfio/qc7bx.

Costello L, Fox JW. Decline effects are rare in ecology. Ecology. 2022:e3680.

Sanchez-Tojar A, Nakagawa S, Sanchez-Fortun M, Martin DA, Ramani S, Girndt A, et al. Meta-analysis challenges a textbook example of status signalling and demonstrates publication bias. Elife. 2018;7:e37385.

Van Klink R, Bowler DE, Gongalsky KB, Swengel AB, Gentile A, Chase JM. Meta-analysis reveals declines in terrestrial but increases in freshwater insect abundances. Science. 2020;368(6489):417–20.

Clements JC, Sundin J, Clark TD, Jutfelt F. Meta-analysis reveals an extreme “decline effect” in the impacts of ocean acidification on fish behavior. PLoS Biol. 2022;20(2):e3001511.

Fanshawe TR, Shaw LF, Spence GT. A large-scale assessment of temporal trends in meta-analyses using systematic review reports from the Cochrane library. Res Synth Methods. 2017;8(4):404–15.

Pietschnig J, Siegel M, Eder JSN, Gittler G. Effect declines are systematic, strong, and ubiquitous: a meta-meta-analysis of the decline effect in intelligence research. Front Psychol. 2019:2874.

Sladekova M, Webb LE, Field AP. Estimating the change in meta-analytic effect size estimates after the application of publication bias adjustment methods. Psychol Methods. 2022.

Kvarven A, Strømland E, Johannesson M. Comparing meta-analyses and preregistered multiple-laboratory replication projects. Nat Hum Behav. 2020;4(4):423–34.

Bartoš F, Maier M, Shanks D, Stanley T, Sladekova M, Wagenmakers E-J. Meta-analyses in psychology often overestimate evidence for and size of effects; 2022.

Yang Y, Hillebrand H, Lagisz M, Cleasby I, Nakagawa S. Low statistical power and overestimated anthropogenic impacts, exacerbated by publication bias, dominate field studies in global change biology. Glob Chang Biol. 2022;28(3):969–89.

Lamberink HJ, Otte WM, Sinke MR, Lakens D, Glasziou PP, Tijdink JK, et al. Statistical power of clinical trials increased while effect size remained stable: an empirical analysis of 136,212 clinical trials between 1975 and 2014. J Clin Epidemiol. 2018;102:123–8.

Ioannidis JP, Stanley TD, Doucouliagos H. The power of bias in economics research. Econ J. 2017;127(605):F236–65.

van Zwet EW, Cator EA. The significance filter, the winner’s curse and the need to shrink. Statistica Neerlandica. 2021;75(4):437–52.

Berner D, Amrhein V. Why and how we should join the shift from significance testing to estimation. J Evol Biol. 2022;35(6):777–87.

Amrhein V, Greenland S, McShane B. Scientists rise up against statistical significance. Nature 2019;567(7748):305–7.

Coles NA, Hamlin JK, Sullivan LL, Parker TH, Altschul D. Building up big-team science. Nature. 2022;601(7894):505–7.

Harpole WS, Sullivan LL, Lind EM, Firn J, Adler PB, Borer ET, et al. Addition of multiple limiting resources reduces grassland diversity. Nature. 2016;537(7618):93–6.

Crossley MS, Meier AR, Baldwin EM, Berry LL, Crenshaw LC, Hartman GL, et al. No net insect abundance and diversity declines across US long term ecological research sites. Nat Ecol Evol. 2020;4(10):1368–76.

Wu PP-Y, Mengersen K, McMahon K, Kendrick GA, Chartrand K, York PH, et al. Timing anthropogenic stressors to mitigate their impact on marine ecosystem resilience. Nat Commun. 2017;8(1):1–11.

Bakbergenuly I, Hoaglin DC, Kulinskaya E. Estimation in meta-analyses of response ratios. BMC Med Res Methodol. 2020;20(1):1–24.

Nakagawa S, Poulin R, Mengersen K, Reinhold K, Engqvist L, Lagisz M, et al. Meta-analysis of variation: ecological and evolutionary applications and beyond. Methods Ecol Evol. 2015;6(2):143–52.

Fidler F, Chee YE, Wintle BC, Burgman MA, McCarthy MA, Gordon A. Metaresearch for evaluating reproducibility in ecology and evolution. BioScience. 2017;67(3):282–9.

Gallagher RV, Falster DS, Maitner BS, Salguero-Gómez R, Vandvik V, Pearse WD, et al. Open Science principles for accelerating trait-based science across the tree of life. Nat Ecol Evol. 2020;4(3):294–303.

Parker T, Fraser H, Nakagawa S. Making conservation science more reliable with preregistration and registered reports. Conserv Biol. 2019;33(4):747–50.

Parr CS, Cummings MP. Data sharing in ecology and evolution. Trends Ecol Evol. 2005;20(7):362–3.

Culina A, van den Berg I, Evans S, Sánchez-Tójar A. Low availability of code in ecology: a call for urgent action. PLoS Biol. 2020;18(7):e3000763.

## Acknowledgement

We thank Valentin Amrhein for his comments on this manuscript. We thank the Faculty of Science and the office of Deputy Vice-chancellor of Research, UNSW, Sydney for the support to YY and SN. YY was funded by the National Natural Science Foundation of China (NO. 32102597). SN, YY, and ML were funded by the Australian Research Council Discovery Grant (DP210100812). DN was supported by an ARC Discovery Grant (DP210101152).

## Author information

### Authors and Affiliations

### Contributions

YY: conceptualising the paper, collecting the data, analysing the data, and drafting the manuscript. AST: collecting the data, commenting, and editing the manuscript. REO: collecting the data, analysing the data, commenting, and editing the manuscript. DWAN: collecting the data, commenting, and editing the manuscript. JK: collecting the data, commenting, and editing the manuscript. MDJ: collecting the data, commenting, and editing the manuscript. THP: collecting the data, commenting, and editing the manuscript. ML: visualising, collecting the data, commenting, editing the manuscript, and supervising the project. SN: conceptualising the paper, collecting the data, analysing the data, commenting, editing the manuscript, and supervising the project. The authors read and approved the final manuscript.

### Corresponding authors

## Ethics declarations

### Competing interests

The authors declare that they have no competing interests.

## Additional information

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Supplementary Information

**Additional file 1.**

Supporting Information.

## Rights and permissions

**Open Access** This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

## About this article

### Cite this article

Yang, Y., Sánchez-Tójar, A., O’Dea, R.E. *et al.* Publication bias impacts on effect size, statistical power, and magnitude (Type M) and sign (Type S) errors in ecology and evolutionary biology.
*BMC Biol* **21**, 71 (2023). https://doi.org/10.1186/s12915-022-01485-y

Received:

Accepted:

Published:

DOI: https://doi.org/10.1186/s12915-022-01485-y