A universal scaling relationship between body mass and proximal limb bone dimensions in quadrupedal terrestrial tetrapods

Background Body size is intimately related to the physiology and ecology of an organism. Therefore, accurate and consistent body mass estimates are essential for inferring numerous aspects of paleobiology in extinct taxa, and investigating large-scale evolutionary and ecological patterns in the history of life. Scaling relationships between skeletal measurements and body mass in birds and mammals are commonly used to predict body mass in extinct members of these crown clades, but the applicability of these models for predicting mass in more distantly related stem taxa, such as non-avian dinosaurs and non-mammalian synapsids, has been criticized on biomechanical grounds. Here we test the major criticisms of scaling methods for estimating body mass using an extensive dataset of mammalian and non-avian reptilian species derived from individual skeletons with live weights. Results Significant differences in the limb scaling of mammals and reptiles are noted in comparisons of limb proportions and limb length to body mass. Remarkably, however, the relationship between proximal (stylopodial) limb bone circumference and body mass is highly conserved in extant terrestrial mammals and reptiles, in spite of their disparate limb postures, gaits, and phylogenetic histories. As a result, we are able to conclusively reject the main criticisms of scaling methods that question the applicability of a universal scaling equation for estimating body mass in distantly related taxa. Conclusions The conserved nature of the relationship between stylopodial circumference and body mass suggests that the minimum diaphyseal circumference of the major weight-bearing bones is only weakly influenced by the varied forces exerted on the limbs (that is, compression or torsion) and most strongly related to the mass of the animal. Our results, therefore, provide a much-needed, robust, phylogenetically corrected framework for accurate and consistent estimation of body mass in extinct terrestrial quadrupeds, which is important for a wide range of paleobiological studies (including growth rates, metabolism, and energetics) and meta-analyses of body size evolution.


Background
In extant taxa, body size is recognized as one of the most important biological properties because it strongly correlates with numerous physiological and ecological factors, such as metabolic rate [1][2][3], growth rate [4,5], fecundity [6], diversity [7], and population density [8,9], as well as home range and land area [6,10,11], which are related to the productivity of the host environment [12].
Due to these relationships, estimates of body mass (the standard measure of body size) are essential for inferring the paleobiology of extinct taxa, and investigating largescale evolutionary and ecological patterns in the history of life.
Currently, there are two types of methods used to estimate body mass in extinct animals: volumetric reconstructions and skeletal scaling relationships. The latter method is commonly used to predict body mass in extinct members of relatively recent crown clades (that is, of Mesozoic origin) such as Mammalia and Aves [21,[41][42][43][44][45]. However, in stem groups (for example, non-avian dinosaurs and non-mammalian synapsids), estimations are often based on volumetric reconstructions, which involve physical three-dimensional scale models [46,47], graphic double integration of two-dimensional reconstructions [48][49][50], or computer-generated life reconstructions [51][52][53][54][55]. Such estimates are widely used in the literature (for example, [35,38]) despite the fact that they are prone to a considerable amount of error. In a typical example, body mass estimates for a single mounted skeleton of Brachiosaurus brancai recently published by the same research group have resulted in estimates of 38 tonnes and 74.4 tonnes [54,56]. Such differences in estimates are the result of differing interpretations of a multitude of factors associated with the mass and proportion of an organism's tissues and organs [57], or, perhaps most importantly, the effects of air sacs and lungs, which will likely have a large effect on specific gravity (the total body density of the animal in relation to water), needed to estimate mass from a volume. Within non-avian reptiles specific gravity has been noted to range from 0.8 to 1.2 [46,48]; however, given the varying levels of bone pneumaticity observed in saurischian dinosaurs [58,59], and the fact that birds typically exhibit lower densities than mammals and other reptiles [60], it is almost certain that the specific gravity of extinct animals also varied [59]. As a result, assumptions based on a set density parameter will considerably affect a mass estimate [54,56]. Perhaps more importantly, the numerous assumptions about soft tissue properties and body shape (for example, muscle sizes) in many of the models make it difficult to control for sources of error and to determine the confidence associated with a given mass estimate, although recent computational modelling advances attempt to outline maximum and minimum body mass bounds (for example, [54,61,62]). Despite the complications associated with life reconstructions of extinct taxa, models are important for testing numerous biomechanical hypotheses [61,[63][64][65][66][67][68]. Therefore, it is important that models be constrained by data derived from extant taxa, such as those obtained from scaling relationships.
An alternative method to reconstructions, and one that can be used to test and constrain scale and computational models ( [55]), is the use of scaling relationships between body mass and skeletal dimensions derived from extant taxa. A skeletal measure, if strongly related to body mass, will provide an estimate that controls for the sources of error associated with making a reconstruction, such as determination of tissue volume and specific gravity, which are virtually impossible to constrain in life-reconstructions. Furthermore, skeletal measurements are generally easier to obtain than full body scale reconstructions, especially for taxa that are only partially preserved, and are therefore more practical estimators in large-scale evolutionary and ecological studies (for example, [15][16][17]20]). Finally, the variation in the extant dataset can be used to quantify the degree of confidence in the estimated parameter, and can thus provide a range in which a particular body mass is likely to fall, thereby providing a constraint for estimates produced by reconstructed models. Scaling methods are almost universally accepted as a means to estimate body mass accurately for extinct taxa of crown groups, such as mammals and birds (for example, [17,42]), but have been extensively criticized when applied to more distantly related stem taxa that fall outside the body size range observable in extant representatives, such as Indricotherium [69], xenarthrans [43], and non-avian dinosaurs [70][71][72]. For the first two groups, studies have since shown that scaling relationships still provide the most reliable mass estimates [43,69].
Dinosaurian body masses are still generally estimated using reconstructions, with the exception of two studies [45,73]. The pioneering work completed by Anderson et al. [73], herein referred to as the Anderson method, suggested that the body mass of dinosaurs could be estimated using the measured scaling relationship between live mass and total circumference of the stylopodia (humerus + femur) derived from a sample of 33 species of extant terrestrial mammals. Although the Anderson method provides a more objective way to estimate body mass in extinct taxa, it has been criticized by numerous authors (for example, [49,56,61,70,71,[74][75][76]). Here we use an extensive dataset of extant mammals and non-avian reptiles compiled from individual skeletons of live-weighed animals, in order to directly test the three main criticisms made towards the use of a universal limb scaling relationship to estimate body mass in extinct terrestrial amniotes: 1. The widely cited Anderson method, especially among non-avian dinosaur researchers, is criticized based on its use of a taxonomically biased sample towards ungulates (for example, [70]). Studies examining limb-scaling patterns in mammals have noted that the limb proportions of ungulates differ from those of other mammals [70,77,78]. However, whether ungulates differ from other groups of mammals in their scaling patterns of limb circumference to body mass has not been directly tested.
2. Differences in gait and limb posture impart different stress regimes on the limbs [79,80]. These differences may affect limb morphology, thereby negating the applicability of a single equation to estimate body mass in a variety of extinct vertebrates. Given different stress regimes, we test for differential limb scaling between animals of various gaits and limb posture by comparing differently sized sub-samples of mammals, and parasagittal mammals to sprawling reptiles.
3. Residual outliers (large residual values) and extreme outliers (values at the upper and lower extremes of the dataset) can have a large effect on regression coefficients [81]. The problem of residual outliers in the large-bodied mammalian sample of Anderson et al. [73] was discussed by Packard et al. [82]. We have expanded the sample size of the large-bodied dataset and will address the effect that potential residual outliers have on the circumference to body mass relationship. The effect of extreme outliers on limb scaling is, in part, mediated by logarithmic transformation of the data, but will also be assessed through size class comparisons. Although the issue of body mass extrapolation to giant extinct taxa (for example, Sauropoda; [50,72]) will always exist, the vast majority of extinct animals, including most non-avian dinosaurs, fall within the body mass range of extant taxa.
All three of these criticisms are tested for the first time, within the context of 200 mammal and 47 non-avian reptile species [See Additional file 1, Dataset]. Based on our results we develop a universal scaling equation between the total circumference of the stylopodia and body mass that is applicable to all terrestrial quadrupeds, and permits estimation of body mass in extinct taxa along with an error factor that can constrain estimates for use in future paleobiological studies.

Raw data results
Results from the standardized major axis (SMA) analyses comparing clades based on the raw non-phylogenetically corrected data are provided in Figures 1 and 2, and Table 1;  comparisons are summarized in Tables 2 and 3. Size class comparisons are presented in Figure 3 and Tables 4 and 5. All analyses show strong correlations with each other, and to body mass (that is, size) as indicated by a mean coefficient of determination of 0.9446 ± 0.0093 for the clade comparisons, and 0.914 ± 0.014 for comparisons between size classes.
In total, 80 pairwise comparisons are made between mammalian clades (Tables 1 and 2). Of these comparisons, the 95% confidence intervals indicate 12 significant differences between scaling coefficients and 13 significant differences between intercepts. In comparison, the likelihood ratio test, the results of which are adjusted for multiple comparisons using the false discovery rate (FDR), reveals 14 significant differences between slopes, and a t-test of the true intercepts indicates ten significant differences; however, when the intercept is corrected and compared at a more biologically meaningful value, the minimum value along the x-axis, the t-test indicates that there are no significant differences in intercept.
Regardless of the comparison method used, the most significant variation is noted in the scaling of stylopodial proportions (length to circumference) of the humerus and femur, as well as in the scaling of humeral and femoral lengths with body mass (Figure 1; Tables 1 and 2). This is especially true for ungulates, which possess stylopodial proportions and lengths that scale significantly different from all other groups examined here. No significant differences in scaling coefficients were recovered in the scaling of either the humeral or femoral circumference to body mass using the likelihood ratio test, and only two differences were recovered by the 95% confidence interval comparisons in the scaling of humerus circumference to body mass (Marsupialia scales significantly higher than Ungulata and Carnivora).
In total, ten and 13 significant differences were noted in comparisons between intercepts using confidence intervals and a t-test, respectively, including a significant difference in the intercept of Carnivora and Glires using 95% confidence intervals in the comparison of total stylopodial (humerus + femur) circumference and body mass. However, visual inspection reveals major overlap between the data points at the minimum values along the x-axis ( Figure 1) suggesting that significant differences may be due to extrapolation of the SMA line to a value of x = 0. This is likely a valid interpretation as an adjusted t-test comparing the intercepts at the minimum values along the x-axis (Table 2) indicates that intercepts are not significantly different between mammalian groups in any of the comparisons made here.
Mammalian and reptilian scaling patterns show similar scaling coefficients, overall. Of the eight comparisons, two scaling coefficients showed significant differences using both the 95% confidence intervals and the likelihood ratio test. More specifically, the humeral proportions and humeral length to body mass in reptiles scale above that observed for mammals ( Figure 2; Tables 1  and 3). Comparison of the confidence intervals revealed significant differences in the intercepts of mammals and reptiles in the relationship between femur circumference and body mass, as well as humerus length to body mass. However, these differences were not recovered by either t-test. When the circumference of the humerus and femur is combined, all tests indicate that the total stylopodial circumference to body mass relationship of reptiles is statistically indifferentiable from that of mammals.
Finally, in order to assure that the results obtained for mammals and reptiles are not influenced by differences in body size range in the two samples, we re-ran the analyses using a subset of the mammalian dataset (N = 174), which corresponds to all mammals equal to, or below, the mass of the Alligator mississippiensis specimen (168 kg), the largest reptile measured in this study. In general, results of this pruned analysis were similar to those obtained with the entire mammalian dataset (Table 3) [See Additional file 2, Table S1]. In particular, comparisons of slopes based on the likelihood ratio test are identical. Differences between the two analyses were noted in comparisons using the 95% confidence intervals in which the pruned analysis revealed an additional difference in the scaling of femoral length and circumference between mammals and reptiles, but failed to recover a significant difference between intercepts in the scaling of femoral circumference to body mass. The t-test on the pruned  Campione and Evans BMC Biology 2012, 10:60 http://www.biomedcentral.com/1741-7007/10/60 data also revealed an additional difference between the intercepts of mammals and reptiles in the relationship of humeral length to body mass as well as femoral to humeral length. Despite differences in the scaling of stylopodial length, no significant differences were noted in the scaling of stylopodial circumference to body mass between mammals and reptiles.
Size class comparisons, based on the mammalian dataset (N = 200), at three different thresholds reveal greater variation in scaling patterns between subsamples at lower body size thresholds (Tables 4 and 5), although this may be due to the small sample size in the large body size class at the 100 kg threshold (N = 36). In particular, the limb proportions of the humerus scaled differently in animals smaller   Table 1.
Lissamphibians are plotted (green) but no line was fitted due to its small sample size and body mass range.  than 20 kg compared to those larger than 20 kg, a pattern also noted at the 50 kg threshold. A significant difference in the proportional scaling of the femur is also noted at 50 kg. Significant differences were noted in the scaling of humeral length to body mass between individuals at the 20 kg and 50 kg threshold. As in the mammalian and reptilian comparisons, no significant differences were noted in the scaling of combined circumference and body mass between different size classes ( Figure 3; Table 5).

Independent contrast results
Overall, phylogenetically corrected scaling relationships reveal lower coefficients of determination than the raw data. The mean R 2 (0.9126 ± 0.0105) for the corrected data is significantly lower than that obtained from the raw data (two tailed t-test: t = -4.4721; P < < 0.0001). As a result, fewer significant differences were noted between mammalian clades and between mammals and reptiles [See Additional file 3, Tables S1 and S2]. Of the 80 mammalian comparisons made, two showed significant differences recovered by both the 95% confidence intervals and the likelihood ratio test. The differences include a significantly lower scaling coefficient of Carnivora compared to Glires and Ungulata, in the scaling of femur length to humerus length. Confidence intervals indicate two other differences when the data is corrected in which the humeral length of reptiles scales significantly higher than that of mammals when compared to body mass and the humeral circumference in ungulates scales higher than that of carnivorans when compared to body mass. Most importantly, however, based on the confidence intervals, comparisons between scaling coefficients obtained from the raw data (Table 1) and the phylogenetically corrected data [See Additional file 3, Table S2] reveal only a single significant difference for the scaling of humeral proportions in Glires. Other than that comparison, the lack of significant differences between the raw data and phylogenetically corrected data suggest that phylogeny does not play a significant role in dictating the scaling patterns tested here with regards to the major weight-bearing bones in terrestrial tetrapods. For this reason, and for ease of comparison with previous limb scaling studies, further discussion will be based on results obtained from the raw data.

Discussion
Skeletal limb morphology in vertebrates is considered to reflect a trade-off between the energetic requirements imposed by movement and the functional requirements imposed by loadings on the bone from behavioral qualities and/or body size [78,[83][84][85][86][87][88]. Biomechanical studies using in vivo strain gauges and force platforms in mammals and birds have concluded that peak functional strains (that is, safety factors, strain at which yield or failure occur/peak functional strain) placed on a limb bone during locomotion are consistent among taxa of different size and different lifestyles (for example, terrestrial, aquatic, and aerial; [80]). However, in non-avian reptiles, safety factors are higher compared to mammals suggesting that functional strains are lower in the former [79,89,90]. Nevertheless, in order to mitigate decreases in safety factors associated with increases in body size, the architecture of the skeletal limb, such as limb robustness, cortical thickness, and/or curvature, are expected to vary [80,86,88,91].
Interspecific limb scaling patterns are often used to test theoretical biomechanical models, such as geometric, elastic, and static similarity, which predict scaling patterns based on biomechanical observations and/or assumptions The particular theoretical scaling model (Sim.) followed by the slope is represented by G, geometric similarity, E, elastic similarity, or S, static similarity. Scaling patterns that fall between models are represented by > or <, and those that do not follow any pattern (that is, above or below all predicted models) are represented by a 0. BM, body mass; C F , femoral circumference; C H , humeral circumference; C H+F , total humeral and femoral circumference; CI, confidence interval; L F , femoral length; L H , humeral length.
Standardized major axis equation shown in the format y = mx + b. Symbols: (°) represents differences at 90 to 95% (0.1 <P > 0.05); (*) at 95-99% (0.05 <P > 0.01); and comparisons. Significant differences using 95% CI are assessed on whether the intervals overlap or not; non-overlapping comparisons are indicated with an asterisk (*). circumference; C H , humeral circumference; C H+F , total humeral and femoral circumference; FDR, false discovery rate; L F , femoral length; L H , humeral length; LRT, (**) at greater than 95% (P < 0.01). Otherwise, P-values are > 0.1. All P-values are adjusted for multiple comparisons using FDR. Hyphens (-) represent duplicate 95% CI, comparisons based on 95% confidence intervals; b', intercept adjusted to correspond to the minimum value along the x-axis; BM, body mass; C F , femoral comparisons based on a likelihood ratio test (slope only); t-test, comparisons based on a two-tailed t-test (intercept only). [70,77,78,84,85,[92][93][94]. These theoretical models were formally presented by McMahon [95,96], who provided empirical support for elastic scaling in terrestrial vertebrates (using ungulates as a proxy), as opposed to a strict geometric (isometric) scaling. These models were subsequently revisited by other authors who present empirical evidence that elastic similarity is restricted to ungulates with other mammals following either a geometric trend [77] or not clearly conforming to either the elastic or geometric theoretical models [85,93,94]. In general, empirical scaling studies of terrestrial mammals have found minor support for elastic similarity (see [87], for a full review). In reptiles, however, Blob [84] recovered significant support for elastic similarity in several regressions comparing limb diameters to body mass in varanids and iguanians. The results obtained here suggest that limb scaling in mammalian and reptilian clades exhibits a great deal of variation with respect to elastic and geometric similarity, and as suggested by Christiansen [85,93], depending on the variables being compared, clades and subgroups appear to follow a variety of scaling models, and no theoretical scaling model can be used to describe all terrestrial vertebrates. However, this study suggests that elastic similarity is more prevalent than previously suggested, especially in the scaling of humeral circumference with body mass. Of the eight clades examined (Table 1), only a single group, Marsupialia, did not follow a significant allometric trend (that is, significantly different than geometric similarity), and six of the clades    Table 4. All three comparisons plot the log total stylopodial circumference against log body mass in the mammalian sample of the dataset. Size class comparisons are based on previously studied thresholds discussed in the text [78,93,94]. Mammals above and below 20 kg (A), 50 kg (B), and 100 kg (C). SMA, standardized major axis. The particular theoretical scaling model (Sim.) followed by the slope is represented by G, geometric similarity, E, elastic similarity, or S, static similarity. Scaling patterns that fall between models are represented by > or <, and those that do not follow any pattern (that is, above or below all predicted models) are represented by a 0. BM, body mass; C F , femoral circumference; C H , humeral circumference; C H+F , total humeral and femoral circumference; L F , femoral length; L H , humeral length follow the model predicted by elastic or static similarity. In contrast, the scaling of humeral length to body mass is more closely associated with geometric similarity, as no clade follows elastic similarity, two clades follow geometric similarity, and four are negatively allometric (and therefore are below any theoretical model). Only two groups (Reptilia and Ungulata) are significantly above geometric similarity and therefore exhibit an allometric pattern whereby the length of the humerus gets shorter as body size increases, approaching a more elastic pattern. A similar pattern is present in the scaling of femoral measurements with body mass. These patterns suggest that circumference measurements tend towards allometric models suggested by McMahon [95,96], whereas length measurements follow a pattern that, in general, cannot be differentiated from isometry when compared to body mass. The results presented here reveal that general scaling patterns of limb circumference in numerous different terrestrial vertebrates, though not always strictly elastic (as defined by McMahon), follow consistent allometric trajectories. Such allometric relationships indicate that, interspecifically, as animals get larger their limbs increase in robusticity at a higher rate compared to body mass. These changes in the architecture of the limb in relation to size support the dynamic similarity hypothesis proposed by Rubin and Lanyon [80], which predicts changes in limb structure in order to maintain safety factors [86]. The morphological changes in limb skeletal structure, as suggested by Rubin and Lanyon [80], are not the only shifts to occur with size, and likely work in concert with other shifts, such as postural and behavioral [80,84,86,88], to mitigate the response of safety factors to changes in body size. It is important to note in this respect that this study only examines the external dimensions of the bones, and that factors such as posture may influence aspects of cross-sectional bone shape (such as the relative proportions between anteroposterior and mediolateral diameters) and internal bone distribution that are not captured here. Nevertheless, the highly conserved relationships between individual and total humeral and femoral circumference and body mass suggest that in terrestrial quadrupeds external circumference measurements of the stylopodia are largely independent of posture and gait, and are most strongly associated with size, allowing us to forward the hypothesis that stylopodial circumference is more closely associated with the body mass than with the type of force (that is, compression or torsion) acting on the limb. Our results therefore present regressions that are most suitable for body mass estimation of extinct terrestrial quadrupedal vertebrates, regardless of the group under consideration.

Stylopodial scaling as a predictor of body mass
As body mass is correlated with numerous physiological and ecological properties, (for example, [4,97]), consistent and accurate estimation of body mass in extinct taxa is important when attempting to reconstruct the dynamics of paleoecosystems and the life history of extinct taxa. The use of skeletal scaling to estimate body mass is common in extinct mammals and birds (for example, [17,41,42,45,98]); however, it is less common in extinct non-avian archosaurs and non-mammalian synapsids ( [48,73,99] being notable exceptions). Scaling methods are often criticized when models are extended to more distantly related stem taxa, based on arguments such as uneven taxon sampling (ungulate bias), its applicability to animals of different gaits and limb postures, as well as its susceptibility to residual and extreme outliers [51,70,72,82]. Our dataset allows us to address these major criticisms with empirical data.

Ungulate uniqueness and bias
Ungulates, and specifically artiodactyls or bovids, are considered to exhibit scaling patterns distinct from those seen in other mammals. In particular, their limbs are considered to follow an elastic trend [70,77,78,93,96,100]. In addition to finding elastic trends in other mammalian clades and in reptiles, we reject previous interpretations that limb scaling in ungulates is strictly elastic. In the sample of 41 ungulates examined here (including 34 artiodactyls of which 20 are bovids), elastic similarity was recovered only in humeral circumference compared to body mass, a pattern also noted in most other clades ( Table 1). Scaling of other limb measurements in ungulates either cannot be differentiated from geometric similarity, or follows allometric patterns significantly different from either theoretical model (Table 1 Sim = 0). These patterns are robust even when assessed at more exclusive levels (artiodactyls or bovids; Additional file 4, Table S3). As a result, a strict relationship between stylopodial scaling patterns and a cursorial lifestyle does not characterize ungulates to the exclusion of other mammalian clades. As such, cursorial adaptations in the limbs of ungulates may be limited to other stylopodial measurements (for example, diameter) or more distal limb bones [83,93]. The different patterns of limb scaling observed in ungulates compared to mammals [70,77,78] are often used to cast doubt on the utility of the Anderson method to estimate body mass in extinct taxa. New data confirms some differences in limb scaling between ungulates and other mammalian clades, but only in comparisons of limb proportions (length to circumference) and length to body mass ( Figure 1; Table 2). Circumference to body mass relationships reveal very high coefficients of determination and recover no significant differences between ungulates and other groups of mammals. The combined circumference of the stylopodia revealed the strongest relationship to body mass ( Figure 4A) and shows that a bias towards ungulates does not significantly alter the relationship; ungulates follow the same scaling relationships of this variable to body mass as other mammals, as well as non-avian reptiles.

Limb scaling patterns at different gaits and limb postures
Extant terrestrial vertebrates have a variety of gaits and limb postures [79,80]. In vivo strain studies have also shown that in mammals, limbs of taxa of smaller body size are primarily loaded in tension, whereas compression predominates in larger taxa, resulting from postural differences with size (also related to the dynamic similarity hypothesis). Such differences are also noted in reptiles compared to mammals, in which the former hold their limbs in a sprawling fashion and hence their stylopodia are generally loaded under tension [79]. Given these postural differences, it was hypothesized that the scaling pattern of limb robusticity with body mass should vary in response to differences in limb loading [84,85]. Comparisons made here between differently sized mammals, as well as between mammals and reptiles, reveal significant differences in limb proportions, as well as in the relationships between length and body mass (Figures 2 and 3; Tables 2 and 5), and support previous studies [78,85,94]. Surprisingly, however, the relationships between limb circumference and body mass are conserved between these different groups, and no significant differences in circumferential scaling between differently sized animals and between mammals and reptiles were observed. Furthermore, we find limited evidence for geometric similarity of limb robusticity in both small and large size class samples. Instead, circumference measurements follow a generally negative allometric pattern indicating a consistent increase in circumference relative to body size in both small and large mammals. The total stylopodial circumference ( Figure 4A) provides the strongest relationship (R 2 = 0.9861) and suggests that this variable is a strong predictor of body size for both parasagittal and sprawling taxa alike, and that combined limb circumference is not strongly correlated with limb posture and gait. These results concur with other studies on non-avian reptiles [84] and birds [101] that have shown remarkable morphological similarities of limb circumference (or diameter) between taxa with highly variable limb posture.

Outliers
The final criticism made towards the use of skeletal scaling methods, such as the Anderson method, to estimate body mass is related to the effect outliers have on the final predictive equation, especially at large body size where the sample size is low [82]. In the relationship between combined humeral and femoral circumference and body mass, a residual outlier test reveals that none of the largest animals in our greatly expanded dataset are residual outliers, including the buffalo, hippopotamus, and elephant ( Figure 4A). The only outliers identified here appear to be related to unique ecologies, such as suspension locomotion (Choloepus didactylus) and burrowing (Priodontes maximus, Condylura cristata, Parascalops breweri), which can generally be inferred from skeletal anatomy as a potential confounding factor to mass estimation based on their highly derived limb morphologies [102]. Both representatives of Soricomorpha, C. cristata and P. breweri, are the farthest residual outliers, and, due to their especially apomorphic anatomy, will be removed from the body mass equation. Only one residual outlier, the turtle Trachemys scripta is difficult to explain, but its relatively high weight may be a factor of captivity or measurement error when the live weight was taken.
A recent study by Packard et al. [82] suggested that because of its amphibious lifestyle, Hippopotamus amphibius may have a high body mass compared to its limb circumference measurement. As a result, it may represent a residual outlier, which justifies the removal of H. amphibius from the analysis. This assertion is based on the observation that if the raw data (non-log) of Anderson et al. [73] is regressed using non-linear least-squares regression methods, the hippopotamus, the bison, and the elephant are all outliers. The statistical merits and flaws of logarithmically transforming data have been heavily debated (for example, [81,82,103,104]) and will not be discussed further here. However, based on the suggestions of Packard et al. [82], we regressed our non-log transformed expanded dataset using a nonlinear least squares regression, implemented with the 'nls' function in R, and tested for potential outliers in the residual variance. The results indicate that 40 species are outliers in the non-log residual data. In order to test for potential significant effects, we removed the 40 outliers and re-ran the log-log ordinary least squares (OLS) regression, which resulted in a slope of 2.802 ± 0.055 and is statistically indistinguishable from that obtained when using the complete dataset. This suggests that these data points do not significantly affect the final result. More importantly, examination of the mean percent prediction error (PPE) indicates that despite the need for back-transformation, the log-transformed linear regression is a significantly better model for predicting body mass than a non-linear model (log PPE = 25% ± 3%; non-log PPE = 43% ± 3%; Figure 4B; two-tailed t-test: t = -8.3245, P < < 0.0001).
Extreme outliers, those at the upper and lower extremes of the dataset, also have the potential to significantly affect regression results. In the current dataset, there are no extreme outliers when the data is log transformed. However, as is generally the case with extant size data, there are several positive extreme outliers in the non-log dataset. Thirty-three extreme outliers are observed in the body mass and combined humeral and femoral circumference data. When these taxa are removed and the log-log analysis is re-run (m = 2.745 ± 0.057, b = -1.099 ± 0.09), the regression is virtually identical to that obtained with the total dataset. The observation that extreme positive values do not affect the log-log OLS regression is further supported by the non-significant variation in scaling coefficients between different mammalian size classes (Figure 3). The empirical data presented here falsifies the main criticisms forwarded against skeletal-body mass regression models for predicting body mass in extinct taxa, and given the highly conserved nature of the relationship between stylopodial circumference and body mass in extant terrestrial mammals and reptiles, suggests that circumference measurements represent robust proxies of body mass that can be applied to extinct, phylogenetically and morphologically disparate quadrupedal terrestrial amniotes. The examination of eight terrestrial lissamphibian species (one caudatan and seven anurans [Additional file 1 Dataset]; not included in the final analysis) reveals that, based on their total stylopodial circumference and body mass, they plot within the range of variation present in the mammalian and reptilian dataset (Figure 2). Although at this time their small sample and range preclude any meaningful statistical comparisons between the limb scaling patterns of lissamphibians and other tetrapods, these preliminary results suggest that the conserved relationship between body mass and proximal limb bone circumference could be extended to encompass the majority of quadrupedal terrestrial tetrapods.

Implications for body mass estimation
In extinct taxa, skeletal measurement proxies of body size are often preferred to actual body mass estimates. Of the limb measurements taken here, results suggest that the regression between the total circumference of the humerus and femur to body mass exhibits the strongest relationship, with the highest R 2 values, and the lowest PPE, standard error of the estimate (SEE), and Akaike Information Criterion (AIC) values of all bivariate regression models ( Figure 4B; Additional file 5, Table S4). Among commonly cited proxies of size is femur length (for example, [15]). However, our analyses indicate that length measurements are generally poor indicators of size, especially compared to circumference ( Figure 4B). Femur length exhibits an especially high amount of error, with a 70% mean PPE in living mammals and reptiles, compared to the 25% for the combined humeral and femoral circumference. Caution should therefore be taken when using limb length as size proxies, especially when examining taxa that encompass a wide phylogenetic bracket.
Based on our results, we propose the following scaling equation as a robust predictor of body mass in quadrupedal tetrapods: log BM = 2.749 · logC H+F − 1.104 (1) where C H+F is the sum of humeral and femoral circumferences needed to estimate body mass. This regression exhibits a very high coefficient of determination (R 2 = 0.988), and a mean PPE of 25.6%. When adjusted for phylogenetic correlation/covariance between observations (that is, species) using a phylogenetic generalized least squares model, the equation is: log BM = 2.754 · logC H+F − 1.097 (2) which has an almost identical mean PPE (25%) as equation 1 ( Figure 4B).
In addition to examining bivariate estimates of body mass, we tested the predictive power of a variety of estimations based on multiple regressions by comparing their PPE, SEE, and AIC with those obtained from the bivariate regression of total circumference with body mass. Analyses including all proximal limb bone measurements also reveal low statistical values for both the raw data: log BM = 0.375 · logLH + 1.544 · log CH − 0.136 · log LF + 0.954 · log CF − 0.351 (3) and the phylogenetically corrected data: log BM = 0.212 · logLH + 1.347 · log CH − 0.533 · log LF + 0.749 · log CF − 0.76 (4) Equally low regression statistics were obtained for the multiple regression including only the circumference measurements, raw data: log BM = 1.78 · log C H + 0.939 · log C F − 0.215 (5) phylogenetically corrected data: log BM = 1.54 · log C H + 1.195 · log C F − 0.234 (6) None of the equations presented above are significantly better at predicting body mass than the combined humeral and femoral circumference (Equations 1 and 2); therefore, any of these equations are likely to provide robust estimates of body mass ( Figure 4B). However, given that equations 2, 4, and 6 account for phylogenetic non-independence, they are likely to represent the statistical error in the data better than the nonphylogenetically corrected data.
Not surprisingly, the masses estimated for several commonly cited non-avian dinosaurs provided by Equation 2 are more consistent with estimates generated from Anderson et al. [73] than volumetric model-based estimates for the same taxa (Table 6). This technique is also important in that it is specimen-based, and therefore explicit and repeatable, and allows uncertainty to be expressed in the estimate. These predicted masses and prediction error ranges, when compared to previous estimates based on volumetric reconstructions [49,51,71], show that many reconstructed models underestimate body mass, sometimes significantly below that predicted by the mean PPE ( Table 6). Given that life-reconstructions of extinct taxa are important for addressing several biological questions, including locomotion and weight distribution, our results provide the first objective framework with which to constrain these models and test whether their assumptions conform to the patterns seen in extant terrestrial tetrapods.

Conclusions
Body size is an important biological descriptor, and as a result, is critical to understanding the paleobiology of extinct organisms and ecosystems. This study presents an extensive dataset of extant quadrupedal terrestrial amniotes, which allows testing of the main criticisms that have been put forth against the use of scaling relationships to estimate body mass in extinct taxa. Our results demonstrate a highly conserved relationship between body mass and stylopodial circumference with minimal variation between clades and groups of different gait and size, compared over a large phylogenetic scope. This general relationship allows the estimation of body mass in extinct quadrupedal groups, and is particularly important for a wide range of paleobiological studies, including growth rates [31], metabolism [36], and energetics [105], as well as for quantifying body size changes across major evolutionary transitions that are accompanied by major changes in gait, including shifts in the early evolutionary history of archosaurs [106], and in the evolution of mammals from reptile-like basal synapsids [107,108].

Database construction
In order to test the hypotheses outlined in the introduction, we amassed an extensive dataset of limb bone measurements of 200 mammal and 47 non-avian reptile species from individuals that were weighed on a scale either prior to death or skeletonization; no extant body masses were estimated. For the most part, the dataset was built with newly measured specimens; however, it was augmented with published measurements from Christiansen and Harris [109] and Anderson et al. [73] [See Additional file 1, Dataset]. Measurements were taken from stylopodial elements, including maximum lengths and minimum circumference. Length measurements less than 150 mm were taken with digital callipers, longer dial callipers were used for measurements between 150 to 300 mm, and fiberglass measuring tape for those greater than 300 mm. Following the Anderson method, we use minimum circumference (thinnest region along the diaphysis) as a proxy for limb robusticity. In addition to reproducing the analysis presented by Anderson et al. [73], minimum circumference should provide a proxy of the minimum cross-sectional area of the bone and therefore be related to the overall compressive strength of the limb. Cross-sectional area was not used due to the cost of collecting this data. Moreover, circumference can be more easily measured on both extant and fossil samples, providing a larger extant dataset and a more inclusive framework for future predictive studies. Circumference measurements were taken with thin paper measuring tapes of different widths, depending on the size of the specimen being measured. All measurements were taken from both sides of the specimen, where possible, and averaged. Specimens measured are of adult body size. For most of the mammalian sample, the ontogenetic status of the specimen was determined based on the level of epiphyseal fusion. For the non-avian reptile sample, as Body masses estimated in this study are based on the phylogenetically corrected total stylopodial circumference equation (Equation 2) and the error range is based on the 25% mean prediction error obtained from the equation. References: A1985, Anderson et al. [73]; C1962, Colbert [46]; H1999, Henderson [51]; P1997, Paul [71]; S2001, Seebacher [49]. Museum abbreviations in dataset file [See Additional File 1 Dataset]. * -limb measurements based off of a cast mounted at the Senckenberg Museum, Frankfurt, Germany; † -measurements taken from Anderson et al. [73]; ‡ -measurements from Redelstorff and Sander [145]; § -all estimates presented under A1985 are based on the equations presented in that study, but based on the limb measurements presented in dataset S1, the only exceptions are B. brancai, which is based on data from A1985; ∫ -estimate from Henderson [63].
well as some of the largest mammals, maturity was established by verifying that the body mass of the measured specimen is similar to published reports of average body masses for that species (for example, [84,[110][111][112]). In general, only a single specimen of each species could be obtained; however, in instances where more than one adult individual was available, the largest individual was used in this study. In these cases, none of the exemplars used seem unusually large compared to the reported adult body mass in that species. Finally, this study compares taxa with different growth strategies (mammals have determinate growth whereas growth in reptiles is generally considered indeterminate, but asymptotic [113]) that may result in differences in size structuring within and between populations of taxa with these different strategies. If, and/or how, these differences affect limb to body mass scaling analyses is unknown at this time. However, the masses of the reptiles used here fall within the range of what is considered typical for an adult of each species, and, given our large sample and the nature of our results (see below), we expect that these effects will be minimal, yet may warrant future consideration.

Taxon sampling
Taxa were chosen based on three criteria: 1) The dataset must include a large range in body mass, so that sizerelated postural differences can be assessed [83,114]. We significantly expand upon the dataset of Anderson et al. [73], especially for large bodied mammalians species, to better represent the range of variation in limb proportions at large sizes and address the contention that certain large taxa are residual outliers [82]. Due to the limitations of measuring limb bone circumference, taxa below 50 g were not included in this study.
2) The sample must encompass a wide phylogenetic scope, so that most major mammalian and reptilian clades are sampled.
3) The sample must include taxa from a broad spectrum of lifestyles. Our study focuses on terrestrial taxa; however, we have also included mammalian or reptilian taxa with specialized lifestyles that have the potential to affect limb proportions and their relationship with body size. These include saltators (Macropodidae), brachiators (Hylobates lar, and Pongo pygmaeus), burrowers (for example, Talpidae), and amphibious taxa (Hippopotamidae and Crocodylia). The former three categories are associated with salient morphological features that allow these lifestyles to be recognized in the fossil record; however, the amphibious nature of several extinct taxa remains uncertain, and may affect how limb measurements scale with body mass due to the effects of buoyancy. Avian taxa were not included in the current study because they are bipedal. The forces exerted by body mass in a biped are transmitted through two limbs compared to four in a quadruped, and therefore direct comparisons of limb to body mass scaling between birds and quadrupedal tetrapods are difficult to interpret. A small sample of lissamphibians (one caudatan and seven anurans) for which live body mass is known was examined in this study. Unfortunately, the current sample size does not provide enough power to make meaningful slope and intercept comparisons, and lissamphibians are not included in the main comparisons presented in the results section.

Statistical analyses
The distribution of the variables used in this study are all positively skewed and, therefore, highly different from a normal distribution; as such all variables were logarithmically transformed (at base 10) to approximate a log-normal distribution. In addition to normality, log transforming reduces the level of heteroscedasticity in the data set, minimizes the effect of extreme outliers, and allows for the visualization of data in a linear fashion, which simplifies the visual comparisons of slopes [81,115]. The benefits and complications regarding the application of log transformation in predictive scaling relationships were recently debated by Packard et al. [82] and Cawley and Janacek [104]. We agree with the latter study, which demonstrated that log-transformed data is preferred for this type of analysis as it assigns an equal weight to all data points in a regression, rather than upper extreme values and, furthermore, residuals are not significantly related to size [104].

Interspecific limb scaling
All measurements were incorporated into a variety of bivariate plots and analyzed using the SMA line-fitting method (also known as Reduced Major Axis) [116]. The analyses compare a variety of measurements, including: 1) limb proportions, such as femur length to humerus length and humerus/femur length to circumference; and 2) limb measurements to body mass, such as humerus/ femur length versus body mass and humerus/femur circumference versus body mass. All SMA analyses were conducted using the open-source software R [117] and the package 'smatr' [116,118].
To address the criticisms raised against the Anderson method, subgroups within the data were compared. These include comparisons between mammalian clades for which a sample size greater than ten could be obtained, such as Ungulata, Carnivora, Marsupialia, Euarchonta, and Glires. In addition, comparisons were made between different size classes. Size class comparisons were based on three body mass thresholds: 20 kg, which was previously used by Economos [94] to show differential scaling in mammals, and it is also thought to represent the lower size limit for migratory mammals and hence may affect limb scaling patterns [4]; 50 kg, a threshold at which mammalian limb scaling has been previously noted to vary [93]; and 100 kg, previously used by Bertram and Biewener [78], and which allows better representation of the large-bodied portion of the dataset.
Fitted lines of different subsamples were compared based on the 95% confidence intervals of the slope and intercept, and differences were considered to be significant when intervals did not overlap. However, given that statistical significance can still be obtained even though confidence intervals overlap [81], we conducted a series of pairwise comparisons of the slopes and intercepts using a likelihood ratio test and a t-test, respectively. These tests have the added benefit that they can be corrected for errors associated with multiple comparisons using the FDR, an approach that, as far as we are aware, cannot be applied to confidence intervals [119,120]. The likelihood ratio test was implemented with the 'smatr' package [116,118]. Conventional methods for comparing intercepts (for example, ANCOVA, Wald statistic, and traditional t-tests) alter the original intercepts by forcing a common slope to each group being analyzed [115,116]. Although this may make statistical sense [116], it involves permuting the best fit-line away from the original biological data. As a result, here we compare intercepts using a two-tailed t-test based on equation 18.25 of Zar [115]: where b 1 and b 2 represent the pair of intercepts being compared, and SE SMA is the standard error of the difference in SMA intercepts, calculated as per equation 18.26 of Zar [115]. Comparing intercepts using this method has the added benefit of allowing comparisons of y-values along the true SMA lines at x-values other than 0. This is advisable when comparing biological scaling lines because first, the intercept at x = 0 is an extrapolation of the line beyond the range of the data [115], but perhaps more importantly given the type of data used here, a value of x = 0 is biologically meaningless. As a result, in addition to presenting the results of the t-test at the true intercept, we compare y-values at the minimum value of the total dataset along the x-axis using the same t-test method. The results of the two intercept comparison methods described above are presented, and all P-values are corrected using the FDR [119,120], implemented with the 'p.adjust' function in R. In total, 14 pairwise comparisons are made for each analysis.
In addition to comparing limb scaling patterns between different groups, scaling coefficients were used to test theoretical scaling models, such as geometric (GS), elastic (ES), and static (SS) similarity [95,96]. The models predict that under GS: circumference ∝ length; mass ∝ length 3 ; mass ∝ circumference 3 , under ES: circumference ∝ length 1.5 ; mass ∝ length 4 ; mass ∝ circumference 2/3 , and finally under SS: circumference ∝ length 2 ; mass ∝ length 5 ; mass ∝ circumference 2.5 . These models were tested against the empirical slopes obtained in this study using the method described by Warton et al. [116].

Phylogenetic independent contrasts
In addition to plotting the raw data, as was done by Anderson et al. [73], we calculated the phylogenetic independent contrasts (PIC) for the entire dataset in order to correct for non-independence of the raw data as a result of common ancestry [121]. We compared the scaling coefficients from the raw and phylogenetically corrected data to test if nonindependence significantly alters the scaling patterns obtained from the raw data. The phylogenetic tree [See Additional file 6, Figure S1] was constructed in Mesquite [122], based on recent phylogenetic analyses obtained for extant Mammalia [123], and non-avian reptiles [124][125][126][127][128][129][130]. Branch lengths are measured in millions of years. For the mammalian portion of the phylogeny we used the branch lengths of Bininda-Emonds et al. [123]. Branch lengths in the reptile portion of the tree were largely calculated using molecular estimates of divergence times [131][132][133][134][135][136][137][138]. However, species-level divergence times of some taxa, such as turtles, are poorly constrained, and as a result, we estimated the branch lengths based on the oldest known fossil occurrence for the species or genus obtained from the Paleobiology Database http://paleodb.org/.
Both theoretical and empirical studies of PIC state that in order for contrasts to receive equal weighting and thereby conform to the assumptions stipulated by parametric analyses and statistics, branch lengths must be adjusted so that contrasts are standardized, and therefore have a non-significant relationship with their standard deviation [139]. The criterion was not met by the raw branch lengths, but was obtained by transforming the branch lengths by their natural log. Branch lengths were assigned and transformed in Mesquite and the tree file was imported into R, where contrasts were calculated using the 'APE' package [140]. A best fit line was calculated for the contrasts using a SMA in the package 'smatr' [116], which allows for the line to pass through the origin, as stipulated by Garland et al. [139]. The PIC slopes for the entire dataset and subsets (as described above) were compared to slopes obtained from the raw data using the 95% confidence intervals.

Body mass estimation
In order to provide the best estimation parameter for body mass, a Model I (OLS) regression analysis is preferred. It is the most appropriate model for estimating a value of y based on x, as it accounts for the complete error of the y variable that can be explained by the x variable [81,141]. The analysis was performed on the entire dataset (N = 247) between body mass and a variety of limb measurements in order to test for the best predictor. The 'goodness of fit' of a predictor was examined based on the commonly used coefficient of determination (R 2 ); however, this value is considered a poor representation of the strength of a regression, due largely to its strong association with sample size [103]. Therefore, given the large dataset presented here, we provide three additional metrics, including the SEE, the PPE, and the AIC. The mean PPE is perhaps the best metric of regression strength for these types of analyses as it deals with the predictive strength of the relationship in relation to the nonlogged data. In addition, the PPE has the added benefit of allowing for calculation of confidence intervals around the mean PPE, and therefore facilitates comparison between the mean PPE of different models.
In addition to the OLS bivariate regression outlined above, we included all limb measurements into a suite of multiple regression analyses and, given that this technique is highly recommended [43,47,142], tested if they are significantly better predictors of body mass than bivariate regressions. The predictive accuracy of each analysis was compared using SEE, PPE, and AIC. Finally, because none of the bivariate or multiple regressions account for correlation and covariance of morphology between taxa as a result of phylogenetic history, we re-analyzed the data using a phylogenetic generalized least squares approach [143], a method recently applied to estimate body mass in extinct bovids [144]. Application of this method is based on the same phylogenetic tree, branch lengths [See Additional file 6, Figure S1], and a Brownian motion model of evolution. This approach was implemented using the 'APE' and 'nlme' packages in R.

Additional material
Additional file 1: Limb measurement and body mass data. Table of measurements of all the extant taxa used in the present study, as well as the limb measurements of the non-avian dinosaurian taxa shown in Table 6.
Additional file 2: Table S1. Raw and PIC stylopodial scaling in a subset of the mammalian dataset and non-avian reptiles. Mammalian subset corresponds to all taxa < 168 kg in order to better approximate body mass range in the sample of non-avian reptiles. Standardized Major Axis equation shown in the format y = mx + b (b = 0 in PIC). The particular theoretical scaling model (Sim.) followed by the slope is represented by G, geometric similarity, E, elastic similarity, or S, static similarity. Scaling patterns that fall between models are represented by > or <, and those that do not follow any pattern (that is, above or below all predicted models) are represented by a 0.
Additional file 3: Table S2. Phylogenetically corrected stylopodial scaling in mammals and non-avian reptiles. Scaling equation shown in the format y = mx. The particular theoretical scaling model (Sim.) followed by the slope is represented by G, geometric similarity, E, elastic similarity, or S, static similarity. Scaling patterns that fall between models are represented by > or <, and those that do not follow any pattern (that is, above or below all predicted models) are represented by a 0.
Additional file 4: Table S3. Raw and PIC stylopodial scaling in Artiodactyla and Bovidae. Standardized Major Axis equation shown in the format y = mx + b (b = 0 in PIC). The particular theoretical scaling model (Sim.) followed by the slope is represented by G, geometric similarity, E, elastic similarity, or S, static similarity. Scaling patterns that fall between models are represented by > or <, and those that do not follow any pattern (that is, above or below all predicted models) are represented by a 0.
Additional file 5: Table S4. Predictive power of various body mass estimation equations. Bivariate and multiple regression statistics for various body mass proxies discussed here (that is, circumference and length of the humerus and femur). Statistics include the Percent Prediction Error (PPE), along with its upper and lower 95% PPE Confidence Intervals (PPE CI), the Standard Error of the Estimate (SEE), the Coefficient of Determination (R 2 ), and the Akaike Information Criterion Score (AIC).
Additional file 6: Figure S1. Phylogenetic tree of mammalian and reptilian taxa included in this study. Topology is based on multiple published analyses mentioned in the text. Numbers indicate the branch lengths used in this study, measured in millions of years. Terminal branch lengths are most often given next to the species name.