October Relative Homoplasy Index: A New Cross-comparable Metric for Quantifying Homoplasy in Discrete Character Datasets Elizabeth M. Steell e.steell@ucl.ac.uk 0 1 2 3 4 5 6 7 Allison Y. Hsiang 0 1 2 3 4 5 6 7 Daniel J. Field 0 1 2 3 4 5 6 7 Address: Department of Earth Sciences, University College London , Kathleen Lonsdale Building, 5 Gower Pl. , London, WC1E 6BS , UK Department of Earth Sciences, University College London , Kathleen Lonsdale Building Department of Earth Sciences, University of Cambridge , Downing Street, Cambridge, CB2 Department of Geological Sciences, Stockholm University , 106 91 Stockholm , Sweden Downing Place , Cambridge, CB2 3EJ , United Kingdom Gower Place , London, WC1E 6BS , United Kingdom University Museum of Zoology, University of Cambridge , David Attenborough Building 2023 13 2023 3 36

Homoplasy is among the main hinderances to phylogenetic inference. However, investigating patterns of homoplasy can also improve our understanding of macroevolution, for instance by revealing evolutionary constraints on morphology, or highlighting convergent form-function relationships. Several methods have been proposed to quantify the extent of homoplasy in discrete character matrices, but the consistency index (CI) and retention index (RI) have remained the most widely used for decades, with little recent scrutiny of their function. Here,

-

we test the performance of CI and RI using simulated and empirical datasets and investigate patterns of homoplasy with different matrix scenarios. In addition, we describe and test a new scaled metric, the relative homoplasy index (RHI), implemented in the R statistical environment. The results suggest that, unlike the RI, the CI does not constitute a direct measure of homoplasy. However, the RI consistently underestimates the extent of homoplasy in phylogenetic character-taxon matrices, particularly in datasets characterised by high levels of homoplasy. By contrast, RHI4the newly proposed metric4outperforms both methods in sensitivity to homoplasy levels, and is scaled between zero and one for comparison of values 34

between different datasets. Using both simulated and empirical phylogenetic datasets,we 26 27 28 29 30 31 32 33 35 36 37 38 39 40 41 42 43 45 46 47 48 49 50 show that relative levels of homoplasy remain constant with the addition of novel characters, and, in contrast to earlier work, decrease with the addition of taxa. Our results help illuminate the inherent properties of homoplasy in cladistic matrices, opening new potential avenuesof research for investigating patterns of homoplasy in macroevolutionary studies.

INTRODUCTION

Homoplasy is an inevitable property of discrete character matrices constructed for 44 phylogenetic inference. In cladistic matrices, homoplasy is defined as the independent acquisition of or reversal to the same character state (Lankester 1870; Sanderson and Donoghue 1989), and is often considered a limitation in morphological datasets as it can obscure phylogenetic relationships (Archie 1996). However, with ever-improving methods to infer phylogenetic relationships and the increasing availability of genomic data for extant clades, accurately quantifying the extent of homoplasy in phylogenetic datasets has potential to be useful for informing estimates of evolvability and constraints on morphological evolution (Sanderson and Donoghue 1989; Wagner 2000; Sidlauskas 2008; Brocklehurst and

Methods to estimate homoplasy received considerable attention in the late 20th 54 century with an extensive body of research exploring parsimony-based metrics with cladistic matrices (summarised in Archie 1996). In recent years, however, few studies have explored patterns of homoplasy in discrete datasets, and the development of new homoplasy estimation methods have lagged behind methodological advances in phylogenetic inference. The two most widely-implemented and easily calculated homoplasy metrics are the parsimony-based consistency index (CI) (Kluge and Farris 1969) and retention index (RI) (Farris 1989). These metrics are implemented in phylogenetic inference programmes such as TNT (Goloboff etla. 2008) and the 8phangorn9 R package (Schliep 2011), and have been subject to numerous comparative analyses on their efficacy for detecting homoplasy, and used to explore patterns of homoplasy within discrete character matrices (Sanderson and Donoghue 1989; Meier et al. 64

1991; Archie 1996; Hauser and Boyajian 1997; Sookias 2020; Murphy et al. 2021). 51 52 53 55 56 57 58 59 60 61 62 63 65 66 67 68 69 70 71 72 73 74 75

However, CI and RI are not cross-comparable methods; that is, they cannot be directly compared across alternative datasets and phylogenetic trees, limiting attempts to infer generalisable conclusions about patterns of homoplasy in phylogenetic datasets. For instance, CI is highly sensitive to variable numbers of taxa (but not characters) in phylogenetic matrices, with a negative relationship detected between CI and the number of taxa included (Sanderson and Donoghue 1989; Hauser and Boyajian 1997; Murphy et al. 2021). By contrast, RI represents a more robust metric with respect to matrix dimensions (Hauser and Boyajian 1997; Murphy et al. 2021). Nonetheless, values of RI are not scaled between zero and one; instead, the minimum value (which indicates the maximum amount of homoplasy possible) is an arbitrary value somewhere in that range (Archie 1996). This property arises because RI is calculated using the theoretical maximum treelength for a given dataset; that is, the number of steps estimated for a matrix when the tree is a complete polytomy (Farris 1989; Archie 1996). Therefore, in any dataset where the tree has more than one resolved node, RI will never be truly scaled between zero and one, rendering values of RI non-comparable between different datasets. To overcome this problematic issue, Archie (1989) developed the homoplasy excess ratio (HER). Unfortunately, however, this method has its own limitations in that it is not applicable to phylogenies inferred under optimality criteria other than maximum parsimony. For instance, phylogenies inferred through Bayesian inference or incorporating topological constraints are not suitable for this method, as HER 84 implements maximum parsimony to infer topologies from randomised versions of character matrices in order to estimate an expected maximum value of homoplasy. Furthermore, HER is particularly time consuming and computationally demanding to calculate, and as such has not seen wide adoption.

Towards improving the rigour of systematic investigations of patterns of homoplasy, we present a novel metric, the relative homoplasy index (RHI), for estimating the extent of homoplasy in phylogenetic datasets. The algorithmic basis for RHI is similar to RI, but the resulting metric is consistently scaled between zero and one for empirical datasets, thus facilitating statistically meaningful comparisons across datasets with differing num brse of taxa and characters. We compare RHI to CI and RI using simulated matrices and empirical 94

datasets, and test the sensitivity of each metric to variation in numbers of taxa, characters, 76 77 78 79 80 81 82 83 85 86 87 88 89 90 91 92 93 95 96 97 98 99 and character state transition rates. We also investigate six empirical datasets deriving from four recent studies on bird phylogenetic relationships (Ksepka et al. 2019; Field et al. 2020; Benito et al. 2022; Steell et al. 2023) to explore these patterns of homoplasy in various bird clades, and analyse simulated character matrices based on these empirical topologies to account for systematic bias from random tree generation methods. MATERIALS & METHODS

Homoplasy Indices

Consistency index4 The consistency index (CI) is defined as: 104 (1) = !!"#

! 101 102 103 105 106 107 108 109 110 111 112 113 115 116 117 118 119 120 121 122 123 125 114 in the matrix). CI is generally interpreted as how well a dataset 8fits9 a particular tree under where L represents tree length (the number of steps for a tree with a given matrix, regardless of whether the tree was inferred from that matrix or not) andLmin represents the theoretical minimum tree length for a matrix with the same dimensions (i.e., same number of taxa, characters, and proportion of multistate characters) 3 that is, the length of the tree if there were no homoplasy in the dataset (Kluge and Farris 1969; Farris 1989).

Lmin is calculated by: (2) "#$

$$ = 3#%& # 2 1 where si is the number of states for theith character, and nc is the total number of characters in the matrix. As Lmin is necessarily always f L by definition, CI is always a value between 0 and 1 (0 < CI f 1; Lmin cannot equal 0 as long as there is at least one non-invariable character the assumption of parsimony (the 8consistency9; Kluge and Farris 1969). The complement 1 3 CI can then be interpreted as how poorly a dataset fits the tree under consideration.

Retention index4The retention index (RI) is defined as: (3) = 1 2

' '!%& = 1 2 length when the matrix is applied to a completely unresolved tree (star tree or single 124 polytomy) (Farris 1989; Archie 1996; Cuthill et al. 2010).

CI and RI can be expressed with respect to each other: (4) (5)

Relative homoplasy index4We introduce a novel metric, the relative homoplasy index (RHI), which is defined as follows: (6) where Lnull (null tree length) represents the tree length for the matrix when no phylogenetic signal is present except that which is present due to chance. Lnull is calculated by taking the median tree length for the empirical matrix with a set of randomised trees. In these randomised trees (null trees), the topology of the empirical tree (i.e., tree shape) is maintained, but the tip names are randomised. Therefore, the properties of the empirical tree4such as number of supported nodes, tree balance, and tree shape4are retained in the set of null trees, avoiding systematic biases associated with these properties(Felsenstein 2004; Simmons et al. 2004).

The relative homoplasy index differs from the consistency index and retention index in that 0 indicates an absence of homoplasy for a matrix (whenL = Lmin), and 1 indicates that 144 a matrix shows as much homoplasy as possible when the corresponding topology is resolved beyond a polytomy (when L = Lnull). In this case, a value of 1 would occur if the matrix contains no phylogenetic information with respect to the tree, beyond that which is present by chance. RHI can be calculated for any discrete matrix with a given phylogeny, regardless of the method or dataset that the phylogeny was inferred from. RHI is a cross-comparable metric that can be applied to matrices with accompanying phylogenetic trees that may have 126 127 128 129 130 131 132 133 135 136 137 138 139 140 141 142 143 145 146 147 148 149 been generated from independent matrices (e.g., molecular data), when the topology is a consensus between multiple datasets, or when the topology is partially constrained by another phylogenetic hypothesis. 154

Data Acquisition and Analyses

We selected published phylogenetic datasets with a focus on birds to investigate homoplasy in this study. Datasets were selected based on their properties such as tree inference method, taxon and character dimensions, and availability of data from public repositories. The six datasets used in this study are summarised in Table 1 and were taken from four published studies with morphological matrices of different bird taxa. Datasets from Benito et al. (2022) and Steell et al. (2023) were formatted so that two different treatments of the matrix and/or empirical tree could be analysed. For Benito et al., two datasets were analysed (Avialae a and Avialae b). Both matrices are the same, but the topologies are based on different phylogenetic inference methods; the topology for Avialaae was inferred under 164 maximum parsimony, and the topology for Avialaeb was inferred under Bayesian inference. For Steell et al., two datasets were analysed with the same characters but slightly different taxon samples and topologies (Passeriformesa and Passeriformesb). Passeriformesa includes only extant taxa, and the topology was not inferred from the morphological matrix, but from entirely independent molecular datasets from other studies (Prum et al. 2015; Oliveros et al. 2019; Harvey et al. 2020). Passeriformes b includes all taxa from Passeriformesa in addition to the fossil specimens analysed in the original publication, with the topology inferred from the morphological dataset with a topological constraint for the extant taxa based on the aforementioned molecular studies. Empirical topologies used in this study are represented in Figure 1. The topologies used in this study were taken as previously 174 published; as such, some are unrooted and all except Passeriformesa are incompletely resolved (Fig. 1; Table 1). We chose to leave the trees in this condition instead of reanalysing them as we were interested in investigating patterns of homoplasy across a range of real datasets and phylogenetic trees, as opposed to reanalysing phylogenetic relationships. *No topological constraints applied in original phylogenetic analysis. Percentage of supported nodes is based on percentage of fully resolved nodes in the original publication of the phylogeny. a Ksepka et al. (2019); b Steell et al. (2023); c Field et al. 2020; d Benito et al. 2022. Character numbers may differ from the original publication because invariant 184

characters were removed prior to analyses in this work. 175 176 177

All analyses were carried out in R v4.2.2 (R Core Team 2021) , primarily using functions within the 8APE9 (Paradis et al. 2004), 8phangorn9 (Schliep 2011), and 8dispRity9 (Guillerme 2018) packages. All code used here, including novel functions, can be found at github.com/LizzySteell/Homoplasy. Invariant characters were removed prior to analysis. Invariant characters are uninformative with respect to homoplasy, and removal was necessary because the RHI function requires the minimum number of states for any character to be two, rather than one. Inapplicable characters (usually represented as 8-9) were treated as 194 ambiguities (missing states; 8?9) for simplicity. Using the 8phyDat9 matrix format implemented in 8phangorn9, we used contrast matrices to specify how steps should be counted for ambiguities and polymorphic scorings. Contrast matrices enable ambiguities (8?9) to be interpreted as all possible numbered states (i.e., 809, 819, 829, etc.), meaning no step is counted from a numbered state to an ambiguity and vice versa. Polymorphisms are interpreted as all numbered states specified in the polymorphism (i.e., 8{01}9 will be interpreted as both 809 and 819). This means that when polymorphisms are present, the least costly state in the polymorphism contributes to the tree length (for further discussion, see Watanabe 2016). In this way, matrices in the 8phyDat9 format implemented by 8phangorn9 can be analysed with the same treatment of polymorphisms as in maximum parsimony 204 packages such as TNT (Goloboff et al. 2008). Therefore, all metrics calculated here represent a minimum estimate of homoplasy, and increasing the proportion of ambiguities in a matrix will further decrease the amount of homoplasy detected.

Consistency index and retention index were calculated for all empirical datasets with all characters treated as unordered. CI and RI were calculated using CI() and RI() functions from 8phangorn9. For illustrative purposes CI and RI values are reported in the figures as 1 3 CI and 1 3 RI, respectively, such that a value closer to 1 indicates more homoplasy than a value closer to 0. This allows for a scale that is consistent across metrics and more easily comparable. A new function was written to calculate the relative homoplasy index (RHI()) that includes the option to specify certain characters as ordered. The arguments for RHI() are 214

8data9 (matrix read into R using read.nexus()), 8tree9 (phylogenetic tree of class 8phylo9), 8n9 198 199 200 201 202 203 205 206 207 208 209 210 211 212 213 215 216 217 218 219 220 221 222 (number of null trees to evaluate), 8contrast9 (object containing a contrast matrix), 8ord9 (a vector specifying ordered characters; default = NULL) and 8cost9 (object containing a cost matrix to specify the number of steps taken during transitions of ordered character states; default = NULL). To obtain a distribution of null tree lengths for RHI, tips were randomised using sample() without replacement while maintaining the overall topology (tree shape). The parsimony() function was then applied to the empirical matrix and each randomised tree to obtain a distribution of values forLnull. Null tree length values are normally distributed (Supplementary Figure 1, Appendix 1) but the range of values varies depending on the properties of the supplied tree and matrix. Generally, we recommendn = 1000; but n = 100 may be adequate based on the simulations and the close similarity of theLnull distributions at different values ofn (Supp. Fig. 1). The function outputs the medianLnull, 5% and 95% quantiles, tree length (L) and theoretical minimum tree length (Lmin). The second output in the list returned by RHI() is a vector containing the null tree lengths.

We carried out three analyses using simulated matrices based on empirical topologies, and three analyses with empirical topologies and associated empirical datasets. Analyses are Telluraves

Mean of log Passeriformes transformed -8 to 2 taxa summarised in Table 2.

Analysis

Datasets Simulated matrices 1. Inflating transition rate 2. Varying of Empirical matrices Comparison

Passeriformes 22 27 27 28 20 1* 2700 2700 2800 2000 1000* homoplasy

Passeriformes metrics across datasets 5. Varying 6. Varying taxa

Table 2 Summary of analyses. * 1 run per metric and 1,000 iterations (n = 1,000) for RHI only. 11 14 15 15 15 15 14 20 20 15 14 14 1100 1400 1500 1500 1500 1500 1400 2000 2000 1500 1400 1400

Investigating Homoplasy in Simulated Datasets

Matrices were simulated with topologies following those in the source publications (Fig. 1) for the six datasets (Table 1). These empirical topologies were selected as reference trees for simulations with the aim of simulating realistic character matrices. Random rtee generation methods, such as the coalescent, birth-death or non-evolutionary model methods implemented in 8APE9 (Paradis et al. 2004) each produce trees with characteristic prpoerties, such as particular tree shapes, with different frequencies. Therefore, publication-derived empirical topologies were used to avoid potential systematic bias resulting from random tree generation methods. Estimating the extent of how homoplasy varies with respect to differing 244 tree shapes is beyond the scope of this study.

Matrix simulation required fully resolved (bifurcating) trees with branch lengths. We used the accelerated transformation (ACCTRAN) algorithm (Farris 1970; Swofford and Maddison 1987) as implemented in phangorn9s acctran() function with each empirical topology and its corresponding empirical matrix to calculate and assign branch lengths. This function also randomly resolves polytomies. Matrices were simulated using sim.morpho() in the 8DispRity9 package (Guillerme 2018). All simulations had a 9:1 binary to three-state character distribution in the matrix, regardless of matrix dimensions. Matrices were simulated under an equal rates Markov model (Lewis 2001) where rates among characters varied based on a predefined lognormal distribution, and each transition within a character occurred at the 254 same rate with equal probabilities. A log-normal distribution was chosen as more appropriate than a gamma distribution as morphological characters are likely affected by hierarchical integration of characters (Wagner 2012). The standard deviation of the log-transformed rate (sd log) was set to 0.5 in all simulations. The mean of the log-transformed rate (mean log) was varied depending on the analysis (described below), as well as the number of simulations generated per run (Table 2). All metrics calculated for the simulated matrices assumed no character state ordering. RHI for the simulations was performed for 100 iterations per run (n = 100).

Analysis 1: inflating transition rate4 I tested the sensitivity of RHI, CI and RI at detecting homoplasy by using the transition rate of character state change as a proxy for 264 homoplasy in simulated matrices. Higher transition rates lead to increased homoplasy because a greater number of character state transitions may result in independent acquisitions or reversals to the same state (Simmons et al. 2004; Harrison and Larsson 2015). However, the efficacy of increasing transition rates as a proxy for homoplasy will decrease with increased numbers of character states present in a matrix. Moreover, the assumption of homoplasy increasing in tandem with transition rates may not be valid if transitions between states do not occur with equal probability. Therefore, the simulations assume a conservative 9:1 ratio of binary to three-state characters and equal probabilities of character state change directionality. We varied transition rates by employing a range of values for the mean log µ() transition rate and simulating 20 matrices per rate category (Table 2). The range of values (-8 274 to 2) forµ was selected based on trialling multiple scenarios with different ranges of values; this range of values was appropriate for the empirical datasets investigated here, as it covers very low transition rates up to very high rates. The median, 5% and 95% quantiles of RHI, 1 3 CI and 1 3 RI were plotted against µ for four topologies (Table 3.2) which were chosen based on their contrasting matrix dimensions. We also normalised median treelength between 259 260 261 262 263 265 266 267 268 269 270 271 272 273 275 276 277 278 279 280 281 282 0 and 1 with the following formula: (7)

!("#$ (!) "/0(!)("#$ (!) where min(L) and max(L) represent the minimum and maximum treelengths from the simulation.

Analysis 2: varying number of taxa with sub-tree sampling4 To test how homoplasy metrics behave with different numbers of taxa, we sampled sub-trees from two empirical topologies (Telluraves and Passeriformesa). A new function, subset.trees(), was written to sample sub-trees from a given topology (argument 8tree9) where the number of tips to be retained (8ntaxa9) and the number of samples to be taken (8n9) are specified. One hundred sub-trees were sampled without replacement per set of taxa for each topology, with number of taxa decreasing in increments (Table 3.2). we investigated simulated matrices with two transition rate categories, µ = -5 (slow rate) and µ = -3 (fast rate). One matrix was simulated for each sub-tree. RHI, 1 3 CI and 1 3 RI were calculated for each simulated matrix and subtree and plotted against number of taxa.

Analysis 3: varying number of characters4 Similar to Analysis 2, we used the 294 topologies from Telluraves and Passeriformesa to investigate metrics with decreasing numbers of characters. Number of characters was specified directly when simulating matrices, and 100 matrices were simulated for each increment of character number, per topology and rate category (Table 2). Rate categories were the same as for Analysis 2. For both Analyses 2 and 3, we performed 20-30 runs per topology and rate category (Table 2).

Homoplasy in Empirical Datasets Analysis 4: comparison of homoplasy metrics across datasets4 RHI, CI and RI were

calculated for the six empirical datasets. Topologies were left as they were in the original tree files; therefore, some topologies were not fully bifurcating, and some were unrooted (Fig. 1). 304

All metrics other than RHI were calculated with character states unordered, and RHI was calculated with the set of ordered characters as specified in the original nexus file for each dataset. RHI was run for 1,000 iterations (n = 1,000) per dataset, for both unordered and ordered runs. The 5% and 95% quantiles for RHI were reported alongside the median value.

Analyses 5 & 6: varying numbers of taxa and characters4 As in Analysis 2, number

of taxa was varied for Analysis 5 with sub-trees sampled from all six empirical topologies. Metrics were calculated from the empirical matrices with each sub-tree. Number of characters was varied for Analysis 6 by randomly sub-setting the empirical matrix without replacement. All six datasets were investigated in Analyses 5 and 6. RHI calculations for Analyses 5 and 6 assumed no character state ordering. Between 10320 increments of 314 taxa/characters were analysed per dataset (Table 2). RESULTS

Simulated Matrices

Analysis 1 evaluated the sensitivity of each metric (RI, CI, and RHI) to increasing character transition rates, a proxy for homoplasy. The results indicate that as transition rate increases (by increasing mean log, µ), homoplasy increases (Fig. 2), although the extent of homoplasy detected is dependent on the metric. Normalised tree length (scaled L) for each dataset shows that at approximately µ = -2, tree length reaches its maximum value for each dataset (Fig. 2). The consistency index rapidly indicates more homoplasy with a small 324 increase in transition rate, and subsequently asymptotes at a much lower µ value than the other metrics. In all datasets except Neornithes (Fig. 2c), 1 3 CI reaches a maximum value of 1, with Neornithes reaching maximum value at 0.9. Retention index is more robust to increases in transition rate, with 1 3 RI reaching an asymptote at approximately equivalent µ characters in simulated matrices. values to the asymptote point on the X axis of the normalised tree length. However, on the Y axis, 1 3 RI does not occupy all homoplasy values between 0 and 1, and instead consistently asymptotes at values below 1. The relative homoplasy index follows the same general pattern as 1 3 RI but occupies the full range of homoplasy values between 0 and 1 on the Y axis, and asymptotes at the same point on the X axis (equivalent µ values) as the normalised tree length in all datasets.

Analysis 2 tested the sensitivity of metrics to changes in number of taxa, while keeping number of characters the same. Across both low (µ = -5) and high (µ = -3) transition rates, Analysis 2 indicates that increasing the number of taxa in a topology impacts the homoplasy values estimated by CI, RI and RHI (Fig. 3a, c, e, g). Greater numbers of taxa cause increases in values of 1 3 CI, which asymptotes at a maximum value on the Y axis with fewer taxa at higher transition rates (Fig. 3c, g). In contrast, both 1 3 RI and RHI values decrease as more taxa are added (Fig. 3a, c, e, g). The extent of decrease in the level of homoplasy detected by 1 3 RI and RHI increases markedly at higher transition rates (Fig. 3c, g). In other words, if the character state transition rate is high, a greater decrease in homoplasy will be detected by RI and RHI when taxa are added into a dataset and all other 344 variables stay the same. In Analysis 3, where the number of characters was varied but taxon number was held constant, there is no observable change across the investigated homoplasy metrics as the number of characters was increased (Fig. 3b, d, f, h). More variance was observed for each metric at lower numbers of both taxa and characters, as well as at lower transition rates (Fig. 3).

Estimates of homoplasy from each investigated method (consistency index, CI; retention index, RI; relative homoplasy index, RHI). L: tree length.

Empirical Matrices

In analysis 4, homoplasy was quantified with each metric for each empirical dataset, and results are summarised in Table 3 and Figure 4. Overall, the two Passeriformes datasets 354 show the highest values for all metrics, across both ordered and unordered character treatments, compared to the other bird matrices analysed here. For RHI, the 90% quantile ranges overlap for ordered and unordered treatments for each dataset (Fig. 4). Neornithes shows the greatest 90% quantile range, which is consistent with the comparatively shallow bell curve of the NeornithesLnull distribution (Supplementary Figure 1, Appendix 1).

Analyses 5 and 6 involved varying the number of taxa and characters, respectively, in each empirical dataset through sampling. Analysis 5 also indicates that with increasing numbers of taxa, 1 3 CI increases, but 1 3 RI and RHI decrease (Fig. 5a, c, e, g, i, k). However, the manner in which values of 1 3 CI, 1 3 RI and RHI change with increasing the size of the taxon samples varies depending on the empirical dataset investigated. Overall, 1 3 364

CI changes more abruptly than 1 3 RI and RHI with increases in taxon sample, which illustrate comparatively shallow decreases with additional taxa. Similar to Analysis 3, Analysis 6 shows that increasing number of characters does not influence any of the investigated homoplasy metrics (Fig. 5b, d, f, h, j, l). characters in empirical matrices. 369

DISCUSSION

Our new metric, the relative homoplasy index (RHI), uses a novel tip randomisation approach to calculate the relative extent of homoplasy in a phylogenetic character-taxon matrix by comparing the observed amount of homoplasy in that dataset to an estimated null value for that matrix and topology. RHI is computationally tractable as implemented in theR 374 statistical environment (github.com/LizzySteell/Homoplasy) and provides better crosscomparable estimates of homoplasy than the commonly used consistency index (CI) and retention index (RI). Through analyses of both simulated and empirical discrete character matrices, we show that RHI is more accurate at detecting homoplasy than CI and more sensitive than RI, particularly in datasets with high levels of homoplasy. 370 371 372 373 375 376 377 378 379 380 381 382 383 385 386 387 388 389 390 391 392 393

Patterns of Homoplasy

Although RI has been considered robust with respect to variable numbers of taxa and characters in phylogenetic datasets (Murphy et al. 2021), its major limitations are that it underestimates the proportional extent of homoplasy in a given dataset, and cannot be 384 appropriately compared across alternative datasets because it is not scaled between zero and one, but rather between zero and an arbitrary value conditioned on each particular dataset (Archie 1996). As such, the main advantage of this newly proposed metric, the relative homoplasy index (RHI) over RI is that values of RHI are scaled consistently (Fig. 2), and therefore should provide a more useful method to accurately estimate the relative extent of homoplasy in empirical datasets. The impact of this scaling consistency becomes particularly important when datasets exhibit elevated levels of homoplasy; the simulations in Analysis 1 predict that matrices with greater overall transition rates, and therefore higher levels of homoplasy, will show a greater divergence between 1 3 RI and RHI values than matrices with lower transition rates (Fig. 2). In other words, RI underestimates levels of homoplasy to 394 a greater extent in datasets exhibiting greater degrees of homoplasy. As expected, Analysis 4 shows that the empirical matrices with the highest levels of homoplasy (Passeriformesa and b) show a greater divergence between 1 3 RI and RHI than the other datasets (Fig. 4). In addition, RHI values are consistent across the alternative versions of the Passeriformes dataset (Passeriformesa and b), but the RI values for each differ slightly (Table 3).

Analyses 2 and 5 show that homoplasy as estimated by RHI and RI decrease as taxon numbers increase, whereas homoplasy detected by CI increases considerably with the addition of taxa (Figs. 3 & 5; Table 2). This pattern in CI mirrors early work investigating homoplasy in discrete matrices, which concluded that adding more taxa into a dataset would increase estimated levels of homoplasy (Sanderson and Donoghue 1989). This longstanding 404 hypothesis has been corroborated by numerous subsequent studies (Meier et al. 1991; Hauser and Boyajian 1997; Murphy et al. 2021), but the explanation of why homoplasy increases with the addition of taxa has been somewhat simplistically attributed to the increased detection of individual homoplastic character state changes with the addition of taxa and/or characters (Sanderson and Donoghue 1989; Hauser and Boyajian 1997; Murphy et al. 2021).

CI decreases due to cumulative homoplastic events in a dataset resulting in a greater 410

tree length. However, when considering CI in this format: 411 (8) '1 !!"# H represents the number of extra steps for a tree with a given matrix, and can be considered an unscaled, arbitrary value of homoplasy (Kluge and Farris 1969). Lmin does not change with 414 respect to additional taxa, and only varies with changes in number of characters or the composition of character states in a matrix. Therefore, although a tree has a greater probability of gaining homoplastic state changes with the addition of more branches (i.e., increases in H via the addition of taxa), this raw quantification of homoplasy in the CI equation is arbitrary and non-comparable; thus, the CI is not a useful tool when attempting to compare estimates of homoplasy from alternative datasets. The original purpose of CI was to measure the consistency of a given dataset with a given tree in order to illustrate how far an empirical topology diverged from a theoretical zero-homoplasy topology derived from a version of the matrix where each character underwent the minimal number of changes (i.e.,L = Lmin; Kluge and Farris 1969; Farris 1989). The use of CI to quantify homoplasy has been 424 warned against previously due to its high sensitivity towards increasing numbers of taxa 420 421 422 423 425 426 427 428 429 430 431 432 433 435 436 437 438 439 (Archie 1989, 1996). We argue that CI should not be considered a measure of homoplasy at all (after all, the opposite of 8consistency9 as described above does not equate to homoplasy). Rather, CI merely provides a proxy for homoplasy that performs poorly in the face of variable taxon numbers4instead of directly tracking homoplasy, CI tracks an increase in tree length that does not scale proportionally with changes in relative homoplasy.

RI and RHI, on the other hand, are expected to measure a decrease in homoplasy with the addition of taxa to a given dataset, as demonstrated in our results (Figs. 3 & 5). The addition of branches to a topology leads to an increase in the total potential amount of homoplasy in a dataset (Lmax in the case of RI andLnull in the case of RHI), due to the (Felsenstein 1978). However, provided the rate distribution of character state transitions for the matrix is maintained, there is no reason to expect more homoplasy in a dataset with more taxa. On the contrary, observed homoplasy values will be proportionally lower compared to the increasing potential homoplasy of the dataset as a taxon sample grows in size.

We find that CI, RI and RHI do not vary with increasing numbers of characters (Figs. 434 exponentially increasing number of possible arrangements of tips as taxa are added 440 3 & 5). For CI, provided the rate distribution of character state transitions is the same 441 regardless of number of characters,L and Lmin (and thereforeH) increase proportionally as 442 characters are added. For RI and RHI, again assuming transition rates are maintained, L and 443 Lmax and L and Lnull, respectively, also increase proportionally as characters are added. 444

Previous studies investigating patterns of detected homoplasy with CI, RI and 445 additional parsimony metrics such as the homoplasy excess ratio (Archie 1989) and the 446 homoplasy slope ratio (Meier et al. 1991) report conflicting results. Archie (1989) observed 447 that RI (therein referred to as the 8homoplasy excess ratio maximum9) detects less homoplasy 448 in subsets of datasets with more taxa, although a slight increase in homoplasy with the 449 addition of characters is observed (Archie 1989). Other studies using multiple non-subset empirical datasets report mixed results regarding RI. For example, Hauser and Boyajian (1997) found a slight decrease in homoplasy with increasing taxa in their meta-analysis of 600 datasets, but Murphy et al. (2021) showed the opposite trend in an analysis of 364 datasets. These variable patterns are likely explained by idiosyncrasies specific to the datasets 454 analysed; each dataset exhibits a particular amount of estimated homoplasy dependent on 450 451 452 453 455 456 457 458 459 460 461 462 463 465 466 467 468 factors such as tree shape, character state space (Simmons et al. 2004) and biological drivers of homoplasy. As our results show, larger datasets do not show more homoplasy based on their dimensions alone.

Future Directions for Homoplasy Quantification

Tree shape is expected to impact patterns of homoplasy estimation (Simmons et al. 2004), which is why a major advantage of the RHI is the tip randomising process. We randomised tips to retain the overall tree shape of each empirical tree, rather than sampling randomly generated trees that may come from a distribution of variable tree shapes with 464 unequal frequencies (Felsenstein 2004). Simmons et al. explored the effect of tree shape on RI and CI with simulated datasets and found that with increasing transition rate, homoplasy detected by RI is affected substantially depending on tree shape. At low transition rates, levels of homoplasy increase significantly more for a completely asymmetrical, unequal branch-length tree than they do for a completely symmetrical, equal branch-length tree (Simmons et al. 2004). Investigating homoplasy patterns with random trees generated by different methods, for example those implemented in 8APE9 (Paradis et al. 2004), was beyond the scope of the current study but could be useful for the development of new, nonparsimony methods for homoplasy quantification.

In the future, the RHI could be adapted and expanded to estimate homoplasy in a 474 probabilistic framework. Instead of counting steps directly from the matrix, branch-lengths representing evolutionary change that are generated from statistical phylogenetic frameworks (i.e., maximum likelihood or Bayesian inference) could be summed to estimate the observed homoplasy in the dataset. However, randomising the tips would not be a sufficient method to generate a null distribution of summed branch lengths because branch lengths remain the same if the tree shape is retained. Therefore, an alternative way to simulate random trees in order to estimate total potential homoplasy would be required. Statistical trait evolution models, such as the birth-death model (Paradis 2010; Stadler 2011), could be utilised with parameters modelled after the empirical topology to retain its characteristics. This would open new avenues for estimating homoplasy beyond morphological datasets, and could 484

potentially be applicable to molecular sequence data. 469 470 471 472 473 475 476 477 478 479 480 481 482 483 485 486 487 488 489 490 491 492 493

The applications of quantifying homoplasy are far reaching. Homoplasy estimates can be used to partition morphological datasets to improve phylogenetic resolution using percharacter homoplasy scores to place characters into rate categories (Rosa et al. 2019) . It would be possible to use a per-character application for RHI for this, although this step may prove time-consuming. Using RI would likely be sufficient and faster for estimating levels of homoplasy for individual characters in a matrix (for instance, by using the 8sitewise9 argument in the RI() function in 8phangorn9 (Schliep 2011)) in order to partition characters into approximate categories. Nonetheless, for accurate estimates of per-character or whoel matrix homoplasy, RHI will perform better. Additionally, quantifying homoplasy enables the 494 exploration of why clades are more morphologically labile or constrained. When accompanied by methods investigating character state space and character state exhaustion (Wagner 2000; Hoyal Cuthill 2015; Brocklehurst and Benson 2021), homoplasy quantification can become a tool to better understand patterns in morphological diversification and evolution (Sidlauskas 2008). The RHI metric provides an important starting point in comparative analyses of homoplasy as it presents a non-computationallydemanding method to directly compare the relative extent of homoplasy across empirical phylogenetic datasets.

ACKNOWLEDGEMENTS

504

We thank D. Ksepka for providing a tree file of the published Telluraves topology for analyses. We are indebted to R. Benson for help and support with the methodological approach. We are grateful to D. Brennan, J. Benito, G. Navalón, M.G. Burton, and A. Chen for insightful discussion. This work was supported by an UK Research and Innovation Future Leaders Fellowship MR/S032177/1 to DJF, a Natural Environment Research Council NE/S007164/1 to EMS, and a Swedish Research Council Starting Grant Within Natural and Engineering Sciences ÄR-NT 2020-03515 to AYH. For the purposes of open access, the authors have applied a Creative Commons Attribution (CC BY) license to any Author Accepted Manuscript version arising. 514

LITERATURE CITED 495 496 497 498 499 500 501

Archie J.W. 1989. Homoplasy excess ratios: New indices for measuring levels of homoplasy in phylogenetic systematics and a critique of the consistency index. Syst. Zool. 38:2533 269.

Archie J.W. 1996. Measures of homoplasy. In: Sanderson M.J., Hufford L., editors.

Homoplasy: the recurrence of similarity in evolution. Elsevier. p. 1533188. supports a neognathous crown bird ancestor. Nature. 612:1003105.

Brocklehurst N., Benson R.J. 2021. Multiple paths to morphological diversification during the origin of amniotes. Nat. Ecol. Evol. 5:124331249. 524

Cuthill J.F.H., Braddy S.J., Donoghue P.C.J. 2010. A formula for maximum possible steps in multistate characters: Isolating matrix parameter effects on measures of evolutionary convergence. Cladistics. 26:983102.

Farris J.S. 1970. Methods for computing Wagner trees. Syst. Biol. 19:83392.

Farris J.S. 1989. the Retention Index and the Rescaled Consistency Index. Cladistics. 5:4173 419.

Felsenstein J. 1978. The number of evolutionary trees. Syst. Zool. 27:27333.

Felsenstein J. 2004. Inferring phylogenies. Sinauer Associates.

Field D.J., Benito J., Chen A., Jagt J.W.M., Ksepka D.T. 2020. Late Cretaceous neornithine from Europe illuminates the origins of crown birds. Nature. 579:3973401. 534

Goloboff P.A., Farris J.S., Nixon K.C. 2008. TNT, a free program for phylogenetic analysis. 519 520 521 522 523 525 526 527 528 529 530 531 532 533 535 536 537 538 539 540 541 542 543

Cladistics. 24:7743786.

Evol. 9:175531763.

Guillerme T. 2018. dispRity: A modular R package for measuring disparity. Methods Ecol. Harrison L.B., Larsson H.C.E. 2015. Among-character rate variation distributions in phylogenetic analysis of discrete morphological characters. Syst. Biol. 64:3073324. Harvey M.G., Bravo G.A., Claramunt S., Cuervo A.M., Derryberry G.E., Battilana J., Seeholzer G.F., Shearer McKay J., O9Meara B.C., Faircloth B.C., Edwards S. V., PérezEmán J., Moyle R.G., Sheldon F.H., Aleixo A., Smith B.T., Chesser R.T., Silveira L.F., Cracraft J., Brumfield R.T., Derryberry E.P. 2020. The evolution of a tropical 544

biodiversity hotspot. Science (80-. ). 370:134331348.

Hauser D.L., Boyajian G. 1997. Proportional change and patterns of homoplasy: Sanderson and Donoghue revisited. Cladistics. 13:973100.

Hoyal Cuthill J. 2015. The size of the character state space affects the occurrence and detection of homoplasy: Modelling the probability of incompatibility for unordered phylogenetic characters. J. Theor. Biol. 366:24332.

Kluge A.G., Farris J.S. 1969. Quantitative phyletics and the evolution of anurans. Syst. Biol. Ksepka D.T., Grande L., Mayr G. 2019. Oldest Finch-Beaked Birds Reveal Parallel

Ecological Radiations in the Earliest Evolution of Passerines. Curr. Biol. 29:657-663.e1. 554

Lankester E.R. 1870. II.4On the use of the term homology in modern zoology, and the 545 546 547 548 549 550 551 552 553 555 556 557 558 559 560 561 562 563 565 566 567 568 564 18:1332. 6:34343.

distinction between homogenetic and homoplastic agreements. Ann. Mag. Nat. Hist. Lewis P.O. 2001. A likelihood approach to estimating phylogeny from discrete morphological character data. Syst. Biol. 50:9133925.

Meier R., Kores P., Darwin S. 1991. Homoplasy slope ratio: a better measurement of observed homoplasy in cladistic analyses. Syst. Zool. 40:74388.

Murphy J.L., Puttick M.N., O9Reilly J.E., Pisani D., Donoghue P.C.J. 2021. Empirical distributions of homoplasy in morphological data. Palaeontology.:pala.12535.

Oliveros C.H., Field D.J., Ksepka D.T., Keith Barker F., Aleixo A., Andersen M.J., Alström P., Benz B.W., Braun E.L., Braun M.J., Bravo G.A., Brumfield R.T., Terry Chesser R., Claramunt S., Cracraft J., Cuervo A.M., Derryberry E.P., Glenn T.C., Harvey M.G., Hosner P.A., Joseph L., Kimball R.T., Mack A.L., Miskelly C.M., Townsend Peterson A., Robbins M.B., Sheldon F.H., Silveira L.F., Smith B.T., White N.D., Moyle R.G., Faircloth B.C. 2019. Earth history and the passerine superradiation. Proc. Natl. Acad.

Sci. U. S. A. 116:791637925.

Paradis E. 2010. Time-dependent speciation and extinction from phylogenies: a least squares approach. Evolution (N. Y). 65:6613672.

language. Bioinformatics. 20:2893290.

Paradis E., Claude J., Strimmer K. 2004. APE: Analyses of phylogenetics and evolution in R 574

Prum R.O., Berv J.S., Dornburg A., Field D.J., Townsend J.P., Lemmon E.M., Lemmon A.R. alternatives in Bayesian analysis of discrete morphological data. Syst. Biol. 68:6573671. Sanderson M.J., Donoghue M.J. 1989. Patterns of variation in levels of homoplasy. Evolution (N. Y). 43:178131795.

Schliep K.P. 2011. phangorn: Phylogenetic analysis in R. Bioinformatics. 27:5923593. Sidlauskas B. 2008. Continuous and arrested morphological diversification in sister clades of characiform fishes: A phylomorphospace approach. Evolution (N. Y). 62:313533156. Simmons M.P., Reeves A., Davis J.I. 2004. Character-state space versus rate of evolution in phylogenetic inference. Cladistics. 20:1913204.

Sookias R.B. 2020. Exploring the effects of character construction and choice, outgroups and analytical method on phylogenetic inference from discrete characters in extant crocodilians. Zool. J. Linn. Soc. 189.

Stadler T. 2011. Simulating trees with a fixed number of extant species. Syst. Biol. 60:6763 684.

Steell E.M., Nguyen J.M.T., Benson R.B.J., Field D.J. 2023. Comparative anatomy of the passerine carpometacarpus helps illuminate the early fossil record of crown 569 570 571 572 573 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 parsimony. Math. Biosci. 87:1993229.

(N. Y). 54:3653386.

Swofford D.L., Maddison W.P. 1987. Reconstructing ancestral character states under Wagner Wagner P.J. 2000. Exhaustion of morphologic character states among fossil taxa. Evolution Wagner P.J. 2012. Modelling rate distributions using character compatibility: implications for morphological evolution among fossil invertebrates. :1433146.

Watanabe A. 2016. The impact of poor sampling of polymorphism on cladistic analysis.

Cladistics. 32:3173334. FIGURE CAPTIONS Figure 1 Phylogenetic trees without branch lengths used in analyses.

A) Unrooted, partially constrained topology of Telluraves from Ksepka et al. (2019); B) Rooted, completely constrained topology of Passeriformes a (extant taxa only) from Steell et al. (2023); C) Unrooted, partially constrained topology of Passeriformes b (extant and fossil taxa) from Steell et al. (2023); D) Unrooted, partially constrained topology of Neornithes from Field et al. (2020); E) Rooted, unconstrained Maximum Parsimony derived topology of Avialae a from Benito et al. (2022) ; F) Unrooted, unconstrained Bayesian derived topology 614

of Avialae b from Benito et al. (2022) . 605

Figure 2 Results from Analysis 1: homoplasy inflation.

Simulated matrices using empirical topologies with varying mean log of character state transition rate (µ). Lighter coloured envelopes represent the 5% and 95% quantiles. A) Telluraves topology; B) Passeriformes a topology; C) Neornithes topology; D) Avialae a topology. Scaled L: Tree length normalised between 0 and 1; RHI: relative homoplasy index; 1 3 CI: 1 3 consistency index; 1 3 RI: 1 3 retention index.

Figure 3 Results from Analyses 2 & 3: varying numbers of taxa and characters in simulated Simulated matrices using the empirical topologies of Telluraves (A3D) and Passeriformesa (E3F) with varying numbers of taxa (A, C, E, G) and varying numbers of characters (B, D, F, H) at two different mean log values of character state transition rate, slow (µ = -5; A, B, E, F) and fast (µ = -3; C, D, G, H). Envelopes represent the 5% and 95% quantiles of each metric.

RHI: relative homoplasy index; 1 3 CI: 1 3 consistency index; 1 3 RI: 1 3 retention index. Figure 4 Results from Analysis 4: homoplasy levels in empirical datasets.

Summary of all metric results presented in Table 3 for each empirical dataset analysed ni this study. RHI envelopes represent 5% and 95% quantile ranges.

Supplementary Figure 1 (Appendix 1) Kernel density plots of null tree length for each empirical dataset.

Null tree length distributions during RHI estimation were visualised for different numbers of null trees. Equivalent normal distributions for n = 100, 1,000 and 5,000 suggest 1,000 or even 644 100 null trees are sufficient to accurately estimate RHI. Kernel density plots were generated 634

Figure 5 Results from Analyses 5 & 6: varying numbers of taxa and characters in empirical

All empirical datasets with varying numbers of taxa (A, C, E, G, I, K) and varying numbers of characters (B, D, F, H, J, L). Envelopes represent the 5% and 95% quantiles of each 630 631 632 using the base R function 8density()9. 648 649 650 651 653 654 SUPPLEMENTARY MATERIAL Data available from the Dryad Digital Repository: DOI: 10.5061/dryad.34tmpg4rp R code and versions of code arising after this manuscript are available from

Supplementary Figure 1 Kernel density plots of null tree length for each empirical dataset.

Benito J. , Kuo P.C. , Widrig K.E. , Jagt J.W.M. , Field D.J. 2022 . Cretaceous ornithurine 2015 . A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing . Nature . 526 : 5693573 . R Core Team . 2021 . R: A language and environment for statistical computing . . Rosa B.B. , Melo G.A.R. , Barbeitos M.S. 2019 . Homoplasy-based partitioning outperforms