Methods Approaches and Outcomes of Genetic Analyses

8.9.1 Genes, Partitions, and Congruent Molecular Analysis

Historical and geographic patterns of migration can be inferred through genetic data when gene flow is expressed as a migration rate (m). The migration rate is measured as the proportion of haplotypes in a population that are of migrant origin in each generation. Patterns of mutation inheritance are reconstructed according to a phylogenetic gene tree of the haplotypes (one or more uniquely identified gene sequence(s)). Phylogenetics is the branch of science that deals with testing for the relatedness of individuals and heuristically seeks to identify the most parsimonious explanations of inheritance for genetic and morphological attributes; we concentrate on the former. The genetic data are sorted as a parsimonious genealogy and allow exploration of populations that became structured through time by geographic proximity, gene flow, and common ancestry. Our selected markers express the molecular past according to their individual rates and modes of mutation. Cytochrome b codes for a protein that is an integral part of the electron transport chain, and is a "workhorse gene" in phylogeographic research (Avise 2000). In contrast, the ISR is a non-coding segment of DNA that is autapomorphic for ambystomatid salamanders. The ISR has been evolutionarily persistent for approximately 20 ma, and there is a small variance in the length of haplotypes that indicates that this locus does not code for a functionally conserved polypeptide (McKnight and Shaffer 1997; Donovan et al. 2000; Church et al. 2003; Zamudio and Savage 2003; Thompson et al. unpublished). This marker may evolve rapidly in a similar manner to the D-loop region, another non-coding segment of the mitochondrial genome (Clayton 1984).

Given that we sampled mitochondrial genes, our investigation should reveal information about the relatedness of mitochondrial genomes as they drift through the cell cytoplasm of salamander populations. Mitochondrial genes provide only a single estimate of genealogy because they are asexually inherited as a linked group through the female egg cytoplasm; hence, all genes in a single mitochondrial genome share an identical matrilineal history. The effective population size for mitochondria is one quarter the size of sexually inherited genes because they are haploid and only females transmit the genetic information (Avise 2001; Ballard and Whitlock 2004). Simulation studies indicate that these properties of mitochondrial genetics increase the likelihood that spatially restricted populations will exhibit exclusive monophyly (an ancestor and all its descendants) (Avise et al. 1988; Moore 1995; Avise 2000). Such simulations commonly trace descendant relations in reverse through the antecedents until two lineages coalesce as a single ancestor; we discuss the implications of such simulations in more detail below. Although mitochondrial genes trace only a single history for any particular genome, they mutate at different rates. For example, our two markers record different degrees of resolution on the history of migration because the coalescence time for an ISR haplotype is shorter than that for the functionally conserved cytochrome b haplotypes. It is important, however, to check that each marker gives congruent phylogenetic results. Although it is unlikely that different topologies in the gene tree would be due to differences in the lineage sorting process (a naturally occurring phenomenon for sexually recombinant genes), it is possible that different rates of mutation would skew the results of a phylogenetic analysis. If, however, the genetic data are analyzed independently as partitions, and then together as a combined set, we expect the trees to express congruent genealogies. Topographically identical phylogenies allow for the cross-validation of additional biological inferences relating to each lineage, and assessments about populations that are sampled for one locus but not the other (and vice-versa) (Templeton 2004; Thompson et al. unpublished).

When multiple representatives of the population's genetic constitution are sampled, the sampling design frequently becomes unbalanced. The molecular data, however, must be balanced in order to address matters of congruence. Congruence in the phylogenies is examined by reducing the data sets to include only the intersection of genetic sequences. Given two molecular markers, the intersection includes only the set of markers that come from individuals (salamanders in our study) that have been sequenced for both markers. Once the data are balanced as such, the Incongruent Length Difference (ILD) test (Farris et al. 1994) provides a starting point for testing phylogenetic congruence among the partitions (Hipp et al. 2004). The ILD test is executed in PAUP* V4.0b10 (Swofford 2000) and compares two data partitions, x and y, to yield a test statistic (D) that is based on a measure of signal equality. Nucleotide character states (there are four, one for each nucleotide AGCT) are drawn at random from any indiscriminate partition of the genetic sequences. Letting L represent the length of the trees generated from the sample, the test statistic is calculated as follows: Dxy = Lxy - (Lx + Ly). This branching statistic measures the difference between tree lengths for sequences that are combined (Lxy) versus partitioned (Lx + Ly). In our study we used 1000 random partitions and generated a distribution of values that equal W, where W = Dprior + Drand; the prior (Dp„or) refers to the different process partitions, cytochrome b and ISR genes in this study, compared to random (Drand) sample partitions of the combined data. The number of replicates for which Drand is smaller than Dpror, designated as S, is used to derive the estimated type I error rate, or P-value, and is equal to 1-(S/W) (Farris et al. 1994; Yoder et al. 2001). This statistic does not test for partition "combinality", but does identify precision in so far as the trees are corroborated. Two possible causes for incongruence are errors in the analytical methods (e.g., long-branch attraction), and differences in lineage sorting (e.g. hybridization) (Felsenstein 1983; Hipp et al. 2004). After performing 1000 sample replicates, we failed to reject the null hypothesis (P = 0.62) that the phylogenetic characters within the cytochrome b and ISR partitions give different phylogenetic from that derivable from a combined single analysis.

Phylogenetic incongruence may also result if an inappropriate phylogenetic model is employed; hence, it is advisable to compare trees that are derived from distinct approaches (Avise 1994). We used MEGA2 (Kumar et al. 2001), MODELTEST (Posada and Crandall 1998), and PAUP* V4.0b10 (Swofford 2000) to study the intersecting of data and produce phylogenetic trees by minimum evolution, maximum likelihood, and parsimony respectively. The relative degree of node support was ranked for each clade by bootstrapping the data with 2000 pseudoreplicates (Felsenstein 1985; Hillis and Bull 1993). Only the phenetically based minimum evolution tree is illustrated (Fig. 3), because the remaining trees exhibit very similar topologies, differing only at the tips where the most closely related haplotypes occur. The tree lengths of minimum evolution trees are proportional to genetic distance.

The probability of selection increases as population sizes become smaller (Ohta 1992). Note that the branch length for haplotype 15 is noticeably longer for the cytochrome b tree (Fig. 3b) than for the combined (Fig. 3a) or ISR (Fig. 3c) trees. This haplotype (H15) is from Santa Cruz California, an area into which Ambystoma macrodactylum (and several other salamanders) migrated during the Pleistocene, subsequently becoming isolated by an approximate 300 km gap from the nearest portion of the remainder of the range (Baily 1948; Stebbins 1949; Russell and Anderson 1956; Moritz et al. 1992; Jockusch and Wake 2002; Thompson et al. unpublished). The longer and shorter branch lengths for the cytochrome b tree (Fig. 3b) may indicate regional differences in population size. Moreover, the ISR gene is likely to remain neutral under such conditions; the branch length asymmetry, when compared to cytochrome b (Fig. 3), fits this prediction.

Fig. 3. Three phylogenetic gene trees produced from the intersected data set for (a) cytochrome b and ISR combined, (b) cytochrome b as a partition, and (c) ISR as a partition. The intersected data reduced to 22 haplotypes, seen at the terminal nodes of each branch (H1--H22). Bootstrap values are located at the nodes of each branch, bars to the right of each tree identify regionally based monophyletic clades, and the alpha-numeric labels, in parentheses, refer to clades from the ISR marker. A sub-clade within the Salmon River Mountains (A-16) is incongruent, whereas the Interior, Clearwater, and Coastal Mountain groups are congruent. Note that the haplotype labels in these trees apply only to this figure, because the data were reduced to include the intersected sequence partitions

Fig. 3. Three phylogenetic gene trees produced from the intersected data set for (a) cytochrome b and ISR combined, (b) cytochrome b as a partition, and (c) ISR as a partition. The intersected data reduced to 22 haplotypes, seen at the terminal nodes of each branch (H1--H22). Bootstrap values are located at the nodes of each branch, bars to the right of each tree identify regionally based monophyletic clades, and the alpha-numeric labels, in parentheses, refer to clades from the ISR marker. A sub-clade within the Salmon River Mountains (A-16) is incongruent, whereas the Interior, Clearwater, and Coastal Mountain groups are congruent. Note that the haplotype labels in these trees apply only to this figure, because the data were reduced to include the intersected sequence partitions

8.9.2 Nested Cladistic Phylogeographical Analysis

Nested cladistic phylogeographic analysis compares among alternative explanations of the migration and movement of populations. This is accomplished by mapping and contrasting the phylogenetic arrangement of genetic events according to their ancestral and derived spatial distributions (Templeton et al. 1995; Templeton 2004).

The last glacial retreat is a prime phylogeographic calibration point for the study of effects of range expansion on genetics and associated demographic parameters. Comparative phylogeography and molecular clocks provide crude-to-precise time indices for speciation, whereas time indices for geological events, such as glacial retreat, are calibrated and refined from global to local scales through radio isotope dating of material found in stratified sections of pollen, diatoms, chironomids, and ice cores (Walker and Pellat 2003). Paleoecological reconstructions coupled with molecular data enable calibration of molecular rates so that population histories, such as range expansion, can be mapped in accord with prevailing environmental conditions.

Phylogeography includes the analysis of geography and haplotype (or allele) lineages and provides biogeographical explanations for spatial associations. For example, Thompson et al. (unpublished) measured effective geographic distances and established a statistical association between the topographic relief of western North America and genetic distances for the ISR marker. Effective distances account for environmental constraints imposed upon a species by the landscape (Verbeylen et al. 2003) and these were approximated through a measurement of genetic isolation in Ambystoma macrodactylum (Thompson et al. unpublished). An effective distance was measured by demonstrating that mitochondrial gene flow, hence the migration of Ambystoma macrodactylum, is sufficiently restricted by Cordilleran relief to engender geographical associations among evolutionarily related haplotypes. Closely related individuals are statistically more effectively linked by geographic distances that trace through the valleys of the landscape than those that pass over the mountains and circumvent the topographic features. In other words, genetic patterns are spatially regionalized according to their evolutionary position within the haplotype tree and these are significantly correlated with Cordilleran topography. We narrowed the number of alternate explanations for these statistically supported patterns by applying Templeton et al.'s (1995) Nested Cladistic Phylogeographical Analysis (NCPA), which incorporates principles from both cladistic theory and coalescence simulations. While statistical probabilities are the test criterion for depicting fully resolved networks of haplotype genealogies, a cladistic component is retained in the approach. It is easy to recognize why the haplotype networks are statistical in nature, but it is less clear why the techniques and phylogenetic networks are cladistic in nature.

A genealogical perspective is instructive for comprehending the cladistic basis of Templeton et al.'s (1992) haplotype networks. Simply put, gene trees are the graphical representation of intraspecific genetic relations. Gene trees differ considerably from classical cladogenetic trees used to map interspecific relations (Hennig 1966; Crandall and Templeton 1996; Posada and Crandall 2001), and neutral drift theory becomes relevant for gene trees as this allows genetic simulations to assign probabilities to the branching process (Avise 2000).

Simulations of neutral genetic drift run in reverse to the mutation origin of each haplotype and are used to obtain statistical information about the phylogenetic, or branching, process relative to effective population sizes and mutation rates. This is called coalescence theory. Coalescent simulations and controlled genetic studies reveal that the numerically most abundant haplotype is most likely to display the ancestral state (Castelloe and Templeton 1994; Crandall 1994; Crandall and Templeton 1996). The probability that a polymorphic site from two random haplotypes experienced more than one mutation, the non-parsimonious state (Templeton et al. 1992), can be estimated through the coalescent parameter, 0 = Mp, where M equals N for haploid and 2N for diploid populations (Templeton et al. 1992; Schneider et al. 2000).

While branches in statistical haplotype networks are not ranked according to their relative bootstrap probabilities, it is possible to achieve 95% statistical support with the coalescent parameter. When realistic estimates of 0 are obtained, the probabilities for a non-parsimonious relationship between randomly drawn haplotypes cannot be identified frequently enough for general use in phylogenetics (Hudson 1989; Templeton et al. 1992). If attention is focused on non-random haplotypes, however, parsimony increases the effectiveness of statistical estimates of 6. Non-random haplotypes are identified through parsimoniously reconstructed genealogies. In gene tree terminology, a polymorphic site that experienced more than one mutation is a non-parsimonious state, but if two haplotypes differ by only one mutational step, the parsimonious state, they are monophyletic. A statistical parsimony network, with the statistics based on coalescence theory, reduces the frequency of non-monophyletic (i.e. paraphyletic) connections (Fig 4). Given that a pair of haplotypes share m sites, but differ at j sites, the probability of a parsimonious relationship is estimated by

This equation contains the probability that a mutation occurring between two haplotypes after their point of divergence (q1), which is estimated by using Hudson's (1989) mathematically derived probability for the non-parsimonious state (H). Hudson's (1989) H statistic serves as the upper bound of a Bayesian prior distribution (0,H). Since non-random haplotypes are compared, qi will always remain less than H, and is easily quantified by comparing haplotype sequences. For each non-random haplotype pair the probability of homoplasy among m shared sites, plus the probability of change occurring among sites that differ j) is computed. The accumulated probabilities determine the total probability that two haplotypes differ at j-1 sites in addition to a single site under consideration, while having m sites in common. Haplotypes are thus connected within a parsimony network if they satisfy the type I error rate (p = 0.05). The procedure starts at j=1, runs through the data to yield the set of estimators q1 ,qn, and stops linking haplotypes into the network if the probability exceeds a (Templeton et al. 1992; Crandall 1994).

Fig. 4. A graphical illustration of a genealogy evolving within a localized portion of a population tree (a). The illustrated population has an effective breeding number (Ne) of 5 individuals. Five sexual generations (x) are shown with a haplotype genealogy superimposed. A single line of descent, leading to individuals 2 to 4, is isolated in (b), and these individuals share a haplotype from a single mutation between x 0 and Xi in (a). If individuals 2 and 3 are sampled at random, then a monophyletic clade unites these haplotypes by a single index mutation, the parsimonious state, but if individuals 3 and 4 (or 2 and 4) (b) are sampled, then a non-parsimonious paraphyletic union is constructed and requires two mutational steps, a non-parsimonious state

Three statistical parsimony networks at 95% statistical confidence were resolved for the two mitochondrial markers (Fig. 5a,b). The haplotypes in the network are either interiors or tips. Interiors also include parsimoniously inferred nodes (PINs) (Thompson et al. unpublished) and always have more than one mutation connecting them to the remaining network.

Fig. 5. In (a), the cytochrome b genealogy is resolved as a single TCS network, and the nested design contains 12 haplotypes (H1--H12). In (b), the ISR marker genealogy is statistically resolved, and the relationship among 29 haplotypes (H1--H29) is summarized into two TCS networks with a nested design. The ISR network (b) contains six haplotypes starting with a U, which refer to published GenBank sequences (McKnight and Shaffer 1997). A dashed line connecting two PINs (b) identifies the closest connection in the unresolved relationship between the two ISR networks. A shaded key discerns nesting levels A to E, and numbers in circles reference each nested category. Clades are referenced by the nesting level followed by the nested address (e.g., A-16 contains haplotypes H9 and H20, and one PIN)

Fig. 5. In (a), the cytochrome b genealogy is resolved as a single TCS network, and the nested design contains 12 haplotypes (H1--H12). In (b), the ISR marker genealogy is statistically resolved, and the relationship among 29 haplotypes (H1--H29) is summarized into two TCS networks with a nested design. The ISR network (b) contains six haplotypes starting with a U, which refer to published GenBank sequences (McKnight and Shaffer 1997). A dashed line connecting two PINs (b) identifies the closest connection in the unresolved relationship between the two ISR networks. A shaded key discerns nesting levels A to E, and numbers in circles reference each nested category. Clades are referenced by the nesting level followed by the nested address (e.g., A-16 contains haplotypes H9 and H20, and one PIN)

The internal haplotypes tend to be ancestral, geographically widespread, and occur more frequently in populations. Tips tend to be the most recently

Table 1. Nested cladistic phylogeographical analysis results for the (a) ISR, and (b) cytochrome b mtDNA markers in Ambystoma macrodactylum. Alternative hypotheses (Ha) from the inference key are listed for each clade in which there is more than a single interpretation. Clades with no genetic or geographical variation are not included

Square Slope

Loci a.

Clade Ha Inference Keyf Inferred pattern

A-9 1-2-11-17-No Inconclusive

Total-1 1-2-? Inconclusive

Total-2 1-2-? Inconclusive

CRE RGF and IBD CRE CRE AF

Inconclusive

* The probability refers to the frequency with which the 1000 permutational chi-square statistics are equal to or greater than the observed fCRE = contiguous range expansion; RGF = restricted gene flow; IBD = isolation by distance; PFRE = past fragmentation followed by range expansion; LDC = long distance colonization; FR = fragmentation; AF = allopatric fragmentation; ? = indiscriminate; ns = not significant derived mutations and have only a single connection to the network. Hypothetical unobserved or extinct haplotypes are termed missing intermediates and are represented in the network by zeros (Crandall 1996). A nesting algorithm is applied to the underlying statistical parsimony network that groups haplotypes into a nested set of clades. The nesting design segregates genealogical units that require independent statistical investigation of their spatial patterns, while contrasting older (interior) versus younger (tip) clades (Fig. 5; Templeton et al. 1995). In effect, this allows the investigation of the ancestral migration of a lineage versus the migration of the descendants of the lineage leading up to the spatial extent of contemporary haplotypes.

Thompson et al. (unpublished) demonstrated, through cross-referencing the combined and cytochrome b nesting designs, that the Clearwater Mountains clade D-2 is more appropriately united with the Rocky Mountain Interior clade D-3 (Fig. 5b). Spatial statistics for the nested clade design test null distributions of clade distance (Dc), nested clade distance (Dn), and interior minus tip clade distance (I-T) against the observed distances for haplotypes (referred to as 0-step clades) and their respective nested clade categories (Templeton et al. 1995; Posada et al. 2000). We used effective geographic distances in our analysis that were weighted by the topography of the landscape to better reflect the habitat grain and effective mobility of Ambystoma macrodactylum. Results from the GEODIS analysis were interpreted according to Templeton's inference key (see the preceding Table 1; programs and an up-to-date inference key are available at http://inbio.byu.edu/Faculty/kac/crandallplab/programs.html.

0 0

Post a comment