Geography of speciation

Speciation requires the cessation of gene flow between populations.  This is most readily achieved when two populations are allopatric.
    Allopatry simply means that two populations do not come into physical contact as a consequence of some physical barrier (e.g, a mountain range, an ocean).  Allopatric speciation refers to the case in which two populations become reproductively isolated as a consequence of  independent evolution in allopatry.  There is a large body of evidence supporting the idea that allopatric speciation is the predominant mode of speciation.  For example, we often observe that populations that are geographically further apart are more phenotypically distinct, and also are often partially reproductively isolated.  Another common example of allopatric speciation can be found in island endemic species that are clearly recently derived from large mainland populations.  A variant of allopatric speciation is the "peripheral isolate model" or "founder effect model".  These models are based on the notion that if a population bottleneck occurs during the establishment of a new subpopulation, then genetic drift can set off a series of genetic changes that basically result in rapid, massive "reorganization" of genes.  The models are rather vague, and we will not consider them as serious players in speciation study. 
    Sympatric speciation
refers to the case in which reproductive isolation evolves between two populations, even though they are never physically impeded from exchanging genes.  There is still some doubt as to the theoretical conditions under which sympatric speciation is possible.  There are only a few possible examples of sympatric speciation.  Perhaps the most compelling case is from the Crater Lake cichilds of Cameroon.  The crater lakes have little topological variation in the lake beds.  The lake is essentially like a bowl.  Therefore, there it is difficult to imagine that there is any geographical impediment to gene flow - this might not be true for many deep lakes for which a drastic drop in water level could isolate fish populations from one another.  In spite of this feature of the crater lakes, there are several cases of very closely related fish species that almost definitely diverged within the lake.

Allopatry results in speciation because the lack of gene flow allows populations to evolve differences that cause incompatibilities if/when the populations come back into contact.  Even relatively small amounts of genes flow are effective in homogenizing populations - this is why most speciation will require allopatry.

Reproductive isolation should be thought of as an incidental effect of evolution.  In principle, evolutionary changes caused by genetic drift or by natural selection could contribute to reproductive isolation.  It seems more likely, however, that isolation is an incidental effect of adaptive evolution.

Reinforcement
An interesting idea put forward by Dobzhansky is that speciation is not usually completed in allopatry, but rather is completed when two previously allopatric populations that are partially isolated come back into contact.   It seems unlikely this is how most speciation events are completed, but it may still occur in nature.  The idea is that when two partially reproductively isolated populations come into contact, there is some hybridization that occurs at the contact zone.  Hybrids have reduced fitness.  Therefore, individuals that produce hybrids, that is, individuals that mate with members of the other species, have reduced fitness.  If there is variation in traits that affect mate choice, then individuals that show stronger pre-mating discrimination against members of the other species are favored by natural selection.  Thus, there should be an increase in the degree of pre-mating isolation in the area where two species meet.  In principle, this could lead to the "completion" of the speciation event in the sense that complete behavioral  isolation results from reinforcement.  There are some problems with the idea.  Perhaps one of the biggest problems is that in the model, there is gene flow occurring between species, yet divergence still continues.  This is similar to the problem of sympatric speciation.

In order to demonstrate that reinforcement is occurring one must show that there is gene flow between two species, that hybrids have reduced fitness, and that pre-mating isolation is stronger in areas of sympatry than in areas of allopatry.


Fig 14.10 from Ridley
 

The distinction between reinforcement, and simple character displacement is important.  If two species hybridize yet produce completely sterile or inviable progeny, we still expect selection to favor the evolution of greater pre-mating isolation in sympatric regions because individuals that make a bad choice in mate selection (i.e., pick a mate from the wrong species) have zero fitness.  However, because there is no gene flow, speciation has already occurred without reinforcement.  Reinforcement refers specifically to the case in which selection drives the evolution of stronger pre-mating isolation in the face of ongoing gene flow between populations.

One of the strongest pieces of evidence that reinforcement may be fairly common, at least in Drosophila, is Coyne and Orr's summary of data on reproductive isolation in Drosophila.  They gathered data on the strength of pre-zygotic and post-zygotic isolation for many species pairs.  They also gathered data from the literature on allozyme divergence between the same species pairs.  Allozyme divergence becomes greater as more time has elapsed since speciation, so it could serve as an indirect estimate for the time since divergence.  Finally, they collected data  on the ranges of the species pairs to determine whether species were sympatric or were allopatric.  The interesting result from the study was that sympatric species tend to evolve strong pre-mating isolation much faster than allopatric pairs (see figure below), while post-mating isolation showed no major differences.  This is exactly what one would expect if reinforcement were common in Drosophila populations.


 

Genetics of reproductive incompatibilities
The genetic basis of species incompatibilities is unlikely to often be attributable to divergence at a single locus.  Consider an ancestral population that has genotype AA at a single locus.  Split it into two populations, one of which evolves such that it becomes fixed for the a allele (so has genotype aa).  Hybrids are sterile, and have genotype Aa.  Could the Aa genotype be the cause of the sterility? No, because the population fixed for the a allele was polymorphic for A and a during its history, and the a allele could not have spread if Aa genotypes were sterile.  The sort of reasoning led to the Dobzhansky-Muller model of reproductive incompatibilities.  An ancestral population has a two-locus genotype of AABB.  Subsequently, evolution occurs in two subpopulations such that one has genotype aaBB, the other has genotype AAbb.  The hybrids are AaBb.  Though A and a are compatible, and B and b are compatible, a and b are not necessarily compatible because the genetic combination was never "tested" by evolution.   These types of incompatibilities are probably the most important source of hybrid sterility and inviability.

How rapidly do these incompatibilities accumulate?  We have no good answers to this question, though the subject is being pursued by several labs.  A good species pair for such analyses is Drosophila simulans and its very close relative, Drosophila mauritiana.  D. simulans is native to Africa, but has recently spread throughout the world in association with humans.  D. mauritiana is an island endemic, found in Mauritius.  They probably become separated about 500,000 years ago.  Genetic analyses can be carried out on these species because even though male hybrids are sterile, female hybrids are fertile.  By making fertile female F1 hybrids and backcrossing them to pure species males, one can create large numbers of recombinant progeny that have part of their genome from one species and part from the other species.  Parts of the genome from the two species can be followed through the crosses through use of either visible mutations (e.g., eye color mutants) or through the use of molecular markers that distinguish the two species' genomes.  One can measure the fertility of these recombinant progeny, and also measure their genotypes to determine which parts of the genome come from D. simulans and which come from D. mauritiana.  One can determine which parts of the genome are correlated with sterile individuals.  Results from such analyses showed that there are regions of the mauritiana genome that individually have no effect on male sterility, but when present together cause almost complete sterility.  This suggests that the genetics of these incompatibilities is complex.  Extrapolation from some limited analyses suggest there may be as many as 100 genes contributing the hybrid male sterility between simulans and mauritiana.  Considering that most genes show no differences between these species, it would appear that incompatibilities are accumulating quite rapidly, and therefore almost definitely as a result of natural selection rather than as a result of genetic drift.

These species are typical in that when there is partial reproductive isolation between two species, it is almost always the heterogametic sex that is sterile or inviable.  In flies and mammals and several other taxa, males are XY, or heterogametic.  Of course, it is possible that the sterility and inviability of the heterogametic sex is really a phenomenon of males, rather than being a phenomenon of heterogamety per se.  This can be addressed through examination of hybrid sterility in taxa in which females, rather than males are the heterogametic sex.  In such taxa, it is female F1 hybrids, rather than male F1 hybrids that tend to be sterile or inviable.  This shows that heterogamety rather than sex is the important factor.  The fact that the heterogametic sex is usually sterile or inviable in species that are partially isolated is known as Haldane's Rule.

Genetics of species differences
Crosses between species can also be used to understand the genetic basis of interspecific differences in morphological or other types of phenotypes.  Research in this area is still in its infancy.  One example of such an approach comes from Bradshaw and his collaborators.

Two species of Mimulus are quite different in many aspects of their morphology.  They are reproductively isolated in nature, primarily because they are pollinated by different pollinators.

Figures showing the two species and their pollinators.
 


 
 
 

In the greenhouse it is relatively easy to make crosses and produce fertile progeny.  These fertile F1 can be selfed to generate large numbers of recombinants that have intermediate phenotypes (figure below).

By assessing correlations between phenotypes and the portion of the genome that has derived from each of the two species, one can roughly estimate the number of genes contributing to the interspecific differences.  The conclusion from the Mimulus study is that interspecific differences are not the result of only very few genes, but also are not the result of extremely large numbers of genes, each of which have very small effects. 

This type of analysis is occurring in several closely related species pairs in plants and animals.  Such studies will eventually teach us about the number and types of genes that cause phenotypic differences between species and result in genetic incompatibilities  in hybrids.
 
  PHYLOGENETICS
Phylogenetics is the practice of reconstructing the history of life.  The histories may be relatively recent, or extremely ancient.  There are a number of reasons why reconstructing history is interesting. 

First, simply knowing the evolutionary relationships of different organisms is inherently interesting to many people.  More generally, phylogenetic trees help us determine the direction of evolutionary change.  For example, if we know that a trait differs in two species, we would further like to know the direction of the change of that trait.  We can do this by using a phylogeny to infer what the trait was like in the ancestor of the two species.  Phylogenies are also very useful in helping us determine the temporal sequence of appearance of various traits.  For example, if a particular type of mating system is found with particular morphologies or particular behaviors, phylogenies could help us determine if the behavioral trait evolved prior to, or subsequent to the morphological trait.  More generally, phylogenies help us to understand the extent to which the evolution of different traits is correlated (this will become more clear in the next lecture).

A phylogenetic tree showing the inferred relationships of HIV viruses derived from a dentist and several of his patients.  Most of the infected patients were not considered to belong to a high-risk group.  The similarity of the virus in the dentist and his patients support the idea that the patients were infected by the dentist.
 

The units of phylogenetic analysis are taxa and characters.  Depending on the type of data and question, taxa can be anything from individuals within a population, to species, genera, families, etc.  The characters that can be used for phylogenetic inference are virtually unlimited.  However, we can usefully divide characters into two types, phenotypic and genetic molecular.  Phenotypic characters can be any aspect of an organism's phenotype that can be quantified, including morphological, behavioral, etc.  Molecular character are almost always nucleotide sequences or amino acid sequences.  Characters have states - you will see the phrase "character state".  Here are two examples of character states.  Imagine the number of rays in the dorsal fin of a fish is the character.  Then the state could be 12 rays or 13 rays - different in different species.  A particular amino acid position in a protein is the character.  The character state is the amino acid state at the position, let's say, leucine vs. valine.

The use of molecular characters has dramatically increased over the last few years.  Therefore, it would be useful to understand the relative strengths and weakness of phenotypic vs. molecular characters.  It is important to appreciate that the same types of problems with data quality occur with phenotypic and molecular characters - so molecules are not a panacea.  However, with molecular data it is often clear when there is a problem, while problems may be more difficult to detect with phenotypic data.

Phenotypic characters: one clear advantage is that such characters can be analyzed for extinct organisms and museum specimens.  Also, these characters require no access to a molecular biology laboratory, though that isn't a major issue anymore.  Some aspects of phenotypes are difficult to quantify, which can be a problem.  Furthermore, phenotypic characters in very different species may appear similar because of convergent evolution.  If one unknowingly uses convergently evolved traits, incorrect phylogenies result.

Molecular characters are easy to measure (quantify).  It is easy to collect huge amounts of data (think of the number of nucleotides in the genome - each one could, in principle, be a character).  Many phylogenetic methods rely on a particular model of evolution, and it is true that models DNA evolution are probably closer to the truth than models of phenotypic evolution.  One very nice feature of molecular data is that different genes can be picked for different questions (for questions of ancient history we prefer more slowly evolving genes, while for questions of recent history we prefer more rapidly evolving genes).  Along the same lines, highly diverged taxa are more easily compared using molecular characters because highly conserved proteins can be identified and aligned, while figuring out which phentypes are homologous can sometime be difficult.
 
 Homology: a homologous trait is one that is shared between two or more taxa because they inherited it through a common ancestor.  When doing phylogenetic analysis, we always strive to use homologous characters. 
 

For both types of molecular and phenotypic characters, convergent evolution may be a problem.  Convergent evolution occurs when two taxa share a character state not because of shared ancestry, but rather because of independent evolution to the same state.  For phenotypic characters, convergent evolution usually results from natural selection (i.e., adaptation).  Though the same can be true for sequences, it is probably less likely.  For molecular data, convergence more typically is a consequence of the fact that the number of possible character states is relatively small.  For example, at the nucleotide level, there are only four possible character states, A, G, C, and T, so two species could share the same base just by chance, rather than as a result of common ancestry.  This is less of a problem for amino acid sequences because there are 20 possible states.  However, it is often the case that the number of acceptable states for many amino acid positions could be much smaller than the theoretical maximum of 20 (because many amino acid changes are probably deleterious mutations, and such changes never contribute to species differences). 
 

Some examples of convergence between marsupial and placental mammals 



 
 
 
 
 
 
 
 







There are three main approaches for reconstructing phylogenies. 

Distance methods summarize the overall similarity between species, for example, the sequence differences for a homologous piece of DNA.  Distance methods are based on the idea that species diverences accumulate at a constant rate as time increases.  Thus, species that are are more recently split are going to have greater similarity.

Parsimony is based on the idea that the best hypothesis of the phylogenetic relationship for a group of organisms is the one that  has the smallest number of evolutionary changes.  Parsimony approaches use only shared, derived character states to attempt phylogeny reconstruction.  In phylogenetics the principle of parsimony states that if two different hypotheses on the phylogenetic relationships of a group of organisms require different numbers of evolutionary changes of characters, then we should prefer the tree with fewer changes.  This would be equivalent to the tree supported by the greatest number of characters.  This makes sense, because a recontructed phylogeny is a hypothesis - the history of the taxa cannot be directly observed.  If we have correctly identified homologous, shared derived character states in our taxa, then each character should give us the same reconstructed phylogeny.  This is because there is only one true evolutionary history, and the patterns of shared derived character states must reflect this one history.  In reality, however, different character support different trees.  This is a result of convergent evolution, which can be a particular problem if a character (e.g., an amino acid position in a protein) is evolving very fast or is the result of adaptation occurring indenpendently but in a similar way in two or more species (see above). In this case the principle of parsimony would dictate that we favor the tree that is supported by the greatest number of characters.

Maximum likelihood is based on determining, given a model of evolution (e.g., DNA evolution) and a data set of homologous characters, which of the possible phylogenies is the most probable. 


One of the major differences between distance methods and parsimony or likelihood approaches is that rate variation is more a problem for distance methods.  If the rate of character evolution is constant across lineages and over time, then the amount of time since two species last shared a common ancestor will be reliably correlated with their degree of similarity.  However, if the rate of evolution is very heterogeneous, then overall similarity can give a misleading picture of the relationships of the taxa.  To understand why this is true consider three species, A, B, and C.  A and B are more closely related to each other than either is to C.  We would say that the A and B are sister taxa.  If the rate of evolution from the common ancestor of A, B and C to each of the three species were the same, then the overall similarity between A and B would be greater than the similarity between A and C, or B and C.  Then, the phenetic approach will correctly group A and B together (I say correctly because I already told you that A and B are sister taxa).  However, imagine that for some reason, the rate of character evolution rapidly accelerated along the lineage leading to species B, while remaining constant in other lineages.  Then B will look very different from and A and C, or in other words, A and C will be more similar to each other than either is to B.  Thus, phenetics will incorrectly group A and C together.  An advantage of the phylogenetic approach is that it is less susceptible to such rate variation.  The reason this is the case is that the method only employs character states that are shared and derived.  A shared character state is one which is present in at least two, but not all of the taxa being analyzed.  These are referred to as informative characters.  The use of only "informative" characters is another way in which phylogenetics differs from phenetics.  Phenetics uses all the data, while phylogenetics uses only subsets of the data.  Shared, derived character states won't mislead you because the sharing of derived character states MUST reflect patterns of inheritance through common ancestors.

Trees
Phylogenetic trees have two properties, topology and branch length.  The topology of a tree is a graphical representation of the evolutionary relationships of the taxa.  That is, the topology is a statement about ancestor-descendant relationships.  The branch length represents the amount of character evolution that has occurred along a lineage.  A node  represents a hypothetical ancestor.
 
 

Types of groups
There are three types of groups that scientists can construct when trying to organize taxa.  These are monophyletic, paraphyletic, and polyphyletic.  The goal of evolutionary biologists should be to have all groups be monophyletic.  A monophyletic group contains an ancestor and all of its descendants.  In other words, for a monophyletic group, all included taxa are more closely related to each other than any are to taxa outside the monophyletic group.  A paraphyletic group is a group that includes an ancestor and some, but not all of its descendants.  A good example of a paraphyletic group would be reptiles.  Reptiles traditionally include snakes, lizards, turtles, and crocodilians.  It turns out, however, that crocodilians are more closely related to birds than to the other reptiles.  In other words, in order to make the group "reptile" monophyletic, either crocodiles should not be grouped with reptiles (but with birds), or birds should be grouped with reptiles.  Paraphyletic groups are often a result of incorrectly relying on shared ancestral traits, rather than shared derived traits, when attempting to reconstruct phylogenies.  Polyphyletic groups result from using convergent traits to group organisms.  Polyphyletic groups are organisms that are grouped together simply because they appear similar as a result of convergent evolution.
 
 

Ancestral states
As mentioned earlier, one of the goals of the tree reconstruction is to understand patterns of character evolution.  In order to do this one must "polarize" the characters, or determine which character states are ancestral and which states are derived.  There are three major ways of doing this. 


For trees built using morphological characters, one possibility is to the use the fossil record to determine the ancestral state.  Clearly this requires that the taxa/characters of interest have a good fossil record.  Even so, it is possible that the ancestral state happened to be fossilized before the derived state. 

Another possibility is to use developmental biology to try to polarize characters.  The idea is that character states that appear early in development (rather than later) tend to represent the ancestral state.  Recall, this is von Baer's Law, which is often true, but which is sometimes violated.  Finally, outgroups can be used to infer the ancestral state. 

The outgroup method is the most common method for inferring the ancestral character state (remember, the ancestral state is inferred, rather than directly observed).  An outgroup is a taxon that is thought to be "outside" the "ingroup" (the group of interest).  For example, consider a monophyletic group and a single amino acid position (i.e., character) in a homologous protein.  The character has two states, leucine and isoleucine.  In other words, some of the ingroup taxa have a leucine at the position, and others have an isoleucine.  We might like to know if the variation represents a change from a leucine to an isoleucine, or vice-versa.  To infer the state in the common ancestor of the ingroup taxa, we pick an outgroup.  The state of the outgroup is inferred to represent the ancestral state of the ingroup.  One of the problems of the outgroup method is that it presumes that we have at least a reasonable hypothesis about the taxa that could serve as outgroups.  Ideal outgroups are groups that are closely related to the ingroup.  The reason this is relevant is because we use the outgroup to infer the ancestral state.  If the outgroup is too far "out" then there is a possibility that a lot of character evolution has occurred along the lineage leading to the outgroup. Then, the inference of the ancestral state is less reliable.  Similarly, whenever possible, it is best to use two outgroups rather than one.  If both outgroups provide the same inference of the ancestral state, then we should have greater confidence that the inference is correct.