Speciation requires the cessation of gene flow between
populations.
This is most readily achieved when two populations are allopatric.
Allopatry simply means that two populations
do not come into
physical contact as a consequence of some physical barrier (e.g, a
mountain
range, an ocean). Allopatric speciation refers to the case in
which
two populations become reproductively isolated as a consequence
of
independent evolution in allopatry. There is a large body of
evidence
supporting the idea that allopatric speciation is the predominant mode
of speciation. For example, we often observe that populations
that
are geographically further apart are more phenotypically distinct, and
also are often partially reproductively isolated. Another common
example of allopatric speciation can be found in island endemic species
that are clearly recently derived from large mainland
populations. A variant of allopatric speciation is the
"peripheral isolate model" or
"founder effect model". These models are based on the notion that
if a population bottleneck occurs during the establishment of a new
subpopulation,
then genetic drift can set off a series of genetic changes that
basically
result in rapid, massive "reorganization" of genes. The models
are
rather vague, and we will not consider them as serious players in
speciation
study.
Sympatric speciation refers to the case in
which reproductive
isolation evolves between two populations, even though they are never
physically
impeded from exchanging genes. There is still some doubt as to
the
theoretical conditions under which sympatric speciation is
possible.
There are only a few possible examples of sympatric speciation.
Perhaps
the most compelling case is from the Crater Lake cichilds of
Cameroon.
The crater lakes have little topological variation in the lake
beds.
The lake is essentially like a bowl. Therefore, there it is
difficult
to imagine that there is any geographical impediment to gene flow -
this
might not be true for many deep lakes for which a drastic drop in water
level could isolate fish populations from one another. In spite
of
this feature of the crater lakes, there are several cases of very
closely
related fish species that almost definitely diverged within the lake.
Allopatry results in speciation because the lack of gene flow allows
populations to evolve differences that cause incompatibilities if/when
the populations come back into contact. Even relatively small
amounts of genes flow are effective in homogenizing populations - this
is why most speciation will require allopatry.
Reproductive isolation should be thought of as an incidental effect of evolution. In principle, evolutionary changes caused by genetic drift or by natural selection could contribute to reproductive isolation. It seems more likely, however, that isolation is an incidental effect of adaptive evolution.
Reinforcement
An interesting idea put forward by Dobzhansky is that speciation is
not usually completed in allopatry, but rather is completed when two
previously
allopatric populations that are partially isolated come back into
contact.
It seems unlikely this is how most speciation events are completed, but
it may still occur in nature. The idea is that when two partially
reproductively isolated populations come into contact, there is some
hybridization
that occurs at the contact zone. Hybrids have reduced
fitness.
Therefore, individuals that produce hybrids, that is, individuals that
mate with members of the other species, have reduced fitness. If
there is variation in traits that affect mate choice, then individuals
that show stronger pre-mating discrimination against members of the
other
species are favored by natural selection. Thus, there should be
an
increase in the degree of pre-mating isolation in the area where two
species
meet. In principle, this could lead to the "completion" of the
speciation
event in the sense that complete behavioral isolation results
from
reinforcement. There are some problems with the idea.
Perhaps
one of the biggest problems is that in the model, there is gene flow
occurring
between species, yet divergence still continues. This is similar
to the problem of sympatric speciation.
In order to demonstrate that reinforcement is occurring one must show that there is gene flow between two species, that hybrids have reduced fitness, and that pre-mating isolation is stronger in areas of sympatry than in areas of allopatry.

Fig 14.10 from Ridley
The distinction between reinforcement, and simple character displacement is important. If two species hybridize yet produce completely sterile or inviable progeny, we still expect selection to favor the evolution of greater pre-mating isolation in sympatric regions because individuals that make a bad choice in mate selection (i.e., pick a mate from the wrong species) have zero fitness. However, because there is no gene flow, speciation has already occurred without reinforcement. Reinforcement refers specifically to the case in which selection drives the evolution of stronger pre-mating isolation in the face of ongoing gene flow between populations.
One of the strongest pieces of evidence that reinforcement may be fairly common, at least in Drosophila, is Coyne and Orr's summary of data on reproductive isolation in Drosophila. They gathered data on the strength of pre-zygotic and post-zygotic isolation for many species pairs. They also gathered data from the literature on allozyme divergence between the same species pairs. Allozyme divergence becomes greater as more time has elapsed since speciation, so it could serve as an indirect estimate for the time since divergence. Finally, they collected data on the ranges of the species pairs to determine whether species were sympatric or were allopatric. The interesting result from the study was that sympatric species tend to evolve strong pre-mating isolation much faster than allopatric pairs (see figure below), while post-mating isolation showed no major differences. This is exactly what one would expect if reinforcement were common in Drosophila populations.

Genetics of reproductive incompatibilities
The genetic basis of species incompatibilities is unlikely to often
be attributable to divergence at a single locus. Consider an
ancestral
population that has genotype AA at a single locus. Split it into
two populations, one of which evolves such that it becomes fixed for
the
a allele (so has genotype aa). Hybrids are sterile, and have
genotype
Aa. Could the Aa genotype be the cause of the sterility? No,
because
the population fixed for the a allele was polymorphic for A and a
during
its history, and the a allele could not have spread if Aa genotypes
were
sterile. The sort of reasoning led to the Dobzhansky-Muller model
of reproductive incompatibilities. An ancestral population
has a
two-locus genotype of AABB. Subsequently, evolution occurs in two
subpopulations such that one has genotype aaBB, the other has genotype
AAbb. The hybrids are AaBb. Though A and a are compatible,
and B and b are compatible, a and b are not necessarily compatible
because
the genetic combination was never "tested" by evolution.
These
types of incompatibilities are probably the most important source of
hybrid
sterility and inviability.
How rapidly do these incompatibilities accumulate? We have no good answers to this question, though the subject is being pursued by several labs. A good species pair for such analyses is Drosophila simulans and its very close relative, Drosophila mauritiana. D. simulans is native to Africa, but has recently spread throughout the world in association with humans. D. mauritiana is an island endemic, found in Mauritius. They probably become separated about 500,000 years ago. Genetic analyses can be carried out on these species because even though male hybrids are sterile, female hybrids are fertile. By making fertile female F1 hybrids and backcrossing them to pure species males, one can create large numbers of recombinant progeny that have part of their genome from one species and part from the other species. Parts of the genome from the two species can be followed through the crosses through use of either visible mutations (e.g., eye color mutants) or through the use of molecular markers that distinguish the two species' genomes. One can measure the fertility of these recombinant progeny, and also measure their genotypes to determine which parts of the genome come from D. simulans and which come from D. mauritiana. One can determine which parts of the genome are correlated with sterile individuals. Results from such analyses showed that there are regions of the mauritiana genome that individually have no effect on male sterility, but when present together cause almost complete sterility. This suggests that the genetics of these incompatibilities is complex. Extrapolation from some limited analyses suggest there may be as many as 100 genes contributing the hybrid male sterility between simulans and mauritiana. Considering that most genes show no differences between these species, it would appear that incompatibilities are accumulating quite rapidly, and therefore almost definitely as a result of natural selection rather than as a result of genetic drift.
These species are typical in that when there is partial reproductive isolation between two species, it is almost always the heterogametic sex that is sterile or inviable. In flies and mammals and several other taxa, males are XY, or heterogametic. Of course, it is possible that the sterility and inviability of the heterogametic sex is really a phenomenon of males, rather than being a phenomenon of heterogamety per se. This can be addressed through examination of hybrid sterility in taxa in which females, rather than males are the heterogametic sex. In such taxa, it is female F1 hybrids, rather than male F1 hybrids that tend to be sterile or inviable. This shows that heterogamety rather than sex is the important factor. The fact that the heterogametic sex is usually sterile or inviable in species that are partially isolated is known as Haldane's Rule.
Genetics of species differences
Crosses between species can also be used to understand the genetic
basis of interspecific differences in morphological or other types of
phenotypes.
Research in this area is still in its infancy. One example of
such
an approach comes from Bradshaw and his collaborators.
Two species of Mimulus are quite different in many aspects of their morphology. They are reproductively isolated in nature, primarily because they are pollinated by different pollinators.
Figures showing the two species and their pollinators.

In the greenhouse it is relatively easy to make crosses and produce fertile progeny. These fertile F1 can be selfed to generate large numbers of recombinants that have intermediate phenotypes (figure below).

By assessing correlations between phenotypes and the portion of the
genome that has derived from each of the two species, one can roughly
estimate
the number of genes contributing to the interspecific
differences.
The conclusion from the Mimulus study is that interspecific differences
are not the result of only very few genes, but also are not the result
of extremely large numbers of genes, each of which have very small
effects.
This type of analysis is occurring in several closely related
species pairs in plants and animals. Such studies will eventually
teach us about the number and types of genes that cause phenotypic
differences between species and result in genetic
incompatibilities in hybrids.
PHYLOGENETICS
Phylogenetics is the practice of reconstructing the history of
life.
The histories may be relatively recent, or extremely ancient.
There
are a number of reasons why reconstructing history is
interesting.
First, simply knowing the evolutionary relationships of different organisms is inherently interesting to many people. More generally, phylogenetic trees help us determine the direction of evolutionary change. For example, if we know that a trait differs in two species, we would further like to know the direction of the change of that trait. We can do this by using a phylogeny to infer what the trait was like in the ancestor of the two species. Phylogenies are also very useful in helping us determine the temporal sequence of appearance of various traits. For example, if a particular type of mating system is found with particular morphologies or particular behaviors, phylogenies could help us determine if the behavioral trait evolved prior to, or subsequent to the morphological trait. More generally, phylogenies help us to understand the extent to which the evolution of different traits is correlated (this will become more clear in the next lecture).

A phylogenetic tree showing the inferred
relationships
of HIV viruses derived from a dentist and several of his
patients.
Most of the infected patients were not considered to belong to a
high-risk
group. The similarity of the virus in the dentist and his
patients
support the idea that the patients were infected by the dentist.
The units of phylogenetic analysis are taxa and characters.
Depending
on the type of data and question, taxa can be anything from individuals
within a population, to species, genera, families, etc. The
characters
that can be used for phylogenetic inference are virtually
unlimited.
However, we can usefully divide characters into two types, phenotypic
and
genetic molecular. Phenotypic characters can be any aspect
of an organism's phenotype that can be quantified, including
morphological,
behavioral, etc. Molecular character are almost always nucleotide
sequences or amino acid sequences. Characters have states - you
will see the phrase "character state". Here are two examples of
character states. Imagine the number of rays in the dorsal fin of
a fish is the character. Then the state could be 12 rays or 13
rays - different in different species. A particular amino acid
position in a protein is the character. The character state is
the amino acid state at the position, let's say, leucine vs. valine.
The use of molecular characters has dramatically increased over the
last few years. Therefore, it would be useful to understand the
relative
strengths and weakness of phenotypic vs. molecular characters. It
is important to appreciate that the same types of problems with data
quality occur with phenotypic and molecular characters - so molecules
are not a panacea. However, with molecular data it is often clear
when there is a problem, while problems may be more difficult to detect
with phenotypic data.
Phenotypic characters: one clear advantage is that such characters
can be analyzed for extinct organisms
and museum specimens. Also, these characters require no access to
a molecular biology laboratory, though that isn't a major issue
anymore. Some aspects of phenotypes are difficult to quantify,
which can be a problem. Furthermore, phenotypic characters in
very different species may appear similar because of convergent
evolution. If one unknowingly uses convergently evolved traits,
incorrect phylogenies result.
Molecular characters are easy to measure (quantify). It is
easy to collect
huge amounts of data (think of the number of nucleotides in the genome
- each one could, in principle, be a character). Many
phylogenetic methods rely on a particular model of evolution, and it is
true that models DNA evolution are probably closer to the truth than
models of phenotypic evolution. One very nice feature of
molecular data is that different genes can be picked for different
questions (for
questions of ancient history we prefer more slowly evolving genes,
while
for questions of recent history we prefer more rapidly evolving
genes).
Along the same lines, highly diverged taxa are more easily compared
using molecular characters because highly conserved proteins can be
identified and aligned, while figuring out which phentypes are
homologous can sometime be difficult.
Homology: a homologous
trait is one that is shared between two or more taxa because they
inherited it through a common ancestor. When doing phylogenetic
analysis, we always strive to use homologous characters.
For both types of molecular and phenotypic characters, convergent
evolution may be a problem.
Convergent evolution occurs when two taxa share a character state not
because
of shared ancestry, but rather because of independent evolution to the
same state. For phenotypic characters, convergent evolution
usually
results from natural selection (i.e., adaptation). Though the
same
can be true for sequences, it is probably less likely. For
molecular
data, convergence more typically is a consequence of the fact that the
number of possible character states is relatively small. For
example,
at the nucleotide level, there are only four possible character states,
A, G, C, and T, so two species could share the same base just by
chance,
rather than as a result of common ancestry. This is less of a
problem
for amino acid sequences because there are 20 possible states.
However,
it is often the case that the number of acceptable states for many
amino
acid positions could be much smaller than the theoretical maximum of 20
(because many amino acid changes are probably deleterious mutations,
and
such changes never contribute to species differences).
Some examples of convergence between marsupial and placental mammals

There are three main approaches for reconstructing
phylogenies.
Distance methods summarize the overall similarity between species, for example, the sequence differences for a homologous piece of DNA. Distance methods are based on the idea that species diverences accumulate at a constant rate as time increases. Thus, species that are are more recently split are going to have greater similarity.
Parsimony is based on the idea that the best hypothesis of the phylogenetic relationship for a group of organisms is the one that has the smallest number of evolutionary changes. Parsimony approaches use only shared, derived character states to attempt phylogeny reconstruction. In phylogenetics the principle of parsimony states that if two different hypotheses on the phylogenetic relationships of a group of organisms require different numbers of evolutionary changes of characters, then we should prefer the tree with fewer changes. This would be equivalent to the tree supported by the greatest number of characters. This makes sense, because a recontructed phylogeny is a hypothesis - the history of the taxa cannot be directly observed. If we have correctly identified homologous, shared derived character states in our taxa, then each character should give us the same reconstructed phylogeny. This is because there is only one true evolutionary history, and the patterns of shared derived character states must reflect this one history. In reality, however, different character support different trees. This is a result of convergent evolution, which can be a particular problem if a character (e.g., an amino acid position in a protein) is evolving very fast or is the result of adaptation occurring indenpendently but in a similar way in two or more species (see above). In this case the principle of parsimony would dictate that we favor the tree that is supported by the greatest number of characters.
Maximum likelihood is based
on determining, given a model of evolution (e.g., DNA evolution) and a
data set of homologous characters, which of the possible phylogenies is
the most probable.
One of the major differences between distance methods and parsimony or likelihood approaches is that rate variation is more a problem for distance methods. If the rate of character evolution is constant across lineages and over time, then the amount of time since two species last shared a common ancestor will be reliably correlated with their degree of similarity. However, if the rate of evolution is very heterogeneous, then overall similarity can give a misleading picture of the relationships of the taxa. To understand why this is true consider three species, A, B, and C. A and B are more closely related to each other than either is to C. We would say that the A and B are sister taxa. If the rate of evolution from the common ancestor of A, B and C to each of the three species were the same, then the overall similarity between A and B would be greater than the similarity between A and C, or B and C. Then, the phenetic approach will correctly group A and B together (I say correctly because I already told you that A and B are sister taxa). However, imagine that for some reason, the rate of character evolution rapidly accelerated along the lineage leading to species B, while remaining constant in other lineages. Then B will look very different from and A and C, or in other words, A and C will be more similar to each other than either is to B. Thus, phenetics will incorrectly group A and C together. An advantage of the phylogenetic approach is that it is less susceptible to such rate variation. The reason this is the case is that the method only employs character states that are shared and derived. A shared character state is one which is present in at least two, but not all of the taxa being analyzed. These are referred to as informative characters. The use of only "informative" characters is another way in which phylogenetics differs from phenetics. Phenetics uses all the data, while phylogenetics uses only subsets of the data. Shared, derived character states won't mislead you because the sharing of derived character states MUST reflect patterns of inheritance through common ancestors.
Trees
Phylogenetic trees have two properties, topology and branch
length.
The topology of a tree is a graphical representation of the
evolutionary
relationships of the taxa. That is, the topology is a statement
about
ancestor-descendant relationships. The branch length represents
the
amount of character evolution that has occurred along a lineage.
A node represents a hypothetical ancestor.
Types
of groups
There are three types of groups that scientists can construct when
trying to organize taxa. These are monophyletic, paraphyletic,
and
polyphyletic. The goal of evolutionary biologists should be to
have
all groups be monophyletic. A monophyletic group contains
an ancestor and all of its descendants. In other words, for a
monophyletic
group, all included taxa are more closely related to each other than
any
are to taxa outside the monophyletic group. A paraphyletic
group is a group that includes an ancestor and some, but not all of its
descendants. A good example of a paraphyletic group would be
reptiles.
Reptiles traditionally include snakes, lizards, turtles, and
crocodilians.
It turns out, however, that crocodilians are more closely related to
birds
than to the other reptiles. In other words, in order to make the
group "reptile" monophyletic, either crocodiles should not be grouped
with
reptiles (but with birds), or birds should be grouped with
reptiles.
Paraphyletic groups are often a result of incorrectly relying on shared
ancestral
traits, rather than shared derived traits, when attempting to
reconstruct
phylogenies. Polyphyletic groups result from using convergent
traits
to group organisms. Polyphyletic groups are organisms
that
are grouped together simply because they appear similar as a result of
convergent evolution.
Ancestral states
As mentioned earlier, one of the goals of the tree reconstruction is
to understand patterns of character evolution. In order to do
this
one must "polarize" the characters, or determine which character states
are ancestral and which states are derived. There are three major
ways of doing this.
For trees built using morphological characters,
one possibility is to the use the fossil
record to determine the ancestral
state. Clearly this requires that the taxa/characters of interest
have a good fossil record. Even so, it is possible that the
ancestral
state happened to be fossilized before the derived state.
Another
possibility is to use developmental
biology to try to polarize characters.
The idea is that character states that appear early in development
(rather
than later) tend to represent the ancestral state. Recall, this
is
von Baer's Law, which is often true, but which is sometimes
violated.
Finally, outgroups can be used to infer the ancestral state.
The outgroup method is the most common method for inferring the ancestral character state (remember, the ancestral state is inferred, rather than directly observed). An outgroup is a taxon that is thought to be "outside" the "ingroup" (the group of interest). For example, consider a monophyletic group and a single amino acid position (i.e., character) in a homologous protein. The character has two states, leucine and isoleucine. In other words, some of the ingroup taxa have a leucine at the position, and others have an isoleucine. We might like to know if the variation represents a change from a leucine to an isoleucine, or vice-versa. To infer the state in the common ancestor of the ingroup taxa, we pick an outgroup. The state of the outgroup is inferred to represent the ancestral state of the ingroup. One of the problems of the outgroup method is that it presumes that we have at least a reasonable hypothesis about the taxa that could serve as outgroups. Ideal outgroups are groups that are closely related to the ingroup. The reason this is relevant is because we use the outgroup to infer the ancestral state. If the outgroup is too far "out" then there is a possibility that a lot of character evolution has occurred along the lineage leading to the outgroup. Then, the inference of the ancestral state is less reliable. Similarly, whenever possible, it is best to use two outgroups rather than one. If both outgroups provide the same inference of the ancestral state, then we should have greater confidence that the inference is correct.