Measuring genetic diversity from molecular data | | Print | |
Determining genetic structure and genetic variability between and within breedsTo understand the influence of selection, mating systems and other breeding interventions in population genetics, it is important to describe and quantify the amount of genetic variation in a population and the pattern of genetic variation among populations. Genetic variation may be measured at various levels, e.g. allelic variation at structural loci (see Module 2, Section 3). Genetic variation within breeds decreases as a result of selection for economically important traits yet genetic variation between and within breeds is important as raw material for genetic improvement. Populations showing a great deal of variation will be able to adapt to changing circumstances whereas populations with less genetic variability will be less adaptable to sudden environmental changes. Allele frequency determination and allelic variabilityThe frequencies of an allele at loci are calculated manually by direct counting. The mean number of alleles (MNA) observed over a range of loci for different populations is considered to be a reasonable indicator of genetic variation. This holds true provided that the populations are at mutational-drift equilibrium and that the sample size is almost the same for each population. Breeds with a low MNA have low genetic variation due to genetic isolation, historical population bottlenecks or founder effects. A high MNA implies great allelic diversity which could have been influenced by cross breeding or admixture. Bar charts can be created for individual breeds to show variability in allelic distributions at loci. Given that sample sizes are never the same for each population analysed, other indicators of allele variability include the effective number of alleles (ENA) and allelic richness (Ar). ENA denotes the number of equally frequent alleles it would take to achieve a given level of gene diversity. It allows one to compare populations where the number and distribution of alleles differ drastically. Ar, however, is a measure of the number of alleles per locus but allows comparisons to be made between samples of different sizes by using the rarefaction technique or a Bayesian simulation approach to standardize populations to a uniform sample size. Variation in gene frequenciesThe variation in gene frequencies at each locus can be used to determine genetic variability between breeds. Chi square analysis is used to test differences among loci and breeds. Variation in genotype frequenciesVariability between breeds can be measured using the observed genotypes at each locus and between pairs of breeds. The assumption of independent distribution of genotypes over all breeds can be tested by contingency Chi square analysis. Comparisons between pairs of breeds are performed. Testing for Hardy-Weinberg equilibriumMost deductions about populations and quantitative genetics depend on the relationship between gene frequencies and genotype frequencies. A population is said to be in Hardy-Weinberg equilibrium (HWE) when gene and genotype frequencies remain constant from generation to generation. There are factors which can cause changes in these frequencies (e.g. selection, migration and mutation) resulting in non-random union of gametes. Deviation from HWE in a population indicates possible inbreeding, population stratification and sometimes problems with the genotyping. In populations where individuals may be affected by particular ailments or may be under different selective pressures, these deviations can also provide evidence for association. The data required to perform HWE tests are gene and genotype frequencies and the size of sample population at each locus. The deviation from HWE can be tested using any one of the following three methods:
With the exception of the Bayesian approach, GENEPOP, FSTAT, ARLEQUIN and the R-programming language can be used to test for HWE. Estimating average heterozygosityHeterozygosity is a measure of genetic variation within a population. High heterozygosity values for a breed may be due to long-term natural selection for adaptation, to the mixed nature of the breeds or to historic mixing of strains of different populations. A low level of heterozygosity may be due to isolation with the subsequent loss of unexploited genetic potential. Locus heterozygosity is related to the polymorphic nature of each locus. A high level of average heterozygosity at a locus could be expected to correlate with high levels of genetic variation at loci with critical importance for adaptive response to environmental changes (Kotzé and Muller, 1994). The observed heterozygosity is defined as the percentage of loci heterozygous per individual or the number of individuals heterozygous per locus. Average heterozygosity at each locus and for each breed can be estimated from allele frequencies at each locus. Individual breed average heterozygosity is estimated by summing heterozygosities at each locus and averaging these values over all loci. Locus heterozygosity is estimated by summing the heterozygosity at all loci for each breed and averaging this quantity over all breeds. The expected heterozygosity (also called gene diversity) is calculated from individual allele frequencies (Nei, 1987). The FSTAT (Goudet, 1995), GENETIX (Belkhir et al., 1996-2004), R-package, Microsatellite Analyzer (Dieringer and Schlštterer, 2003) and MSTollkit (Park, 2001) computer programs can be used to estimate both observed and expected heterozygosity per locus and population and across all populations analysed. Estimating levels of inbreedingMolecular data can also be used to estimate inbreeding values even though there are factors other than descent for two markers to be similar. Observed and expected heterozygotes at different loci can be used to estimate the extent of inbreeding. The locus inbreeding coefficients are averaged to estimate average inbreeding coefficients for each population. Inbreeding coefficients should only be estimated for breeds which show significant deviation from the HWE. A large value reflects the existence of a small number of heterozygote genotypes and an excess of homozygote genotypes. A small value indicates the occurrence of heterozygote genotypes at a higher proportion than the homozygote genotypes. Genetic differentiationPopulation differentiation can be assessed by determining whether allelic composition is independent of population assignment (Raymond and Rousset, 1995a). The statistical test is based on analysis of contingency tables using a Markov Chain procedure to derive an unbiased estimate of the exact probability of being wrong in rejecting the null hypothesis, i.e. allelic composition is independent of population assignment (no differentiation). The test is performed for pair-wise inter-population comparisons on contingency tables containing data from each of the microsatellite loci studied. The FSTAT, GENETIX and POPULATIONS statistical program’s can be used to perform the computations. Analysis of gene flow, genetic admixture and structure
Tests for linkage disequilibriumLinkage disequilibrium (LDE) is the non-random association between different loci which may arise from: (i) admixture of populations with different gene frequencies; (ii) chance in small populations (e.g. endangered breeds); (iii) selection favouring one combination of alleles over another; or (iv) the close association between markers in the same linkage group (Falconer and Mackay, 1996). A test can be carried out to check for the existence of the association between markers studied. The null hypothesis for the LDE test is that all the genotypes at one locus are independent of those at another locus. The GENEPOP program (Raymond and Rousset, 1995b) and FSTAT (Goudet, 1995) can be used to test for LDE. The program prepares contingency tables for all pairs of loci in each population and in a pooled sample of all populations. Then a probability test (or Fisher exact test) for each table using the Markov chain method to obtain P-values is performed. Distribution of genetic diversity (population differentiation)When a population is divided into subpopulations, there is less heterozygosity than there would be if the population was undivided. Founder effects acting on different subpopulations generally lead to subpopulations with allele frequencies that are different from the larger population. Since allele frequency in each generation represents a sample of the previous generation’s allele frequency, there will be greater sampling error in these small groups than there would be in a larger undifferentiated population. Hence, genetic drift will push these smaller demes toward different allele frequencies and allele fixation more quickly than would take place in a larger undifferentiated population. There are two commonly used approaches to quantify the distribution of genetic diversity within and between populations.
AMOVAs can be used to: (1) describe the partitioning of genetic variation among and within groups; and (2) test user-defined groupings of populations. AMOVA differs from a simple analysis of variance (ANOVA) in that data are arranged hierarchically and mean squares are computed for groupings at all levels of the hierarchy. This allows for hypothesis tests of between-group and within-group differences at several hierarchical levels. |
Last Updated on Thursday, 03 November 2011 09:31 |