Background

Advancement in population and evolutionary genetic research has been accompanied by – or perhaps better phrased – been a consequence of continuous improvement in the way genetic similarity or dissimilarity between genomes is assessed. Seen in long time perspective, genetic marker methodology has evolved from focusing on phenotypes, via immunological parameters and proteins, to genotypes. Following their introduction to the study of natural populations about 15 years ago [13], microsatellites or short simple tandem repeats have been the genotype-based marker approach of choice for many applications where the relatedness between individuals, populations or species is sought. Preceding and subsequently in parallel to this, non-repetitive DNA sequence variation has been assessed through various approaches, including DNA sequencing, restriction fragment length polymorphism (RFLP) analysis, single strand conformation polymorphism (SSCP) analysis, random amplified polymorphism detection (RAPD) and amplified fragment length polymorphism (AFLP) analysis [4]. More recently, single nucleotide polymorphisms (SNPs) are increasingly finding their application in studies of natural populations [5, 6].

The benefits of microsatellites are several and well-known. They are multi-allelic, show high heterozygosity and are relatively easy to analyse at moderate cost. Because of the high polymorphism information content, a rather limited number of markers suffice for many applications in molecular ecology and population genetics. It is usually not too difficult to isolate the required markers from DNA libraries [7] or to employ markers originally developed for related species [8]. SNPs merit as genetic markers for other reasons. They are very common, with genomic densities outnumbering that of microsatellites by orders of magnitudes. Large numbers of individuals may be genotyped at large number of loci by simple and fast automatic methods, and data interpretation is usually straightforward [5, 9]. Moreover, SNP variation at protein-coding genes and in other functionally constrained regions of the genome is likely to form the main genetic background to phenotypic variation. Furthermore biallelic SNPs evolve in a manner well described by simple mutation models. There are good reasons to believe that they in many cases will gradually come to replace the use microsatellites in molecular ecology and population genetics/genomics research [6].

Unfortunately, however useful, both microsatellites and SNPs suffer from some shortcomings. The complex and heterogenous mutation pattern of microsatellites [10] introduces ambiguities to further data analysis. Genoty** errors may occur because of stutter bands and technical artefacts (allelic dropouts, null alleles, false alleles, size homoplasy) [11]. As for SNPs, many more markers are needed to get the same amount of information [6, 9]. Moreover, despite the many elegant genoty** methods available [9], most of them are relatively costly at small or medium scales, and requires special equipment for high-throughput genoty**.

With a few years' lag phase, the introduction of new genetic markers to the study of natural populations has generally followed methodological developments made in the genetic analysis of model organisms [4]. Currently, there is an increasing focus on polymorphisms of the type short insertions and deletions (indels) in genomic research of humans [12, 13] and model species such as Drosophila melanogaster [14] and chicken G. gallus [15]. Indels have been recognised as an abundant source of genetic markers that are widely spread across the genome, though not as common as SNPs. For instance, Mills et al. [13] used data from re-sequencing surveys to identify 415,436 indels segregating in human populations and they estimated that among the total number of >10 million polymorphisms known in humans, some 1.5 million represent indels. Clearly, this indicates that indels could form a very common class of genetic markers also in non-model species and this is particularly so given that genetic diversity in many natural populations typically seems to be higher than in humans [5, 6, 16]. Most importantly, indels can be genotyped with simple procedures based on size separation. Another advantage is the minuscule chance of two indel mutations of exactly the same length happening at the same genomic position, meaning that shared indels can confidently been seen as representing identity-by-descent [cf. [17]].

In this study we present a test of the usefulness of indel markers in natural populations. We use a bioinformatics approach to survey dog shot-gun reads [Full size table

Using conventional genoty** based on fragment length separation in a DNA sequencing instrument, 81 out of the 94 putative markers were found to be polymorphic in a screening of 7 dogs and 76 of them were polymorphic in a global sample of 18 wolves (Figure 1A). As PCR primers were designed to generate amplicons of varying size within the 70-120-bp interval, combinations of multiplex reactions (three markers per PCR) were readily formed. This allowed simultaneous amplification, and consequently simultaneous genoty** within a single capillary, of several markers even using the same fluorofore (Figure 1B).

Figure 1
figure 1

(a) Genoty** of a 4-bp indel locus in wolves showing (upper panel) a homozygote for the longer allele, (mid panel) a heterozygote and (lower panel) a homozygote for the shorter allele. (b) Multiplex amplification and simultaneous genoty** in a single capillary of five indel markers in one individual heterozygous for all these markers. The long and short alleles of marker 1–5 are labelled. All markers show some form of extra fragments that likely represent PCR artefacts. These may either be shorter (marker 2) or longer (marker 3–5) than the amplified allele, alternatively both shorter and longer (marker 1).

In wolves, 74% of the polymorphic loci had a minor allele frequency of >10% and 49% of >20%. The average observed and expected heterozygosities were respectively 19.4% and 26.1% in wolves, while they were 26.8% and 35.5% in dogs. The distribution of wolf heterozygosities is shown in Figure 2.

Figure 2
figure 2

Distribution of observed heterosygosities at 94 indel loci genotyped in 18 wolves from five populations worldwide.

The 76 indels found to be polymorphic in the global sample of wolves were subsequently genotyped in 27 wolves from a Swedish population. Fifty-one loci were polymorphic and showed an observed mean heterozygosity of 25.3%, or 17.0% if including all 76 markers. The same wolves were also genotyped for a set of 20 microsatellites known to be informative in this population [e.g. [21]]. Expected heterozygosities for these loci ranged between 28–75%. There was a positive correlation between mean heterozygosity at indel and microsatellite loci in individual wolves (r2 = 0.41, P < 0.001; Figure 3).

Figure 3
figure 3

Correlation between average observed individual indel (51) and microsatellite (20) heterozygosities in 22 Swedish wolves.