
Camels (Camelus, Camelini) contain two extant domestic species, the one-humped dromedary (Camelus dromedarius) and the two-humped Bactrian camel (Camelus bactrianus)1,2. Although the former herds are mainly feed in North Africa and West Asia, the latter herds live in the cold desert areas of Northeast and Central Asia. The wild Bactrian camel (Camelus ferus), the only representative of the wild tribe Camelini as a result of the extinction of the wild dromedary3, is listed as critically endangered by the International Union for Conservation of Nature4 with an estimation of a few hundreds to 2000 individuals5,6. Historically, the wild Bactrian camel was widely distributed throughout Asia, extending from the great bend of the Yellow River westward to central Kazakhstan (KAZA), but it can only be found in the Mongolian Gobi and the Chinese Taklimakan and Lop Noor deserts today7. Fossil and molecular evidence suggested that the ancestor of camels lived in North America and spread to Asia via the Bering land bridge around 11 or 16 million years ago8,9. Within the Camelini, the dromedaries and Bactrian camels were then split around 4 or 5 million years ago9,10. The domestication of camels, similar to many other innovations of domestic mammals such as horse-based transport11, greatly promoted human mobility and represented a great leap forward for human civilization. For example, the Bactrian camels were rightfully considered as the principal means of locomotion across the bridge between the eastern and western cultures in the time of the Silk Road12. Today, they still serve as valuable sources of meat, milk, and wool to people’s livelihoods in arid and semi-arid areas1.

The origin of domestic dromedaries was recently revealed by world-wide sequencing of modern and ancient mitochondrial DNA (mtDNA), which suggested that they were at first domesticated in the southeast Arabian Peninsula13. However, the origin of domestic Bactrian camels remains controversial. One intuitive possibility was that the extant wild Bactrian camels were the progenitor of the domestic form, which were then dispersed from the Mongolian Plateau to the West gradually7,12. This hypothesis was supported by the presence of Camelid faunal remains at Neolithic sites near Mongolia (MG), although it was unclear these were the domestic as opposed to the wild ones12. Nevertheless, molecular studies based on mtDNAs9,14 and Y chromosomes15 discovered dramatic sequence variations between the wild and domestic Bactrian camels, suggesting that the extant wild Bactrian camel was a separate lineage14. Another possible place of origin was Iran (IRAN)1, where early skeletal remains of domestic Bactrian camels (around 2500–3000 BC) were discovered16. Although prehistoric mtDNAs of Bactrian camels supported the idea that the domestication took place in Central Asia rather than in MG or East Asia17, the incomplete archaeological findings and limited molecular markers provided little decisive information about the actual domestication history.

Whole-genome sequences contain much more molecular markers than mtDNAs, which were successfully used to portray the origin, migration, and admixture of humans18,19,20 and domestic animals13). Re-calculation of the pairwise Fst after removing introgression still indicated that IRAN was the most differentiated one (0.04–0.06) among all the domestic populations (Fig. 3b). To gain more insights into the population phylogeny, we reconstructed the NJ tree based on the pairwise Fst and performed the bootstrap test (Fig. 3b). It confirmed that IRAN was the first one to separate among all the domestic Bactrian populations, followed by KAZA and RUS. The Bayesian binary Markov Chain Monte Carlo (MCMC) analysis based on the phylogeny strongly supported Central Asia as the ancestral area of domestic Bactrian camels (probability = 99.78%) and a subsequent dispersal route from Central to East Asia (Supplementary Fig. 14).

a BABA/ABBA analysis for introgression of dromedaries into domestic Bactrian camels. To focus on this introgression, one dromedary with the ancestry of Bactrian camels and three wild camels with the ancestry of domestic ones were removed. Number of 100 kb segments with significant fd (|Z-score| > 2) for each tree configuration is shown. b Neighbor-joining (NJ) tree of the populations after the introgressed segments were removed. The heatmap represents average pairwise Fst for 5.1 × 104 10 kb-sliding windows. Bootstrap values of the NJ tree were calculated by randomly sampling five thousand 10 kb windows for 100 times. c Maximum likelihood tree of full-length mtDNAs. Populations are represented by different colors and sequences from Genbank are indicated by dots. Bootstrap values for main branches are labeled.

As an independent evidence, we also reconstructed the maximum likelihood tree of full-length mtDNAs based on the 128 samples we sequenced in this study, as well as 39 additional samples available from Genbank (Fig. 3c and Supplementary Table 11). Introgression of mtDNAs could easily be identified and excluded from the tree. For example, two newly sequenced mtDNAs from KAZA and RUS were clustered with dromedaries. Within the clade of domestic Bactrian camels, although most camels from different geographic regions were mixed, two mtDNAs from IRAN formed the most basal branches of the domestic populations (Fig. 3c). The Bayesian binary MCMC analysis again supported the Central Asian origin of domestic Bactrian camels (probability = 76.43%).

Demographic history of Bactrian camels

We performed several parametric modeling analyses to infer the demographic dynamics of the camels in history. Consistent with previous study10, the long-term trajectories of Bactrian camels based on the pairwise sequentially Markovian coalescent (PSMC) model41 revealed a tremendous decrease in the effective population size of the ancestral camels earlier than one million years ago (Supplementary Fig. 15). Although the long-term trajectories of the wild and domestic Bactrian camels were generally similar, they were obvious to diverge from each other as early as 0.4 million years ago, excluding the former as direct progenitors of the latter as previous mtDNA analyses9,14.

To explore the divergence time among the camel populations in more detail, we used the generalized phylogenetic coalescent sampler (G-PhoCS)42. Given the phylogeny of the camel populations, G-PhoCS could estimate the mutation-scaled population size and population divergence time based on unlinked neutral loci in individual genomes from each population. To reduce the model complexity, we only included dromedaries, wild Bactrian camels, and three representative populations (IRAN, KAZA, and MG) of domestic Bactrian camels (Supplementary Fig. 16 and Supplementary Table 12). According to Fig. 3b, IRAN and KAZA were the first two Central Asian populations to separate and the split of MG could indicate the dispersal from Central to East Asia. The age was calibrated by assuming the Bactrian camel and dromedary divergence of 5.73 million years according to the Timetree database43. When no migration band was incorporated, convergence of all parameter estimates could easily be achieved (Supplementary Fig. 17 and Supplementary Table 13). Similar to the PSMC results, the effective population size was generally decreased from ancestral to modern populations (Fig. 4). The divergence time between wild and domestic Bactrian camels was estimated as 0.43 million years ago (95% confidence interval [CI]: 0.13–0.73 Mya; Fig. 4), which was slightly later than that based on mtDNAs (0.714 or 1.1 Mya9). Among the domestic populations, IRAN was separated from others about 4.45 thousand years ago (95% CI: 0.07–17.6 Kya) and then the Central and East Asian populations were separated about 2.40 thousand years ago (95% CI: 0.01–7.84 Kya; Fig. 4).

The change in mutation-scaled effective population size θ is represented by heat colors. The time in years were calibrated by the divergence time between dromedaries and Bactrian camels. 95% Confidence intervals are shown by bars on the time axis. The red and blue bar indicate IRAN-MG and KAZA-MG divergence, respectively. These estimates are based on the model without migration.

To allow for gene flow, we also tried to introduce migration bands from dromedaries to Bactrian camel populations (Supplementary Fig. 16 and Supplementary Table 12). The estimates could only converge when a migration band from Iranian dromedaries to IRAN and a migration band from a ghost population to KAZA were introduced (Supplementary Fig. 18 and Supplementary Table 13). Although the divergence time between wild and domestic Bactrian camels was not changed with the migration model (0.46 Mya, 95% CI: 0.24–0.71 Mya), the first divergence time of the domestic populations (0.19 Mya, 95% CI: 0.08–0.31 Mya) became unrealistic, because it was far beyond the known history of livestock domestication (11.5 Kya44). Besides, the total migration rate was only estimated as 0.27% and 0.16% for the migration band to IRAN and KAZA, respectively, much lower than that estimated with Admixture (1–10%). A possible reason for the poor estimation would be a more complex admixture history than the continuous migration model with constant rates assumed by G-PhoCS.


In this study, we characterized for the first time the whole-genome variations of camels across Asia, including domestic Bactrian camels representing a major subset of recognized breeds, extant wild Bactrian camels as well as dromedaries. As the extant wild Bactrian camels are going towards extinction, our research provided extremely valuable genetic resources of the living fossil. Also, considering the extensive utilization of domestic camels in transportation, milk and wool production, our data provided new options to implement genetic association study and marker-assistant selection for improving livestock productivity and future breeding effects. In addition, these data provided an unprecedented opportunity to trace the origin and migration of domestic Bactrian camels in history.

Previous studies found limited archaeological records and molecular markers for the first domestication of Bactrian camels in Central Asia rather than in East Asia17. Here we provided more solid evidences on the basis of the whole-genome sequences. The earliest branching among the domestic Bactrian camels occurred between IRAN and all the others, which was followed by the split between the Central and East Asian populations. Although evident introgression of dromedaries was observed in Central Asia, we demonstrated that it will not influence our results by removing the introgressed genomic segments. In contrast, although the extant wild and domestic Bactrian camels share close habitat in MG, our whole-genome analyses gave a coherent result to other mtDNA analyses that the two populations were separated by so long a time that the latter were not likely to originate from the former9,14. Furthermore, the extant wild camels contributed little to the gene pool of domestic populations, implying that the domestic populations in MG could possibly be immigrated there during more recent periods.

Based on these results, we proposed a comprehensive scenario for the origin and migration of the Bactrian camels (Supplementary Fig. 19). After the ancestor of camels moved from North America and split into dromedaries and Bactrian camels, the wild Bactrian camels spread from East to Central Asia about 0.43 million years ago (95% CI: 0.13–0.73 Mya). The Bactrian camels were first domesticated in Central Asia before 4.45 thousand years ago (95% CI: 0.07–17.6 Kya), which were then migrated back to East Asia around 2.40 thousand years ago (95% CI: 0.01–7.84 Kya) with the increasing economic exchange and cooperation between the West and East. This scenario could resolve the mystery why the wild and domestic Bactrian camels from the Mongolian Plateau have such a large genetic distance. It should be noted that the timing of the events here were based on the model without admixture. Considering that the domestic Bactrian camels in Central Asia were further hybridized with dromedaries out of Arabia, the origin and migration age of the domestic Bactrian camels would be overestimated because of the increased genomic divergence.

Despite the insights gleaned from our data, it was important to note that the direct wild progenitor of domestic Bactrian camels were not found in Central Asia now and may no longer exist. However, there were records suggesting that the wild Bactrian camels were more widely distributed throughout Asia in history, extending from the great bend of the Yellow River westward to central Kazakhstan7. In future work, sequencing of ancient genomes from camel fossils will add to the picture of their early domestication. Another issue in our study was that although the occurrence of gene flow between dromedaries and Bactrian camels in Central Asia was convincingly detected, the actual admixture history remained largely unknown. First, the TreeMix analysis suggested that although the Iranian Bactrian camels and dromedaries were directly mixed, those from KAZA and RUS appeared to be mixed with a ghost population (Fig. 2c). Second, when the excessive shared alleles between Iranian dromedaries and Central Asian Bactrian camels were removed, KAZA continued to have reduced divergence from the dromedaries compared with the other populations (population tree in Fig. 3b). This branching pattern was also consistent with the ghost admixture model for KAZA. Third, a continuous migration model with constant rates implemented by G-PhoCS could only capture a small fraction of admixture, even though a ghost population was assumed. These results hinted at a more complex and possible multistage admixture history between Bactrian camels and dromedaries. As we only had a few dromedaries from IRAN, a future attempt to collect dromedaries from more diverse populations could help to decipher the admixture history.


Sample preparation

Blood samples of 105 domestic Bactrian camels were collected from villages in China (55), MG (28), KAZA (6), RUS (10), and IRAN (6). Blood samples of four dromedaries were also collected from IRAN. The collections were made during routine veterinary treatments with the guidelines from the Camel Protection Association of Inner Mongolia. An endeavor was made to collect samples from unrelated individuals based on the information provided by the owners and local farmers. We collected 50 ml blood for each camel from the jugular vein after disinfection treatment, placed it in EDTA anticoagulant tubes, and then stored it at −80 °C. Ear skin samples (0.5 cm) of 19 wild Bactrian camels were collected from the Great Gobi-Strictly Protected Area A in MG. The wild Bactrian camels chosen were artificially reared and the research was reviewed and approved by the Great Gobi National Park. Proper surgical procedures were adopted in the collection. Local anesthesia (5% procaine hydrochloride) was applied to the ear and the wound was disinfected with iodophor and sulfonamide powder. The samples were eluted with phosphate-buffered saline solutions, placed in cryotubes and were stored at −80 °C.

Genome sequencing

The genomic DNA was extracted from 200 μl blood samples with the QIAamp DNA Blood Mini kit (Qiagen) and from the skin samples with a standard phenol–chloroform method. The quality and integrity of DNA was controlled by OD260/280 ratio and agarose gel electrophoresis. For sequencing library preparation, the genomic DNA was sheared to fragments of 300–500 bp, which were then end repaired, ‘A’-tailed, and ligated to Illumina sequencing adapters. The ligated products with sizes of 370–470 bp were selected on 2% agarose gels and then amplified by PCR. The libraries were sequenced on Illumina HiSeq platform with standard paired-end mode.

Variant calling

We used an in-house script to perform quality control on raw sequencing reads. Low-quality reads with ambiguous bases >10% were excluded. The 3′-ends with base quality score <20 were trimmed and reads with length <35 bp were removed after trimming. Trimmed reads were mapped to the reference genome assembly of the Bactrian camel ( using BWA-MEM (v0.7.12)45 for each individual and then processed with SAMtools (v1.3.1)46. We followed the GATK pipeline (v3.2–2)47 for variant calling. First, PCR duplicates were removed using Picard tools (v1.135) and local indel realignment were performed. Second, SNPs and small indels were called with UnifiedGenotyper across all 128 individuals simultaneously. Finally, the raw variants were filtered with the following criteria: variant quality score >40, sequencing depth summing all individuals >200 and <5000, minor allele frequency >1%, variants with <20% individuals with missing genotypes, root mean square of map** quality >30, and biallelic variants. Total number of SNPs were reduced from 17.76 to 13.83 million after filtering. Functional annotation of variants were performed with ANNOVAR (v2013-06-21)48 according to RefSeq (

Population statistics and structure

Summary population statistics, including pairwise nucleotide diversity π, Watterson’s θ, and Weir’s Fst across 10 kb-sliding windows were calculated by VCFtools (v0.1.12b)49. Pairwise kinships between the samples were inferred by KING (v2.1.3)50 and one of the paired individuals with close relationship was removed. For population structure analyses, SNPs in approximate linkage disequilibrium with each other were pruned by PLINK (v1.07)51 (–indep-pairwise 50 5 0.5). SNPs located within exons and flanking 1 kb regions were also excluded. As a result, 2.08 million SNPs were preserved. MDS and pairwise distance matrix based on IBS were calculated using the –mds-plot 4 and –distance-matrix option in PLINK, respectively. The distance matrix was used to construct the NJ tree by Phylip (v3.69)52. One hundred random datasets were generated with –thin 0.1 option in PLINK and bootstrap values were retrieved from the consensus tree reconstructed by Phylip. The population ancestry was inferred by ADMIXTURE (v1.3.0)35 with a fast maximum likelihood method. The optimum number of ancestral clusters K was estimated with the fivefold cross-validation procedure.

TreeMix analysis and admixture tests

Migration events among camel populations were inferred using TreeMix (v1.12)37 with migration number m = 0–5. The threepop/fourpop module from the TreeMix package was used to perform the F3/F4 test38,53 with -k 500. In the F3 test (Z; X, Y), one focal population (Z) was tested as a mixture of population X and Y. A large negative value of F3 score (standardized to Z-score with the Jackknife procedure) would indicate a very strong signal of Z as a mixture of X and Y. In our analysis, we ran F3 tests with all configurations of the populations. In the more sensitive F4 test (Y, Z; W, X), where W is an outgroup of Y and Z, the admixture bias between Y and Z with X was tested. If Y (or Z) have more admixture with X, it will show significant negative (or positive) F4 score (standardized to Z-score with the Jackknife procedure). To focus on the admixture between the domestic Bactrian camels and dromedaries, we set the population configuration as (Y, Z; wild, drom), where Y and Z were two domestic populations.

Local introgression test

To select the local genomic regions with significant introgression between dromedaries and Bactrian camels after their divergence, we used an in-house script to perform the BABA/ABBA test39 across 100 kb-sliding windows. For the tree configuration (Y, Z; W, X), the original Patterson’s D statistic can be calculated as a normalized F4 score53:

$$D = \frac{{E\left( {\left( {p_{\mathrm{Y}} - p_{\mathrm{Z}}} \right)\left( {p_{\mathrm{W}} - p_{\mathrm{X}}} \right)} \right)}}{{E\left( {\left( {p_{\mathrm{Y}} + p_{\mathrm{Z}} - 2p_{\mathrm{Y}}p_{\mathrm{Z}}} \right)\left( {p_{\mathrm{W}} + p_{\mathrm{X}} - 2p_{\mathrm{W}}p_{\mathrm{X}}} \right)} \right)}}$$

where pX is the frequency of a given allele in population X and the expectations E() are estimated by averaging all SNPs in a window. The more robust fd statistic for local genomic regions, which is a special form of F4 ratio and directly measures the proportion of introgression40, can be formulated as:

$$f_{\mathrm{d}} = \left\{ {\begin{array}{*{20}{c}} {\frac{{E\left( {\left( {p_{\mathrm{Y}} - p_{\mathrm{Z}}} \right)\left( {p_{\mathrm{W}} - p_{\mathrm{X}}} \right)} \right)}}{{E\left( {max\left( {\left( {p_{\mathrm{X}} - p_{\mathrm{Y}}} \right)\left( {p_{\mathrm{X}} - p_{\mathrm{W}}} \right),\left( {p_{\mathrm{Z}} - p_{\mathrm{Y}}} \right)\left( {p_{\mathrm{Z}} - p_{\mathrm{W}}} \right)} \right)} \right)}}\quad (D \, > \, 0)} \\ {\frac{{E\left( {\left( {p_{\mathrm{Y}} - p_{\mathrm{Z}}} \right)\left( {p_{\mathrm{W}} - p_{\mathrm{X}}} \right)} \right)}}{{E\left( {max\left( {\left( {p_{\mathrm{X}} - p_{\mathrm{Z}}} \right)\left( {p_{\mathrm{X}} - p_{\mathrm{W}}} \right),\left( {p_{\mathrm{Y}} - p_{\mathrm{Z}}} \right)\left( {p_{\mathrm{Y}} - p_{\mathrm{W}}} \right)} \right)} \right)}}\quad (D \, < \, 0)} \end{array}} \right.$$

We used the population configuration (East Asian, Central Asian; wild, drom) to perform the test. The fd statistic in each window was evaluated by the Z-score with the Jackknife procedure:

$$Z(f_{\mathrm{d}}) = \frac{{E(\widehat {f_{\mathrm{d}}})}}{{\sqrt {var\left( {\widehat {f_{\mathrm{d}}}} \right) \times n} }}$$

where \(\widehat {f_{\mathrm{d}}}\) is estimated with a 10 kb block removed each time and n is the repetition times.

Population phylogeny and mtDNA analysis

Population distance was measured with average Fst across 10 kb windows. To minimize linkage disequilibrium and perform the bootstrap test, five thousand 10 kb windows located at least 100 kb apart were randomly sampled for 100 times. The NJ tree and consensus tree were reconstructed by Phylip52. We complied the full-length mtDNA sequences of camels from those we sequenced in the study and those collected from GenBank. The sequences were aligned using ClustalW54. The control regions were deleted, because they were missing in many sequences and not well aligned. The maximum likelihood tree was constructed using MEGA (v6.06)55 with 1000 random bootstrap runs. The Tamura-Nei model and uniform substitution rates among sites were adopted. The ancestral area inference was performed with the Bayesian binary MCMC method implemented in RASP (v4.0)56. The MCMC was run for 50,000 iterations with 100 iterations between two samples and the first 100 samples were discarded.

G-PhoCS analysis

To prepare for the G-PhoCS (v1.3)42 input, we implemented the following filters to the genome to reduce the effects of selection and sequencing errors: exons and 1 kb flanking regions; gap regions in the genome assembly; and regions with repeat sequence annotations. Altogether, 47% of the genome were excluded. We then randomly collected ten thousand 1 kb loci located at least 30 kb apart, to ensure sufficient inter-locus recombination. Multiple sequence alignments for the loci from individual genomes per population were retrieved by vcf consensus in VCFtools49, with heterozygous genotypes represented by the International Union of Pure and Applied Chemistry code and missing genotypes masked by “N.” Recommended gamma priors were used in the G-PhoCS analysis for the mutation-scaled population size θ, population divergence time τ, as well as migration rate m. The MCMC was run for 100,000 burn-in iterations and 500,000 sampling iterations with 10 iterations between two traced samples. The automatic fine-tuning procedure was done during the first 10,000 iterations. The convergence and mixing of the MCMC trace were monitored by Tracer (v1.6, available from Because of the stochastic nature of the MCMC algorithm, we tested the models on independent datasets and accepted the results only if two independent runs achieved similar estimates. We explored four models as follows: no migration (model 1); a single migration band from the dromedary to IRAN (model 2); two migration bands from the dromedary to IRAN and KAZA, respectively (model 3); and a migration band from the dromedary to IRAN and another from a ghost population to KAZA (model 4). All loci of the ghost population were set as “N.” Only the models 1 and 4 showed convergence within ten independent runs. The time scale in years was calibrated according to a consensus divergence time of Bactrian camels and dromedaries (5.73 Mya in TimeTree43). The total migration rate per band M was calculated with M = m, where m was the mutation-scaled migration rate per generation and τm was the mutation-scaled time span of the migration band.

Statistics and reproducibility

Standard statistical tests were performed with R (v3.4.2). Specifically, the count of population-specific variants was compared between East Asia (n = 4) and Central Asia (n = 3) with the two-tailed t-test. The nucleotide diversity π was compared between the populations with the two-tailed t-test based on twenty thousand 10 kb windows separated by 100 kb with each other. The ANOVA of missing count of variants was performed for domestic Bactrian camels (n = 105), wild Bactrian camels (n = 19), and dromedaries (n = 4), with sequencing depth as a covariant.

