Introduction

It is generally agreed that the peopling of East Asia resulted mainly from the Late Pleistocene south-to-north migrations, initiated from Southeast Asia by the earliest settlers after they migrated from Africa via a coastal route at ~60 kilo-years ago (kya)1,2,3,4,5,6,7. However, so far it remains elusive how these initial settlers migrated into the interior of East Asia. Although it is plausible that modern humans kept adopting the coastal route and moved along the coastline of the ancient Sundaland, reaching and finally leading to the settlement of East Asia, an alternative possibility that the settlers might have adopted an inland route into the interior of East Asia via river valleys could not be ruled out. Actually, as suggested by GIS-based analysis, river valleys had likely played an important role in populating the interior of South Asia by modern humans8. This opinion echoes with the suggestion that the major river systems in the northern mainland Southeast Asia, such as Ayeyarwady, Salween and Mekong, created diverse environments and paths for human dispersal and thus were of great help for early hominine adaptation9. The modern human cranium excavated from northern Laos, dated to 51–46 kya10, evinced the very early presence of modern humans in the interior of Southeast Asia and lent further support to this hypothesized inland dispersal scenario that likely occurred in the Late Pleistocene.

Unfortunately, so far no genetic trace of this inland dispersal(s) was observed, notwithstanding much progress has been achieved on dissecting the genetic landscapes in East Asians2,3,7,11,12,13,7,15,16,17,22. One possible reason is that only a few study took into account the genetic data from both East and Southeast Asian populations; more importantly, another reason may be attributable to the scarcity of genetic information from Myanmar, the largest country in mainland Southeast Asia which locates at the junction connecting South, Southeast and East Asia. Although Myanmar likely served as the corridor where the initial settlers had adopted to enter and colonize southeastern Asia during their migration along Asian coast4,6, previous studies focused either on the genetic structure of some ethnic populations18 or on the distribution of a single haplogroup (viz. M31) in Myanmar19.

Therefore, if this hypothesized ancient inland dispersal route did exist, Myanmar likely served as the corridor. In fact, the two major rivers (i.e. Ayeyarwady and Salween) in Myanmar can trace their upstream back to southwestern China. The existence of such river valleys would facilitate the potential population movement northwards into the interior zones. Coincidentally, our recent study has observed the enrichment of a number of new basal mtDNA lineages in southwestern China (especially Yunnan Province) and suggested this region likely to be the genetic reservoir of the modern humans after they entered East Asia13, further favoring the possibility of directly genetic contribution from Southeast Asia, say Myanmar, to southwestern China possibly occurred in the Late Pleistocene.

Results

Classification of mtDNA sequences in Myanmar populations

As shown in Supplementary Table S1 online, among the 845 Myanmar mtDNAs which were analyzed for their control-region and additional coding-region sites, the majority (532/845, 62.96%) could be allocated unambiguously into East Eurasian haplogroups, such as D, G, M7-M13, A, N9a, R9 and B4,11,13,16,20,21,22,23,24, whereas 4.26% (36/845) were assigned into haplogroups of South Asian ancestry6,25,26,27,28,29. Surprisingly, a high frequency of samples (269/845, 31.83%) could not be recognized based on control-region variation and partial coding-region information. Completely sequencing 64 representatives of these unrecognizable mtDNAs revealed that 225 of them belong in fact to certain sub-clades of the already defined haplogroups, e.g., M4, M5, M7, M20, M21, M24, M30, M33, M35, M45, M46, M49, M50, M51, M54, M55, M58, M60, M72, M76, M90, M91, R22, R31, N21 and HV (Supplementary Fig. S1 online); whereas the rest 44 mtDNAs were proven to represent 3 so far undefined basal lineages, for which could not find any sister clades after compared with over 20,666 mtDNA genomes worldwide (mtDNA tree Build 1630; http://www.phylotree.org/) and therefore were named as M82, M83 and M84 here (Supplementary Fig. S1 online).

Genetic relationship of Myanmar populations and their surrounding groups

After unambiguously determining all the 845 Myanmar mtDNAs under study, the proportion of haplogroups of East Eurasian (66.51%) and South Asian (17.40%) ancestries remains stable in the whole Myanmar population. The East Asian-prevalent haplogroups (i.e., M9a, A, D4, G and C2,11,Supplementary Table S1 online) which were chosen under the guideline of newly obtained mtDNA genome information (Supplementary Fig. S1 online). For any mtDNA whose phylogenetic status could not be identified yet, further complete sequencing work will be carried out. When naming the newly identified novel basal lineages, we followed the nomenclature listed in PhyloTree website (mtDNA tree Build 1630; http://www.phylotree.org/) and our recently suggested haplogroup scheme46. For haplogroups of interest, 28 additional representative samples were also chosen for complete sequencing. The experiments were carried out in accordance with the approved guideline of Chinese Academy of Sciences.

Data quality control

To ensure the quality of the complete genome data, our previously proposed quality-control measures, such as independent amplification, detecting errors by phylogenetic analysis and matching or near-matching method were followed6,11,28. Furthermore, to avoid the amplification of NUMT47 as well as the problem of artificial recombination that is easily introduced when dozens of primer sets are involved48, four pairs of PCR primers were designed to amplify the whole mtDNA genome. Each fragment was amplified independently. The amplified fragments, each of which contains more than 400bp overlap** regions with its neighbors49, were sequenced by use of 48 inner primers (12 for each fragment) reported in our previous studies6,20.

Data analyses

To facilitate the comparison with the reported data from the neighboring populations and thus to distill the genetic trace left by the Pleistocene inland immigrants, previously published mtDNA data, including 1,524 mtDNAs from mainland Southeast Asia and 3,120 mtDNAs from East Asia, 690 mtDNA sequences from northeast India and 302 from Bangladesh, were taken into account. In addition, the recently published Myanmar data (116 Barma and 155 Karen18) were also considered (Fig. 4 and Supplementary Table S6 online).

The PCA and CA were performed based on haplogroup frequencies by using Statistical Package for the Social Sciences (SPSS) 16.0 software. Reduced median network for each basal haplogroup was constructed manually and checked by the program NETWORK 4.510 (www.fluxus-engineering.com/sharenet.htm). Phylogenetic tree of haplogroup was reconstructed manually based on complete sequences and confirmed by mtphyl software. Contour maps of spatial frequencies of haplogroup were constructed using Kriging linear model of Surfer 8.0 (Golden Software Inc. Golden, Colorado, USA). Spatial analysis was performed using the PASSAGE software packet. Moran's I metrix was applied in correlogram analysis50. The time to the most recent common ancestor (TMRCA) of a haplogroup was estimated using ρ statistic method as described previously51. Nei's dA genetic distances52 and AMOVA were calculated by using the package Arlequin 3.11. Admixture estimation was performed by the Weighted Least Squares (WLS) Method53 using SPSS 16.0.

Additional Information

Accession numbers: All of the sequences obtained in the present study have been deposited into GenBank, with accession numbers KP345975-KP346066 for whole mtDNA genomes and KP346067-KP346911 for control region sequences.