Abstract
Background
The underrepresentation of Hmong-Mien (HM) people in Asian genomic studies has hindered our comprehensive understanding of the full landscape of their evolutionary history and complex trait architecture. South China is a multi-ethnic region and indigenously settled by ethnolinguistically diverse HM, Austroasiatic (AA), Tai-Kadai (TK), Austronesian (AN), and Sino-Tibetan (ST) people, which is regarded as East Asia’s initial cradle of biodiversity. However, previous fragmented genetic studies have only presented a fraction of the landscape of genetic diversity in this region, especially the lack of haplotype-based genomic resources. The deep characterization of demographic history and natural-selection-relevant genetic architecture of HM people was necessary.
Results
We reported one HM-specific genomic resource and comprehensively explored the fine-scale genetic structure and adaptative features inferred from the genome-wide SNP data of 440 HM individuals from 33 ethnolinguistic populations, including previously unreported She. We identified solid genetic differentiation between HM people and Han Chinese at 7.64‒15.86 years ago (kya) and split events between southern Chinese inland (Miao/Yao) and coastal (She) HM people in the middle Bronze Age period and the latter obtained more gene flow from Ancient Northern East Asians. Multiple admixture models further confirmed that extensive gene flow from surrounding ST, TK, and AN people entangled in forming the gene pool of Chinese coastal HM people. Genetic findings of isolated shared unique ancestral components based on the sharing alleles and haplotypes deconstructed that HM people from the Yungui Plateau carried the breadth of previously unknown genomic diversity. We identified a direct and recent genetic connection between Chinese inland and Southeast Asian HM people as they shared the most extended identity-by-descent fragments, supporting the long-distance migration hypothesis. Uniparental phylogenetic topology and network-based phylogenetic relationship reconstruction found ancient uniparental founding lineages in southwestern HM people. Finally, the population-specific biological adaptation study identified the shared and differentiated natural selection signatures among inland and coastal HM people associated with physical features and immune functions. The allele frequency spectrum of cancer susceptibility alleles and pharmacogenomic genes showed significant differences between HM and northern Chinese people.
Conclusions
Our extensive genetic evidence combined with the historical documents supported the view that ancient HM people originated from the Yungui regions associated with ancient “Three-Miao tribes” descended from the ancient Daxi-Qujialing-Shijiahe people. Then, some have recently migrated rapidly to Southeast Asia, and some have migrated eastward and mixed respectively with Southeast Asian indigenes, Liangzhu-related coastal ancient populations, and incoming southward ST people. Generally, complex population migration, admixture, and adaptation history contributed to the complicated patterns of population structure of geographically diverse HM people.
Similar content being viewed by others
Background
China is rich in human biodiversity, and over six language families exist here, including Altaic (Mongolic, Tungusic, and Turkic), Sino-Tibetan (Sinitic and Tibeto-Burman (TB)), Hmong-Mien (HM), Tai-Kadai (TK), Austronesian (AN), and Austroasiatic (AA). The genetic patterns of modern Chinese populations revealed the population stratification among ethnolinguistically different people, which was strongly correlated with geography, culture, and language families [1,2,3]. Recent genetic cohorts from the China Metabolic Analytics Project (ChinaMAP) [4] and NyuWa genome resources [5] have provided crucial genetic variation data from geographically different Chinese populations and offered new insights focused on population structure and the medical relevance of Chinese people. We also noticed that all these genetic studies in China mainly included Han Chinese as their major studied subjects, which would introduce the Han bias in Chinese population genetic studies and influence the health inequality of genomic benefit in the Genome-drived precision medicine era [4,5,6]. China had two independent agriculture innovation centers in the Yellow River Basin (YRB, millet agriculture) and the Yangtze River Basin (YZRB, rice agriculture). The abundant civilization history of social organization development and technological innovation in the middle Holocene epoch facilitated the formation of the ancient Yangshao tribe and the Dawenkou tribe in North China, the Sanmiao tribe, and the Liangzhu society in South China. Recent ancient DNA has identified the genetic differentiation between Ancient Northern East Asian (ANEA) and Ancient Southern East Asian (ASEA) since the early Neolithic period, and then they experienced extensive population admixture events along different geographical corridors [7, 8]. The patterns of evolutionary history observed in East Asia differed from those in Europe and Oceania, which had undergone large-scale population admixture and replacement processes [9]. Ancient human gene flow events outside East Asia have limited influence on the genetic backgrounds of East Asians [10]. However, ancient DNA from spatiotemporally diverse East Asians has identified regional-restricted ancient founding lineages and contributed to the reconstruction of subsequent extensive population migration and admixture events [7, 15,16]. However, these efforts have only provided primary foundational knowledge to dissect the mystery of genetically different Chinese populations’ evolutionary and adaptive history. The fine-scale genetic structure of ethnolinguistically different Chinese populations and the patterns of genetic relationship and admixture between some Chinese populations should be further explored, especially for some ethnolinguistically underrepresented groups in South China.
HM-speaking populations include those who speak Hmongic (Miao, She, and Hmong) and Mienic (Yao and Dao) languages in mountainous areas of South China, Vietnam, and Thailand [17]. The original homeland of HM people was suggested to be in Central China, associated with Neolithic Shijiahe, Qujialing, and Daxi cultures in the middle YZRB. Historical documents showed that the expansion of pro-Han Chinese or other ANEA through Central China promoted the southward of ancient HM people [18]. The complex migration and admixture history of HM people and their interaction with other southern Chinese populations (ST, TK, AN, and AA) must be further explored. Recent findings based on uniparental markers and genome-wide evidence have identified different evolutionary processes between inland TK and HM people and between coastal AN and TK people [19,20,15, 30], whole-genome sequences of mitochondrial DNA and Y-chromosome could provide additional evolutionary traces based on the shared or novel haplotype groups, also referred to as the haplogroups. Poznik et al. analyzed 1244 worldwide Y-chromosome sequences from the 1000 Genomes Project (1KGP) to characterize the landscape of Y-chromosome diversity of 26 worldwide populations. Karmin et al. investigated 456 geographically diverse high-coverage Y-chromosome sequences to construct the revised phylogenetic topology with the divergence time estimation of key mutation events [32, 33]. They have reported punctuated bursts and population bottlenecks associated with the cultural exchanges among worldwide populations [32, 33]. Maternal lineages among different populations could also trace the process of population evolutionary past. Recent mitochondrial studies from modern and ancient Tibetan genomes have illuminated the Neolithic expansion processes of the YRB farmers and the Paleolithic peopling of the Tibetan Plateau [34,35,36]. Li et al. also reported that the maternal structure of Han Chinese was stratified along three main Chinese river boundaries [37]. It is obvious that fine-scale and large-scale uniparental genetic studies should be conducted to explore the evolutionary history of the understudied Chinese populations [38, 51]. Other risk alleles possessed a low frequency in East Asians but a high frequency in other populations or vice versa (Fig. 5a). The similar patterns of AFS in some types of cancers also showed comparable genetic basis or pleiotropy at cancer-risk loci. In addition, interpopulation differences in drug responses were generally recognized, and drugs such as clopidogrel, warfarin, carbamazepine, and peginterferon have been confirmed to show the greatest population differences in predicted adverse drug reactions [5, 27]. Thus, we assessed the AFS of 25 known pharmacogenomic variants from the ADME (absorption, distribution, metabolism, and excretion) core genes and found that some variants showed significant allele frequency differences between HM speakers, East Asians and other intercontinental populations, such as SLC15A2 that is associated with the absorption of β-lactam antibiotics and peptide-like drugs, suggesting the necessity for genomic testing for drug response phenotypes (Fig. 5b).
Medical relevance and natural selection signatures among HM-speaking populations. a Allele frequency spectrum (AFS) of 106 previously reported cancer susceptibility variants among HM people and worldwide reference populations from the NyuWa, GnomAD, and 1KGP. b The AFS of 25 previously reported pharmacogenomic loci in our dataset and reference groups. GDX_Guizhou included Dongjia, Gejia, and **, and quality control
We collected 349 HM individuals from 25 ethnically or geographically diverse populations (Miao, Yao, and She) from Sichuan, Chongqing, Guizhou, and Fujian provinces in South China (Fig. 1a; Additional file 1: Table S1), where 38 She people (SYS: 14; PSS: 7; GSS: 17) from Fujian in coastal South China were first reported here. We also sampled four AN-speaking Gaoshan people in Fujian to explore the genetic interaction between coastal HM and AN populations. We genotyped 661,134 autosomal, 28,320 X-chromosomal, 24,047 Y-chromosomal, and 3746 mitochondrial SNPs in all HM people and Gaoshan people using the Infinium Global Screening Array (Illumina, CA, USA). We used PLINK v.1.90 [67] and King [68] to explore the close relatives within three generations. We estimated the PI_HAT values using PLINK with the “--genome” parameter. The kinship coefficients of individual pairs with PI_HAT values larger than 0.15 were further estimated using King with the “--related --ibs” parameter. We used PLINK v.1.90 [67] to filter out the variants with missing call rates exceeding 0.05 (--geno: 0.05) and remove samples with missing call rates exceeding 0.1 (--mind: 0.1). Additionally, variants with minor allele frequencies less than 0.05 (--maf 0.05) and not in Hardy–Weinberg equilibrium (--hwe 1e-6) were filtered out. The final HM-related Illumina dataset included 533,935 SNPs.
Ethics approvement
All included individuals signed the written informed consent forms and were unrelated indigenous people in the sampling places. We also provided the necessary genetic counseling and healthy genetic reports for the sample donors if they were interested. The study protocol was approved via the medical Ethics committees at North Sichuan Medical College and West China Hospital of Sichuan University.
Dataset arrangement and reference populations
To present a fully resolved picture of the genetic diversity of HM people, we also collected 20 HM people (10 Miaos from Hunan and 10 Shes from Fujian) from the HGDP [69] and 71 HM individuals (12 Daos, 8 IuMiens, 12 PaThens, and 39 Hmongs) from previously investigated populations from Vietnam [41] and Thailand [42] that were genotyped using the Affymetrix Human Origins array (personal communication). These HM people were merged with the above HM-related Illumina dataset to generate an HM-specific dataset, which consisted of 56,814 SNPs and included 440 HM people from 33 populations belonging to seven ethnic groups. To explore the genetic structure of HM-speaking populations in the context of modern eastern Eurasian reference populations, we first merged our HM-related Illumina dataset with published genome-wide SNP data that was genotyped using the same Illumina array to generate the high-density Illumina dataset [2, 15, 16, 23, 77,78,79]. The Illumina dataset contained 533,935 SNPs and also included two AA-speaking Blang and Wa; nine Mongolic-speaking Baoan, Dongxiang, Mongolian, and Yugur; Sinitic-speaking Han and Hui populations from Guizhou, Sichuan, Fujian, Gansu, and Hainan provinces; six TB-speaking Pumi, Bai, Hani, Lahu, Tibetan, and Tujia; one Tungusic-speaking Manchu; and two Turkic-speaking Kazakh and Salar (Fig. 1d). The high-density dataset was mainly used to perform the haplotype-based analyses and phylogenetic reconstruction of uniparental lineages. We then merged the high-density Illumina dataset with modern and ancient populations genotyped via the Affymetrix Human Origins array from the AADR [80] to form the merged low-density HO dataset, including 56,814 SNPs, which was used to explore the general patterns of population structure as this dataset included more modern and ancient reference populations. We then imputed the low-density genome-wide SNP data of modern populations in the merged HO dataset using the WBBC (Westlake BioBank for Chinese) and 1KGP haplotype reference panels [31, 81], which generated the imputed merged HO dataset covering 458,786 SNPs. The HO modern reference populations included 343 TK people from 26 populations in China and Southeast Asia, 27 Han Chinese people from 6 populations, 276 TB people from 30 Chinese and Southeast Asian populations, 224 AA people from 18 populations, 115 AN people from 13 populations, 30 Japanese and 6 Korean, 140 Mongolic people from 18 populations, and 62 Tungusic people from 62 populations (Additional file 1: Table S1) [13, 40,41,42, 71, 82,83,84]. To analyze the comprehensive admixture and interaction landscape between HM people and other ancestral source groups, we merged the high-density Illumina dataset with ancient eastern Eurasians included in the 1240K dataset to form the merged middle-density 1240K dataset, including 146,802 SNPs. Ancient eastern Eurasians were included in both the merged HO and 1240K datasets, which included 47 ancient YRB farmers from 19 populations in Shandong, Henan, Shaanxi and Qinghai [7, 13, 85]; 30 ancient people from 13 populations in Amur River Basin or West Liao River Basin [51] to estimate Fst genetic distances among each population pair. Pairwise genetic distances were designed with two parameters (--within and --keep-cluster-names).
Inference of population admixture events
To construct the phylogenetic relationship among these ethnolinguistically diverse populations, we performed phylogenetic reconstruction using TreeMix v.1.13 [95]. PLINK v.1.90 [67] was used to evaluate the allele frequency of each population, which was used as the input file in the TreeMix-based analysis. We adopted the French population from the 1KGP as the outgroup (-root French) and ran TreeMix with the migration edges ranging from 0 to 7 and five replications for each run to explore the possible gene flow events. We used the plotting_funcs.R script to visualize each model’s phylogenetic topology and corresponding residual matrix. We used the -k flag (-k 500) to group SNPs to account for the LD. Additional parameters (-bootstrap and -global) were also used to get the best-fitted model. We also ran MEGA (Molecular Evolutionary Genetics Analysis) [96] based on the Fst genetic matrix to validate the obtained phylogenetic topology, and we obtained the consistent pattern of the major clades.
Runs of homozygosity
We estimated the indicator of genomic autozygosity using PLINK v.1.90 [67] based on the high-density Illumina dataset. We set the ROH containing at least 50 SNPs and a total length ≥ 500 kilobases using two parameters (--homozyg-snp 50 and --homozyg-kb 500). Two consecutive SNPs more than 100 kb apart (--homozyg-gap 100) were regarded as independent ROH. The default settings of at least one SNP per 50 kb on average (--homozyg-density 50), the scanning window contains 50 SNPs (--homozyg-window-snp 50), a scanning window hit should contain at most one heterozygous call (--homozyg-window-het 1) and the hit rate of all scanning windows containing the SNP must be at least 0.05 (--homozyg-window-threshold 0.05) were used. We further visualized the ROH distribution of each studied population statistically using R v.3.5.2 via the box plots.
Shared genetic drift and admixture signal estimation based on shared alleles
To measure the genetic affinity directly within HM people and among HM and other geographically close modern populations, we performed outgroup f3-statistics using the qp3pop program in ADMIXTOOLS [44]. As the merged HO dataset included the most comprehensive modern and ancient reference populations, we used f3(HM people, modern Eurasian; Yoruba) to explore the shared genetic affinity between HM people and modern reference populations and used f3(HM people, ancient Eurasian; Yoruba) to measure their genetic relationship with ancient reference populations. We also conducted the three population tests based on the merged Illumina and 1240K datasets. Similarly, we conducted admixture f3-statistics in the form f3(ancestral source1, ancestral source2; HM people) to identify the possible ancestral sources that can produce statistically significant values based on the three datasets. Here, negative f3 values with a Z-score lower than − 3 indicated that two ancestral sources might be the ancestral source proximities of the targeted populations and also confirmed that the studied population was an admixed population.
Genome-wide admixture models based on the f 4-statistic tests
We conducted four population tests for targeted HM people based on individual and merged populations. We used qpDstat in ADMIXTOOLS [44] to conduct the f4(HM1, HM2; reference populations, Mbuti), f4(reference population1, reference population2; studied populations, Mbuti), and f4(reference population1, studied populations; reference population2, Mbuti). The first form was used to explore the genetic homogeneity and heterogeneity between two included HM populations. The latter two formats were used to test the differentiated genetic ancestry between our targeted and reference populations. We also used qpWave to confirm the genetic homozygosity between two HM-speaking populations and used qpAdm [44] to estimate the admixture proportion with the following outgroups: Mbuti, Russia_Ust_Ishim, Russia_Kostenki14, Papuan, Australia, Mixe, Russia_MA1_HG, Onge, Atayal, and China_Tianyuan. We next used qpGraph to test the optimal frequency-based admixture models with gene flow events among various alternative models [44].
Admixture time estimation based on the decay of LD
Population admixture can introduce the exponential decay of LD. We used MALDER to test the admixture LD decays and estimate the possible admixture times of HM people [97]. We used multiple modern northern and southern East Asian populations as potential ancestral sources and tested all possible source combinations. The exponential curve fitting processes added the minimum distance between two SNP bins (mindis: 0.005 in Morgan) and leave-one-chromosome-out jackknifing (jackknife: YES).
Haplotype-based fine-scale population structure reconstruction
Segmented haplotype estimation
A stricter filtering strategy of missingness per SNP and missingness per individual was performed using PLINK v.1.90 [67] with two parameters (--geno 0.01 and --mind 0.01). We used the Segmented HAPlotype Estimation & Imputation Tool (SHAPEIT v2.r904) [98] to estimate haplotypes based on the high-density Illumina dataset and modern populations included in the merged HO dataset. Phased haplotypes were estimated with the following parameters to find a good starting point for the estimated haplotypes and get more parsimonious graphs: the number of burn-in iterations of 10 (--burn 10), the number of iterations of the pruning stage of 10 (--prune 10), and the number of main iterations of 30 (--main 30). We used the default settings of model parameters and HapMap phase II b37 as the genetic map in the haplotype estimates. The obtained haplotype data was used to explore the fine-scale population structure via fineSTRUCTURE, identify ancestral proximity and estimate their admixture proportion and time, and screen the natural selection signatures for local adaptation.
Admixture events inferred from ChromoPainter and fastGLOBETROTTER
To identify ancestral sources, date, and describe admixture events of our targeted HM people, we used ChromoPainterv2 [71] to paint the ancestral haplotype composition of our sampled HM populations. We merged our data with 929 lift-over high-coverage whole-genomes from 54 worldwide ethnolinguistically diverse populations and obtained haplotype data using SHAPEIT v2.r904 [98]. Han people from ** and quality control
We extracted 24,047 Y-chromosomal SNPs and 3746 mitochondrial SNPs from the merged Illumina dataset to explore the paternal and maternal population history based on the sharing haplogroups and coalescence processes. We used PLINK v.1.90 to conduct quality control based on the missing SNP and genoty** rates with two parameters (--geno: 0.1 and --mind: 0.1) [107]. In the final quality-control dataset, we retained 11,369 Y-SNPs in 203 individuals and 1428 mtDNA SNPs for uniparental evolutionary history reconstruction.
Haplogroup classification, haplogroup frequency spectrum estimation, and clustering analysis
For Y-chromosome haplogroup classification, we used the Python package of hGrpr2.py instrumented in HaploGrouper [108] and the Y-LineageTracker [109] to classify the haplogroups. Two additional reference files were used in the HaploGrouper-based analysis, including the treeFileNEW_isogg2019.txt and snpFile_b38_isogg2019.txt. The Chip version was used in the LineageTracker-based analysis. We also used this software to estimate the haplogroup frequency in different levels of the focused terminal lineages. HaploGrouper and HaploGrep were used to classify the maternal haplogroups.
Phylogeny analysis and network analysis
We used the BEAST2.0 [110] and Y-LineageTracker [109] to reconstruct the phylogenetic topology focused on the population divergence, expansion, and migration events. BEAUti, Tracer v1.7.2, and FigTree v1.4.4 were used to prepare the intermediate files for BEAST-based analysis and visualize the resulting phylogeny. The BEAST2.0 was also used to reconstruct the maternal phylogeny. Finally, we used the median-joining Network instructed in the popART [111] to rebuild the network relationship among different haplogroups and populations based on the obtained maternal and paternal genetic variations.
Availability of data and materials
All data generated or analyzed during this study are included in this published article, its supplementary information files, and publicly available repositories. The genome-wide variation data were collected from the public dataset of Allen Ancient DNA Resource (AADR) (https://reich.hms.harvard.edu/allen-ancient-dna-resource-aadr-downloadable-genotypes-present-day-and-ancient-dna-data). The second-analysis results were submitted in the supplementary materials and also deposited into the OMIX database (https://ngdc.cncb.ac.cn/omix/preview/lJsduf4A) with an accession number of OMIX005474. The raw allele frequency data are available in the Zenodo (https://zenodo.org/records/10453202). The newly reported genoty** data from this study have been deposited into the Zenodo (https://zenodo.org/records/10453218) and Genome Variation Map (https://ngdc.cncb.ac.cn/gvm/home) with an accession number of GVM000631. The acquisition and use of the data shall comply with the regulations of the People’s Republic of China on the administration of human genetic resources. Requests for access to raw data can be directed to Guanglin He (Guanglinhescu@163.com) and Mengge Wang (Menggewang2021@163.com).
Abbreviations
- TB:
-
Tibeto-Burman
- HM:
-
Hmong-Mien
- TK:
-
Tai-Kadai
- AN:
-
Austronesian
- AA:
-
Austroasiatic
- ChinaMAP:
-
China Metabolic Analytics Project
- YRB:
-
Yellow River Basin
- YZRB:
-
Yangtze River Basin
- ANEA:
-
Ancient Northern East Asian
- ASEA:
-
Ancient Southern East Asian
- AFS:
-
Allele frequency spectrum
- WGS:
-
Whole-genome sequencing
- HGDP:
-
Human Genome Diversity Project
- 1KGP:
-
1000 Genomes Project
- PSS:
-
**shui She
- GSS:
-
Guanshe She
- SYS:
-
Shanyang She
- HGDPS:
-
She from the Human Genome Diversity Project
- HGDPM:
-
Miao from the Human Genome Diversity Project
- HO:
-
Human Origins
- AADR:
-
Allen Ancient DNA Resource
- PCA:
-
Principal component analysis
- CQM:
-
Chongqing Miao
- FJS:
-
Fujian She
- STM:
-
Songtao Miao
- GZM:
-
Guizhou Miao
- GZY:
-
Guizhou Yao
- HFS:
-
Haplogroup frequency spectrum
- rCCR:
-
Relative cross coalescence rate
- IBD:
-
Identity-by-descent
- ROH:
-
Runs of Homozygosity
- ADME:
-
Absorption, distribution, metabolism, and excretion
- PBS:
-
Population branch statistics
- iHS:
-
Integrated haplotype score
- XPEHH:
-
Cross-population extended haplotype homozygosity
- SXH:
-
Han_Shaanxi
- 10K_CPGDP:
-
10K Chinese People Genomic Diversity Project
- CCHCR1:
-
Coiled-coil alpha-helical rod protein 1
- HDVs:
-
Highly differentiated variants
- CR1:
-
Complement receptor 1
- TRIM31:
-
Tripartite motif containing 31
- GPCR:
-
G protein-coupled receptor
- GO:
-
Gene Ontology
- KEGG:
-
Kyoto Encyclopedia of Genes and Genomes
- WBBC:
-
Westlake BioBank for Chinese
- SHAPEIT:
-
Segmented HAPlotype Estimation & Imputation Tool
References
Mengge Wang, He G, Zou X, Chen P, Wang Z, Tang R et al. Reconstructing the genetic admixture history of Tai-Kadai and Sinitic people: insights from genome-wide SNP data from South China. J Syst Evol. 2022:0.
He G, Wang M, Zou X, Yeh HY, Liu C, Liu C et al. Extensive ethnolinguistic diversity at the crossroads of North China and South Siberia reflects multiple sources of genetic diversity. J Syst Evol. 2022, n/a(n/a).
He G, Li YX, Wang MG, Zou X, Yeh HY, Yang XM, et al. Fine-scale genetic structure of Tujia and central Han Chinese revealing massive genetic admixture under language borrowing. J Syst Evol. 2021;59(1):1–20.
Cao Y, Li L, Xu M, Feng Z, Sun X, Lu J, et al. The ChinaMAP analytics of deep whole genome sequences in 10,588 individuals. Cell Res. 2020;30(9):717–31.
Zhang P, Luo H, Li Y, Wang Y, Wang J, Zheng Y, et al. NyuWa Genome resource: a deep whole-genome sequencing-based variation profile and reference panel for the Chinese population. Cell Rep. 2021;37(7):110017.
He GG, Yao H, Sun Q, Duan S, Tang R, Chen J et al. Whole-genome sequencing of ethnolinguistic diverse northwestern Chinese Hexi Corridor people from the 10K_CPGDP project suggested the differentiated East-West genetic admixture along the Silk Road and their biological adaptations. bioRxiv. 2023:2023.2002. 2026.530053.
Yang MA, Fan X, Sun B, Chen C, Lang J, Ko YC, et al. Ancient DNA indicates human population shifts and admixture in northern and southern China. Science. 2020;369(6501):282–8.
Sun Y, Wang M, Sun Q, Liu Y, Duan S, Wang Z et al. Distinguished biological adaptation architecture aggravated population differentiation of Tibeto-Burman-speaking people inferred from 500 whole-genome data from 39 populations. J Genet Genomics. 2023.
Damgaard PB, Marchi N, Rasmussen S, Peyrot M, Renaud G, Korneliussen T, et al. 137 ancient human genomes from across the Eurasian steppes. Nature. 2018;557(7705):369–74.
Robbeets M, Bouckaert R, Conte M, Savelyev A, Li T, An DI, et al. Triangulation supports agricultural spread of the Transeurasian languages. Nature. 2021;599(7886):616–21.
Mao X, Zhang H, Qiao S, Liu Y, Chang F, **e P, et al. The deep population history of northern East Asia from the Late Pleistocene to the Holocene. Cell. 2021;184(12):3256–3266 e3213.
Wang T, Wang W, **e G, Li Z, Fan X, Yang Q, et al. Human population history at the crossroads of East and Southeast Asia since 11,000 years ago. Cell. 2021;184(14):3829–3841 e3821.
Wang CC, Yeh HY, Popov AN, Zhang HQ, Matsumura H, Sirak K, et al. Genomic insights into the formation of human populations in East Asia. Nature. 2021;591(7850):413–9.
Wang M, Du W, Tang R, Liu Y, Zou X, Yuan D, et al. Genomic history and forensic characteristics of Sherpa highlanders on the Tibetan Plateau inferred from high-resolution InDel panel and genome-wide SNPs. Forensic Sci Int Genet. 2022;56: 102633.
He G, Li Y, Zou X, Yeh HY, Tang R, Wang P et al. The northern gene flow into southeastern East Asians inferred from genome‐wide array genoty**. J Syst Evol. 2022, n/a(n/a).
He GL, Wang MG, Li YX, Zou X, Yeh HY, Tang RK, et al. Fine-scale north-to-south genetic admixture profile in Shaanxi Han Chinese revealed by genome-wide demographic history reconstruction. J Syst Evol. 2021;60(4):955–72.
Handel Z. Review of Ratliff (2010): Hmong-Mien Language History. Diachronica. 2012;29(3):385–98.
Diamond J, Bellwood P. Farmers and their languages: the first expansions. Science. 2003;300(5619):597–603.
He G, Wang J, Yang L, Duan S, Sun Q, Li Y, et al. Genome-wide allele and haplotype-sharing patterns suggested one unique Hmong-Mein-related lineage and biological adaptation history in Southwest China. Hum Genomics. 2023;17(1):3.
Wang J, Wu J, Sun Q, Wu Q, Li Y, Duan S, et al. Extensive genetic admixture between Tai-Kadai-speaking people and their neighbours in the northeastern region of the Yungui Plateau inferred from genome-wide variations. BMC Genomics. 2023;24(1):317.
**a Z-Y, Yan S, Wang C-C, Zheng H-X, Zhang F, Liu Y-C et al. Inland-coastal bifurcation of southern East Asians revealed by Hmong-Mien genomic history. bioRxiv. 2019:730903.
Wen B, Li H, Gao S, Mao X, Gao Y, Li F, et al. Genetic structure of hmong-mien speaking populations in East Asia as revealed by mtDNA Lineages. Mol Biol Evol. 2004;22(3):725–34.
Wang Y, Zou X, Wang M, Yuan D, Yang L, Zeng Y, et al. The genomic history of southwestern Chinese populations demonstrated massive population migration and admixture among proto-Hmong-Mien speakers and incoming migrants. Mol Genet Genomics. 2022;297(1):241–62.
Liu Y, **e J, Wang M, Liu C, Zhu J, Zou X, et al. Genomic insights into the population history and biological adaptation of southwestern chinese hmong-mien people. Front Genet. 2021;12: 815160.
Jeong C, Wang K, Wilkin S, Taylor WTT, Miller BK, Bemmann JH, et al. A dynamic 6,000-year genetic history of Eurasia’s Eastern Steppe. Cell. 2020;183(4):890–904 e829.
Pagani L, Lawson DJ, Jagoda E, Morseburg A, Eriksson A, Mitt M, et al. Genomic analyses inform on migration events during the peopling of Eurasia. Nature. 2016;538(7624):238–42.
GenomeAsia KC. The GenomeAsia 100K Project enables genetic discoveries across Asia. Nature. 2019;576(7785):106–11.
Mallick S, Li H, Lipson M, Mathieson I, Gymrek M, Racimo F, et al. The simons genome diversity project: 300 genomes from 142 diverse populations. Nature. 2016;538(7624):201–6.
Sirugo G, Williams SM, Tishkoff SA. The missing diversity in human genetic studies. Cell. 2019;177(1):26–31.
Almarri MA, Bergstrom A, Prado-Martinez J, Yang F, Fu B, Dunham AS, et al. Population structure, stratification, and introgression of human structural variation. Cell. 2020;182(1):189–199 e115.
Byrska-Bishop M, Evani US, Zhao X, Basile AO, Abel HJ, Regier AA, et al. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell. 2022;185(18):3426–3440 e3419.
Poznik GD, Xue Y, Mendez FL, Willems TF, Massaia A, Wilson Sayres MA, et al. Punctuated bursts in human male demography inferred from 1,244 worldwide Y-chromosome sequences. Nat Genet. 2016;48(6):593–9.
Karmin M, Saag L, Vicente M, Wilson Sayres MA, Jarve M, Talas UG, et al. A recent bottleneck of Y chromosome diversity coincides with a global change in culture. Genome Res. 2015;25(4):459–66.
Ding M, Wang T, Ko AM, Chen H, Wang H, Dong G, et al. Ancient mitogenomes show plateau populations from last 5200 years partially contributed to present-day Tibetans. Proc Biol Sci. 1923;2020(287):20192968.
Li YC, Tian JY, Liu FW, Yang BY, Gu KS, Rahman ZU, et al. Neolithic millet farmers contributed to the permanent settlement of the Tibetan Plateau by adopting barley agriculture. Natl Sci Rev. 2019;6(5):1005–13.
Qi X, Cui C, Peng Y, Zhang X, Yang Z, Zhong H, et al. Genetic evidence of paleolithic colonization and neolithic expansion of modern humans on the tibetan plateau. Mol Biol Evol. 2013;30(8):1761–78.
Li YC, Ye WJ, Jiang CG, Zeng Z, Tian JY, Yang LQ, et al. River valleys shaped the maternal genetic landscape of Han Chinese. Mol Biol Evol. 2019;36(8):1643–52.
Wang M, Wang Z, He G, Wang S, Zou X, Liu J, et al. Whole mitochondrial genome analysis of highland Tibetan ethnicity using massively parallel sequencing. Forensic Sci Int Genet. 2020;44:102197.
Mengge W, Guanglin H, Yongdong S, Shouyu W, **ng Z, **g L, et al. Massively parallel sequencing of mitogenome sequences reveals the forensic features and maternal diversity of tai-kadai-speaking hlai islanders. Forensic Sci Int Genet. 2020;47:102303.
Bergstrom A, McCarthy SA, Hui R, Almarri MA, Ayub Q, Danecek P et al. Insights into human genetic variation and population history from 929 diverse genomes. Science. 2020, 367(6484).
Liu D, Duong NT, Ton ND, Van Phong N, Pakendorf B, Van Hai N, et al. Extensive ethnolinguistic diversity in Vietnam reflects multiple sources of genetic diversity. Mol Biol Evol. 2020;37(9):2503–19.
Kutanan W, Liu D, Kampuansai J, Srikummool M, Srithawong S, Shoocongdej R, et al. Reconstructing the human genetic history of Mainland Southeast Asia: insights from genome-wide data from Thailand and Laos. Mol Biol Evol. 2021;38(8):3459–77.
Choin J, Mendoza-Revilla J, Arauna LR, Cuadros-Espinoza S, Cassar O, Larena M, et al. Genomic insights into population history and biological adaptation in Oceania. Nature. 2021;592(7855):583–9.
Patterson N, Moorjani P, Luo Y, Mallick S, Rohland N, Zhan Y, et al. Ancient admixture in human history. Genetics. 2012;192(3):1065–93.
Mallick S, Micco A, Mah M, Ringbauer H, Lazaridis I, Olalde I et al. The Allen Ancient DNA Resource (AADR): A curated compendium of ancient human genomes. bioRxiv. 2023:2023.2004.2006.535797.
Wang M, He G, Zou X, Liu J, Ye Z, Ming T, et al. Genetic insights into the paternal admixture history of Chinese Mongolians via high-resolution customized Y-SNP SNaPshot panels. Forensic Sci Int Genet. 2021;54:102565.
Yang M, He G, Ren Z, Wang Q, Liu Y, Zhang H et al. Genomic Insights Into the Unique Demographic History and Genetic Structure of Five Hmong-Mien-Speaking Miao and Yao Populations in Southwest China. Front Ecol Evol. 2022, 10.
Patin E, Lopez M, Grollemund R, Verdu P, Harmant C, Quach H, et al. Dispersals and genetic adaptation of Bantu-speaking populations in Africa and North America. Science. 2017;356(6337):543–6.
Sud A, Kinnersley B, Houlston RS. Genome-wide association studies of cancer: current insights and future perspectives. Nat Rev Cancer. 2017;17(11):692–704.
Benton ML, Abraham A, LaBella AL, Abbot P, Rokas A, Capra JA. The influence of evolutionary history on human health and disease. Nat Rev Genet. 2021;22(5):269–83.
Michailidou K, Hall P, Gonzalez-Neira A, Ghoussaini M, Dennis J, Milne RL, et al. Large-scale genoty** identifies 41 new loci associated with breast cancer risk. Nat Genet. 2013;45(4):353–61, 361e351-352.
Kamberov YG, Wang S, Tan J, Gerbault P, Wark A, Tan L, et al. Modeling recent human evolution in mice by expression of a selected EDAR variant. Cell. 2013;152(4):691–702.
Lamason RL, Mohideen MA, Mest JR, Wong AC, Norton HL, Aros MC, et al. SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and humans. Science. 2005;310(5755):1782–6.
Ma X, Yang W, Gao Y, Pan Y, Lu Y, Chen H, et al. Genetic origins and sex-biased admixture of the huis. Mol Biol Evol. 2021;38(9):3804–19.
Chen H, Lin R, Lu Y, Zhang R, Gao Y, He Y, et al. Tracing Bai-Yue Ancestry in Aboriginal Li People on Hainan Island. Mol Biol Evol. 2022;39(10):msac210.
Ma L, Wang R, Feng S, Yang X, Li J, Zhang Z et al. Genomic insight into the population history and biological adaptations of high-altitude Tibetan highlanders in Nagqu. Front Ecol Evol. 2022, 10.
Song M, Wang Z, Zhang Y, Zhao C, Lang M, **e M, et al. Forensic characteristics and phylogenetic analysis of both Y-STR and Y-SNP in the Li and Han ethnic groups from Hainan Island of China. Forensic Sci Int Genet. 2019;39:e14–20.
Wang M, Huang Y, Liu K, Yuan H, Duan S, Wang Z et al. Ancient farmer and steppe pastoralist-related founding lineages contributed to the complex landscape of episodes in the diversification of Chinese paternal lineages. bioRxiv. 2023:2023.2008. 2028.555114.
Li H, Huang Y, Mustavich LF, Zhang F, Tan JZ, Wang LE, et al. Y chromosomes of prehistoric people along the Yangtze River. Hum Genet. 2007;122(3–4):383–8.
Kutanan W, Shoocongdej R, Srikummool M, Hubner A, Suttipai T, Srithawong S, et al. Cultural variation impacts paternal and maternal genetic lineages of the Hmong-Mien and Sino-Tibetan groups from Thailand. Eur J Hum Genet. 2020;28(11):1563–79.
He G, Wang M, Miao L, Chen J, Zhao J, Sun Q, et al. Multiple founding paternal lineages inferred from the newly-developed 639-plex Y-SNP panel suggested the complex admixture and migration history of Chinese people. Hum Genomics. 2023;17(1):29.
Kutanan W, Kampuansai J, Srikummool M, Kangwanpong D, Ghirotto S, Brunelli A, et al. Complete mitochondrial genomes of Thai and Lao populations indicate an ancient origin of Austroasiatic groups and demic diffusion in the spread of Tai-Kadai languages. Hum Genet. 2017;136(1):85–98.
McColl H, Racimo F, Vinner L, Demeter F, Gakuhari T, Moreno-Mayar JV, et al. The prehistoric peopling of Southeast Asia. Science. 2018;361(6397):88–92.
Lipson M, Cheronet O, Mallick S, Rohland N, Oxenham M, Pietrusewsky M, et al. Ancient genomes document multiple waves of migration in Southeast Asian prehistory. Science. 2018;361(6397):92–5.
Yunusbayev B, Metspalu M, Metspalu E, Valeev A, Litvinov S, Valiev R, et al. The genetic legacy of the expansion of Turkic-speaking nomads across Eurasia. PLoS Genet. 2015;11(4):e1005068.
Pan Y, Zhang C, Lu Y, Ning Z, Lu D, Gao Y, et al. Genomic diversity and post-admixture adaptation in the Uyghurs. Natl Sci Rev. 2022;9(3):nwab124.
Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4:7.
Manichaikul A, Mychaleckyj JC, Rich SS, Daly K, Sale M, Chen W-M. Robust relationship inference in genome-wide association studies. Bioinformatics. 2010;26(22):2867–73.
Bergstrom A, McCarthy SA, Hui R, Almarri MA, Ayub Q, Danecek P et al. Insights into human genetic variation and population history from 929 diverse genomes. Human Genome Diversity Project: The International Genome Sample Resource https://www.internationalgenome.org/ (2020).
He G, Fan ZQ, Zou X, Deng X, Yeh HY, Wang Z et al. Demographic model and biological adaptation inferred from the genome‐wide single nucleotide polymorphism data reveal tripartite origins of southernmost Chinese Huis. Am J Biol Anthropol. 2022, n/a(n/a).
Chen J, He G, Ren Z, Wang Q, Liu Y, Zhang H et al. Fine-Scale Population Admixture Landscape of Tai–Kadai-Speaking Maonan in Southwest China Inferred From Genome-Wide SNP Data. Front Genet. 2022, 13.
Zhang X, He G, Li W, Wang Y, Li X, Chen Y, et al. Genomic insight into the population admixture history of tungusic-speaking manchu people in Northeast China. Front Genet. 2021;12(1761):754492.
Yao H, Wang M, Zou X, Li Y, Yang X, Li A, et al. New insights into the fine-scale history of western-eastern admixture of the northwestern Chinese population in the Hexi Corridor via genome-wide genetic legacy. Mol Genet Genomics. 2021;296(3):631–51.
Wang M, Yuan D, Zou X, Wang Z, Yeh HY, Liu J, et al. Fine-Scale Genetic Structure and Natural Selection Signatures of Southwestern Hans Inferred From Patterns of Genome-Wide Allele, Haplotype, and Haplogroup Lineages. Front Genet. 2021;12:727821.
He G, Zhang Y, Wei L-H, Wang M, Yang X, Guo J et al. The genomic formation of Tanka people, an isolated “Gypsies in water” in the coastal region of Southeast China. Am J Biol Anthropol. 2021.
Chen J, He G, Ren Z, Wang Q, Liu Y, Zhang H, et al. Genomic insights into the admixture history of mongolic- and tungusic-speaking populations from Southwestern East Asia. Front Genet. 2021;12(880):685285.
Luo T, Wang R, Wang CC. Inferring the population structure and admixture history of three Hmong-Mien-speaking Miao tribes from southwest China based on genome-wide SNP genoty**. Ann Hum Biol. 2021;48(5):418–29.
Tan H, Wang R, Wang C-C. Fine-scale genetic profile and admixture history of two Hmong-Mien–speaking Miao tribes from Southwest China inferred from genome-wide data. Hum Biol. 2021;93(3):179–99.
He G, Sun Q, Liu Y, Yang J, Chen P, Liu C et al. Admixture-introduced complex landscape of genetic diversity contributed to the extensive Southwestern Chinese ethnolinguistic diversity. 2023.
Mallick S, Micco A, Mah M, Ringbauer H, Lazaridis I, Olalde I et al. The Allen Ancient DNA Resource (AADR): A curated compendium of ancient human genomes. The Allen Ancient DNA Resourcehttps://reich.hms.harvard.edu/allen-ancient-dna-resource-aadr-downloadable-genotypes-present-day-and-ancient-dna-data (2022).
Cong PK, Bai WY, Li JC, Yang MY, Khederzadeh S, Gai SR, et al. Genomic analyses of 10,376 individuals in the Westlake BioBank for Chinese (WBBC) pilot project. Nat Commun. 2022;13(1):2939.
Jeong C, Balanovsky O, Lukianova E, Kahbatkyzy N, Flegontov P, Zaporozhchenko V, et al. The genetic history of admixture across inner Eurasia. Nat Ecol Evol. 2019;3(6):966–76.
Lazaridis I, Patterson N, Mittnik A, Renaud G, Mallick S, Kirsanow K, et al. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature. 2014;513(7518):409–13.
Skoglund P, Posth C, Sirak K, Spriggs M, Valentin F, Bedford S, et al. Genomic insights into the peopling of the Southwest Pacific. Nature. 2016;538(7626):510–3.
Ning C, Li T, Wang K, Zhang F, Li T, Wu X, et al. Ancient genomes from northern China suggest links between subsistence changes and human migration. Nat Commun. 2020;11(1):2700.
Liu CC, Witonsky D, Gosling A, Lee JH, Ringbauer H, Hagan R, et al. Ancient genomes from the Himalayas illuminate the genetic history of Tibetans and their Tibeto-Burman speaking neighbors. Nat Commun. 2022;13(1):1203.
Chen N, Ren L, Du L, Hou J, Mullin VE, Wu D, et al. Ancient genomes reveal tropical bovid species in the Tibetan Plateau contributed to the prevalence of hunting game until the late Neolithic. Proc Natl Acad Sci U S A. 2020;117(45):28150–9.
Sikora M, Pitulko VV, Sousa VC, Allentoft ME, Vinner L, Rasmussen S, et al. The population history of northeastern Siberia since the Pleistocene. Nature. 2019;570(7760):182–8.
de Barros Damgaard P, Martiniano R, Kamm J, Moreno-Mayar JV, Kroonen G, Peyrot M et al. The first horse herders and the impact of early Bronze Age steppe expansions into Asia. Science. 2018, 360(6396).
Flegontov P, Altinisik NE, Changmai P, Rohland N, Mallick S, Adamski N, et al. Palaeo-Eskimo genetic ancestry and the peopling of Chukotka and North America. Nature. 2019;570(7760):236–40.
Zhang F, Ning C, Scott A, Fu Q, Bjorn R, Li W, et al. The genomic origins of the Bronze Age Tarim Basin mummies. Nature. 2021;599(7884):256–61.
Yang MA, Gao X, Theunert C, Tong H, Aximu-Petri A, Nickel B, et al. 40,000-Year-Old Individual from Asia Provides Insight into Early Population Structure in Eurasia. Curr Biol. 2017;27(20):3202–3208 e3209.
Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2(12):e190.
Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009;19(9):1655–64.
Pickrell JK, Pritchard JK. Inference of population splits and mixtures from genome-wide allele frequency data. PLoS Genet. 2012;8(11): e1002967.
Kumar S, Stecher G, Tamura K. MEGA7: molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016;33(7):1870–4.
Loh PR, Lipson M, Patterson N, Moorjani P, Pickrell JK, Reich D, et al. Inferring admixture histories of human populations using linkage disequilibrium. Genetics. 2013;193(4):1233–54.
Delaneau O, Marchini J, Zagury JF. A linear complexity phasing method for thousands of genomes. Nat Methods. 2011;9(2):179–81.
Wangkumhang P, Greenfield M, Hellenthal G. An efficient method to identify, date, and describe admixture events using haplotype information. Genome Res. 2022;32(8):1553–64.
Lawson DJ, Hellenthal G, Myers S, Falush D. Inference of population structure using dense haplotype data. PLoS Genet. 2012;8(1):e1002453.
Browning BL, Browning SR. Improving the Accuracy and Efficiency of Identity-by-Descent Detection in Population Data. Genetics. 2013;194(2):459–71.
Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25(14):1754–60.
Freed D, Aldana R, Weber JA, Edwards JS. The Sentieon Genomics Tools–A fast and accurate solution to variant calling from next-generation sequence data. BioRxiv. 2017:115717.
Schiffels S, Wang K. MSMC and MSMC2: the multiple sequentially markovian coalescent. Methods Mol Biol. 2020;2090:147–66.
Yi X, Liang Y, Huerta-Sanchez E, ** X, Cuo ZX, Pool JE, et al. Sequencing of 50 human exomes reveals adaptation to high altitude. Science. 2010;329(5987):75–8.
Zhou Y, Zhou B, Pache L, Chang M, Khodabakhshi AH, Tanaseichuk O, et al. Metascape provides a biologist-oriented resource for the analysis of systems-level datasets. Nat Commun. 2019;10(1):1523.
Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience. 2015;4(1):7.
Jagadeesan A, Ebenesersdottir SS, Guethmundsdottir VB, Thordardottir EL, Moore KHS, Helgason A. HaploGrouper: a generalized approach to haplogroup classification. Bioinformatics. 2021;37(4):570–2.
Chen H, Lu Y, Lu D, Xu S. Y-LineageTracker: a high-throughput analysis framework for Y-chromosomal next-generation sequencing data. BMC Bioinformatics. 2021;22(1):114.
Dellicour S, Gill MS, Faria NR, Rambaut A, Pybus OG, Suchard MA, et al. Relax, keep walking - a practical guide to continuous phylogeographic inference with BEAST. Mol Biol Evol. 2021;38(8):3486–93.
Leigh JW, Bryant D, Nakagawa S. popart: full-feature software for haplotype network construction. Methods Ecol Evol. 2015;6(9):1110–6.
Acknowledgements
We thank Prof. Etienne Patin at Institut Pasteur for sharing high-coverage WGS data from Taiwan Island, Island Southeast Asia, and Oceanian. We thank Prof. Wibhu Kutanan, Prof. Mark Stoneking, and Dr. Dang Liu for sharing genome-wide SNP data from Vietnam, Thailand, and Laos. We also think all volunteers participated in this project.
Funding
This study was supported by the National Natural Science Foundation of China (82202078) and the Major Project of the National Social Science Foundation of China (23&ZD203), the Open project of the Key Laboratory of Forensic Genetics of the Ministry of Public Security (2022FGKFKT05), the Center for Archaeological Science of Sichuan University (23SASA01), the 1.3.5 Project for Disciplines of Excellence, West China Hospital, Sichuan University (ZYJC20002), Sichuan Science and Technology Program.
Author information
Authors and Affiliations
Contributions
G.L.H., M.G.W., and C.L. conceived and designed the study. L.H.W. and R.H. collected samples. G.L.H., M.G.W., P.X.W., J.C., Y.L., Y.T.S., R.H., S.H.D., Q.X.S., R.K.T., J.B.Y., Z.Y.W., L.B.Y., L.P.H., J.W.Y., and S.J.N. made all analyses and revised the manuscript. G.L.H. and M.G.W. wrote the manuscript draft. All authors read, revised, and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
All procedures strictly followed the Declaration of Helsinki 2000. All volunteers participating in the study signed the written informed consent before sampling. The study was approved by the Ethics Committee of West China Hospital of Sichuan University (2023–306) and North Sichuan Medical College (No: 2021-A9).
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1: Table S1.
The populations newly genotyped in this study and reference populations collected from the Allen Ancient DNA Resource and publicly available databases. Table S2. Pairwise Fst values between coastal, inland, Southeast Asian HM people and other reference populations. Table S3. Outgroup-f3 values focused on the coastal Hmong-Mien people. Table S4. Admixture signatures inferred from the admixture-f3-statistics in the form of f3(source1, source2; targeted populations). Table S5. Admixture time estimated using ALDER. Table S6. The signatures of natural selection identified in HM people based on Fst, iHS and XPEHH analyses. Table S7. The signatures of natural selection specific to Inland HM1 identified based on Fst, iHS and XPEHH analyses. Table S8. The signatures of natural selection specific to Coastal HM identified based on Fst, iHS and XPEHH analyses.
Additional file 2: Fig. S1.
Genetic structure of modern and ancient East Asians. Fig. S2. The cross-validation error of model-based ADMIXTURE analysis of 254 modern and ancient populations in the merged Human Origins (HO) dataset. Fig. S3. Population admixture and genetic ancestry among 153 ethnolinguistically diverse modern eastern Eurasians and 101 ancient populations from East Asia and surrounding regions. Fig. S4. The phylogenetic relationships between geographically diverse HM-speaking populations from China and Southeast Asia revealed by TreeMix analysis with the French as the outgroup population. Fig. S5. Model-based ADMIXTURE results of modern and ancient East Asians inferred with predefined ancestral sources ranging from 2 to 11. Fig. S6. Model-based ADMIXTURE results of newly genotyped populations and HM-speaking reference populations from China and Southeast Asia inferred with predefined ancestral sources ranging from 2 to 10. Fig. S7. A formal test of genomic continuity and admixture in She_**shui people inferred from f4-statistics in the form f4(Reference population1, Reference population2; She_**shui, Mbuti). Fig. S8. A formal test of genomic continuity and admixture in She_Guanshe people inferred from f4-statistics in the form f4(Reference population1, Reference population2; She_Guanshe, Mbuti). Fig. S9. A formal test of genomic continuity and admixture in She_Shanyang people inferred from f4-statistics in the form f4(Reference population1, Reference population2; She_Shanyang, Mbuti). Fig. S10. A formal test of genomic continuity and admixture in Gaoshan_Huaan people inferred from f4-statistics in the form f4(Reference population1, Reference population2; Gaoshan_Huaan, Mbuti). Fig. S11. A formal test of genomic continuity and admixture in She_Guanshe people inferred from f4-statistics in the form f4(Reference population1, She_Guanshe; Reference population2, Mbuti). Fig. S12. A formal test of genomic continuity and admixture in She_**shui people inferred from f4-statistics in the form f4(Reference population1, She_**shui; Reference population2, Mbuti). Fig. S13. A formal test of genomic continuity and admixture in She_Shanyang people inferred from f4-statistics in the form f4(Reference population1, She_Shanyang; Reference population2, Mbuti). Fig. S14. A formal test of genomic continuity and admixture in Gaoshan_Huaan people inferred from f4-statistics in the form f4(Reference population1, Gaoshan_Huaan; Reference population2, Mbuti). Fig. S15. Demographic history of newly genotyped coastal Huaan Gaoshan population. Fig. S16. Demographic history of newly genotyped coastal Guanshe She population. Fig. S17. Demographic history of newly genotyped coastal **shui She population. Fig. S18. Demographic history of newly genotyped coastal Shanyang She population. Fig. S19. The phylogenetic relationship inferred from paternal and maternal lineages. Fig. S20. The effective population sizes (Ne) of inland and coastal HM-speaking populations, and the Yoruba genome was added as a benchmark. The Ne of geographically diverse HM-speaking populations was zoomed in Fig. 3a. Fig. S21. The annotation results of candidate signatures of natural selection identified based on PBS, Fst, iHS, and XPEHH approaches. Fig. S22. The signatures of natural selection specific to Inland HM1 and Coastal HM were identified based on the PBS approach. Fig. S23. The annotation results of candidate genes of natural selection specific to one regional HM-speaking population identified based on Fst, iHS, and XPEHH approaches.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
He, G., Wang, P., Chen, J. et al. Differentiated genomic footprints suggest isolation and long-distance migration of Hmong-Mien populations. BMC Biol 22, 18 (2024). https://doi.org/10.1186/s12915-024-01828-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12915-024-01828-x