Background

Austronesian is one of the most important linguistic families, spread in most regions of Island Southeast Asia, the Pacific Ocean, and the Indian Ocean, and comprising more than one fifth of all the languages in the world [1]. This linguistic family was originally proposed by Murdock [2] by bringing two groups of speakers, i.e. Malayo-Polynesians (Island Southeast Asians (ISEA), Malagasy, Micronesians, and Polynesians) and Taiwan aborigines together as a monophyletic unit based on their linguistic similarity [3, 4]. Later, Benedict found that another linguistic family in East Asia, Daic, has many resemblances with the so-called Austronesian, and therefore announced a super-phylum of Austro-Tai [5]. Daic is a linguistic family located to the north of the ISEA groups, mainly in South China. Some Daic populations spread to Laos, Thailand, and as far as India [1]. Substantial resemblances among Taiwan aborigines, Malayo-Polynesians, and Daic speakers have been reported by ethnologists [610] and linguists [1115], linking Taiwan aborigines and Malayo-Polynesians to coastal populations in Southeast China, primarily Daic speakers and their ancestry, Baiyue.

The origin of Austronesian has always been a controversial subject in linguistics and other related fields. The Express Train Hypothesis, a well accepted linguistic theory on the origin of Austronesian [3, 4, 16, 17], postulates that proto-Austronesians originated in Taiwan and began to expand southward about 5,000–6,000 years ago by way of the Philippines and Eastern Indonesia. They eventually navigated eastward to Micronesia and Polynesia, and westward to Western Indonesia and Madagascar. The 'express train' refers to a rapid dispersal across the present Austronesian range starting from Eastern Indonesia. The hypothesis of the Taiwan origin of all the Austronesians (Taiwan Homeland Hypothesis or THH hereafter) is primarily based on the observation that a much higher linguistic diversity exists among languages of Taiwan aborigines than among the Malayo-Polynesians [3, 4]. However, some linguists found evidences against the THH, and suggested that Kalimantan or Sulawesi may be the homeland of Austronesian [15, 18, 19]. The THH was further challenged by ethnologists [69], archaeologists [10], and geneticists [2025].

Genetic evidence has been equally controversial. Some mitochondrial DNA (mtDNA) studies suggested a Taiwan origin of Polynesians [2022]. A recent mtDNA study on Taiwan aborigines found a root of the "Polynesian Motif" in Taiwan, which suggests that the THH may be confirmed in maternal lineages [26]. On the other hand, this theory was challenged in paternal lineages by the Y-Chromosome studies that showed a lack of resemblance between the Polynesians and Taiwan aborigines [23]. It was also challenged by other mtDNA studies, which suggest an Indonesian origin of Polynesians [24, 25]. The conflicts in the genetic evidence can be attributed to the lack of evidence or populations from two crucial regions: (1) coastal populations in Southeast Asia ancestral to three Austronesian groups (Taiwan aborigines, ISEA, and Polynesians), and (2) ISEA populations including Indonesians from which Polynesians derived.

Another important factor in the genetic structure of Austronesians is that Eastern Austronesians are distinctly different from Western Austronesians (ISEA and Taiwan aborigines, Figure 1). Autosomal STR variation studies [27] revealed a pronounced genetic division between Polynesians and Western Austronesians. These studies suggest that the Polynesians might have undergone natural selection or have been admixed with Melanesians. This process changed their genetic structure [16, 20, 28]. There is also the possibility of genetic drift and founder effects during the dispersal of Polynesians. The genetic structure of Western Austronesians, especially that of the ISEA, is more pivotal to the origin of Austronesians (Figure 1). The high Y chromosome diversity of Indonesian populations, Bali and Sumba islanders, suggests that these populations have existed since the Palaeolithic age [29, 30]. Because of this high genetic diversity, it appears that the ISEA, especially the Indonesians are not just of Taiwanese origin.

Figure 1
figure 1

Geographic distribution of sampled populations and migration routes suggested by Y chromosome analysis. The codes for the population samples are the same as those in Table 1. Green arrows indicate expansion of Daic; blue arrows, Taiwanese; orange arrows, ISEA. The origin of Polynesians, purple arrows, remains controversial in paternal lineages.

Here, we examined the THH of ISEA by studying the Y chromosome diversity of all relevant population groups such as that of the Daic, Indonesians, and Taiwan aborigines. We show that the paternal lineages of both ISEA and Taiwan aborigines derived from the Daic, although independently of each other. In addition, our findings indicate that it is unlikely that Taiwan is the homeland of the paternal lineages of the ISEA populations.

Results and Discussion

To determine the genetic affinity between the Daic populations and the Western Austronesians, we typed twenty single nucleotide polymorphisms (SNPs) and seven short tandem repeats (STRs) in the non-recombining region of 1,509 Y chromosomes sampled from 30 Daic populations, 23 ISEA populations, and 11 Taiwan aboriginal populations (see Figure 1 for locations of the populations and Table 1 for population information). Almost all of the Daic populations in China and all of the Taiwan aboriginal populations were sampled in this study.

In addition, principal component (PC) analysis of 134 East Asian populations encompassing all linguistic groups in East and Southeast Asia was performed using the frequencies of haplogroups defined by SNPs. The result showed that Daic populations are closer to the Western Austronesian groups than any other East and Southeast Asian populations are (Figure 2), indicating a strong genetic affinity between Daic speakers and Western Austronesians. The separation of the Daic-ISEA-Taiwan cluster from the other ethnic groups is attributable to PC2 rather than to PC1, and O1a* is the haplogroup that shows the strongest correlation with PC2 (r2 = -0.875, P < 10-4; see Additional file 1 for details). Furthermore, O1a-M119 is the dominating haplogroup in Taiwan aborigines (average 77%) ranging from 54% to 100% (Table 2, sum of O1a* and O1a2). This lineage is also highly prevalent in Daic speakers (20.5%) and in ISEA (21.2%), but not in the other East Asians (< 5%) [23, 2.

Data analysis

Population relationships were investigated with principal component analyses using Y-chromosome haplogroup frequencies and SPSS11.0 software (SPSS Inc.). Some of the SNPs, such as M175 and M117, were not typed for the previously published populations, therefore our O*-M175 data were combined into haplogroup K, and O3a5a-M117 into O3a5* in our PC analysis. Correlation analysis among haplogroups and PCs was also conducted using SPSS11.0.

The admixture analysis was performed using an ADMIX 2.0 program [52] in order to evaluate the genetic influence of Han Chinese on the Daic populations. We assumed the potential admixture started 2,500 ago when the Qin army entered the Daic area in Canton. The admixture proportions of the Indonesians were also estimated by ADMIX 2.0, and the admixture history was to start 5,000 years ago.

The genetic distances among Daic, Taiwan aborigines, and Malayo-Polynesians were estimated by RST and linearized RST [53] using ARLEQUIN software [54], and the diversities of three groups were evaluated by average gene diversity, haplotype diversity [55], and variance of the STR allele sizes [56].

A Median-Joining network of O1a* STR haplogroups was drawn by Network 4.1 software (Fluxus Technology Ltd). The age of O1a* was estimated in the network. The mutation rate used in the time estimate is 1.932 × 10-4 per year, the sum of the mutation rates [57] of all the STRs used in the network. We assumed 25 years for one generation.