Background

Zingiber Boehm. is a diverse genus of the family Zingiberaceae and consists of approximately 100–150 species that are widely distributed in the tropical and subtropical regions of Asia and Far East Asia [1, 2]. Zingiber contains many economically important species. Some species have long-lasting inflorescences and an assemblage of tightly clasped, brightly colored bracts and floral that often highly showy. They are widely used as landsca** and cut-flower in floral arrangements including chocolate pinecone ginger (Z. montanum) and Chiang Mai Princess (Z. citriodorum) [1,2,71]. MIcroSAtellite (MISA) (http://pgrc.ipk-gatersleben.de/misa/) was used to detect the simple sequence repeat (SSRs or microsatellites) motifs in fourteen sequenced chloroplast genomes with the settings as follows: 8 for mono-, 5 for di-, 4 for tri-, and 3 for tetra-, pena-, and hexa-nucleotide SSRs (Fig. 3). The REPuter software was employed to identify long repeats such as forward, palindrome, reverse and complement repeats. The criteria for determining long repeats were as follows: (1) a minimal repeat size of more than 30 bp; (2) a repeat identity of more than 90%; and (3) a hamming distance equal to 3 (Fig. 4).

Genome comparison and nucleotide variation analysis

To detect the contractions and expansions of the IR regions in the chloroplast genomes of the Zingiber, 20 whole genomes within Zingiber were compared (Fig. 5). The online software mVISTA tool with the Shufe-LAGAN mode [72] was used to make pairwise alignments among these 20 whole chloroplast genomes with the annotated chloroplast genome of Z. cochleariforme as reference (Fig. 6). The 20 chloroplast genomes of Zingiber were first aligned using MAFFT v7 [73] and then manually adjusted using BioEdit v7.0.9 [74]. DnaSP v5.10 software [75] was used to calculate the nucleotide variability (Pi) of the 20 chloroplast genomes within the Zingiber, with a sliding window analysis with the step size and window length set as 200 bp and 800 bp (Fig. 7).

Positive selection analysis

To identify the genes under selection, we scanned the chloroplast genomes of fourteen species within Zingiber using the software EasyCondeML [76]. The software was used for calculating the non-synonymous (dN) and synonymous (dS) substitution rates, along with their ratios (ω = dN/dS). The analyses of selective pressures were conducted along the ML tree of these fourteen species in Newick format. Each single-copy CDS sequences was aligned according to their amino acid sequence. The site-specific model with five site models (M0, M1a & M2a, M7 & M8) were employed to identify the signatures of adaptation across chloroplast genomes. This model allowed the ω ratio to vary among sites, with a fixed ω ratio in all the branches. The site-specific model, M1a (nearly neutral) vs. M2a (positive selection) and M7 (β) vs. M8 (β & ω) were calculated in order to detect positive selection [77]. Likelihood ratio test (LRT) of the comparison (M1a vs. M2a and M7 vs. M8) was used to evaluate of the selection strength respectively and the p value of Chi square (χ2) smaller than 0.05 is thought as significant. The Bayes Empirical Bayes (BEB) inference [78] was implemented in site models M2a and M8 to estimate the posterior probabilities and positive selection pressures of the selected genes.

Phylogenetic analyses

The phylogenetic analyses of 20 Zingiber species were performed based on chloroplast genomic data. The Maximum Likelihood (ML) method in Geneious R11 was used to construct the phylogenetic tree with default settings including 1000 bootstrap replications and the general time-reversible model with a gamma distribution of substitution rate among sites (GTR + G). In addition, Bayesian Inference (BI) was performed using MrBayes v3.2 [79], using the substitution model GTR and running parameters were as follows: the Markov Chain Monte Carlo algorithm was applied for 2 million generations with four Markov chains and sampled of trees every 100 generations, then the first 10% of trees were discarded as burn-in. The software Figtree v1.4 was used to edit and visualize the final BI tree and ML tree (Fig. 8). In addition, to clarify the phylogenetic position of Zingiber within the Zingiberaceae, we constructed a maximum likelihood tree based on chloroplast genome dataset of 55 Zingiberaceae species.