Introduction

Chloroplast genome (plastome) comparative analysis has proven to be a valuable tool in phylogeny reconstruction and resolving complex evolutionary relationships1,2,3,4,5. In angiosperms, it has been observed that the number and order of genes in the plastome are generally conserved6. This conservation is attributed to the relatively slower evolution rate of chloroplast sequences compared to nuclear regions7,8. However, it is worth noting that sequence rearrangements in plastome have been reported in various plant species9,10,11. Inverted repeats region (IR) expansions or contractions into single-copy areas containing inversions, as well as significant inversions in large single-copy regions (LSC), are some examples of these rearrangements12,13. These inversion occurrences were most likely caused by intragenomic recombination in areas with varying G + C concentrations14,15 or tRNA activity16. The importance of gene rearrangements and inversions in plastomes for phylogenetic analyses lies in their rarity, ease of homology estimation, and simplicity in determining the polarity of inversion events17,18,19. The comparisons facilitate the investigation of molecular evolutionary patterns linked to structural rearrangement and the clarification of the molecular mechanisms responsible for those occurrences.

With a global distribution, the Ranunculaceae family has about 2000 primarily herbaceous species20,21,22 and is considered one of the oldest families to diverge from the eudicots. It is a large family, which includes approximately 59 genera and numerous Ranunculaceae plants have significant medicinal uses23. Deep discoveries and a reevaluation of the taxonomy of Ranunculaceae have been made possible in recent years by molecular phylogenetics. The results of molecular phylogenetic research have led to the reduction of several genera and the proposal of a new genus21,24,25,26,27,28. Several widely used plastid regions and tandemly repeated DNA have been the primary data sources for all molecular research conducted to date plastomes. Few entire plastomes have been published and made available through GenBank (http://www.ncbi.nlm.nih.gov).

Nigella, commonly known as fennel flower, constitutes a compact genus within the Nigelleae tribe, comprising 18 species in the Ranunculaceae family29,30. This genus is indigenous to Southern Europe, North Africa, South Asia, Southwest Asia, and the Middle East31,32. Nigella comprises fourteen species, N. sativa L. (black cumin) stands out as the most popular medicinal plant. Moreover, the seeds of N. sativa L. are utilized as spices in various culinary applications. N. damascena L. and N. arvensis are annual plants known for their ornamental and medicinal qualities33,34,35. A limited number of studies have examined genetic variation in N. sativa (black cumin) using DNA-based molecular markers36,37. Plastid phylogenomic investigations can be especially effective in elucidating the generic relationships within the Ranunculaceae family. Structural variations in the plastome, such as gene inversions, gene transpositions, and expansion–contraction of the inverted repeat (IR), offer valuable systematic insights into the family22,38,39.

In this study, we sequenced, assembled, and analyzed the complete plastome sequence of the N. sativa plant for the first time, which belongs to the Ranunculaceae family. We compared it with ten previously published chloroplast genome sequences from the Ranunculaceae family obtained from the National Center for Biotechnology Information (NCBI). This study conducted a general characteristic analysis of plastome for all species and compared it with N. sativa. This analysis likely encompassed a thorough examination of various features such as structure, gene composition, and other relevant attributes within the plastome of the studied species. Furthermore, the study involved the identification of microsatellites (SSRs), long repeat sequences, and highly variable regions within the chloroplast genomes of N. sativa and other studied species.

Results

General features and composition of plastome

This research investigates the plastome structure of N. sativa and compares it with the plastomes of ten additional species within the Ranunculaceae family. The complete plastome of N. sativa exhibits a quadripartite structure, consistent with the typical organization found in most land plant plastomes (Fig. 1). The plastome of N. sativa is approximately 154,120 bp in size and is divided into four main sections. These include the LSC region, which spans 85,538 bp, the SSC region covering 17,984 bp, and two IR regions with a total size of 25,299 bp. In this study, the plastome of P. anemonoides emerged as the largest, spanning a length of 164,383 bp, whereas the plastome of N. sativa was identified as the shortest among the 11 selected plastomes. The plastome of N. sativa contains a total of 128 genes, consisting of 83 genes for encoding proteins, 37 genes for transfer RNA (tRNA), and eight genes for ribosomal RNA (Table 1). The gene count for this organism is the most minimal among all plastomes, with A. coerulea displaying a larger total of 140 genes. There is variability in the number of protein-coding genes across the studied species, ranging from 81 to 94. Notably, N. sativa possesses a total of 83 protein-coding genes. Upon examining all species in the study, it is evident that A. glaucifolium boasts the highest number of protein-coding genes (PCGs), while A. coerulea exhibits the lowest count of PCGs. Within the plastome of N. sativa, 11 genes (rps11, rps12, rps14, rps15, rps18, rps19, rps2, rps3, rps4, rps7 and rps8) encode for small ribosomal subunits, while another set of eight genes (rpl14, rpl16, rpl2, rpl20, rpl22, rpl23, rpl33 and rpl36) encode for large ribosomal subunits. Furthermore, there are 45 genes associated with proteins related to photosynthesis, and an additional four genes (rpoA, rpoB, rpoC1, and rpoC2) are involved in encoding DNA-dependent RNA polymerase. Lastly, nine genes (accD, ccsA, cemA, matK, clpP, infA, ycf1, ycf2, and ycf4) are associated with the encoding of other proteins, as outlined in Table 2. The tRNA gene count ranges from 36 (in A. glaucifolium and A. raddeana) to 45 (in A. coerulea), while the rRNA gene count remains constant at 8 across all plastomes. We found 11 intron-containing genes (atpF, ndhA, ndhB, petB, petD, rpl16, rpl2, and rpoC1) in N. sativa plastome, eight of which contained single intron, whereas three genes (clpP, rps12 and ycf3) have two introns each (Table 3). The GC content of the plastome among the 11 species was generally similar, with N. sativa exhibiting a GC percentage of approximately 38%. In contrast, A. coerulea displayed a higher GC content of 39% across all the plastomes examined. In examining PCG length in N. sativa plastome, we found a length of 76,339 bp. Comparative analysis across species revealed diverse PCG lengths, ranging from 75,870 bp (N. damascene) to 84,105 bp (A. glaucifolium). Additionally, IR lengths in plastomes varied from 31,279 bp (A. raddeana) to 25,162 bp (N. damascene), indicating a positive correlation between overall plastome length and IR size across species (Table 1). We examined the codon usage frequency of protein-coding genes in the N. sativa plastome; phenylalanine had the most codons (1982 codons), then Lysine (1912 codons), while Alanine was the least common amino acid (260 codons). Of the total codons analyzed, 35 exhibited a relative synonymous codon usage (RSCU) greater than 1 in the N. sativa plastome. The most favored codon was AGA, encoding arginine, with an RSCU value of 1.78. Following closely, CAU, which encodes histidine, had an RSCU value of 1.44 (Table S1).

Figure 1 
figure 1

Plastome genome map of N. sativa. Genes drawn outside the circle are transcribed anti-clockwise, while those inside the circle are transcribed clockwise. Large single copy (LSC) region, inverted repeat (IRA, IRB) regions and small single copy (SSC) region are shown in the figure. The darker green color in the inner circle corresponds to GC content whereas the lighter green corresponds to AT content. Different colors of genes represent their different functions.

Table 1 Basic features of the plastome of the N. sativa species and related species.
Table 2 List of genes annotated in the plastome of N. sativa.
Table 3 The genes with introns in the plastome of N. sativa and the length of exons and introns.

Comparative analysis and divergence

The mVISTA analysis uncovered sequence variability among 11 plastomes. In our results, the coding regions displayed comparatively low sequence divergence, while more significant divergence was observed in the non-coding regions. The results of the analysis revealed a noteworthy resemblance between N. damascena and N. sativa in comparison to other species. However, a distinctive pattern of divergence was observed in the region spanning from trnL to ycf1, particularly in the SSC region, as illustrated in Fig. 2. The analysis of various species revealed a variable number of divergences, with a notable pattern observed across different genomic regions. The most substantial divergences were identified within the LSC region, with A. raddeana and A. glaucifolium. Noteworthy divergences were also observed in other species, especially across the psbA to the atpH, rpoB to the trnT, and ycf3 to the ndhJ regions. A striking divergence pattern was also evident in A. coerulea, exhibiting significant distinctions, especially within the rbcL to clpP region in the LSC position. In the SSC region, all plastomes exhibited pronounced divergences compared to N. sativa (Fig. 2). High divergence was noted from ndhF to ycf1, with A. glaucifolium showcasing a particularly significant divergence. Contrastingly, the IR region displayed relatively lower levels of divergence compared to the LSC and SSC regions. The ycf2 gene, however, demonstrated substantial divergence in the IR region across all species, with P. anemonoides exhibiting heightened distinctions. Furthermore, the rpl2 gene displayed notable divergence, particularly in A. coerulea.

Figure 2
figure 2

Alignment visualization of the N. sativa plastome sequences with related species. VISTA-based identity plot showing sequence identity among the 10 species using N. sativa as a reference. The vertical scale indicates percent identity, ranging from 50 to 100%. The horizontal axis indicates the coordinates within the plastome. Arrows indicate the annotated genes and their transcription direction. The thick black lines show the inverted repeats (IRs).

The average pairwise sequence divergence was also calculated for the complete plastome and protein coding genes. A. glaucifolium’s plastome displayed the highest average pairwise sequence divergence (0.2851) with N. sativa, followed by A. raddeana (0.2290) and A. coerulea (0.1222). In contrast, N. damascena exhibited a low pairwise sequence divergence of 0.0117 with N. sativa (Table S1 and Fig. 3). Analysis of protein-coding gene divergence in selected plastomes reveals a distinct pattern, depicted in a heatmap. Notably, the ycf1 gene exhibits significant divergence compared to N. sativa, with other divergent genes including rpl14, rpl16, rpl20, ccsA, cemA, matK, psbT, ndhA, and ndhF across all species, except N. damascene, which resembles N. sativa. The highest pairwise sequence divergence is observed in ycf1 at 0.2283. This study provides valuable insights into the evolutionary dynamics and genetic divergence among these species.

Figure 3
figure 3

Pairwise sequence distance of 73 protein coding genes of N. sativa and related species (A). Nucleotide diversity (Pi) analysis for whole plastomes of N. sativa species. Sliding window length was 200 bp and step size was selected as 100 bp. X-axis: position of the midpoint of a window, Y-axis: nucleotide diversity (Pi) of each window. (B) Sliding window analysis of N. sativa and N. damascena. (C) Sliding window analyses of N. sativa with other 7 species.

The complete plastome of N. sativa was aligned with N. damascena, and DnaSP software calculated nucleotide variability (Pi) to identify mutational hotspots. Nine highly variable loci with elevated Pi values were detected in the chloroplast genomes of both species, highlighting specific regions of sequence diversity. These include six divergent hotspots in LSC regions, trnD-GUC-psbD (0.055) and trnS-GGA-trnl-UAA (0.09), atpB-psaI (0.06), ycf4-cemA (0.08), psbE-petL (0.065), rps8-rpl16 (0.1), and 3 in SSC region ndhF-ndhG (0.8), ndhI-rps15 (0.065), and ycf1 (0.21) (Fig. 3B). Our investigation involved a thorough multiple alignment of nine plastomes, excluding A. glaucifolium and A. raddeana due to their substantial divergence from N. sativa. The analysis revealed 12 divergent hotspot regions with Pi values exceeding 0.1. Noteworthy loci in the LSC region include trnH-GUG-psbA, matK-trnQ-UUG, atpF-atpI, rpoB-psbD, ycf3-ndhJ, ndhC-cemA, and petA-psaJ. In the IR region, trnN-GUU-ndhF, trnV-GAC-rps12, and ycf2-trnI-CAU exhibited divergence. In the SSC region, the ndhA-ycf1 locus (0.27) stands out, as depicted in Fig. 3B. High Pi values in divergence regions highlight significant variations in the entire plastome of N. sativa. Specifically, the ndhC-cemA region shows the highest Pi value at 0.31, followed closely by ndhA-ycf1 at 0.27, providing insights into specific genomic distinctions in these areas.

Plastomes structure variations, inversions, and divergence hotspots

The plastome of the Ranunculaceae family is typically highly conserved, our study revealed variations in certain species compared to the N. sativa plastome. A significant 36 kb inversion in the LSC region (ycf3 to atpA genes) was identified in the plastomes of A. raddeana and A. glaucifolium. Additionally, a 19 kb inversion between ycf1 and ndhF genes in the SSC region was observed in the latter species (Fig. 4). Similarly, in the plastomes of A. coerulea, a 22 kb inversion from atpB to clpP in the large single-copy (LSC) region was observed (Fig. 4). A. raddeana and A. glaucifolium displayed minor inversions and shifts in the psbA and trnH-GUG to psbK region (LSC region). Rearrangements in A. raddeana and A. glaucifolium included the relocation of trnR-UCU, trnG-UCC, and trnS-GCU near ndhJ, as well as the movement of rps4 and rps16 to the genome’s start. Notably, trnK-UUU and matK shifted between rps16 and psbA genes. The absence of the rps16 gene in N. sativa and A. angustius was observed. Additionally, the ycf15 gene was exclusively present in A. glaucifolium (Fig. 4), highlighting distinct genomic variations and structural rearrangements in these chloroplast genomes.

Figure 4
figure 4

Synteny plot of N. sativa and ten other plastomes from Ranunculaceae family. The synteny plot shows normal links with chocolate color, inverted link with lime-green color, and gene feature with sky-blue color.

IR expansion and contraction

To explore the potential expansion and contraction of IRs, the distributions of IR and SC border regions in the plastomes of 11 taxa within the family Ranunculaceae were compared. The rps19 gene, present in all species except A. raddeana, A. glaucifolium, P. anemonoides, and A. coerulea, exhibited an unusual behavior by crossing the boundary between the LSC and IRb regions. Notably, the rpl22 gene consistently resided in the LSC region across species, except for A. raddeana, A. glaucifolium, and A. coerulea, where it was absent (Fig. 5). Additionally, the typical placement of the rpl2 gene in the IRb region shifted to the LSC region in A. coerulea. The ycf1 gene in A. glaucifolium fully overlaps the JSB boundary, while across all species, it spans the JSA boundary, predominantly in the IRa region. In N. damascena and N. sativa, ycf1 is in the SSC region. The ndhF gene is closer to the JSB boundary in all species except A. glaucifolium, where it extends beyond the JSA boundary. The psbA is absent in A. glaucifolium, N. damascena, and N. sativa. The trnH gene is absent in A. raddeana and A. glaucifolium. A. coerulea has the rpl23 gene in the IRb region, absent in other species. A. raddeana and A. glaucifolium exhibit unique gene arrangements, with rps11 in the LSC region, infA in the IRb region in A. raddeana, and rps4 exclusively in A. glaucifolium’s LSC region (Fig. 5). This analysis highlights distinct plastome patterns among species. Structural variations in the IR and SSC regions can lead to gene rearrangements40,41. In this study, the lengths of IR regions were extended in P. anemonoides (30,979 bp), A. glaucifolium (31,256 bp), and A. raddeana (31,279 bp). This extension may contribute to the comparatively larger plastome sizes observed in these species compared to the IR region lengths of N. sativa (25,299 bp) and N. damascena (25,162 bp. Contraction and extension were identified in IR and SSC regions across all studied species. Additionally, in species such as A. coerulea, which has an extended plastome, there is an observable extension in the LSC region (Fig. 5).

Figure 5
figure 5

Comparison of junctions between the large single-copy (LSC), small single-copy (SSC) and inverted repeat (IR) regions among plastome of N. sativa and other ten plastomes. Boxes above or below the main line indicate the adjacent border genes. The numbers above the gene features indicate the distance between the ends of genes and border sites.

Repeat and SSR analysis

The number of repeats identified in all selected species ranges from 46 to 50, encompassing 16 to 28 palindromic repeats, 17 to 26 forward repeats, and 0 to 15 reverse repeats (Fig. 6). In N. sativa, the total repeats are 48, including 23 palindromic repeats and 25 forward repeats, with no reverse repeats observed. Across the selected species, all repeat types are predominantly about 18–30 bp in length (Fig. 6). Tandem repeats vary from 14 to 49 in all species, most falling within the 11–20 bp range. Specifically, N. sativa exhibits 24 tandem repeats (Fig. 6C). The SSR analysis of 11 plastomes revealed diversity in microsatellite counts, notably, N. sativa displayed 32 repeats, predominantly consisting of mononucleotide repeats. Additionally, some di- and trinucleotide repeats are present in the SSR analysis. P. anemonoides exhibits the highest number of SSRs among all species, totaling 65 (Fig. 7A). The predominant type of SSRs across all plastomes were mononucleotide repeats, followed by dinucleotide and trinucleotide repeats. However, tetranucleotide, pentanucleotide, and hexanucleotide repeats were absent in all plastomes. A and T repeats constitute a more significant proportion of mononucleotide repeats than G and C repeats. Similarly, in dinucleotide repeats, the AT content represents a more significant proportion than the GC content (Fig. 7B).

Figure 6
figure 6

Analysis of repeated sequences in N. sativa and other 10 Ranunculaceae plastomes (A), totals numbers of three repeat types (B), number of palindromic repeats by length (C), number of tandem repeats by length (D), number of forward repeats by length (E) and number of reverse repeats by length.

Figure 7
figure 7

Number of different types of SSRs in the plastome of N. sativa and other plastomes (A) and number of SSR motifs (B).

Phylogenetic analysis

This study inferred phylogenetic relationships within Ranunculaceae from 73 shared protein coding genes. The Glaucidioideae, Hydrastidoideae, and Coptidoideae emerged as the earliest divergent lineages within the Ranunculaceae family in our study. In our current study, the analysis of plastid phylogenomics revealed a well-supported sister relationship between subfamilies Talictroideae and tribe Adonideae, with a strong bootstrap value of 95. The tribe Asteropyreae and Caltheae were observed to form the same clade in our study, but the support for this grou** is relatively low, with a bootstrap value of 44. Our analysis in Ranunculoideae successfully resolved the sister relationship between the tribes Anemoneae and Ranunculeae, with a robust bootstrap support value of 100 (Fig. 8). In our study, we observed that the position of Nigelleae is situated between Callianthemum and Cimicifugeae based on the protein coding genes data set. This tribe demonstrated its closest relationship with Cimicifugeae, a connection supported by a robust bootstrap value of 100. The phylogenetic trees strongly indicate that N. sativa is most closely related to N. damascene, which belongs to the genus Nigella and forms the same clade.

Figure 8
figure 8

Phylogenetic trees were constructed for 75 members of the family Ranunculaceae, representing 11 different genera using different methods, and tree is shown for 73 commonly shared genes data sets constructed by Maximum Likelihood (ML) and Bayesian inference (BI) method. The number above on each node represents the bootstrap value. The red color diamond shape represents the position for N. sativa.

Discussion

In recent years, the plastome has frequently been employed as a DNA super barcode for the identification, classification, and phylogenetic research of medicinal plants42,59. The identification and classification of Ranunculaceae species are crucial for understanding their evolutionary relationships and ecological roles59,60. The previous research revealed that the combination of markers such as ndhC-trnV-UAC, psbE-petL, rps8-rpl14, petN-psbM, atpF-atpI, trnT-GGU-psbD, rpl32-trnL-UAG, rpl16-rps3, rps16-trnQ-UUG, ndhG-ndhI, accD-psaI, trnG-GCC-trnfM-CAU, trnT-UGU-trnL-UAA, psbZ-trnG-GCC, and trnK-UUU-rps16 resulted in a 100% species identification rate, which is significantly higher than the rates achieved by individual markers59,60,61,62. The study also revealed that the use of combination markers can identify seven-fold more variant sites than conventional single-specific barcode markers Kim et al. This observation aligns with previous findings in the Ranunculaceae family, where over 20 divergent hotspot regions were identified59. Similarly, nine divergent hotspot regions in seven species of Pulsatilla (Ranunculaceae) were identified previously, including six intergenic spacer regions (rps4-rps16, rps16-matK, ndhC-trnV, psbE-petL, ndhD-ccsA and ccsA-ndhF) and four protein-coding regions (ycf1, ndhF and ndhI)60. These findings underscore the value of using multiple markers to account for the varying rates of nucleotide variation across different loci. The use of these combined markers can be particularly advantageous for identifying closely related species, where individual markers may not be sufficient to distinguish between them. The most effective multi-locus barcode for identifying Pulsatilla species from the Ranunculaceae family was found to be cpDNA barcodes like rbcL, matK and trnH-psbA in earlier research60. Furthermore, ycf1 gene was also found the most efficient barcode in Aconitum species identification61.

Additionally, our findings indicate that Angiosperms tend to accumulate variations at the genus level in the LSC and SSC regions of the plastome. This pattern is consistent with the distribution of variations reported in the plastomes of other genera, such as Cymbidium, Oenothera, and Pyrus63. Moreover, the observed distribution of divergence regions, predominantly in the LSC and SSC regions, aligns with previous reports on Chaenomeles and Lancea species64,65. Previously, five types of plastome were identified based on distinctions in the LSC region. N. damascena (Type I) represents an ancestral condition. A. raddeana and A. glaucifolium exhibit the second type (Type II) with a unique gene arrangement pattern involving inversions. Likewise, A. coerulea (Type V) features an inversion between accD and clpP1, distinguishing it from Type I chloroplast genomes. In the Ranunculaceae, the Type I plastome is considered the most primitive. According to39, all other types have originated from Type I through the inversion of different genes.

The concept of codon usage bias (CUB) refers to the differential frequency with which various synonymous codons encoding the same amino acid are observed in the coding sequences of a given organism’s genome48. CUB preferences are specific to different genes in different species and can even vary within a particular species. This variability is shaped by a combination of factors, including mutation, selection, and genetic drift, which act during the long-term evolution of genes and species66. In our study, we examined the codon usage frequency of protein-coding genes in the N. sativa plastome, among all phenylalanine had the highest codons (1982). Additionally, 35 codons analyzed exhibited a relative synonymous codon usage (RSCU) greater than 1 while the most favored codon was AGA, encoding arginine, with an RSCU value of 1.78.

The plastome of higher plants is known for its high degree of conservation. However, variations in genome length between species do arise due to the dynamic processes of extension and contraction occurring in the IR, LSC, and SSC regions67,68,69,70,71. Throughout plastome evolution, the IR region undergoes dynamic changes involving expansion and contraction, with genes entering either the IR region or the LSC and SSC regions72. We thoroughly compared 11 species, examining the two IRs and the two single-copy regions. In N. sativa, a notable contraction was observed in the IRs, while only a slight expansion was noted in the SSC region due to the shifting of rpl2 and ycf1 genes, leading to a shortened plastome length (Fig. 7). On the contrary, in P. anemonoides, there is an extension in the IR region. The larger genome size of this species might be due to the rps19 gene entering the junction of the LSC and IR borders, and 107 bp appeared in the IR region and was duplicated. Similarly, A. raddeana and A. glaucifolium exhibit expanded IR regions with placed genes infA, rps8, rpl2, ycf1, and rpl36 extending to the JLB Junction. Additionally, rps11 and rps4 genes are situated in the LSC region, contributing to increased genome size. The expanded genome size in A. coerulea results from LSC region enlargement, while SSC and IR regions simultaneously contract. This aligns with previous research indicating significant structural changes in land plant plastomes, including IR region loss or specific gene families73. The events of expansion and contraction in IRs are crucial in evolution as they can lead to alterations in gene content and plastome size47,74. The expansion of IRs has been documented in Araceae74,75. In certain cases, the LSC region expands while the SSC region decreases, reaching a size of only 7000 bp in Pothos76. The expansion and contraction of IR regions can result in the duplication or conversion of certain genes from duplicate to a single copy, respectively47,74. Modifications in IR size can also prompt rearrangements of genes in the SSC region, as recently observed in Zantedeschia74.

Long repeats are crucial contributors to the complete plastome’s variation, expansion, and rearrangement77. N. sativa was found to have approximately 48 long repeats. In comparison, the long repeats in these plastomes ranged from 46 (A. coerulea) to 50 (A. raddeana, A. macrophylla, A. angustius). The SSRs and long repeats in the 11 plastomes showed considerable variation. SSRs were mainly present in the non-coding region, and their sequence variation was higher compared to the coding region78. Additionally, SSRs can be employed for studying conservation genetics in endangered plant species, molecular identification, and exploring genetic relationships among related species79,80. The analysis of SSRs in the plastome of N. sativa revealed variations in the number of SSRs among 11 species, ranging from 24 (A. raddeana) to 65 (P. anemonoides). Mononucleotide repeats are the most common, followed by dinucleotide repeats, and the prevalent motifs across all species are A and T. Our results align with previous reports indicating that mononucleotide and dinucleotide repeats were the most and second most abundant SSRs in the plastomes of two Caldesia species81. Additionally, our findings are in line with earlier research suggesting that SSRs in plastome predominantly consist of polythymine (polyT) or polyadenine (polyA) repeats and less frequently contain tandem cytosine (C) and guanine (G) repeats82. This consistency supports the previous observation that plastome SSRs are primarily dominated by ‘A’ or ‘T’ mononucleotide repeats83,84.

The current classification of Ranunculaceae, as proposed by85, relies on a comprehensive analysis that combines both morphological and molecular phylogenetic data. This classification results from examining 6957 molecular characters and 65 morphological characters. In this proposed classification, Ranunculaceae is categorized into five monophyletic subfamilies: Glaucidioideae, Hydrastidoideae, Coptidoideae, Thalictroideae, and Ranunculoideae. The Ranunculoideae subfamily is further subdivided into ten strongly supported monophyletic tribes. The findings of our study align with previous research, supporting Glaucidium as the first diverging taxon and sister to all other Ranunculaceae species85,86,87. Our results are consistent with the findings of85, indicating that Hydrastis is the second diverging taxon with robust support, and Coptidoideae represents the third diverging clade. In earlier studies, the position of Nigelleae within the Ranunculaceae family has been inconsistent. However, a previous analysis of plastomes from 38 Ranunculaceae species found that Nigelleae is closely related to Delphineae. This relationship was strongly supported by a bootstrap value (100), providing robust evidence for the clustering of Nigelleae and Delphineae in the same clade88. Furthermore, based on 77 protein-coding genes and four rRNA genes, the analysis revealed that Caltheae is the sister group to Asteropyreae. In turn, Asteropyreae is identified as the sister group to the combined clade of Caltheae, Delphinieae, and Nigelleae39. Nevertheless, our findings align with the research conducted by89,90, where they identified Nigellaea as the sister group to Cimicifugeae. Similar results about Nigelleae were reported previously91. Furthermore, in line with our study, they also identified the sister relationship between the subfamilies Talictroideae and Adonideae. Moreover, in our research, the strongest supported grou** (with a bootstrap value of 100) among tribes of Ranunculoideae is the sister group relationship between Anemoneae and Ranunculeae. This finding is consistent with results from previous studies, providing additional confirmation to the observed relationship between these two tribes85,92,93,

Materials and methods

The fresh leaves were collected from N. sativa cultivate in Agriculture Research Center, KPK, Pakistan and transported in liquid nitrogen to the − 80 °C facility. The specimens were submitted to the Agriculture Research Center KP, Pakistan herbarium center under the voucher numbers AGN-NG1 (N. sativa). Dr. Muhammad Waqas one of the leading agronomists at the Agriculture Research Center KPK, Pakistan, identified the plants. The plant samples were collected and processed per the national guidelines and legislation. Hence, a permission permits (NJ334/15/78) was obtained from the Environmental Protection Agency, Khyber Pakhtunkhwa, Pakistan.

DNA extraction and sequencing

To extract high-quality DNA from young and immature leaves of N. sativa, we employed a meticulous process. Firstly, the leaves were finely ground into a fine powder using liquid nitrogen. This method ensured that the DNA would be released from the cells effectively. To isolate the DNA, we utilized the highly reliable DNeasy Plant Mini Kit from Qiagen (Valencia, CA, USA). This kit provided us with a robust and efficient method for DNA extraction from plant samples. The kit's protocol was followed carefully to obtain high-quality DNA. Once the DNA was successfully isolated, we proceeded to sequence the chloroplast DNA using an Illumina HiSeq-2000 platform at Macrogen (Seoul, Korea). This cutting-edge sequencing platform allowed us to generate a vast number of raw reads for N. sativa, specifically around 578,630,881 raw reads. However, to ensure the reliability and accuracy of our analysis, we needed to filter out low-quality sequences. To achieve this, we implemented a stringent filtering criterion based on a Phred score of less than 30. This quality control step eliminated any reads that did not meet the desired threshold, ensuring that only high-quality sequences were retained for further analysis. To assemble the plastome with precision, we employed two different methods. Firstly, we utilized the GetOrganelle v 1.7.5 pipeline95, which is a sophisticated tool specifically designed for plastome assembly. Additionally, we also employed SPAdes version 3.10.1 (http://bioinf.spbau.ru/spades) as an assembler to enhance the accuracy and reliability of the assembly process.

Genome annotation

The annotation process of the plastome involved several steps using established tools and software. CpGAVAS296 and GeSeq (https://chlorobox.mpimp-golm.mpg.de/geseq.html), widely recognized online tools for genome annotation, were utilized to carry out the initial annotation. Additionally, tRNAscan-SE97, a well-established program, was employed to identify tRNA genes within the plastomes. To ensure the accuracy of the annotations, a comparative analysis was conducted by comparing the plastomes with reference genomes using Geneious Pro v.10.2.398 and tRNAs can-SE (v.1.21)97. This step allowed for the identification of start and stop codons, determination of intron boundaries, and implementation of manual alterations when necessary. To visualize the structural features of the plastomes, chloroplot, a powerful tool99, was used. Furthermore, the genomic divergence was assessed using mVISTA in shuffle-LAGAN mode, with the plastome of N. sativa serving as the reference55. In the N. sativa plastome, the average pairwise sequence divergence with ten related species (N. damascena, A. asiatica, A. angustius, A. raddeana, A. coerulea, A. glaucifolium, P. anemonoides, L. fumarioides, D. fargesii and A. macrophylla) was determined. We extensively compared gene order and performed multiple sequence alignment. This allowed us to employ comparative sequence analysis to identify any missing or unclear gene annotations. For whole genome alignment, we used MAFFT version 7.222 with default parameters100. Pairwise sequence divergence was calculated using Kimura’s two-parameter (K2P) model. This approach ensured an accurate assessment of the genetic data. In our analysis, we employed the DnaSP software version 6.13.03101 to perform a sliding window analysis with a window size of 200 bp and a step size of 100 bp. This analysis allowed us to calculate nucleotide variations, specifically the nucleotide diversity (Pi). To visualize the shared genes and gene divergence among different species plastomes, we utilized the heatmap2 package in the R software. Additionally, we created a synteny plot using the pyGenomeViz version 0.2.1 package, employing the pgv-mmseqs mode and setting an identity threshold of 50%. The relevant source for pyGenomeViz can be found on GitHub at the following URL: https://github.com/moshi4/pyGenomeViz.

Characterization of repetitive sequences and SSRs

We identified various functional repetitive sequences within the plastomes of N. sativa and 10 other species belonging to the Ranunculaceae family. We identified palindromic, forward, and reverse repeat sequences using the online tool REPuter102. The analysis was conducted with conditions specifying a minimum repeat size of 8 base pairs and a maximum of 50 computed repeats. Likewise, the MISA software103 was employed to calculate simple sequence repeats (SSRs) under specific conditions: ≥ 8 repeat units for one base pair repeats, ≥ 6 repeat units for two base pair repeats, ≥ 4 repeat units for 3 and 4 base pair repeats, and ≥ three repeat units for 5 and 6 base pair repeats. Moreover, tandem repeats were computed using the online tool Tandem Repeats Finder v.4.09104.

Genome divergence

We assessed the variation in shared protein-coding genes and complete plastomes among N. sativa and its related species. A comparative analysis was executed through multiple sequence alignment, wherein the examination and analysis of gene order were undertaken to enhance the precision of deficient and ambiguous gene annotations. Plastome annotations were conducted using MAFFT version 7.222100, employing default parameters. Pairwise sequence divergence was calculated utilizing Kimura’s two-parameter model (K2P)100. We created a synteny plot using the pyGenomeViz version 0.2.1 package, employing the pgv-mmseqs mode and setting an identity threshold of 50%. The relevant source for pyGenomeViz can be found on GitHub at the following URL: https://github.com/moshi4/pyGenomeViz.

Phylogenetic analyses

To determine the phylogenetic position of N. sativa within the family Ranunculaceae, 76 published plastome sequences of Ranunculaceae species were downloaded from the NCBI database for phylogenetic analysis. A comprehensive analysis was conducted using a dataset comprising 73 commonly shared genes among 75 members of the family Ranunculaceae, representing 11 different genera. To ensure accuracy, the nucleotide sequences of these 73 protein-coding genes were aligned and combined using MAFFT, employing the default settings as outlined by105. The best-fitting model of nucleotide evolution, TVM + F + I + G4, was determined by jModelTest 2106. Two distinct approaches were employed to deduce the phylogenetic relationship of N. sativa. Firstly, a Bayesian inference (BI) tree was constructed using Mrbayes 3.12, utilizing the Markov chain Monte Carlo sampling method. Secondly, a maximum likelihood (ML) tree was generated using PAUP* 4.0107. The ML tree was created by running 1000 bootstraps, which provided support values for different nodes. For the BI analysis, a total of four chains were employed: three heated chains and one cold chain. These chains were run for 10,000,000 generations, with a sampling frequency of 1000 and a print frequency of 10,000. To ensure convergence, a burn-in of 2500 (25% of the total number of generations divided by the sampling frequency) was implemented. Finally, a 50% majority-rule consensus tree was derived from the phylogenetic trees generated, and Figtree108 was utilized to visually represent the relationships among the moss species based on their plastome sequences.

Ethics approval and consent to participate

The authors declared that experimental research works on the plant described in this paper comply with institutional, national, and international guidelines. Field studies were conducted in accordance with local legislation and got permission from the provincial department of Forest and Grass of Khyber Pakhtunkhwa Province, Pakistan.