Introduction

Patrinia species have been traditionally used by Chinese medicine practitioners for various kinds of disorders, especially colon cancer. In recent years, Patrinia species have been documented with a number of research in phytochemistry and pharmacology which are related to its traditional usage1. Specific anti-cancer studies were also documented from various research groups2,3. On the other hand, the herbs were usually adopted as food or supplements, hence they are widely cultivated in various provinces in China for various kinds of usage4. According to Flora Reipublicae Popularis Sinicae (FRPS), Patrinia has ten species, three subspecies and two varieties. In Flora of China (FOC), Patrinia has been classified as eleven species and three subspecies. Most of these species are believed to have medicinal value in general. However, only five Patrinia species, P. scabiosifolia Link, P. scabra Bunge, P. heterophylla Bunge, P. villosa (Thunb.) Juss. and P. rupestris (Pall.) Juss., were documented in Zhonghua Bencao5 and Zhongyao Da Cidian6, and their herbal materials were named as “Baijiang”, “Yanbaijiang” or “Mutouhui”. There is however no official record of its source plants and quality control requirement in the Chinese Pharmacopoeia (2020)7. Moreover, the market available Patrinia materials have never been well identified. Hence, there is a need to verify the source species of the market available Patrinia for their good quality control and clinical application.

When reviewing the preclinical research of Patrinia species, three species were usually adopted in most studies mainly due to their regional usage and retails availability. Firstly, P. villosa seems to be the most frequently used and researched in Patrinia history. The species was found to be used as traditional medicinal herbs, with various clinical applications in anti-cancer, anti-diarrhea, sedative, etc. Specific application was recorded for its treatment in colorectal cancer by activating the PI3K/Akt signaling pathway8. Secondly, P. scabiosifolia has quite abundant records of its usage relating to cancer. Various compounds or raw extracts of the herbs have been tested against human carcinoma cell lines, suggesting it to be a good anti-cancer herb9,10. Lignans, monoterpenes from the species also showed potential cytotoxic activities against human colon HCT-116 cells11. The herb was also found to inhibit the growth of 5-fluorouracil-resistant colorectal carcinoma cells12. Thirdly, P. heterophylla also has a number of records regarding its pre-clinical research. Its active components, including phenylpropanoids, flavonoid, iridoids and coumarins, also possessed cytotoxic activities against different tumor cells13,14,15.

It is important to note that the identity of the medicinal Patrinia species has always been confusing, and only few reports were found regarding their source materials authentication. One of the major concerns is the inconsistency of botanical description and the phenotypic structures. Variations of the authenticating characters, such as involucral bract and leaf segments, are usually noticed. More varying character states were found from the cultivated populations2), that two P. scabiosifolia (M. C. Li 082 and 083) samples showed comparatively weaker intensity of spot S1 and S5, while one P. heterophylla (M. C. Li 476) showed greater intensity of spot S5 which were even absent in other samples of P. heterophylla. Moreover, if two herbs are closely related, their chemical profiles of great similarity could not be differentiated.

In order to develop a platform of accurate authentication, DNA analysis was further employed. The time cost of DNA analysis was greatly increased by a list of sequential procedures, i.e. DNA extraction, PCR amplification, purification of PCR products, Sanger sequencing and nucleotide analyses including SNP and InDel analysis and phylogenetic tree reconstruction. The completion of all procedures could take about two to three weeks. Yet, species identity could be reinforced critically by the species-specific SNPs and InDels, as well as the monophyly of singe-species clades in phylogenetic tree. More importantly, this molecular evidence is less influenced by environmental factors when comparing to chemical markers. The integration of the three authentication methods by morphological, chemical and molecular evidence contributed to an accurate authentication platform of Patrinia herbal medicines.

In this study, the application of complete chloroplast genomes in authenticating plant species was fully demonstrated. The method and results are valuable in DNA barcoding authentication, particularly for the plant taxa that are hardly differentiated down to species-level by using universal barcode regions. As in the case of Patrinia species, the universal barcode regions were not useful in differentiating the targeted four species through DNA barcoding. In this study, although eighteen variable nucleotides including eight species-specific nucleotides were identified in the universal barcode region ITS2, it showed no resolution in species discrimination. It is contrasting to the work of Kim et al.22 that 22 species-specific nucleotides were found in this nuclear region. The reason could be probably due to different primer pairs were used and different species were studied. The primer pairs ITS-S2F and ITS-S3R were used in this study, while ITS-S2F and ITS4 were adopted in the study of Kim et al.22. Besides, P. heterophylla, P. monandra, P. scabra and P. villosa subsp. punctifolia were included in this study, but not P. saniculifolia and P. rupestris in the study of Kim et al.22.

When using the targeted regions based on the full alignment of the complete chloroplast genomes, the sequence of amplicons could truly help in species differentiation, although the discrimination success rate of each targeted regions varies. It is suggested to use petA as the key barcode region as this single region has genetic information to differentiate all six studied taxa into monophyletic clades. When combining two loci for phylogenetic reconstruction, both combination of atpB + petA and petA + rpl2-rpl23 could increase the discrimination success rate up to 100%, further revealing the importance of petA in species authentication. When three loci were used for phylogenetic reconstruction, the combination atpB + petA + rpl2-rpl23, petA + psaI-ycf4 + rpl2-rpl23 and atpB + petA + psaI-ycf4 provided the best discrimination success rate (100%). So, if limited resources are obtained, it is suggested to simply use the single locus petA for differentiating Patrinia species. However, 2 to 3 loci should be considered in order to provide better resolution. Particularly, the species-specific insertion in 24 bp (3′ GAAGGGGTATGTTATTATTTTATT 5′) of P. villosa subsp. villosa is highly informative. Although having the same discrimination success rate as 33.3% with psaI-ycf4, the region atpB is not preferred as only 4 informative variable nucleotides were found. The region rpl2-rpl23 has twenty informative variable nucleotides in which nine of them are species-specific, contributing to the moderate discrimination success rate as 50%. Therefore, psaI-ycf4 and rpl2-rpl23 should be considered as auxiliary markers for better resolution and greater bootstrap values.

The topological differences between NJ, UPGMA and ML trees of the four-loci combination were probably caused by the lack of species-specific nucleotides in distinguishing P. monandra from the others. It was obvious that monophyletic clade of P. monandra could not be formed in the ML tree. Future study in capturing species-specific chloroplast SNPs and InDel of P. monandra and other Patrinia species would be helpful in increasing the resolution of phylogenetic analyses.

Comparing to the RAPD genomic profiling and SCAR markers in the study of Moon et al.23, the utilization of complete chloroplast genomes in authenticating medicinal Patrinia is relatively reliable and stable. Firstly, heavy work in screening suitable markers and primers were required in Moon’s study. Forty-seven out of eighty-six Operon primers produced distinct RAPD profiles, with twenty-eight primers showing polymorphic fragments. Based on forty-six species-specific amplicons, forty-three SCAR primer pairs were designed and eight of them were selected to capture the species-specific amplicons. In contrast, primer design and selection using chloroplast genome is more convenient and less labor intensive. Demonstrated in this study, three divergence hotspots were identified on chloroplast genomes alignment, and 4 out of 6 primer pairs were screened through amplification trial. Secondly, the multiplex-SCAR assay was restricted by the spectrum of amplicon size, as the primer sets had been chosen to visualize the differentiation in species-specific size. In our study, the amplicons were simply purified from agarose gel for sequencing, phylogenetic tree reconstruction and identification of informative nucleotides. Thirdly, the stability and specificity of RAPD profiling is affected by PCR conditions. In contrast, all PCR amplifications for targeted chloroplast regions were conducted under the same condition, with relatively low annealing temperature as 40 °C that is less specific, yielded strong bands at desired amplicon size for most of the samples as shown in our electrophoresis gels (Supplementary Fig. S29S36). In addition, chloroplast genomes allow us to design specific primers capturing DNA fragments originated from chloroplast, and hence avoiding fungal contamination which occurs in ITS.

In summary, this study truly reflects the power of integrating plant taxonomy, chemical fingerprinting and DNA analysis. It is hard to start any authentication without knowing the name of the plants and the DNA sequences. Traditional morphological identification of plant species is the most efficient and direct way of authentication, but phytochemicals would become important markers when morphology or genomic DNA are not available. When samples quality is good enough for DNA extraction, the power of complete chloroplast genomes was demonstrated in breaking through the limitation of universal barcode regions. This study is also the first time to discover long fragment of species-specific InDels in the chloroplast regions for species differentiation of Patrinia. In conclusion, the three aspects of authentication methods would complement to each other to cope with various samples forms and states for better quality control of Chinese medicines.

Methods

Morphological authentication

Fresh samples of Patrinia species available in the markets were purchased from various provinces in China (Table 4). All fresh parts with flowers or fruits were used to prepare herbarium specimens. The identity of each sample was morphologically confirmed by studying the characters of the bracteole, peduncle indumentum, involucral bract, leave texture, basal or cauline leave arrangement, stamen structure and corolla color.

Table 4 Herbal materials used for morphological, chemical and molecular authentication.

All authenticated samples after standardized specimen processing methods were deposited in the Shiu-Ying Hu Herbarium (herbarium code: CUHK) as voucher specimens with collector numbers. Materials for chemical and molecular analysis were reserved for each specimen. The samples were well classified into 6 taxa. Among all samples, four specimens with well preserved and clear structures were adopted as our reference specimens (authenticated specimens in Table 4). The collector numbers are given as below:

Patrinia heterophylla Bunge (M. C. Li 089)

Patrinia monandra C. B. Clarke (M. C. Li 103)

Patrinia scabiosifolia Link (M. C. Li 083)

Patrinia villosa (Thunb.) Juss. subsp. villosa (M. C. Li 403)

Chemical authentication

For each of the herbal samples, the test solution was prepared by extracting 2 g dried and pulverized herb with 20 ml methanol under ultrasonic condition at room temperature (approximately 21 °C) for 60 min, followed by filtration. The filtrate was then evaporated to dryness under reduced pressure at 50 °C. The extract was dissolved in 5 ml of methanol and was used for TLC analysis on silica gel 60 F254 TLC plates (20 cm × 10 cm, Merck, Germany). Extracts (2 μL) were applied to the plates as 8 mm bands using the CAMAG automatic TLC Sampler 4 (ATS4, Muttenz, Switzerland), development to a distance of 8.5 cm up the plate was performed in a TLC develo** chamber. A mixture of ethyl acetate: methanol: water (8:1:1, v/v, upper layer) was used as the develo** solvent system. The plate was then heated on a TLC plate heater (CAMAG, Muttenz, Switzerland) at about 105 °C after spraying with the 10 % solution of sulfuric acid in ethanol until the color of the spots appeared distinctly. High-definition images of the TLC plate were captured using a Visualizer 3 (CAMAG, Muttenz, Switzerland) linked with WinCATS software28 under UV light (λ = 366 nm).

Molecular authentication

Sliding window analysis and primer design

The nine accessions of Patrinia complete chloroplast genomes (Supplementary Table S1) available on NCBI GenBank were downloaded for alignment. MAFFT version 729 were used to align the chloroplast genomes. Sliding window analysis was performed using DNA Sequence Polymorphism (DnaSP) version 6.12.0330 for the calculation of nucleotide diversity values (Pi) from the aligned chloroplast genomes, in which the window length and the step size were set to 600 bp and 200 bp, respectively. The result was then visualized in a line chart (Fig. 4). Hotspot regions were identified with a threshold value Pi = 0.05. The loci above this value were considered as potential candidates for species differentiation.

Figure 4
figure 4

Identification of hotspot regions through sliding window analysis. The X-axis represents the alignment positions of the complete chloroplast genomes. The Y-axis marks the value of nucleotide diversity values (Pi). In total three hotspot regions—atpB, psaI-ycf4-petA and rpl2-rpl23—were identified with the threshold Pi > 0.05 as the red line.

Three hotspots were identified, namely atpB (alignment position: 57,125–58,621 bp), psaI-ycf4-petA (66,232–69,774 bp) and rpl2-rpl23 (94,020–95,295 bp). All the hotspot regions were located in Large Single Copy (LSC). Since the hotspot psaI-petA in over 3500 bp was too long for PCR amplification, only the hypervariable regions were targeted, resulted in two loci as psaI-ycf4 (66,232–67,096 bp) and petA (69,520–69,774 bp) were considered for primer design.

According to the hotspot regions, six pairs of primers were designed (Table 5) to capture the hypervariable positions with differentiating power. Since the intergenic spacer between the protein-coding gene rpl2 and rpl23 exceed 1200 bp which was not beneficial for PCR amplification, two pair of primers were designed to amplify two separated fragments (630 bp and 460 bp) of this locus. The same treatment was also performed for the loci atpB since this protein-coding gene exceeds 1400 bp.

Table 5 Primer pairs used for PCR amplification.

DNA extraction

About 50 mg of each silicon-dried leaf sample (Table 4) were taken for DNA extraction (Supplementary Table S2). Weighed samples were placed in 2 mL Precellys Hard tissue grinding MK28 (Bertin Corp., Maryland, USA), and homogenized by Precellys Evolution Tissue Homogenizer (Bertin Technologies, Montigny-le-Bretonneux, France) using hard tissue mode. Total genomic DNA of all studied samples were extracted using i-genomic Plant DNA Extraction Mini Kit (iNtRON Biotechnology, Daejeon, Korea) following the instructions of the manufacturer. The quality and quantity of extracted DNA were assessed by 1.5 % agarose gel electrophoresis and NanoDrop Lite Spectrophotometer (Thermo Fisher Scientific, Massachusetts, USA), respectively.

PCR amplification, agarose gel electrophoresis and DNA sequencing

PCR amplification using both designed and universal primer pairs (Table 5) was firstly conducted for the samples of the four authenticated specimens representing P. scabiosifolia (M. C. Li 083), P. villosa subsp. villosa (M. C. Li 403), P. monandra (M. C. Li 103) and P. heterophylla (M. C. Li 089) (Table 4). After assessing the amplifiability, the primer pairs were used to amplify the target sequences from thirty-three testing samples collected from various locations in the mainland China. These samples include six samples of P. heterophylla, six samples of P. monandra, eight samples of P. scabiosifolia and eleven samples of P. villosa subsp. villosa. In addition, to test the amplifiability on other Patrinia taxa, a sample of P. scabra Bunge and one of P. villosa subsp. punctifolia H. J. Wang were adopted for PCR amplification using the selected primer pairs.

Extracted total genomic DNA of each sample were amplified using GoTaq® G2 Flexi DNA Polymerase (Promega, Wisconsin, USA). In each 30-μL reaction, 6 μl 1X Green GoTaq® Flexi Buffer, 3 μl MgCl2 (2.5 mM), 0.6 μl Promega dNTPs mix (0.2 mM), 1.5 μl Forward Primer (500 nM), 1.5 μl Reverse Primer (500 nM), 0.2 μl GoTaq polymerase (1 U/μl), 1 μL template DNA and 16.2 μl double-distilled water were included. Thermocycling procedures were undertaken in Applied Biosystems VeritiPro 96-Well Thermal Cycler (Thermo Fisher Scientific, Massachusetts, USA), started with an incubation at 95 °C for 4 minutes, followed by 35 cycles of denaturation at 95 °C for 30 s, annealing at 40 °C (or 45°C for ITS2 and psbA-trnH) for 30 s and elongation at 72 °C for 40 s, and finished by a final extension at 72 °C for 4 min. PCR products were kept at 12 °C or stored at 4 °C refrigerator until being subjected to gel electrophoresis in 1.5 % agarose gels for purification. QIAquick Gel Extraction Kit (Qiagen Co., Hilden, Germany) were used to purify PCR products following manufacturer’s instructions. Purified PCR products were sent to Tech Dragon Limited (Shatin, Hong Kong, China) for Sanger sequencing using Applied Biosystems 3730xl DNA Analyzer. Bidirectional sequences were assembled using CodonCode Aligner (Centerville, Massachusetts, USA)31. All assembled sequences were uploaded to NCBI GenBank, with the accession number of OR712158 to OR712225, PP277662 to PP277698 and PP280905 to PP281021 (Supplementary Table S3). Low-quality nucleotides with QV value below 30 at the two ends were discarded.

Phylogenetic analysis

Single locus and multiple-loci combination of sequences were used for phylogenetic analysis to assess their differentiation power down to species level. The sequences were firstly aligned using MAFFT version 729, and then being adopted for phylogenetic tree construction using MEGA X version 10.2.532. The best-fit model with the lowest Bayesian Information Criterion (BIC) was selected. To explore the possible barcoding gaps of the four targeted chloroplast loci, genetic-distance based methods namely Neighbour-Joining (NJ) and Unweighted Pair Group Method with Arithmetic mean (UPGMA), as well as character-based method i.e. Maximum Likelihood (ML), were adopted for phylogenetic analysis of the studied Patrinia species. For the multiple-loci combination, the amplicon sequences of each specimen were accordingly concatenated into a single sequence, which were then aligned using MAFFT version 729. NJ, UPGMA and ML trees were constructed from single locus and four-loci combinations, while only NJ trees were constructed from two-loci and three-loci combinations.

To root the trees, Valeriana officinalis L. from the family Caprifoliaceae was selected as an outgroup species. Fragments of ITS2 and all chloroplast regions (psbA-trnH, atpB, petA, rpl2-rpl23 and psaI-ycf4) were extracted from the NCBI accessions ON685480 (3713 bp) and NC_045052 (complete chloroplast genome in 151,505 bp33), respectively. To further prove the authentication power by this method, the two additional well-authenticated Patrinia taxa, P. scabra Bunge (M. C. Li 483) and P. villosa (Thunb.) Juss. subsp. punctifolia H. J. Wang (M. C. Li 484), were included in the phylogenetic analysis.

Informative variable nucleotides, including species-specific and non-species-specific nucleotides, were manually identified using BioEdit34 based on the unrooted alignments of each locus (Table 3). These variable nucleotides were classified as Single-Nucleotide Polymorphisms (SNPs) and Insertions–deletions (Indels). The number of variable nucleotides of multi-loci combinations were then calculated. Discrimination success rates was calculated by dividing the number of monophyletic clades containing single taxon in the NJ tree over the total number of studied Patrinia taxon (as 6) times 100%.