Introduction

Chamaecyparis taiwanensis Masam. & Suzuki [= Chamaecyparis obtusa (Sieb. & Zucc.) Endl. var. formosana (Hayata) Hayata] (Cupressaceae) is a gymnosperm endemic in Taiwan. C. taiwanensis is endemic to Taiwan and is the dominant species in the conifer and broadleaf tree mixed forest, located in middle altitude region (from 1700 m to 2600 m) of Taiwan island1. The lowest latitude boundary of cypress’ natural distribution falls into Taiwan, suggesting a great significance in biogeography2. As an indispensable resource for making elegant buildings, furniture and handicrafts, these species play a vital role in serving wood source and timber industry. C. taiwanensis is well-known for their wood quality and expensiveness (4400 USD/m3)(woodprice.forest.gov.tw), which often lead to endless illegal felling crimes. Therefore, develo** individual identification system to C. taiwanensis is of more importance3.

Illegal felling remains a persistent problem in the timber producing countries all over the world. For decades, illegal logging endangered precious and valuable tree species such as cypress4, ash5, mahogany tree6, and Brazilian rosewood7 all over the world. In some cases, the law enforcement authorities, such as forestry police, arrest the suspects in time. However, lack of direct scientific evidence that correlate timbers to the stumps leads the conviction processes rather difficult and ineffective. Thus, the need of individual identification is critical to the forestry industry.

The problem of illegal logging has been paid attention since 1995. More and more national and international regulations mandate tracking systems that ensure traceability on wood market8,9,10. Wood anatomy and dendrochronology are common visual identification method. The former is based on the anatomical characteristics to identify the wood, and can usually be identified to the genus11; the latter is often used to illustrate past climates, but may also provide the age and origin of the trees12. Compounds synthesized by trees and other plants are often called phytochemicals and are often used to identify species or distinguish genera. Intraspecific variation can also be detected in some species through some chemical analysis such as mass spectrometry12,13, near infrared spectroscopy14, detector dogs15, stable isotopes16, and radiocarbon17. Genetic analysis can provide species-level identification, which is usually achieved by DNA sequence polymorphism18. Simple sequence repeats (SSRs) and Single nucleotide polymorphisms (SNPs) can be used to identify individuals and can be used in population genetics or systematic geography to determine the geographical region of origin within a species19. DNA fingerprinting is built into each organism itself and cannot be forged20. When enough markers are developed, in principle every individual has its own unique DNA fingerprint. DNA fingerprinting has the potential to track wood products independently within complex global supply network21. Theoretically, DNA fingerprinting is the only forensic wood identification technology that could be used to connect seized timber to illegally felled stumps8.

SSR is the most common marker used in individual identification for its short length, high polymorphism, easy polymerase chain reaction (PCR) amplification, high reproducibility, and high sensitivity20,22,23. SSRs are divided into two broad categories by different sources: Genomic (g)-SSR and expressed sequence tag (EST)-SSR24. gSSR markers are derived from amplified genomic libraries. EST-SSRs are markers mined from EST sequence collections. gSSR markers have been reported to be more polymorphic when compared with EST-SSR in gymnosperms4 and crops25,26 because of a more diversified nucleotide sequence. Since the development of high-throughput sequencing technology, the marker development technique has been continuously advanced. Wang et al., 2018 published the first report on gSSR developed by De novo genome sequencing27. In contrast, EST-SSR, derived from the expressed sequence, is fast-acting, cost-effective and labor-saving alternative for non-model organisms24. Because of the conservative nature in gene coding regions24, newly developed EST-SSRs usually can be transferred in closely related species for marker development. The first EST-SSR based on Illumina-based de novo transcriptome was also published by Zhou et al. in 201828. A study to develop both markers would avail of their merits and functions simultaneously.

For C. taiwanensis, evaluation of genetic variation or population structure is necessary for its preservation2,29 because this species is used extensively. After mid-twentieth century, the number of C. taiwanensis plunged, which also led to a significant decrease in both genetic variation and population structure. As an important tool for genetic and subsequent breeding, SSR markers are helpful for breeding polymorphic maternal plants and increasing the diversity of progeny. The objective of this study is to establish a scientifically valid SSR mediated individual identification system for C. taiwanensis in order to provide court evidence to link the seized wood and the victim tree, and to provide traceability proof for wood supply network. In the beginning of the research, we used Next Generation Sequence (NGS) technology to establish the DNA and RNA libraries of C. taiwanensis to accelerate the development of gSSR and EST-SSR markers. A total of 96 samples from four populations were used to evaluate the polymorphism, discriminative power, and random match rate of the selected SSRs. The linkage disequilibrium between markers was calculated to estimate the availability and credibility of the individual identification system. In this study, we successfully linked 3 stolen timbers back to 3 victim trees (case number MJIB-DNA-1080413 combine 1080328), marked the first successful application of C. taiwanesis individual identification system. Finally, our work would deter illegal felling toward these precious species by manifesting law enforcement effectively.

Result and discussion

Develo** C. taiwanensis individual identification system

Choice of template and library preparation

The gSSR are characterized by high polymorphism and is suitable for develo** individual identification markers. The EST-SSR are highly conservative which could be used for develo** markers to categorize species and populations20,22,23. In this study, both DNA and RNA libraries were constructed simultaneously as gSSR and EST-SSR markers, respectively (Fig. 1, Supplementary Sect. 1). From the three DNA libraries and from a RNA library prepared for the study, the sequences were compared between individual plants as well as between groups (Supplementary Sect. 1). With these two nucleic acid markers, we envisioned to differentiate samples within or among species.

Figure 1
figure 1

Flowchart describing the procedure of develo** SSR markers and aligning illegally-felled timbers to victim trees of C. taiwanensis. (a) 35 SSR markers specific to individual identification were selected from the DNA and RNA libraries of C. taiwanensis. The cumulative random matching rate of the system reaches CPI = 5.596 × 10–12, which can be used to identify 18 million individuals with a credibility of 99.99% (b) 11 seized timbers were compared with 7 victim trees, and 3 timbers were matched with 3 victim trees successfully. The values of credibility in all matched cases were over 95%. (N number of individuals, P number of populations).

Nucleic acid sequencing and analysis

Next-generation sequencing technology enables the possible procurement of large number of sequences in a short time. In this study, we used the Illumina MiSeq platform (2 × 300 bp) to sequence the DNA and RNA libraries (Fig. 1). A total of 13,651,578 and 11,763,646 raw reads were produced from DNA and RNA libraries, respectively. The raw reads were deposited in the NCBI Sequence Read Archive (PRJNA506084). The sequences were then subjected to quality-trimming and merging and afterwards 4,236,284 contigs of the DNA pool and 4,392,534 RNA contigs were assembled. The base lengths of contigs ranged from 120–579 and 120–529, at an average of 420 for DNA and RNA, respectively. According to the work published by timber researchers23,30, the nucleic acid markers with fragment lengths of around 250 bases best meet our research goals. The lengths of contigs derived from the four libraries we have prepared were found to be suitable for screening markers within 250 bp length. A target band size below 250 bp implies a higher PCR success rate as the DNA of wood samples from seized timber and victim trees were mostly severely degraded.

SSR discovery and primer design

A sum of 318,153 gSSR and 63,390 EST-SSR candidate sequences were screened by Simple Sequence Repeat Identification Tool (SSRIT)31 (Fig. 1). The proportions of SSR in the genomic DNA and RNA libraries were 7.51% and 1.44%, respectively. Study by Squirrell et al.21. This system would also deter dishonest businessman piggybacking illegal material in legal timber auction, which would further forestall illegal logging. In addition, these markers can be also used in population genetic analysis studies that facilitate the conservation and breeding of C. taiwanensis.

Conclusions

In this study, we developed an individual identification system for C. taiwanensis and provided the scientific evidence. This methodology can be adopted by the courts to link seized timber and victim trees. The C. taiwanensis individual identification system of this study includes 23 gSSR and 12 EST-SSR markers revealing polymorphism. When the 30 non-linkage markers were applied to C. taiwanensis identification, the lowest CPI was 5.596 × 10–12 and the highest CPD was 0.999999999994404, which was sufficient to identify 18 million random samples of C. taiwanensis (CL = 99.99%). While applied in the criminal cases of C. taiwanensis illegal logging, this SSR marker system successfully matched five seized illegally-felled timbers to three victim trees with minimal 99.99% CL. To the best of our knowledge, this is the first time the SSR technology is being applied to provide molecular evidence for court conviction on C. taiwanensis illegal logging. Our study would provide not only the scientific evidence correlating seized timber and victim tree, but also could inherent unique serial number to identify every single C. taiwanensis timber. We demonstrated the feasibility of matching seized/ illegally-felled timber with victim tree by modern SSR technology, which would prevent illegal logging by warning the criminals that the woodland trees could be identified on the basis of molecular level. Additionally, these markers can be also used in population genetic analysis studies that facilitate the conservation and breeding of C. taiwanensis.

Materials and methods

Develo** C. taiwanensis individual identification system

Library preparation and SSR enrichment

In this study, we constructed both DNA and RNA libraries of C. taiwanensis (Fig. 1.). Three DNA libraries were created from individuals of TP (Voucher no. Chung 2448) and 100R (Voucher no. Chung 2603, 2621) (Supplementary Sect. 1). To build the DNA libraries, genomic DNA was extracted from fresh leaves using the cetyltrimethylammonium bromide (CTAB) method49. The quality and concentration of DNA were measured by NanoDrop 2000 (Thermo Fisher Scientific, San Diego, California, USA) and Qubit 2.0 Fluorometer (Thermo Fisher Scientific). From the total genomic DNA, microsatellites enriched in SSR markers was followed the magnetic bead enrichment method of Glenn and Schable50. Briefly, DNA was digested using AluI/XmnI and HaeIII/XmnI (New England Biolabs, Ipswich, Massachusetts, USA). The double-stranded SuperSNX linkers (SuperSNX24 Forward: 5′-GTTTAAGGCCTAGCTAGCAGAATC-3′; SuperSNX24 + 4p: 5′-pGATTCTGCTAGCTAGGCCTTAAACAAA-3′) were ligated to the digested DNA fragments. The linker-conjugated DNA fragments were hybridized with Biotin-labeled microsatellite probes containing Mix 2: (AG)12, (TG)12; Mix 3: (AAC)6, (AAG)8, (AAT)12, (ATC)8, (ACT)12; Mix4: (AAAC)6, (AAAG)6, (AATC)6, (AATG)6, (ACAG)6, (ACCT)6, (ACTG)6, (ACTC)6, (AAAT)8, (AACT)8, (ACAT)8, (AAGT)8, and (AGAT)8. The SSR hybridized fragments were extracted using Streptavidin M-280 Dynabeads (Invitrogen, Carlsbad, Calsbad, California, USA) and recovered by PCR using the SuperSNX24 Forward primers. The concentration and quality of SSR-enriched libraries were measured by Nanodrop 2000 (Thermo Fisher Scientific, Carlsbad, San Diego, California, USA) and Qubit 2.0 Fluorometer (Thermo Fisher Scientific, USA).

One individual of C. taiwanensis (Voucher no.: Chung 2627) from XI was used to prepare RNA library. RNA was extracted from fresh leaves by using the CTAB method51. The quality and concentration of RNA were measured by NanoDrop 2000 and Qubit 2.0 Fluorometer. The RNA was reverse transcribed into complementary DNA (cDNA) using Ovation RNA-Seq System V2 (NuGEN, San Carlos, California, USA) and the cDNA was quantitated using Nanodrop 2000 and Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, California, USA) by Tri-I Biotech, Inc. (New Taipei City, Taiwan). The cDNA was fragmented by Covaris S220 focused-ultrasonicator (Covaris, Woburm, Massachusetts, USA) and the cDNA library was prepared according to the manual of Ovation Ultralow DR Multiplex System 1–96 (NuGEN).

Sequencing and analysis

Three DNA and one RNA libraries were sequenced using the Illumina MiSeq System (2 × 300 bp paired-end; Illumina, San Diego, California, USA) at Tri-I Biotech (New Taipei City, Taiwan). The raw reads were prescreened to remove adapter sequences and reads with greater than 0.1% error or with an average quality less than QV30. High-quality filtered DNA and cDNA reads were merged by CLC Genomics Workbench version 7.5 (QIAGENE, Aarhus, Denmark).

SSR screening and primer design

SSRIT was applied to screen the gSSR and EST-SSR containing sequences from contigs. To design gSSR and EST-SSR primers, sequences with at least five di-, tri-, tetra-, penta-, and hexa-nucleotide repeats were selected using BatchPrimer352, with optimized conditions set length at 18–23 bp, melting temperature 45–62 ℃, and a product size of 80–300 bp.

Marker validation

A total of 75 markers including 23 gSSR and 12 EST-SSR markers newly designed in this study, and 40 published SSR4,53,54 (Supplementary Sect. 2) were subjected to validation test on 96 samples from four C. taiwanensis populations (TP, SY, DS and FR, see Supplementary Sect. 1). In addition, we also tested cross-species transferability of the designed gSSR and EST-SSR markers (Supplementary Sect. 3). The samples used in marker validation and cross-species transferability of DNA were extracted using the VIOGENE plant DNA extraction kit (VIOGENE, New Taipei City, Taiwan). The PCR reaction was conducted with a final volume 20 μL containing 2 ng of genomic DNA, 0.25 μL of 10 μM each primer and 10 μL of Q-Amp 2 × Screening Fire Taq Master Mix (Bio-Genesis Technologies, Taipei, Taiwan). The following PCR conditions were used: an initial denaturation of 95 ℃ for 2 min; 30 cycles of 95 ℃ for 45 s, a primer-specific annealing temperature (Tables 1, 2) for 45 s, and 72 ℃ for 45 s; followed by a 15-min extension at 72 ℃ (Tables 1, 2). The amplified products were evaluated on the ABI 3130XL (Applied Biosystems, Waltham, Massachusetts, USA) with GeneScan 500 ROX Size Standard (Applied Biosystems). Fragment size was determined by using GeneMapper version 3.2 (Applied Biosystems).

Marker analysis

GenAlex 6.51b255 was used to calculate number of alleles (A), observed heterozygosity (Ho), expected heterozygosity (He), Hardy–Weinberg equilibrium (HWE) of the newly developed gSSR and EST-SST markers. PowerMarker V3.2556 was used to calculate polymorphism information content or power of information content (PIC)57. Power of discrimination (PD)58, PD = 1 − ΣPi2, where Pi is the frequency of genotype i . Power of exclusion or probability of exclusion (PE)58, PE = h2[1 − 2 h(1 − h)2], where h is the frequency of heterozygotes. Probability of identity (PI)59, PI = 1 − PD. Combined power of discrimination (CPD)58, here we calculated CPD of 30 markers. CPD = 1 − [(1 − PD1)(1 − PD2)…(1 − PD30)].Combined probability of identity (CPI)59. Microsoft Excel (Microsoft Office 2016) was used to calculate PD, PI, PE, CPD, CPI. GENEPOP 4.260 was used to test for linkage disequilibrium.

Aligning seized timbers to victim trees

Samples from five seized timbers of Taiwan Yilan District Prosecutors Office, six illegally-felled timbers found at crime scene woodland and seven victim trees (Supplementary Sect. 4) were collected. Duplicates of a victim tree (7TA and 7TB) was sourced out in order to ensure the reproducibility of the identical SSR type in individual tree. Two grams of each sample was powdered in liquid nitrogen and the total genomic DNA was extracted following the protocol of VIOGENE plant DNA extraction kit (VIOGENE, New Taipei City, Taiwan). Nineteen non-linkage markers were selected for DNA ty**. The sample succeeded in ty** were further combined to the aforementioned database to calculation the CPI.