Introduction

The biological exoskeletons are usually composed of highly ordered hierarchical micro- and nanostructures occluded with organic molecules and exhibit superior mechanical, optical, thermal, magnetic properties 1,2,3. Calcium carbonate is one of the most abundant biominerals in nature, and the biomineralization and bioinspired mineralization of calcium carbonate have been the focus of investigation for many decades. The molluscan shells have well defined micro- and nanostructures and excellent mechanical properties, composed of 95% CaCO3 and less than 5 wt % of organic macromolecules (proteins, glycoproteins, polysaccharides and lipids) 4,5. Generally, the polymorphs of CaCO3 include aragonite, calcite, vaterite, calcite monohydratre and calcium carbonate hemihydrate 6. Biogenic CaCO3 exhibits complex microstructures such as prismatic, nacreous, foliate, and cross-lamellar microstructures 7,8. Importantly, protein components in the molluscan shells are responsible for the nucleation and crystal growth of biogenic CaCO3 and can enhance the mechanical properties of the shells 9. The matrix proteins occluded in the molluscan shells were extracted, sequenced and identified as biomineralization-related proteins which were found to have functions including stabilization of amorphous calcium carbonate, inhibition or acceleration of nucleation of calcium carbonate 10,11,12,13. A few examples of biomineralization-related proteins include mantle protein N25 14 , perlucin, perlustrin 15, aspein 16, prisilkin-39 17, Shematrin-2 18, PfN44 19, and enzymes such as carbonic anhydrase 20. Most of matrix proteins in molluscan shells are specifically secreted by the mantle tissue of molluscans. In another word, the proteins from the mantle tissues of molluscans can tune the formation of different structural layers of the shells 12,S5). The numbers of transition (261,292) and transversion (202,192) mutations accounted for 56.14% and 43.45% in SNP types, respectively. The most abundant type of base variations is C/T polymorphism (70,283; 15.1%), because C in CG base is often methylated and transfers to thymine after spontaneous deamination. Another abundant type of base variations is A/T polymorphism, such as G to A (70,176; 15.08%), A to G (61,051; 13.12%) (Fig. 7). 62,103 potential indels in total were also identified among 21,630 unigenes with frequency of about 2.87 indels per unigene (Excel S6). The numbers of unigenes decrease gradually with the increase of indel length from 1 to 10 bp (Fig. S1). The indel lengths are mainly 1, 2 and 3 bp. The numbers of deletion are larger than that of insertion when the indel length is more than 10 bp (Fig. S1).

Figure 7
figure 7

SNP types and frequencies from P. placenta mantle tissue. REF: The genotypes of the reference sequences at the defined sites, ALT: Other genotypes at the reference sites.

In addition, 21,640 unigenes containing 26,048 potential SSR markers were identified (Fig. 8). Among these unigenes, the most abundant type of repeats is p1 (mono-nucleotide) (15,224; 58.4%), followed by p2 (di-nucleotide) (6227; 23.9%), p3 (tri-nucleotide) (2700; 10.3%), p4 (tetra-nucleotide) (361; 1.3%) repeats. Furthermore, there are some c- types (complex repeat motifs), accounting for 5.8% (1524), different from the above mentioned nucleotides in these SSR markers. The frequencies of SSRs with different numbers of repeat units were calculated. Among the p1 repeat units, the counts of the repeat units between 9 and 12 are dominant, accounting for more than 50%, followed by 13–16 repeat units (17.74%) and 17–20 repeat units (12.85%). More than half of 5–8 repeat units were assigned to the p2 repeat units, followed by 9–12 repeat units. However, only 5–8 repeat units are present in both p3 and p4 repeats (Fig. 8). The A/T repeat units of mono-nucleotide are dominant among these types, accounting for 55.57%. AT/TA (9.42%) was dominant in the p2 repeat units, followed by TG/CA (4.94%), GA/GT (4.16%) (Fig. S2). ATG/GAT (2.2%) was dominant in the p3 repeat units, followed by ATC/TGA (1.71%), CAT/TCA (1.2%). The remaining types of motifs have more complicated types but less numbers (c-type, complex repeat motifs), accounted for 7.28% in total. To further test the SSR markers, forward and reverse primer pairs were obtained using Primer 3.0 (Excel S7).

Figure 8
figure 8

Distribution of SSRs based on the number of repeat units. p1: mono-nucleotide, p2: di-nucleotide, p3: tri-nucleotide, p4: tetra-nucleotide, p5: penta-nucleotide, p6: hexa-nucleotide, c: complex repeat motifs.

Identification of genes involved in biomineralization process

To better understand the proteins related to the shell formation process, the annotated unigenes of mantle tissues of P. placenta were compared with the sequences of proteins known to be associated with biomineralization process in molluscan shells using nr database. The annotated unigene with lowest E-value was selected as the representative unigene while several annotated unigenes were assigned to the same reference unigene. 178 homologous unigenes of 51 shell matrix proteins, such as calmodulin, perlucin, ferritin and carbonic anhydrase were found to be related to biomineralization process in the transcriptome of mantle tissue of P. placenta (Excel S10). This is the first time to report the potential biomineralization-related unigenes in the mantle tissue of P. placenta, as far as we know. A lot of researchers have focused on the identification of genes related to the shell formation of molluscan animals, and an increasing number of genes have been identified 55,56,57,58. 259 proteins were identified from oyster C. gigas shells by proteomic analysis 56. In comparison with the proteomic data of shell matrix of C. gigas, we identified a set of 158 unigenes that are probably related to shell formation, including house-kee** protein elongation factor 1a, and extracellular matrix protein collagen (Excel S9). Many of the shell-formation related proteins are enzymes such as glutathione peroxidase, hemicentin and tyrosinase, that may be involved in matrix construction or modification 57,58. In this study, only one enzyme tyrosinase (three unigenes) was found to be related to the shell formation process in the transcriptome of P. placenta (Excel S9). Twenty-one proteins with 66 homologous unigenes were identified to be related to calcitic shell formation of P. placenta (Table 4). Furthermore, eighteen of the above 66 unigenes were identified to be highly expressed in mantle tissue of P. placenta. A few highly expressed unigenes with fragments per kilobase of feature per million mapped reads (FPKM) values of > 15 are carbonic anhydrase, calreticulin, ferritin, perlucin, gigasin-2, and tyrosinase-like proteins.

Table 4 Identification of genes involved in the calcitic shell formation of P. placentaa.

Comparison of the mantle transcriptomes of different molluscans

The shells of scallops P. yessoensis, C. farreri and P. placenta are composed of foliated calcite minerals. The mantle transcriptomes of scallop P. yessoensis, C. farreri and P. placenta were compared to find the similarities and differences of the biomineralization related proteins of these molluscan organisms 36,37. We find 117 biomineralization-related unigenes in the mantle of P. placenta, much less than those in P. yessoensis (162 unigenes), but much more than those in C. farreri (42 unigenes) (Fig. 9). There are six biomineralization-related unigenes expressed in the mantles of the three species, including sarcoplasmic calcium-binding protein, calcineurin a, calmodulin-like protein, perlucin, alkaline phosphatase, tyrosinase-like protein tyr-3. In comparison to C. farreri, P. placenta and P. yessoensis have more similar homologous biomineralization-related unigenes, 31 unigenes in total, about 27% of the biomineralization related unigenes of P. placenta. For example, collagen, chitin synthase 1/2, carbonic anhydrase-like, heat shock protein 70 and calmodulin were found in both P. yessoensis and P. placenta, potentially indicating their functional similarities for their biomineralization processes. There are probably more mineralization proteins to be discovered in the unigenes of the mantle tissue of P. placenta (Fig. 9, Excel S10).

Figure 9
figure 9

Comparison of mantle transcriptomes of three scallop shells composed of folicated calcite crystals. Blue circle: 168 biomineralization-related unigenes (124 exclusive) discovered from P. yessoensis mantle 36. Yellow circle: 42 biomineralization-related unigenes (25 exclusive) discovered from C. farreri mantle 37. Green circle: 117 biomineralization-related unigenes (84 exclusive) discovered from P. placenta mantle.

Quantitative Real-Time PCR (qRT-PCR) analysis

Ten selected potential biomineralization-related unigenes in the different tissues such as adduction muscle, gonad, gill, mantle, hepatopancreas, mouthparts, and intestine of P. placenta were examined by qRT-PCR (Fig. 10). In general, four of ten unigenes have much higher expression in mantle tissue than those in the other tissues, which include c76266_g1 (gigasin-2), c73086_g1 (tyrosinase-like), c66761_g1 (pif-like) and c59513_g1 (teneurin-2) unigenes (Fig. 10a–d). Pif is an important macromolecule for in vivo shell formation of nacre 44. In addition, it was found that pif can induce the formation of aragonite and vaterite crystals in the in vitro system 77. Tyrosinases are abundant in shells and their high expression in mantle of the pacific oyster C. gigas indicates that their functions are probably related to shell formation 56. Gigasin-2 and teneurin-2 were identified for the first time from the shell of C. gigas via shell matrix proteome characterization 78,79. According to the Pfam database analysis, teneurin-2 in P. placenta is predicted to be epidermal growth factor (EGF) domain and gigasin-2 has zona pellucida (ZP) domain (Excel S11). EGF domains are mostly found in the SMPs as tandem repeats and only present in the prismatic (calcitic) layers but not in the nacreous layer of Pinctada80. The EGF domain is a calcium-binding motif composed of 45 amino acids arranged in two small \(\upbeta\)-sheets with six conserved cysteine residues 81. Both EGF-like and ZP domains have been reported in the shells of Lottia gigantea 82 and C. gigas 78. ZP domains are present in a range of extracellular filament or matrix proteins from a wide variety of eukaryotic organisms, and are characterized by eight conserved cysteine residues, which are involved in protein polymerization processes 83. We propose that the above mentioned four unigenes are primarily biomineralization-related unigenes. However, the expression levels of the other six unigenes in the mantle tissue are not as high as their expression levels in the other tissues, indicating that they probably have other functions except for the biomineralization process (Fig. 10e–j). We would discuss more about the functions of these ten unigenes in the discussion part.

Figure 10
figure 10

Differential expression of gigasin-2 (c76266_g1) (a), tyrosinase-like (c73086_g1) (b), pif-like(c66761_g1) (c), teneurin-2 (c59513_g1) (d), perlucin (c67461_g2) (e), calmodulin-like, (c81494_g1) (f), carbonic anhydrase-like (c84941_g5) (g), caltractin-like (c70548_g1) (h), calreticulin-like (c84621_g1) (i), and insoluble matrix shell protein 6 (c69385_g1) (j) in adduction muscle, gonad, gill, mantle, hepatopancreas, mouthparts, and intestine tissues of P. placenta, were determined using real-time PCR, The error bars represent the standard error of three biological replicates, statistical significance was considered at p < 0.05.

Pfam database analysis

Pfam database search is important for understanding the possible biomineralization-related functions of the shell matrix proteins of molluscan animals. In molluscan animals, shell matrix proteins are very often repetitive, highly conserved, low complex domains. The functions of these protein domains involved in biomineralization have been studied by many groups 13,84. For example, the nacre protein perlucin contains a C-type lectin domain and has a broad carbohydrate-binding feature, which was supposed to facilitate calcium-dependent glycoprotein-protein interactions within the skeletal matrix 23. In molluscan animals, the pif-like proteins contain von willebrand factor type A (VWA), chitin-binding and laminin G domains, which can bind chitin framework and accelerate CaCO3 precipitation inside the chitin membrane, and then regulate their vertical alignment 85,86. In bivalves, most EF-hand proteins from the mantle tissue of bivalves are Ca2+ sensors or signal modulators, which may induce conformational change by binding with Ca2+, such as calmodulin, troponin C and myosin light chains 87.

In this study, the sequences of potential biomineralization genes were obtained from the transcriptomics analysis of P. placenta mantle. Based on these amino acid sequences, the information of their domain was obtained by using Pfam. Finally, we speculate on the function of potential biomineralization-related proteins of P. placenta based on the characteristics of the domain found in biomineralization proteins of molluscan. Identification of the possible biomineralization-related functions of the unigenes expressed in the mantle tissue of P. placenta were carried out by keyword searching according to the domain which were reported to be related to biomineralization in the Pfam database. Many potential biomineralization-related unigenes are proposed to be involved in the shell formation from the transcriptomes of the mantle tissue of P. placenta according to the Pfam database analysis. (Excel S11).

Calmodulin, calponin and mucin proteins are supposed to be associated with molluscan shell formation. Among those proteins, calponin was highly expressed in the mantle of P. placenta, with FPKM values > 551, but the expression values of most of the other two proteins are relatively low. Only c81494_g1 unigene (calmodulin-like protein) had a high expression level (FPKM values in between 38.17 and 76.44) ( Excel S11), but it expressed a higher quantity in mouth parts than that in the other tissues (Fig. 10f), suggesting that calmodulin may play other roles in P. placenta. The insoluble matrix shell protein 6 shows a higher expression in the tissues of hepatopancreas and intestine than that of the other tissues in P. placenta (Fig. 10j).

Cadherin and collagen proteins contain enriched amount of von willebrand factor type A and epidermal growth factor domains, indicating that they were derived from the extracellular matrix 88. However, most of them show a low expression level in the mantle tissue of P. placenta. Only c83310_g1 (collagen alpha-1) shows a relatively high expression level (FPKM values in between 11.18 and 32.32) (Excel S11). Perlucin extracted from abalone nacre contains a functional C-type lectin domain which can increase the precipitation rate of calcium carbonate from a saturated solution, indicating that it may promote the nucleation and/or the growth of calcium carbonate crystals 15,23. Among the proteins containing perlucin domains, only the c67461_g2 unigene (perlucin) shows a high expression level (FPKM values: 65.55–100.36) in the mantle tissue of P. placenta (Excel S11). However, c67461_g2 unigene shows higher expression in tissues of hepatopancreas than that of the other tissues by qRT-PCR (Fig. 10e). Carbonic_anhydrase domain was reported to be involved in the formation of both nacreous and prismatic layers 20,24,89. The expression values of all the carbonic anhydrase unigenes are relatively high, with FPKM values \(>\) 27, such as c71108_g1, c85950_g1, c81423_g3 and c84941_g5 unigenes (Excel S11). At the mean time, the expression of c84941_g5 was higher in gill than that in the other tissues (Fig. 10g). On the other hand, some extracellular enzymes or inorganic ion-binding proteins such as chitinase, hemicentin and peroxidasin with relatively high expression levels in the mantle tissue of P. placenta are probably involved in the shell formation (Excel S11). Chitinase in shell matrix may reconstruct the chitinaceous scaffold and promote the interaction between chitin and chitin binding protein 82. In comparison to the other ion-binding proteins, the c71155_g1 unigene (chitinase 1) has a relatively high expression level (FPKM values in between 96.85 and 125.15) (Excel S11), but we didn't know yet how specific this gene is in each tissue of P. placenta.

Discussion

Shell formation is a very complicated process that involves a series of proteins and genes, while living organisms produce biominerals with superior mechanical properties under biological control 1. The main objective of this study is to identify unigenes involved in biomineralization. Our study indicates that a high-coverage expression profile can be produced by using short-read Illumina sequencing technology and some effective sequence assembly tools such as Trinity and ESTScan softwares. A total of 113,325 unigenes with an average length of 697 bp was generated from the m antle tissue of P. placenta by using Illumina HiSeqTM 4000 sequencing technology, while 66.39% of the above unigenes (76,237) have lengths less than 500 bp (Fig. 1). The lengths of the unigene sequences from the mantle tissue of P. placenta are larger than those generated in the reported unigene sequences of P. penguin 34. These unigene sequences are similar to the reports for the yesso scallop P. yessoensis (93,204 unigenes; 733 bp) 36 and the freshwater pearl mussel C. plicata (98,501 unigenes; 689 bp) 33. However, the number and mean length of the unigenes from the mantle tissue of P. placenta are larger than that from the zhikong scallop C. farreri (77,975 unigenes; 538 bp) 37 and the pearl oyster P. maxima (108,704 unigenes; 407 bp) 90. This difference may be attributed to the use of different sequencing platforms. A large quantity of genomic data is available for many bivalve species, but only 18.99% of the mantle tissue of P. placenta were annotated in nr database. This means that more than half of the unigenes of the mantle tissue of P. placenta have no known homologous unigenes. The low rate of annotated unigenes of the mantle tissue of P. placenta could be a result of limitations in the genomic information available for P. placenta, which is the case in many of other bivalve species 33,36.

The SNPs of the mantle tissue of P. placenta were obtained from the sequencing errors as well as the true SNPs. In our data, the obtained SNP density was 0.0059 SNPs/bp (0.59%) in the mantle tissue of P. placenta (Excel S5, Table 2), which was significantly larger than the sequencing error (0.01%) (Table 1). This result confirmed the reliability of the obtained SNPs data. The SNPs are potentially useful for genetic linkage map** and for the analysis of quantitative traits of the P. placenta. The transcriptome we present here provides the most comprehensive polymorphism for the P. placenta to date, as far as we know. The SNP density of the eastern oyster C. virginica is 0.042 SNPs/bp 54, which was more polymorphic than that of the mantle of P. placenta (0.0059 SNPs/bp) (Excel S5, Table 2). 465,392 potential SNPs were constructed from 60,371 unigenes, with frequency of about 7.56 SNPs per unigene in the mantle tissue of P. placenta, which was consistent with the reported result of the mantle tissue of pearl oyster P. martensii 35. The indel density of the mantle tissue of P. placenta was 2.87 indels per unigene (Excel S6), which was much lower than that indel density of the mantle tissue of P. martensii 35. As is well known, the Illumina sequencing is considered to be robust against homopolymer errors and therefore it may be suited well for identification of indels 91.

The variations in unigene expression between different tissues have been shown to be correlated with shell formation in molluscans such as C. gigas 22, P. penguin 34, and T. pyramis 32. The exact biomineralization functions of proteins such as perlucin 15 and pif 85 have been investigated by in vitro and in vivo mineralization studies. According to the nr database, 178 potential biomineralization-related unigenes were identified in the mantle transcriptome of P. placenta in this work. Among these unigenes, ten selected potential biomineralization-related unigenes (FPKM values \(>\) 15) were examined in the different tissues of P. placenta by qRT-PCR. In the present study, four of ten unigenes (gigasin-2, tyrosinase-like, pif-like and teneurin-2) have much higher expression in the mantle tissue than those in the other tissues, indicating that they are very often related with the biomineralization process of P. placenta shell (Fig. 10a–d). Three gigasin-2 isoforms were identified in water soluble matrix of C. gigas shell, which are proposed to be involved in bone remodeling processes and could be responsible for the biocompatibility between bone and nacre grafts 79. Meanwhile, gigasin-2 was highly expressed in the mantle tissue of C. gigas 56. However, homologous proteins have not been identified in other species since gigasin-2 was reported in 2012 56. It is known that tyrosinase family are potentially involved in melanin biosynthetic pathway in various organisms. Moreover, it was reported that tyrosinases from molluscans are secreted from the mantle and transported to the prismatic layer of the shell, while they contribute to melanin biosynthesis and shell pigmentation 56,92,93. In this study, there is no pigmentation in the mantle tissue and the transparent shell of P. placenta. Therefore, the high expression of tyrosinase in the mantle tissue indicates that their functions are not only related to melanin biosynthesis, but also related to the shell formation. After injection of pif dsRNA, both of calcite laths of the C. cigas shell and nacreous layer of the P. fucata shell grew to disordered structure in vivo, indicating that pif protein might be essential for the normal growth of the prismatic and nacreous layer 44,77. The teneurin-2 was first identified from diverse shell matrix proteome and had signal peptides in C. gigas, and it was proposed that it was secreted from the mantle into the shell 78. According to the above discussion about the expressions of the four unigenes and their functions for biomineralization process in the other molluscan species in the literatures, we propose that gigasin-2, tyrosinase-like, pif-like and teneurin-2 may play important roles for the biomineralization process. It would be important to extract these four proteins and investigate their functions for biomineralization via in-vitro crystallization process of CaCO3 in the future. We consider to study the full-length cDNA sequences, gene expression and recombinant proteins of these four unigenes to understand their functions for in vivo and in vitro crystallization of CaCO3.

Six of the remaining unigenes don’t exhibit high expression in the mantle tissue, in comparison to the other tissues of P. placenta. Thus, it is hard to tell whether they participate in biomineralization or not based on the qRT-PCR analysis. These findings are somewhat similar from those of previous studies. For example, the researchers identified six types of perlucin and discovered their different expression levels in different tissues of T. pyramis. Some of the perlucin proteins were expressed at the highest levels in the digestive gland, while the others were expressed at high levels in the mantle or the gonad of T. pyramis 32. The perlucin was isolated from the nacreous layer of the marine snail Haliotis laevigata and it could promote the nucleation of CaCO3 crystals on the calcite surface in the in vitro experiments 94. In addition, calmodulin-like protein can induce the nucleation of aragonite through binding with the 16-kDa protein and regulates the growth of calcite in the prismatic layer of pearl oyster P. fucata 95. This expression pattern and the in vitro crystallization experiments suggest that perlucin family may play important roles in both of the biomineralization process and digestive process 32,94. Similar to the perlucin, two types of calmodulin were expressed at the lowest level in the mantle than the other tissues in T. pyramis 32. Calmodulin-like protein was expressed with the highest level in the mantle tissue of P. fucata species and has a potentially high affinity for calcium 96. The carbonic anhydrase family were expressed in the mantle and associated with the shell formation in the european abalone Haliotis tuberculata 97. Nacrein containing carbonic anhydrase domain was expressed in both the nacreous layer and the prismatic layer of P. fucata 24. Meanwhile, it was also highly expressed in the mantle of T. pyramis and P. penguin 32,34. However, the researchers analyzed the expression levels of the five studied carbonic anhydrase isoforms in different tissues and found that four of them were more highly expressed in the hemocytes than in the gills or the mantle in C. gigas 22. In P. placenta, one of the perlucin family was identified by qRT-PCR and showed a highest expression in the digestive gland than that in the other tissues (Fig. 10e). The expression level of c81494_g1 unnigene (calmodulin-like) was the highest in the mouthpart, moderate in the gills, very low in the mantle tissue (Fig. 10f). The expression of carbonic anhydrase-like was the highest level in the gill tissue, medium in the mantle in P. placenta (Fig. 10g). Based on the qRT-PCR and in vitro crystallization results in the reported literature, we conclude that the above mentioned three unigenes, perlucin, calmodulin-like and carbonic anhydrate-like unigenes are potentially related to the biomineralization process of P. placenta.

As a member of the calmodulin subfamily of EF-hand Ca2+-binding proteins, Caltractin was first identified in C. gigas 56,98. Another unigene calreticulin is also a calcium-binding protein, it was primarily involved in the unfolded protein response to cellular stress (temperature, salinity, air exposure and heavy metals) in the endoplasmic reticulum 14,56,99. Both of the two calcium-binding proteins, calreticulin-like and caltractin-like exhibit relatively low expression in the mantle tissue according to the qRT-PCR results (Fig. 10h,i). However, we consider that they may probably have some kind of function for biomineralization process since they are calcium-binding proteins 100,101,102.

Conclusions

In conclusion, the transcript dataset of the mantle tissue of P. placenta was investigated in details by using Illumina HiSeqTM 4000 platform and public unigene databases. The identified and annotated unigenes provide valuable genomic resources for the understanding of the biomineralization mechanism. More than half of the annotated unigenes of the mantle tissue of P. placenta are consistent with those proteins from the pacific oyster C. gigas according to nr database. The transcripts of mantle tissue of P. placenta were identified with SNP, SSR and indel markers. These SNP markers, SSR markers and primers may be used in the construction of a genetic linkage map and gene-based association studies. 66 homologous unigenes of 21 shell matrix proteins in the transcriptome of mantle tissue of P. placenta were found to be related to the calcitic shell formation, while eighteen of the above unigenes are highly expressed with FPKM larger than 15 in the mantle tissue. Furthermore, qRT-PCR analysis for ten of highly expressed homologous unigenes (FPKM > 50) related to biomineralization from six different tissues of P. placenta indicate that seven of them are potentially related to the biomineralization process of the calcitic shells of P. placenta. Especially, the qRT-PCR analysis shows that four of ten examined unigenes including teneurin-2, gigasin-2, pif-like, tyrosinase-like unigenes have the highest expression levels in the mantle tissue than the levels in the other tissues, indicating their primary functions for biomineralization process. This study can contribute to the understanding of the molecular mechanisms and the functional components of the proteins that involve the biomineralization process of the calcite foliated plates of P. placenta. The transcriptomic data generated in this study provide a basis for further studies of P. placenta genome. Moreover, the comparison of potential biomineralization genes also reveals the similarities and differences between shell formation matrix of different molluscan animals.