DiCleave: a deep learning model for predicting human Dicer cleavage sites

Mu, Lixuan; Song, Jiangning; Akutsu, Tatsuya; Mori, Tomoya

doi:10.1186/s12859-024-05638-4

DiCleave: a deep learning model for predicting human Dicer cleavage sites

Research
Open access
Published: 09 January 2024

Volume 25, article number 13, (2024)
Cite this article

Download PDF

You have full access to this open access article

BMC Bioinformatics Aims and scope Submit manuscript

DiCleave: a deep learning model for predicting human Dicer cleavage sites

Download PDF

Lixuan Mu¹,
Jiangning Song²,
Tatsuya Akutsu¹ &
…
Tomoya Mori¹

962 Accesses
Explore all metrics

Abstract

Background

MicroRNAs (miRNAs) are a class of non-coding RNAs that play a pivotal role as gene expression regulators. These miRNAs are typically approximately 20 to 25 nucleotides long. The maturation of miRNAs requires Dicer cleavage at specific sites within the precursor miRNAs (pre-miRNAs). Recent advances in machine learning-based approaches for cleavage site prediction, such as PHDcleav and LBSizeCleav, have been reported. ReCGBM, a gradient boosting-based model, demonstrates superior performance compared with existing methods. Nonetheless, ReCGBM operates solely as a binary classifier despite the presence of two cleavage sites in a typical pre-miRNA. Previous approaches have focused on utilizing only a fraction of the structural information in pre-miRNAs, often overlooking comprehensive secondary structure information. There is a compelling need for the development of a novel model to address these limitations.

Results

In this study, we developed a deep learning model for predicting the presence of a Dicer cleavage site within a pre-miRNA segment. This model was enhanced by an autoencoder that learned the secondary structure embeddings of pre-miRNA. Benchmarking experiments demonstrated that the performance of our model was comparable to that of ReCGBM in the binary classification tasks. In addition, our model excelled in multi-class classification tasks, making it a more versatile and practical solution than ReCGBM.

Conclusions

Our proposed model exhibited superior performance compared with the current state-of-the-art model, underscoring the effectiveness of a deep learning approach in predicting Dicer cleavage sites. Furthermore, our model could be trained using only sequence and secondary structure information. Its capacity to accommodate multi-class classification tasks has enhanced the practical utility of our model.

View this article's peer review reports

LBSizeCleav: improved support vector machine (SVM)-based prediction of Dicer cleavage sites using loop/bulge length

Article Open access 25 November 2016

ReCGBM: a gradient boosting-based method for predicting human dicer cleavage sites

Article Open access 10 February 2021

Deep neural networks for human microRNA precursor detection

Article Open access 13 January 2020

Background

MicroRNAs (miRNAs) are a class of non-coding RNAs that function as gene expression regulators across various animal model systems [1]. Typically, miRNAs interact with the 3’ untranslated region of target mRNAs, leading to mRNA degradation and the downregulation of gene expression [2]. The dysregulated miRNA expression is closely associated with a spectrum of diseases including cancer [3,4,5], cardiovascular diseases [6, 7], inflammatory diseases [8,9,Ablation models

We used two ablation models to assess the effectiveness of the proposed architecture. In ablation model 1 (AM1), the output from the first module was directly processed by the second module without addition of the pre-miRNA secondary structure embeddings. Conversely, in ablation model 2 (AM2), all CU1 components were removed, signifying that the original input information for each module was not enhanced prior to the convolution process. Both AM1 and AM2 followed the same training procedure as the proposed model with minor adjustments to the layer parameters.

Performance evaluation

In the context of binary classification, we computed several metrics to assess the model’s performance in the independent test sets. These metrics included accuracy (ACC), specificity (Spe), sensitivity (Sen), F1 score, and Matthews’ correlation coefficient (MCC) [45]. The definitions of these binary classification metrics are provided below:

$$Acc=\frac{TP+TN}{TP+TN+FP+FN},$$

$$Spe=\frac{TN}{TN+FP},$$

$$Sen=\frac{TP}{TP+FN},$$

$$F1= \frac{2\times TP}{2\times TP+FP+FN},$$

$$MCC= \frac{TP\times TN-FP\times FN}{\sqrt{\left(TP+FP\right)\times \left(TP+FN\right)\times \left(TN+FP\right)\times (TN+FN)}},$$

where $TP$, $TN$, $FP$, and $FN$ denote the counts of true positives, true negatives, false positives, and false negatives, respectively.

For the evaluation of the multi-class classification models, we adopted a macro-averaging approach for $Acc$, $Spe$, and $Sen$ to calculate the average of each metric across all classes. In addition, we utilized an extension of the binary MCC to the multi-class scenario for evaluation [46]. MCC for the multi-class classification is defined as below:

$${MCC}_{multi}=\frac{c\times s-{\sum }_{k}^{K}{p}_{k}\times {t}_{k}}{\sqrt{\left({s}^{2}-{\sum }_{k}^{K}{p}_{k}^{2}\right)\times ({s}^{2}-{\sum }_{k}^{K}{t}_{k}^{2})}},$$

where ${t}_{k}$ is the number of occurrences for class $k$, ${p}_{k}$ signifies the number of predictions for class $k$, $c$ denotes the total number of samples correctly predicted, and $s$ represents the total number of samples.

Results

Performance of binary classification models

The performance of the best models among replications is listed in Table 2. The test set was balanced, containing an equal number of positive and negative patterns (see Sect. "Methods"). There were no substantial differences between the performances of the models for 5’ and 3’ patterns; both models achieved accuracy, specificity, sensitivity, and F1 scores of 0.90 or higher, whereas MCC reached 0.82. Receiver operating characteristic (ROC) curves and precision-recall (PR) curves for DiCleave in the binary classification task are shown in Fig. 5. Although the area under the ROC curve (AUC) score of the 3’ pattern model was marginally higher than that of the 5’ pattern model, it could be concluded that both models demonstrated nearly identical performance. On the other hand, the performance on an unbalanced test set, which was created by randomly selecting 50 positive patterns and all negative patterns from the original test set, showed a slight decrease compared with that on the balanced dataset, although the F1 score and MCC remained above 0.80 and 0.75, respectively (Additional file 1: Tables S1 and S2).

Table 2 Performance of different models

Full size table

The proposed model outperformed the ablation models in all classification tasks (Table 2). AM1, which lacked pre-miRNA secondary structure embeddings, exhibited the lowest accuracy in all classification tasks. Whereas AM1 achieved the highest specificities in the binary classification task, its sensitivities were the lowest. Conversely, the inclusion of secondary structure embeddings notably enhanced the overall performance of AM2, resulting in significantly improved sensitivities at the acceptable cost of decreased specificity. However, the performance of AM2 was lagged behind that of DiCleave. This can be attributed to the absence of CU1, which hindered the preservation of information within the input.

To assess the effectiveness of our proposed models, we compared them with ReCGBM. We computed the average performance of DiCleave and ReCGBM across 10 replications with different initial conditions to provide a comprehensive view of their performance. As shown in Fig. 6 and Additional file 1: Tables S3 and S4, the performance of our models surpassed the average performance of ReCGBM in 5’ pattern prediction. In addition, our models demonstrated similar performance in 5’ and 3’ pattern predictions whereas ReCGBM exhibited unequal performance between 5’ and 3’ pattern predictions (i.e., inferior performance in 5’ pattern prediction compared with the 3’ pattern prediction). Figure 6 also includes a performance comparison with PHDcleav [28] and LBSizeCleav [29]. The PHDcleav model utilized binary input features and a window size of 14 [28], whereas the LBSizeCleav model employed parameter k = 1 and a window size of 14 [29]. We selected these two models because their input features closely resembled our architecture. As shown in Fig. 6, both SVM-based models struggled to compete with ReCGBM and DiCleave.

Performance of the multi-class classification model

Our model can perform multi-class classification tasks by adopting the output layer. When presented with a cleavage pattern sequence, our model was able to predict whether the sequence represented a 5’ pattern, a 3’ pattern, or a negative pattern. The performance of the multi-class classification model is summarized in Table 2. It achieved high accuracy, sensitivity, F1 socre of 0.89, and specificity of 0.94. Furthermore, for an unbalanced dataset for the multi-class classification tasks, both the F1 score and MCC exceeded 0.85 and 0.75, respectively (Additional file 1: Tables S1 and S2).

The confusion matrix for the multi-class classification model is shown in Fig. 7. Only one 5’ pattern sequence was misclassified as a 3’ pattern, and no 3’ patterns were misclassified as 5’ patterns. This outcome highlights our model’s capability to differentiate positive cleavage patterns from different arms. However, there is substantial room for improvement when distinguishing between positive and negative patterns, particularly in the differentiation of 3’ patterns from non-cleavage patterns.

The ROC and PR curves for the multi-class classification model using a one-vs-all strategy are shown in Fig. 8. The AUC scores for both 5’ and 3’ patterns reached 0.97, surpassing that of the negative pattern (= 0.93). In the PR curve, the precision for 3’ patterns decreased more rapidly as recall increased, compared with that for 5’ and non-cleavage patterns.

Discussion

We introduced a DNN model to predict the presence of a human Dicer cleavage site within a short sequence segment, defined as a cleavage pattern. Our model’s input consisted of a combination of one-hot encodings for sequences, including pattern sequences and their complementary sequences, and secondary structure information. Rather than merging these inputs into twelve types of dimers that include combinations of the four bases (“A”, “C”, “G”, “U”) and the three secondary structure indicators (“(”, “.”, “)”) [47], we opted to stack them. This approach was chosen to avoid creating a sparse input space that could yield an unnatural convolution output. The combination of sequence patterns and the secondary structure embeddings extracted by the autoencoder significantly improved the discriminative capacity of the DNN model, resulting in accuracies of 0.91 and 0.89 for the binary and multi-class classification tasks, respectively. This result demonstrated that incorporating pre-miRNA secondary structure embeddings and leveraging the shortcut structure of CU1 significantly enhanced the performance of the model. It is worth noting that the multi-class classification model was trained with an unbalanced dataset in terms of the number of patterns for each class. Although no further measures were taken to address this imbalance apart from adjusting the weight of the negative patterns in the loss calculation, our model still yielded satisfactory results. This success could be attributed to the valuable information provided by the secondary structures, which proved effective in distinguishing between cleavage and non-cleavage patterns.

The computational experiments conducted in this study involved a random selection of training and test datasets from the main dataset. However, even when sequences with 80% or higher similarity were pre-filtered from the main dataset using CD-HIT-EST software [48, 49], DiCleave still demonstrated high performance for predicting cleavage patterns. Specifically, the best model among 10 replications achieved an F1 score of 0.85 or higher for both binary and multi-class classification tasks in the balanced case, and 0.80 or higher in the unbalanced case (Additional file 1: Table S5).

One limitation of this study was the relatively small size of the human pre-miRNA dataset used for training the deep learning model, which led to overfitting during training. To mitigate this, we employed a small batch size of 20 samples per mini-batch, as preliminary experiments indicated that a larger batch size caused the models to become stuck in local minima. We set the learning rate to 0.005, which is larger than the default value in PyTorch’s Adam implementation. Low learning rates (e.g., 1e-4 or 5e-4) were found to slow down the training process and hinder model convergence, whereas high learning rates led to overfitting. One potential solution for the small and unbalanced dataset is data augmentation, which can increase the number of training samples. However, effective data augmentation for nucleotide sequences requires expert knowledge in biology. Another limitation pertains to the definition of the cleavage pattern within pre-miRNA sequences. In this study, a cleavage pattern was defined as a 14-nt-long sequence with a Dicer cleavage site at the center. However, in real scenarios, the cleavage site could occur at any position within a given sequence, which could significantly increase the dataset size if each possible position were considered.

Finally, as identified by Liu and colleagues, the features near the pre-miRNA center were observed to have great significance [30], suggesting an intrinsic interplay among the bases within pre-miRNA. Consequently, our future endeavors include the integration of a Transformer-based model, which has the potential to harness these intrinsic features through the attention mechanism [47]. We also aim to create an end-to-end generative model to directly generate mature miRNA sequences from pre-miRNAs based on our cleavage prediction method in forthcoming research.

Conclusions

We have demonstrated the effectiveness of our deep learning models in predicting the presence of a human Dicer cleavage site within a given pre-miRNA sequence using both its sequence and secondary structure information. Our binary classification model exhibited superior or comparable performance compared with existing models. Furthermore, our model’s ability to function as a multi-class classifier is highly advantageous and practical. This versatility allows our model to make predictions without requiring prior information for any sequence segment, ensuring accessibility to a broad range of data in the miRBase database even when the available information is incomplete.

Availability of data and materials

The original microRNA data were extracted from miRBase available at https://www.mirbase.org. The datasets used in this study, the trained model parameters, and the code implementing our model are publicly available at https://github.com/MGuard0303/DiCleave/.

Abbreviations

miRNA:: MicroRNA
pri-miRNA:: Primary microRNA
pre-miRNA:: Precursor microRNA
ACC:: Accuracy
Spe:: Specificity
Sen:: Sensitivity
MCC:: Matthews correlation coefficient
TP:: True positive
TN:: True negative
FP:: False positive
FN:: False negative

References

O’Brien J, Hayder H, Zayed Y, Peng C. Overview of microRNA biogenesis, mechanisms of actions, and circulation. Front Endocrinol. 2018;9:402.
Article CAS Google Scholar
Ha M, Kim VN. Regulation of microRNA biogenesis. Nat Rev Mol Cell Biol. 2014;15(8):509–24.
Article CAS PubMed Google Scholar
Shimomura A, Shiino S, Kawauhi J, Takizawa S, Sakamoto H, Matsuzaki J, et al. Novel combination of serum microRNA for detecting breast cancer in the early stage. Cancer Sci. 2016;107(3):326–34.
Article CAS PubMed PubMed Central Google Scholar
Usuba W, Urabe F, Yamamoto Y, Matsuzaki J, Sasaki H, Ichikawa M, et al. Circulating miRNA panels for specific and early detection in bladder cancer. Cancer Sci. 2019;110(1):408–19.
Article CAS PubMed Google Scholar
Liu R, Chen X, Du Y, Yao W, Shen L, Wang C, et al. Serum microRNA expression profile as a biomarker in the diagnosis and prognosis of pancreatic cancer. Clin Chem. 2012;58(3):610–8.
Article CAS PubMed Google Scholar
Jansen F, Yang X, Proebsting S, Hoelscher M, Przybilla D, Baumann K, et al. Micro RNA expression in circulating microvesicles predicts cardiovascular events in patients with coronary artery disease. J Am Heart Assoc. 2014;3(6): e001249.
Article PubMed PubMed Central Google Scholar
Ikeda S, Kong SW, Lu J, Bis** E, Zhang H, Allen PD, et al. Altered microRNA expression in human heart disease. Physiol Genomics. 2007;31(3):367–73.
Article CAS PubMed Google Scholar
Neudecker V, Haneklaus M, Jensen O, Khailova L, Masterson J, Tye H, et al. Myeloid-derived miR-223 regulates intestinal inflammation via repression of the NLRP3 inflammasome. J Exp Med. 2017;214(6):1737–52.
Article CAS PubMed PubMed Central Google Scholar
Lv L-L, Feng Y, Wu M, Wang B, Li Z-L, Zhong X, et al. Exosomal miRNA-19b-3p of tubular epithelial cells promotes M1 macrophage activation in kidney injury. Cell Death Differ. 2020;27(1):210–26.
Article CAS PubMed Google Scholar
Wang C, Zhang C, Liu L, ** A, Chen B, Li Y, et al. Macrophage-derived mir-155-containing exosomes suppress fibroblast proliferation and promote fibroblast inflammation during cardiac injury. Mol Ther. 2017;25(1):192–204.
Article CAS PubMed PubMed Central Google Scholar
Santarelli DM, Carroll AP, Cairns HM, Tooney PA, Cairns MJ. Schizophrenia-associated MicroRNA–gene interactions in the dorsolateral prefrontal cortex. Genomics Proteomics Bioinformatics. 2019;17(6):623–34.
Article CAS PubMed Google Scholar
Brum CB, Paixão-Côrtes VR, Carvalho AM, Martins-Silva T, Carpena MX, Ulguim KF, et al. Genetic variants in miRNAs differentially expressed during brain development and their relevance to psychiatric disorders susceptibility. World J Biol Pyschiatry. 2021;22(6):456–67.
Article Google Scholar
Santarelli DM, Beveridge NJ, Tooney PA, Cairns MJ. Upregulation of dicer and microRNA expression in the dorsolateral prefrontal cortex Brodmann area 46 in schizophrenia. Biol Pyschiatry. 2011;69(2):180–7.
Article CAS Google Scholar
Ardekani AM, Naeini MM. The role of microRNAs in human diseases. Avicenna J Med Biotechnol. 2010;2(4):161–79.
CAS PubMed PubMed Central Google Scholar
Conrad T, Ntini E, Lang B, Cozzuto L, Andersen JB, Marquardt JU, et al. Determination of primary microRNA processing in clinical samples by targeted pri-miR-sequencing. RNA. 2020;26(11):1726–30.
Article CAS PubMed PubMed Central Google Scholar
Beezhold KJ, Castranova V, Chen F. Microprocessor of microRNAs: regulation and potential for therapeutic intervention. Mol Cancer. 2010;9(1):1–9.
Article Google Scholar
Czech B, Hannon GJ. Small RNA sorting: matchmaking for Argonautes. Nat Rev Genet. 2011;12(1):19–31.
Article CAS PubMed Google Scholar
Wee LJ, Tan TW, Ranganathan S. SVM-based prediction of caspase substrate cleavage sites. BMC Bioinformatics. 2006;7(Suppl5):S14.
Article PubMed PubMed Central Google Scholar
Wee LJ, Tan TW, Ranganathan S. CASVM: web server for SVM-based prediction of caspase substrates cleavage sites. Bioinformatics. 2007;23(23):3241–3.
Article CAS PubMed Google Scholar
Duverle DA, Ono Y, Sorimachi H, Mamitsuka H. Calpain cleavage prediction using multiple kernel learning. PLoS ONE. 2011;6(5): e19035.
Article CAS PubMed PubMed Central Google Scholar
Piippo M, Lietzén N, Nevalainen OS, Salmi J, Nyman TA. Pripper: prediction of caspase cleavage sites from whole proteomes. BMC Bioinformatics. 2010;11(320):1–9.
Google Scholar
Song J, Tan H, Shen H, Mahmood K, Boyd SE, Webb GI, et al. Cascleave: towards more accurate prediction of caspase substrate cleavage sites. Bioinformatics. 2010;26(6):752–60.
Article CAS PubMed Google Scholar
Song J, Tan H, Perry AJ, Akutsu T, Webb GI, Whisstock JC, et al. PROSPER: an integrated feature-based tool for predicting protease substrate cleavage sites. PLoS ONE. 2012;7(11): e50300.
Article CAS PubMed PubMed Central Google Scholar
Wang M, Zhao X-M, Tan H, Akutsu T, Whisstock JC, Song J. Cascleave 2.0, a new approach for predicting caspase and granzyme cleavage targets. Bioinformatics. 2014;30(1):71–80.
Article PubMed Google Scholar
Singh O, Su EC-Y. Prediction of HIV-1 protease cleavage site using a combination of sequence, structural, and physicochemical features. BMC Bioinformatics. 2016;17(Suppl 17):478.
Article PubMed PubMed Central Google Scholar
Liu ZX, Yu K, Dong J, Zhao L, Liu Z, Zhang Q, et al. Precise prediction of calpain cleavage sites and their aberrance caused by mutations in cancer. Front Genet. 2019;10:715.
Article CAS PubMed PubMed Central Google Scholar
Fan YX, Zhang Y, Shen HB. LabCaS: labeling calpain substrate cleavage sites from amino acid sequence using conditional random fields. Proteins. 2013;81(4):622–34.
Article CAS PubMed Google Scholar
Ahmed F, Kaundal R, Raghava GP. PHDcleav: a SVM based method for predicting human Dicer cleavage sites using sequence and secondary structure of miRNA precursors. BMC Bioinformatics. 2013;14(Suppl 14):S9.
Article PubMed PubMed Central Google Scholar
Bao Y, Hayashida M, Akutsu T. LBSizeCleav: improved support vector machine (SVM)-based prediction of Dicer cleavage sites using loop/bulge length. BMC Bioinformatics. 2016;17(1):487.
Article PubMed PubMed Central Google Scholar
Liu P, Song J, Lin C-Y, Akutsu T. ReCGBM: a gradient boosting-based method for predicting human dicer cleavage sites. BMC Bioinformatics. 2021;22(1):63.
Article CAS PubMed PubMed Central Google Scholar
Nguyen TD, Trinh TA, Bao S, Nguyen TA. Secondary structure RNA elements control the cleavage activity of DICER. Nat Commun. 2022;13(1):2138.
Article CAS PubMed PubMed Central Google Scholar
** Y, Yang Y, Zhang P. New insights into RNA secondary structure in the alternative splicing of pre-mRNAs. RNA Biol. 2011;8(3):450–7.
Article CAS PubMed Google Scholar
Singh J, Hanson J, Paliwal K, Zhou Y. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat Commun. 2019;10(1):5407.
Article PubMed PubMed Central Google Scholar
Sato K, Akiyama M, Sakakibara Y. RNA secondary structure prediction using deep learning with thermodynamic integration. Nat Commu. 2021;12(1):941.
Article CAS Google Scholar
Ryu JY, Kim HU, Lee SY. Deep learning improves prediction of drug–drug and drug–food interactions. PNAS. 2018;115(18):E4304-4311.
Article CAS PubMed PubMed Central Google Scholar
Kumar Shukla P, Kumar Shukla P, Sharma P, Rawat P, Samar J, Moriwal R, et al. Efficient prediction of drug–drug interaction using deep learning models. IET Syst Biol. 2020;14(4):211–6.
Article PubMed PubMed Central Google Scholar
Angermueller C, Pärnamaa T, Parts L, Stegle O. Deep learning for computational biology. Mol Syst Biol. 2016;12(7):878.
Article PubMed PubMed Central Google Scholar
Eraslan G, Avsec Ž, Gagneur J, Theis FJ. Deep learning: new computational modelling techniques for genomics. Nat Rev Genet. 2019;20(7):389–403.
Article CAS PubMed Google Scholar
** S, Zeng X, **a F, Huang W, Liu X. Application of deep learning methods in biological networks. Brief Bioinform. 2021;22(2):1902–17.
Article CAS PubMed Google Scholar
Griffiths-Jones S, Saini HK, van Dongen S, Enright AJ. miRBase: tools for microRNA genomics. Nucleic Acids Res. 2007;36:D154–8.
Article PubMed PubMed Central Google Scholar
Hofacker IL. Vienna RNA secondary structure server. Nucleic Acids Res. 2003;31(13):3429–31.
Article CAS PubMed PubMed Central Google Scholar
Kingma DP, Ba J. Adam: A method for stochastic optimization. Preprint at https://doi.org/10.48550/ar**v.1412.6980 (2017).
Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd International Conference on Machine Learning: 2015. p 448–456
Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR, Improving neural networks by preventing co-adaptation of feature detectors. Prepring at https://doi.org/10.48550/ar**v.1207.0580 (2012).
Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta BBA Protein Structure. 1975;405(2):442–51.
Article CAS Google Scholar
Gorodkin J. Comparing two K-category assignments by a K-category correlation coefficient. Comput Biol Chem. 2004;28(5–6):367–74.
Article CAS PubMed Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. In Advances in Neural Inforation Processing Systems (NIPS): 2016;30
Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–9.
Article CAS PubMed Google Scholar
Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics. 2012;28(23):3150–2.
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

Not applicable.

Funding

This work was partially supported by the Japan Society for the Promotion of Science under Grants-in-Aid for Scientific Research (A) (22H00532) and Challenging Exploratory Research (22K19830).

Author information

Authors and Affiliations

Bioinformatics Center, Institute for Chemical Research, Kyoto University, Kyoto, 611-0011, Japan
Lixuan Mu, Tatsuya Akutsu & Tomoya Mori
Monash Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, 3800, Australia
Jiangning Song

Authors

Lixuan Mu
View author publications
You can also search for this author in PubMed Google Scholar
Jiangning Song
View author publications
You can also search for this author in PubMed Google Scholar
Tatsuya Akutsu
View author publications
You can also search for this author in PubMed Google Scholar
Tomoya Mori
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

LM collected data and implemented the algorithm. TA and JS conceptualized and designed the study. LM and TM wrote the manuscript. TM and TA supervised the whole study. All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Tomoya Mori.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

Except for JS, who is an editorial board member of BMC Bioinformatics, the other authors have no competing interests to declare.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

Supplementary Tables and Figure.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Mu, L., Song, J., Akutsu, T. et al. DiCleave: a deep learning model for predicting human Dicer cleavage sites. BMC Bioinformatics 25, 13 (2024). https://doi.org/10.1186/s12859-024-05638-4

Download citation

Received: 11 April 2023
Accepted: 03 January 2024
Published: 09 January 2024
DOI: https://doi.org/10.1186/s12859-024-05638-4

DiCleave: a deep learning model for predicting human Dicer cleavage sites