Unlocking the microbial studies through computational approaches: how far have we reached?

Kumar, Rajnish; Yadav, Garima; Kuddus, Mohammed; Ashraf, Ghulam Md; Singh, Rachana

doi:10.1007/s11356-023-26220-0

Unlocking the microbial studies through computational approaches: how far have we reached?

Review Article
Published: 15 March 2023

Volume 30, pages 48929–48947, (2023)
Cite this article

Download PDF

Environmental Science and Pollution Research Aims and scope Submit manuscript

Unlocking the microbial studies through computational approaches: how far have we reached?

Download PDF

Rajnish Kumar^1,3,
Garima Yadav¹,
Mohammed Kuddus²,
Ghulam Md Ashraf⁴ &
…
Rachana Singh ORCID: orcid.org/0000-0002-4485-752X¹

3218 Accesses
6 Citations
3 Altmetric
Explore all metrics

Abstract

The metagenomics approach accelerated the study of genetic information from uncultured microbes and complex microbial communities. In silico research also facilitated an understanding of protein-DNA interactions, protein–protein interactions, docking between proteins and phyto/biochemicals for drug design, and modeling of the 3D structure of proteins. These in silico approaches provided insight into analyzing pathogenic and nonpathogenic strains that helped in the identification of probable genes for vaccines and antimicrobial agents and comparing whole-genome sequences to microbial evolution. Artificial intelligence, more precisely machine learning (ML) and deep learning (DL), has proven to be a promising approach in the field of microbiology to handle, analyze, and utilize large data that are generated through nucleic acid sequencing and proteomics. This enabled the understanding of the functional and taxonomic diversity of microorganisms. ML and DL have been used in the prediction and forecasting of diseases and applied to trace environmental contaminants and environmental quality. This review presents an in-depth analysis of the recent application of silico approaches in microbial genomics, proteomics, functional diversity, vaccine development, and drug design.

Use Cases and Future Aspects of Intelligent Techniques in Microbial Data Analysis

Machine learning algorithm to characterize antimicrobial resistance associated with the International Space Station surface microbiome

Article Open access 24 August 2022

Understanding the Role of (Advanced) Machine Learning in Metagenomic Workflows

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

The discipline of microbiology means exploring the structure and function, interrelationships, and mechanisms within communities of microorganisms and their interactions with the immediate environments or hosts. Microscopy has been the key technique for the identification of microbes, which is complementarily followed by culture techniques to elucidate their physiology, genetic constructs, metabolism, and pathogenicity. However, these procedures are time-consuming and labor-intensive. The incorporation of advanced techniques such as high-throughput sequencing and next-generation sequencing in the field of microbiology has presented a plethora of genomic data. This accumulation of data from various domains of microbial genomics has enabled the development of new diagnostic and genoty** tools, deciphered microbial genetic diversity, and identified virulence and resistance mechanisms. Additionally, in silico methods assist in gathering genetic information that can be used to identify therapeutic targets, investigate host–pathogen interactions, and establish mechanisms of antibiotic resistance and virulence.

Therefore, an optimal analysis and interpretation of these large intricate data is the next challenge to achieve these promising advances. This task is beyond human expertise with a high risk of errors involved and calls out for advanced computational techniques that can detect meaningful patterns from the heaps of data. Artificial intelligence can fill these gaps with techniques such as machine learning, which uses structured data and recognizes meaningful patterns with supervised and unsupervised learning methods.

Bioinformatics, an application of information technology, helps in the processing and analysis of the data generated in biological research and experiments by applying computer-based algorithms. It helps in DNA barcoding and designing the patterns of disease outbreaks and new biological products. Proteomics also facilitates the study of protein structures and the identification of protein–protein interaction sites (Rao et al. 2014). In the study of metabolomics, dynamics in cell and cellular interactions are possible with the help of bioinformatics (Kushwaha et al. 2017). Bioinformatics has not only helped in genome sequencing and presented accomplishments in gene allocations but also helped to draw phylogenetic relationships and detect transcription factor-binding sites of the genes. Microarray data analysis is made possible by bioinformatics tools. Biological data are growing exponentially due to the availability of low-cost sequencing technologies. The enormous amount of data generated has led to the development of databases of nucleic acid sequences, protein sequences, and their structures. For example, Swiss-Prot and PIR for protein sequences, GenBank and DDBJ for genome sequences and protein structures, and protein databanks are established primary databases. Various software and tools that could be helpful in microbiological studies are summarized in Table 1.

Table 1 Useful software and tools for microbial studies

Full size table

In silico approaches for microbial genomics

Metagenomics is an approach of advanced genomics techniques to study the microbial communities directly from their natural environments without cultivation in the lab and isolation of individual species (DeLong 2002; Riesenfeld et al. 2004a, b; Handelsman 2004; Rodriguez-Valera 2004; Streit and Schmitz 2004; Edwards and Rohwer 2005). This is the culture-independent approach for retrieval of 16S rRNA genes, established two decades ago by Pace and colleagues (Olsen et al. 1986). In 2002, Hugenholtz reported that until that time, 99% of microbial species had not been cultivated due to limitations but metagenomics approaches, revolutionized microbiology by eliminating the need for clonal isolates (Hugenholtz 2002; Rappe and Giovannoni 2003; Singh and Porwal 2021).

Metagenomic assembly facilitates gene prediction and annotation and is therefore considered a significant step when studying the functional constitution and size of microbiomes (Van der Walt et al. 2017).

To facilitate microbial identification studies, various techniques have emerged. DNA pyrosequencing, also known as sequencing by synthesis, was developed in the mid-1990s (Ronaghi et al. 1996). The major limitation of this method is its inability to read the long stretches of DNA sequence (sequences hardly exceed 100–200 base pairs with first- and second-generation pyrosequencing chemistries) (Joseph et al. 2009).

With the advent of sequencing technology, next-generation sequencing (NGS) has emerged as a rapid and reliable method for the identification of bacterial pathogens. NGS has evolved as a molecular microscope, expanding its applications into every field of microbial research (Buermans and den Dunnen 2014). The application of NGS in the microbial world includes both wet lab and bioinformatics tools/computational methods (Fig. 1) (Ghannam and Techtmann 2021). The first step of this technique is the molecular profiling of the microbial community that incorporates collection of sample (from the patient or environment), nucleic acid extraction, and library preparation. Several biases could be introduced with wet lab methods (Hazen et al. 2013). After sequencing, the primary analysis was performed using bioinformatics tools. Several studies have taken place on the processing of sequencing reads. This includes methods for binning marker genes into operational taxonomic units (OTUs) and is representative of biologically meaningful categories (Edgar 2010). Liu et al. (2021) have elaborated the step wise analysis methods used for high throught put analysis of microbiome. The collected samples are first diluted and then distributed in microtiter plate of 96 wells. The wells are then subjected to amplicon sequencing and selected as candidate. The candidates are further subjected to 16rDNA full length Sangers sequencing (Fig. 2).

Peker et al. compared the three methods for NGS data analysis for speed and diagnostic accuracy: de novo assembly followed by the Basic Local Alignment Search Tool (BLAST), operational taxonomic unit (OTU) for clustering and an in house developed database (16S–23S rRNA encoding region). They directly used the patient samples to perform NGS of the 16S and 23S rRNA encoding regions for reliable identification of pathogens. Although NGS data analysis is tedious and laborious, a database for the complete 16S–23S rRNA coding region is not obtainable. The study suggested and recommended de novo assembly followed by BLAST as a better method. This method showed the shortest turnaround time (2 h and 5 min), which is two hours less than OTU clustering and 4.5 h less than map**, with a sensitivity of 80%. This analysis concluded that the blend of de novo assembly and BLAST seems to be the best approach for the analysis of data (Peker et al. 2019). Additionally, comprehension of protein-DNA interactions, protein–protein interactions, docking between proteins and phyto/biochemicals for drug design, and modelling of the three-dimensional structure of proteins were made possible by in silico research (Qiu et al. 2020; Bryant et al. 2022; Baig et al. 2016; Ali et al. 2021; Fatoki et al. 2021).

Machine learning for metagenomic data analysis

With the evolution of technology and machine learning (ML) models, metagenomics has become a popular field of bioinformatics. One can create more competent models to address the problems of DNA sequencing and genome classification. As the technology is becoming more sophisticated, new more precise DNA sequencing techniques have been developed, and the enhanced computational power of modern computers has helped to achieve that. As a result, much larger quantities of data can now be processed and trained with more complex machine learning models that were earlier not feasible due several limitations. The advantage of ML is that it can fully appreciate the depth of data generated while microbiome studies and build predictive models based on outcomes for the data achieved from the microbial community (Ghannam and Techtmann 2021). ML approaches use several forms, involving unsupervised, semisupervised, reinforced, or supervised learning (Kumar et al. 2018; Saxena et al. 2019; Sathya and Abraham 2013: Zitnik et al. 2019) (Fig. 2). The model that uses a training set falls under supervised learning (Stoter et al. 2019). Statistical classification and regression analysis come under common supervised learning algorithms (Kumar et al. 2011). Clustering, also known as unsupervised learning, implements k-means to determine a centriole and reduces error by iteration and descent to achieve classification (Omer et al. 2014).

The progression of ML has led to the use of this technique in various fields of research (Chen et al. 2016; Li et al. 2016; Zou et al. 2016; Ding et al. 2017; Feng et al. 2017; Yu et al. 2017; Zeng et al. 2017; Pan et al. 2018; Liu et al. 2018; He et al. 2019; Kumar et al. 2021; Zhang et al. 2019). Such exemplary applications are drug repurposing (Yu et al. 2016, 2017), discovery of new antibiotics (Steele et al. 2009), identification of novel biocatalysts, personalized medicine (Virgin and Todd 2011; Pires et al. 2020a, b; Villasana et al. 2020), identification of disease-related microRNAs (Chen and Huang 2017; Zhao et al. 2018), identification of disease-related noncoding RNAs (Chen and Yan 2013; Hu et al. 2017, 2018), and bioremediation of agricultural, industrial, and domestic wastes (Mani and Kumar 2014; Pires et al. 2020a, b). Oudah and Henschel defined the four key stages of ML algorithm development (Oudah and Henschel 2018): The first step of the ML method, which is also a critical stage, addresses the extraction of the features (Liu et al. 2015) and then OTUs, which are obtained by clustering. Then, the significant features that are responsible for enhancing the precision and proficiency are selected, and the final step is training the dataset that is used to train an algorithm and fit the dataset. After that, a test set is used for the evaluation of the model.

Machine learning for disease prediction and classification

Various normal microflora residing in the gut play vital roles in human health. Disturbances in intestinal microorganisms may cause inflammatory diseases of the intestine (Chen et al. 2017a, b, c), such as colorectal cancer, tumors, diabetes, ulcerative colitis, and obesity. Consequently, it becomes essential to interpret the relationship of microbes, a disease, better clinical prognostic tests, and the development of new drugs (Yu et al. 2015, 2016; Shi et al. 2016; Su et al. 2018, Fan et al. 2019, Arango-Argoty et al. 2018, Steiner et al. 2020).

For the analysis of microiome–host interactions in the context of disease, an approach was given by Fan et al. (2019) that combines several data sources of the human microbiome–host disease consortium with HeteSim scores. Initially, they constructed heterogenicity networks and then conducted microbe–disease pair weighting with the standardized HeteSim measurement method. This was followed by the integration of the microbes–disease–disease pathway with HeteSim scores of the microbe‒microbe–disease pathway and finally calculation of the corresponding scores of probable microgenome associations.

Amgarten et al. (2018) proposed a new tool, MARVEL, for the prediction of the double-stranded DNA sequence of bacteriophages in metagenomics. MARVEL uses a random forest (RF) approach with a large dataset containing 1247 phage genomes and 1029 bacterial genomes along with a test dataset consisting of 335 bacterial and 177 phage genomes. Six features were proposed for the identification of phages, and then, RF was exercised for the selection of features. Finally, three features were established, which provided more information (Grazziotin et al. 2017).

Over the last few years, many studies have explored and scrutinized the role of microbiome communities in the prediction of diseases. Later, researchers incorporated complete genome sequencing and entire transcriptome sequencing data of 33 types of cancer from The Cancer Genome Atlas (TCGA) to examine the potential of microbial signatures as cancer predictors by using variation boosting ML models (Poore et al. 2020). The ML models successfully discriminated different cancer types and distinguished between cancer and normal tissues, suggesting that the microbiome is exclusive to each cancer type and cancer stage. The authors concluded that the proposed model could serve as a potential tool in microbiome-based cancer diagnosis. A similar study investigated the role of the vaginal microbial community based on bacterial signatures in the prediction of cervical intraepithelial neoplasia (CIN) using a random forest model (Lee et al. 2020). Sequencing data of the V3 region of 16S rRNA from vaginal swabs of 66 subjects were investigated for its taxonomic composition. A set of 33 bacterial species were obtained as marker communities differentiating between the CIN1 and CIN2 groups, with 0.952 area under curve (AUC). This finding validates the potential of the RF model in the prediction of CIN staging and VM as a biomarker.

Cai et al. (2019) focused on investigating the underlying mechanism of pathogenesis in human diseases using genomics with the help of in silico applications. They used a novel ML-based approach and recognized two genes, OTOF and SOCS1, that contribute to the pathogenesis mechanism of rhinovirus (Xu et al. 2019). The expression levels of these two genes could potentially determine the infected or noninfected state of an individual. Alongside depicting the significance of these two genes in rhinovirus pathogenesis, this study also demonstrated the effectiveness of in silico applications in studying the pathogenesis mechanisms. Wang et al. in 2019, proposed a spectral rotation method based on the triplet periodicity property to solve planted motif finding problems (Wang et al. 2019). The proposed method gives genes with several substitutions that can be detected from arbitrarily generated background sequences. The results of the experiment based on the genomic dataset of Saccharomyces cerevisiae showed that genes could be visually distinguished. The authors suggested that genes having approximately 50% mutations could be easily identified in background sequences.

Several studies have explored viral genomics with the help of in silico approaches. Remita et al. developed a machine learning-based virus classification tool called CASTOR and used different datasets of hepatitis B virus, human papillomaviruses (HPV), and HIV-1 as testing datasets (Remita et al. 2017). The model imitates the restriction fragment length polymorphism (RFLP) technique in silico and stimulates fragmention of genomic material by different restriction endonucleases. The authors noted positive cases of 99% for HPV alpha species, 99% for HBV genoty**, and 98% for HIV-1 M subty**. They concluded that this model is a great fit to achieve accurate large-scale virus studies owing to its generality and robustness (Lebatteux et al. 2019). Ren et al. proposed VirFinder (a novel k-mer-based tool) for the identification of viral sequences from collected metagenomic data (Ren et al. 2017). This model identifies viral sequences based on the differences in k-mer signatures of viruses and hosts. The model was trained on sequences of host and viral genomes that were sequenced before January 1, 2014, and evaluated on sequences attained after January 1, 2014. When compared to the current gene-based virus classification tool VirSorter (Roux et al. 2015), the proposed model had better TPRs (true positive rates), and it also works comparatively better for small viral contigs. The authors concluded that the proposed model is an effective tool to improve viral sequence identification, especially for viral metagenomic data.

Through their intricate multilayered learning models, deep neural networks have been shown to be a promising approach for the analysis of feature-rich and high-dimensional omics data with their complex multilevel structure. Various studies have developed deep learning-based computational models for the analysis of complex genomic and metagenomic datasets. Arangp-Argoty et al. proposed DeepARG networks to analyze a metagenomic dataset to envisage antibiotic resistance genes (ARGs) (Arango-Argoty et al. 2018). This network constitutes two models, DeepARG-LS for short-read sequences and DeepARG-SS for full-length sequences. The models were trained using 30 ARG categories and showed extreme accuracy (> 0.97) and recall (> 0.90) when evaluated on different databases (Berglund et al. 2017; Lakin et al. 2017). On the basis of the results, the authors concluded that DeepARG facilitates the identification of a wide range of ARGs.

Quang et al. developed a model named deleterious annotation of genetic variants using neural networks (DANN), based on a deep neural network, to annotate the pathogenicity of coding and noncoding genetic variants while also capturing nonlinear relationships among the features (Quang et al. 2022)

Full size image

Table 2 Various computational pipelines used for microbial genomics, proteomics, and functional diversity

Full size table

Application of ML in antimicrobial resistance

Human mortality worldwide faces the widespread spread of infectious diseases. There is a major challenge for health workers for the prevention and treatment of such diseases. To address any such threat, accurate identification and characterization of pathogens are the foremost requirement and require expertise along with high-end equipment and facilities. Machine learning can automate this with precision and accuracy with the help of image and metagenomics data (Goodswen et al. 2021).

Drug-resistant tuberculosis (TB) poses major health concerns worldwide. Earlier, the identification of drug resistance was based on single nucleotide polymorphisms (SNPs). Currently, research is based on the association between genetic variants and multivariate variants (Zhang et al. 2013; Walker et al. 2015). Yang et al. (2018) studied the multivariate association with different ML models, such as RF, SVM, and LR, for the classification of multidrug resistance against eight anti-TB drugs. The reported SVM was the best model that derived the data from 1839 TB samples. Another similar study was conducted by Kouchaki et al. (2019), with 13,402 samples and tested against 11 drugs. In this study, LR performed best and indicated that ML algorithms function differently with different training datasets.

Antimalarial resistance in Plasmodium falciparum is the greatest challenge in Africa. The efficacy of antimalarial therapy was assessed by genoty** malaria parasites once the infection was identified and treated (Plucinski et al. 2015; Talundzic et al. 2016; Halsey et al. 2017), followed by parasite genoty** from the same patient if reinfected with malaria by sequencing a well-defined set of microsatellite repeats. This microsatellite comparison enables us to understand whether the patient is infected with a new strain or reoccureance is due to failure in treatment (Plucinski et al. 2015). For this study, an unsupervised Bayesian classifier was developed (Slater et al. 2005), as the manual prediction of these profiles is difficult and prone to bias. Jones et al. (2020) evaluated this approach and proposed that the Bayesian approach was immensely specific and catered to the precise assessment of treatment failure rates in comparison to manual analysis (Jones et al. 2020).

Limitations and conclusions

As indicated by the performance metrics of the research listed above, AI algorithms have shown excellent gains in microbial studies. However, large-scale clinical applications outside of limited clinical investigations are needed. This could aid in obtaining government regulatory approval for clinical applications of AI-based models in conventional patient treatment, which is currently absent.

Multiple variables must be addressed before AI may be used in regular healthcare procedures on a wide scale, such as model training, high-quality data/images, data labeling, and model validation methodologies.

In general, AI models necessitate correctly annotated genomes and proteomics data. Otherwise, the study could lead to AI model bias. AI models that are based on a single source and nonblinded microbiological data frequently produce incorrect results. AI models are typically built in a single institution with a specific patient population, which can limit the models’ ability to be applied outside of the institutional clinical setting. The accuracy achieved in AI-assisted microbiological research may not always imply efficacy in clinical practice. Furthermore, ethical concerns are likely due to biassed AI models and exaggerated accuracy, which could result in unintended misidentification or predictions with false negatives and positives. The majority of AI-driven microbiological technologies are largely research-based and not in widespread use. While several research groups strive to make AI technologies easier to integrate and implement with traditional software systems, this requires additional formal training for microbiologists and technical employees. Moreover, microbial institutions must develop uniform standards for the use of AI in relevant settings. All of these drawbacks must be addressed before regulatory organizations provide final permission for the use of AI-based technology in microbiology research.

Predictive models for metagenomics studies, disease prediction and classification, and microbial proteomics studies could be extremely beneficial, not only in the case of early disease detection and improved patient survival rates but also in terms of gaining a better understanding of pathogenic and beneficial microorganisms. During the previous decade, AI algorithms’ prediction performance improved considerably. Similarly, modern microbiological study predictive models are improving. However, to take advantage of advances in AI algorithms for data mining and building valuable patterns for better decision support, we must appropriately utilize data collected from microbial research. These prediction models are not intended to replace traditional microbiological research but rather to provide an additional layer of protection for disease detection and treatment. Additionally, these AI-based systems are capable of extracting key information with predictive significance. In regard to a tangible benefit, only models with knowledge-driven approaches provide a genuine difference when compared to traditional techniques. Fair restrictions from relevant authorities, as well as the adoption of AI approaches in microbial metagenomics, proteomics, and disease predictions, are necessary conditions for incorporating AI technology into the current healthcare environment.

Data availability

NA

References

Ali M, Aurongzeb M, Rashid Y (2021) (2021) In-silico three dimensional structure prediction of important Neisseria meningitidis proteins. Pak J Pharm Sci 34(2):553–560
CAS Google Scholar
Amgarten D, Braga LPP, da Silva AM, Setubal JC (2018) MARVEL, a tool for prediction of bacteriophage sequences in metagenomic bins. Front Genet 9:304. https://doi.org/10.3389/fgene.2018.00304
Article CAS Google Scholar
Andrighetti T, Bohar B, Lemke N, Sudhakar P, Korcsmaros T (2020) MicrobioLink: an integrated computational pipeline to infer functional effects of microbiome–host interactions. Cells 9:1278. https://doi.org/10.3390/cells9051278
Article CAS Google Scholar
Arango-Argoty G, Garner E, Pruden A, Heath LS, Vikesland P, Zhang L (2018) DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data. Microbiome 6:23. https://doi.org/10.1186/s40168-018-0401-z
Article Google Scholar
Argimón S, Abudahab K, Goater RJE, Fedosejev A, Bhai J, Glasner C, Feil EJ, Holden MTG, Yeats CA, Grundmann H, Spratt BG, Aanensen DM (2016) Microreact: visualizing and sharing data for genomic epidemiology and phylogeography. Microb Genom 2(11):e000093. https://doi.org/10.1099/mgen.0.000093
Article Google Scholar
Arisdakessian,C., Sean, B., Cleveland, and Belcaid, M. (2020). MetaFlow|mics: scalable and reproducible nextflow pipelines for the analysis of micro-biome marker data. In Practice and Experience in Advanced Research Computing (PEARC ’20). 26–30. https://doi.org/10.1145/3311790.3396664
Aziz RK, Bartels D, Best AA et al (2008) The RAST Server: Rapid Annotations using Subsystems Technology. BMC Genomics 9:75. https://doi.org/10.1186/1471-2164-9-75
Article CAS Google Scholar
Baeriswyl S, Gan BH, Siriwardena TN, Visini R, Robadey M, Javor S, Stocker A, Darbre T, Reymond JL (2019) X-ray crystal structures of short antimicrobial peptides as Pseudomonas aeruginosa lectin B complexes. ACS Chem Biol 14:758–766. https://doi.org/10.1021/acschembio.9b00047
Article CAS Google Scholar
Baig MH, Ahmad K, Roy S, Ashraf JM, Adil M, Siddiqui MH, Khan S, Kamal MA, Provazník I, Choi I (2016) Computer aided drug design: success and limitations. Curr Pharm Des 22(5):572–581. https://doi.org/10.2174/1381612822666151125000550
Article CAS Google Scholar
Balaji A, Kille B, Kappell AD et al (2022) SeqScreen: accurate and sensitive functional screening of pathogenic sequences via ensemble learning. Genome Biol 23:133. https://doi.org/10.1186/s13059-022-02695-x
Article CAS Google Scholar
Berglund F, Marathe NP, Österlund T, Bengtsson-Palme J, Kotsakis S, Flach CF, Larsson DGJ, Kristiansson E (2017) Identification of 76 novel B1 metallo-β-lactamases through large-scale screening of genomic and metagenomic data. Microbiome 5:134. https://doi.org/10.1186/s40168-017-0353-8
Article Google Scholar
Bryant P, Pozzati G, Elofsson A (2022) Improved prediction of protein-protein interactions using AlphaFold2. Nat Commun 13(1):1265. https://doi.org/10.1038/s41467-022-28865-w
Article CAS Google Scholar
Brynildsrud O, Bohlin J, Scheffer L, Eldholm V (2016) Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary. Genome Biol 17(1):238. https://doi.org/10.1186/s13059-016-1108-8
Article CAS Google Scholar
Buermans HPJ, den Dunnen JT (2014) Next generation sequencing technology: advances and applications. Biochimica et Biophysica Acta (BBA) – Mol Basis of Dis 1842:1932–1941. https://doi.org/10.1016/j.bbadis.2014.06.015
Article CAS Google Scholar
Cai L, Wu, Y, Gao J (2019) DeepSV: accurate calling of genomic deletions from high-throughput sequencing data using deep convolutional neural network. BMC Bioinformatics 20(1):665. https://doi.org/10.1186/s12859-019-3299-y
Capecchi A, Cai X, Personne H, Köhler T, van Delden C, Reymond JL (2021) Machine learning designs nonhemolytic antimicrobial peptides. Chem Sci 12:9221–9232. https://doi.org/10.1039/d1sc01713f
Article CAS Google Scholar
Chavali AK, Rhee SY (2018) Bioinformatics tools for the identification of gene clusters that biosynthesize specialized metabolites. Briefings in Bioinfo 19:1022–1034. https://doi.org/10.1093/bib/bbx020
Article CAS Google Scholar
Chen X, Huang L (2017) LRSSLMDA: laplacian regularized sparse subspace learning for MiRNA-disease association prediction. PLoS Comput Biol 13:e1005912. https://doi.org/10.1371/journal.pcbi.1005912
Article CAS Google Scholar
Chen X, Yan GY (2013) Novel human lncRNA-disease association inference based on lncRNA expression profiles. Bioinfo 29:2617–2624. https://doi.org/10.1093/bioinformatics/btt426
Article CAS Google Scholar
Chen XX, Tang H, Li WC, Wu H, Chen W, Ding H et al (2016) Identification of bacterial cell wall lyases via pseudo amino acid composition. Biomed Res Int. https://doi.org/10.1155/2016/1654623
Article Google Scholar
Chen J, Guo MY, Li SM, Liu B (2017a) ProtDec-LTR2.0: an improved method for protein remote homology detection by combining pseudo protein and supervised Learning to Rank. Bioinfo 33:3473–3476. https://doi.org/10.1093/bioinformatics/btx429
Article CAS Google Scholar
Chen X, Huang Y-A, You Z-H, Yan G-Y, Wang X-S (2017) A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases. Bioinfo 33:733–739. https://doi.org/10.1093/bioinformatics/btw715
Article CAS Google Scholar
Chen X, Huang YA, You ZH, Yan GY, Wang XS (2017) A novel approach based on KATZ measure to predict associations of human microbiota with diseases. Bioinfo 33:733–739. https://doi.org/10.1093/bioinformatics/btw715
Article CAS Google Scholar
Chen T, Liu Y-X, Huang L (2022) ImageGP: an easy-to-use data visualization web server for scientific researchers. iMeta 1:e5. https://doi.org/10.1002/imt2.5
Article Google Scholar
Croucher NJ, Page AJ, Connor TR, Delaney AJ, Keane JA, Bentley SD, Parkhill J, Harris SR (2015) Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res. 43(3):e15. https://doi.org/10.1093/nar/gku1196
Article CAS Google Scholar
Davis-Turak J, Courtney SM, Hazard ES, Glen WB Jr, da Silveira WA et al (2017) Genomics pipelines and data integration: challenges and opportunities in the research setting. Expert Rev Mol Diagn 17:225–237. https://doi.org/10.1080/14737159.2017.1282822
Article CAS Google Scholar
DeLong EF (2002) Microbial population genomics and ecology. Curr Opin Microbiol 5:520–524. https://doi.org/10.1016/s1369-5274(02)00353-3
Article Google Scholar
Didelot X, Wilson DJ (2015) ClonalFrameML: efficient inference of recombination in whole bacterial genomes. PLoS Comput Biol 11(2):e1004041. https://doi.org/10.1371/journal.pcbi.1004041
Article CAS Google Scholar
Dimopoulos EA, Carmagnini A, Velsko IM, Warinner C, Larson G, Frantz LAF et al (2022) HAYSTAC: a Bayesian framework for robust and rapid species identification in high-throughput sequencing data. PLoS Comput Biol 18:e1010493. https://doi.org/10.1371/journal.pcbi.1010493
Article CAS Google Scholar
Ding YJ, Tang JJ, Guo F (2017) Identification of drug-target interactions via multiple information integration. Inf Sci 418:546–560. https://doi.org/10.1016/j.ins.2017.08.045
Article Google Scholar
Djemiel C, Dequiedt S, Karimi B et al (2020) BIOCOM-PIPE: a new user-friendly metabarcoding pipeline for the characterization of microbial diversity from 16S, 18S and 23S rRNA gene amplicons. BMC Bioinformatics 21:492. https://doi.org/10.1186/s12859-020-03829-3
Article CAS Google Scholar
Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinfo 26:2460–2461. https://doi.org/10.1093/bioinformatics/btq461
Article CAS Google Scholar
Edwards RA, Rohwer F (2005) Viral metagenomics. Nat Rev Microbiol 3:504–510. https://doi.org/10.1038/nrmicro1163
Article CAS Google Scholar
Emamjomeh A, Goliaei B, Zahiri J, Ebrahimpour R (2014) Predicting protein–protein interactions between human and hepatitis C virus via an ensemble learning method. Mol Biosyst 12:3147–3154. https://doi.org/10.1039/c4mb00410h
Article Google Scholar
Fan CY, Lei XJ, Guo L, Zhang AD (2019) Predicting the associations between microbes and diseases by integrating multiple data sources and path-based HeteSim scores. Neurocomput 323:76–85. https://doi.org/10.1016/j.neucom.2018.09.054
Article Google Scholar
Fatoki TH, Ibraheem O, Ogunyemi IO, Akinmoladun AC, Ugboko HU, Adeseko CJ, Awofisayo OA, Olusegun SJ, Enibukun JM (2021) Network analysis, sequence and structure dynamics of key proteins of coronavirus and human host, and molecular docking of selected phytochemicals of nine medicinal plants. J Biomol Struct Dyn 39(16):6195–6217. https://doi.org/10.1080/07391102.2020.1794971
Article CAS Google Scholar
Feng PM, Zhang JD, Tang H, Chen W, Lin H (2017) Predicting the organelle location of noncoding RNAs using pseudo nucleotide compositions. Interdiscip Sci Comput Life Sci 9:540–544. https://doi.org/10.1007/s12539-016-0193-4
Article CAS Google Scholar
Francisco AP, Vaz C, Monteiro PT et al (2012) PHYLOViZ: phylogenetic inference and data visualization for sequence based ty** methods. BMC Bioinformatics 13:87. https://doi.org/10.1186/1471-2105-13-87
Article Google Scholar
Ghannam RB, Techtmann SM (2021) Machine learning applications in microbial ecology, human microbiome studies, and environmental monitoring. Compu and Struc Biotech J 19:1092–1107. https://doi.org/10.1016/j.csbj.2021.01.028
Article CAS Google Scholar
Goodswen SJ, Kennedy PJ, Ellis JT (2013) A novel strategy for classifying the output from an in silico vaccine discovery pipeline for eukaryotic pathogens using machine learning algorithms. BMC Bioinfo 14:315. https://doi.org/10.1186/1471-2105-14-315
Article Google Scholar
Goodswen SJ, Barratt JLN, Kennedy PJ, Kaufer A, Calarco L, Ellis JT (2021) Machine learning and applications in microbiology. FEMS Micro Rev 45:fuab015. https://doi.org/10.1093/femsre/fuab015
Article CAS Google Scholar
Grazziotin AL, Koonin EV, Kristensen DM (2017) Prokaryotic Virus Orthologous Groups (pVOGs): a resource for comparative genomics and protein family annotation. Nucleic Acids Res 45:491–498. https://doi.org/10.1093/nar/gkw975
Article CAS Google Scholar
Hadfield J, Croucher NJ, Goater RJ, Abudahab K, Aanensen DM, Harris SR (2018) Phandango: an interactive viewer for bacterial population genomics. Bioinformatics 34(2):292–293. https://doi.org/10.1093/bioinformatics/btx610
Article CAS Google Scholar
Halsey ES, Venkatesan M, Plucinski MM et al (2017) Capacity development through the US President’s malaria initiative-supported antimalarial resistance monitoring in Africa Net- work. Emerg Infect Dis 23. https://doi.org/10.3201/eid2313.170366
Handelsman J (2004) Metagenomics: Application of genomics to uncultured microorganisms. Microbiol Mol Biol Rev 68:669–684. https://doi.org/10.1128/MMBR.68.4.669-685.2004
Article CAS Google Scholar
Hazen TC, Rocha AM, Techtmann SM (2013) Advances in monitoring environmental microbes. Curr Opin Biotech 24:526–533. https://doi.org/10.1016/j.copbio.2012.10.020
Article CAS Google Scholar
He WY, Jia CZ, Zou Q (2019) 4mCPred: machine learning methods for DNA N-4-methylcytosine sites prediction. Bioinfo 35:593–601. https://doi.org/10.1093/bioinformatics/bty668
Article CAS Google Scholar
Hu H, Zhu CY, Ai HX, Zhang L, Zhao J, Zhao Q et al (2017) LPI-ETSLP: lncRNA-protein interaction prediction using eigenvalue transformation-based semi-supervised link prediction. Mol Biosyst 13:1781–1787. https://doi.org/10.1039/c7mb00290d
Article CAS Google Scholar
Hu H, Zhang L, Ai HX, Zhang H, Fan YT, Zhao Q et al (2018) HLPI-Ensemble: prediction of human lncRNA-protein interactions based on ensemble strategy. RNA Biol 15:797–806. https://doi.org/10.1080/15476286.2018.1457935
Article Google Scholar
Hugenholtz P (2002) Exploring prokaryotic diversity in the genomic era. Genome Biol 3. https://doi.org/10.1186/gb-2002-3-2-reviews0003
Hyatt D, Chen GL, LoCascio PF et al (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119. https://doi.org/10.1186/1471-2105-11-119
Article CAS Google Scholar
Jones S, Plucinski M, Kay K et al (2020) A computer modelling approach to evaluate the accuracy of microsatellite markers for classification of recurrent infections during routine monitoring of antimalarial drug efficacy. Antimicrob Agents Chemother 64. https://doi.org/10.1128/AAC.01517-19
Joseph RM, Devineni AV, King IF, Heberlein U (2009) Oviposition preference for and positional avoidance of acetic acid provide a model for competing behavioral drives in Drosophila. Proc Natl Acad Sci U S A 106(27):11352–11357. https://doi.org/10.1073/pnas.0901419106
Katrina LK, Tiago C, Charles, de Bourcy FA, Dimitrov B, Dingle G, Egger R et al (2020) IDseq—an open source cloud-based pipeline and analysis service for metagenomic pathogen detection and monitoring. GigaScience 9:giaa111. https://doi.org/10.1093/gigascience/giaa111
Article CAS Google Scholar
Kim B, Alguwaizani S, Zhou X, Huang DS, Park B, Han K (2017) An improved method for predicting interactions between virus and human proteins. J Bioinform Comput Biol 15:1650024. https://doi.org/10.1142/S0219720016500244
Article CAS Google Scholar
Kouchaki S, Yang Y, Walker TM et al (2019) Application of machine learning techniques to tuberculosis drug resistance analysis. Bioinfo 35:2276–2282. https://doi.org/10.1093/bioinformatics/bty949
Article CAS Google Scholar
Kozlov AM, Darriba D, Flouri T, Morel B, Stamatakis A (2019) RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics 35(21):4453–4455. https://doi.org/10.1093/bioinformatics/btz305
Article CAS Google Scholar
Kumar R, Sharma A, Varadwaj P, Ahmad A, Ashraf GM (2011) Classification of oral bioavailability of drugs by machine learning approaches: a comparative study. J Comp Int Sci 2:1–18. https://doi.org/10.6062/JCIS.2011.02.03.0045
Article Google Scholar
Kumar R, Sharma A, Siddiqui MH, Tiwari RK (2018) Promises of machine learning approaches in prediction of absorption of compounds. Mini Rev Med Chem 18(3):196–207. https://doi.org/10.2174/1389557517666170315150116
Article CAS Google Scholar
Kumar R, Sharma A, Srivastava JK, Siddiqui MH, Uddin MS, Aleya L (2021) Hydroxychloroquine in COVID-19: therapeutic promises, current status, and environmental implications. Environ Sci Pollut Res Int 28(30):40431–40444. https://doi.org/10.1007/s11356-020-12200-1
Article CAS Google Scholar
Kushwaha UKS, Deo I, Jaiswal JP, Prasad B (2017) Role of bioinformatics in crop improvement. GJSFR 17(1):13–23
Lakin SM, Dean C, Noyes NR, Dettenwanger A, Ross AS, Doster E, Rovira P, Abdo Z (2017) MEGARes: an antimicrobial resistance database for high throughput sequencing. Nucleic Acids Res 45:574–580. https://doi.org/10.1093/nar/gkw1009
Article CAS Google Scholar
Lebatteux D, Remita AM, Diallo AB (2019) Toward an alignment-free method for feature extraction and accurate classification of viral sequences. J Comput Biol 26:519–535. https://doi.org/10.1089/cmb.2018.0239
Article CAS Google Scholar
Lee YH, Kang GU, Jeon SY, Tagele SB, Pham HQ, Kim MS, Ahmad S, Jung DR, Park YJ, Han HS et al (2020) Vaginal microbiome-based bacterial signatures for predicting the severity of cervical intraepithelial neoplasia. Diagnostics (basel) 10:1013. https://doi.org/10.3390/diagnostics10121013
Article CAS Google Scholar
Letunic I, Bork P (2021) Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation. Nucleic Acids Res 49(W1):W293–W296. https://doi.org/10.1093/nar/gkab301
Article CAS Google Scholar
Li Z, Tang JJ, Guo F (2016) Learning from real imbalanced data of 14–3-3 proteins binding specificity. Neurocomput 217:83–91. https://doi.org/10.1016/j.neucom.2016.03.093
Article Google Scholar
Lima DB, Dupré M, Santos MDM, Carvalho PC, Chamot-Rooke J (2021) DiagnoTop: a computational pipeline for discriminating bacterial pathogens without database search. J Am Soc Mass Spectrom 32:1295–1299. https://doi.org/10.1021/jasms.1c00014
Article CAS Google Scholar
Liu B, Liu F, Wang X, Chen J, Fang L, Chou KC (2015) Pse-in- One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Res 43:65–71. https://doi.org/10.1093/nar/gkv458
Article CAS Google Scholar
Liu B, Jiang S, Zou Q (2018) HITS-PR-HHblits: protein remote homology detection by combining PageRank and Hyperlink-Induced Topic Search. Brief Bioinform 2018:bby104. https://doi.org/10.1093/bib/bby104
Article CAS Google Scholar
Liu YX, Qin Y, Chen T et al (2021) A practical guide to amplicon and metagenomic analysis of microbiome data. Protein Cell 12:315–330. https://doi.org/10.1007/s13238-020-00724-8
Article Google Scholar
Mani D, Kumar C (2014) Biotechnological advances in bioremediation of heavy metals contaminated ecosystems: an overview with special reference to phytoremediation. Int J Environ Sci Technol 11:843–872
Article CAS Google Scholar
Marinier E, Zaheer R, Berry C, Weedmark KA, Domaratzki M, Mabon P, Knox NC, Reimer AR, Graham MR, Chui L, Patterson-Fortin L, Zhang J, Pagotto F, Farber J et al (2017) Neptune: a bioinformatics tool for rapid discovery of genomic variation in bacterial populations. Nucleic Acids Res 45(18):e159. https://doi.org/10.1093/nar/gkx702
Article CAS Google Scholar
Maritan M, Romeo M, Oberti L, Sormanni P, Tasaki M, Russo R, Ambrosetti A, Motta P, Rognoni P, Mazzini G, Barbiroli A et al (2020) Inherent biophysical properties modulate the toxicity of soluble amyloidogenic light chains. J Mol Biol 432:845–860. https://doi.org/10.1016/j.jmb.2019.12.015
Article CAS Google Scholar
Naccache SN, Federman S, Veeraraghavan N, Zaharia M, Lee D, Samayoa E et al (2014) A cloud-compatible bioinformatics pipeline for ultrarapid pathogen identification from next-generation sequencing of clinical samples. Genome Res 24:1180–1192. https://doi.org/10.1101/gr.171934.113
Article CAS Google Scholar
Nascimento M, Sousa A, Ramirez M, Francisco AP, Carriço JA, Nascimento M, Sousa A, Ramirez M, Francisco AP, Carriço JA, Vaz C (2017) PHYLOViZ 2.0: providing scalable data integration and visualization for multiple phylogenetic inference methods. Bioinformatics 33:128–129. https://doi.org/10.1093/bioinformatics/btw582
Article CAS Google Scholar
Nguyen LT, Haney EF, Vogel HJ (2011) The expanding scope of antimicrobial peptide structures and their modes of action. Trends Biotechnol 29:464–472. https://doi.org/10.1016/j.tibtech.2011.05.001
Article CAS Google Scholar
O’Donoghue, Seán I (2021) Grand challenges in bioinformatics data visualization. Front Bioinformatics 1:13. https://doi.org/10.3389/fbinf.2021.669186
Article Google Scholar
O’Donoghue SI, Baldi BF, Clark SJ, Darling AE, Hogan JM, Kaur S, Maier-Hein L et al (2018) Visualization of biomedical data. Annu Rev Biomed Data Sci 1:275–304
Article Google Scholar
Olawoye IB, Frost SDW, Happi CT (2020) The Bacteria Genome Pipeline (BAGEP): an automated, scalable workflow for bacteria genomes with Snakemake. Peer J 8:e10121. https://doi.org/10.7717/peerj.10121
Article Google Scholar
Olsen GJ, Lane DJ, Giovannoni SJ, Pace NR, Stahl DA (1986) Microbial ecology and evolution: a ribosomal RNA approach. Annu Rev Microbiol 40:337–365. https://doi.org/10.1146/annurev.mi.40.100186.002005
Article CAS Google Scholar
Omer A, Singh P, Yadav NK, Singh RK (2014) An overview of data mining algorithms in drug induced toxicity prediction. Mini Rev Med Chem 14:345–354. https://doi.org/10.2174/1389557514666140219110244
Article CAS Google Scholar
Osbourn A (2010) Secondary metabolic gene clusters: evolutionary toolkits for chemical innovation. Trends Genet 26:449–457. https://doi.org/10.1016/j.tig.2010.07.001
Article CAS Google Scholar
Oudah M, Henschel A (2018) Taxonomy-aware feature engineering for microbiome classification. BMC Bioinformatics 19:227. https://doi.org/10.1186/s12859-018-2205-3
Article Google Scholar
Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S, Holden MT, Fookes M, Falush D, Keane JA, Parkhill J (2015) Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31(22):3691–3693. https://doi.org/10.1093/bioinformatics/btv421
Article CAS Google Scholar
Pan GF, Jiang LM, Tang JJ, Guo F (2018) A novel computational method for detecting DNA methylation sites with DNA sequence information and physicochemical properties. Int J Mol Sci 19:E511. https://doi.org/10.3390/ijms19020511
Article CAS Google Scholar
Pandolfo M, Telatin A, Lazzari G, Adriaenssens EM, Vitulo D (2022) MetaPhage: an automated pipeline for analyzing, annotating, and classifying bacteriophages in metagenomics sequencing data. mSystems 7. https://doi.org/10.1128/msystems.00741-22
Parks DH, Mankowski T, Zangooei S, Porter MS, Armanini DG, Baird DJ et al (2013) GenGIS 2: geospatial analysis of traditional and genetic biodiversity, with new gradient algorithms and an extensible plugin framework. PLoS ONE 8(7):e69885. https://doi.org/10.1371/journal.pone.0069885
Article CAS Google Scholar
Peker N, Garcia-Croes S, Dijkhuizen B, Wiersma HH, van Zanten E, Wisselink G, Friedrich AW, Kooistra-Smid M, Sinha B, Rossen JWA, Couto N (2019) A comparison of three different bioinformatics analyses of the 16S–23S rRNA encoding region for bacterial identification. Front Microbiol 10:620. https://doi.org/10.3389/fmicb.2019.00620
Article Google Scholar
Pelosi B (2022) Develo** a bioinformatics pipeline for comparative protein classification analysis. BMC Genom Data 23:43. https://doi.org/10.1186/s12863-022-01045-x
Article CAS Google Scholar
Pires I, Souza G, Junior J (2020a) An analysis of the relation between garbage pickers and women’s health risk. Acta Sci Agric 4:12–16
Google Scholar
Pires IM, Marques G, Garcia NM, Flórez-Revuelta F, Ponciano V, Oniani S (2020b) A research on the classification and applicability of the mobile health applications. J Pers Med 10:11
Article Google Scholar
Plucinski MM, Morton L, Bushman M et al (2015) Robust algorithm for systematic classification of malaria late treatment failures as recrudescence or reinfection using microsatellite geno-ty**. Antimicrob Agents Chemother 59:6096–6100. https://doi.org/10.1128/AAC.00072-15
Article CAS Google Scholar
Poore GD, Kopylova E, Zhu Q, Carpenter C, Fraraccio S, Wandro S, Kosciolek T, Janssen S, Metcalf J, Song SJ, Kanbar J, Miller-Montgomery et al (2020) Microbiome analyses of blood and tissues suggest cancer diagnostic approach. Nature 579:567–574. https://doi.org/10.1038/s41586-020-2095-1
Article CAS Google Scholar
Posada-Céspedes S, Seifert D, Topolsky I, Jablonski KP, Metzner KJ, Beerenwinkel N (2021) V-pipe: a computational pipeline for assessing viral genetic diversity from high-throughput data. Bioinformatics 37:1673–1680. https://doi.org/10.1093/bioinformatics/btab015
Article CAS Google Scholar
Price MN, Dehal PS, Arkin AP (2010) FastTree 2 – approximately maximum-likelihood trees for large alignments. PLOS ONE 5(3):e9490. https://doi.org/10.1371/journal.pone.0009490
Article CAS Google Scholar
Qiu Y, Li X, He X, Pu J, Zhang J, Lu S (2020) Computational methods-guided design of modulators targeting protein-protein interactions (PPIs). Eur J Med Chem 207:112764. https://doi.org/10.1016/j.ejmech.2020.112764
Article CAS Google Scholar
Quang D, Chen Y, **e X (2015) DANN: a deep learning approach for annotating the pathogenicity of genetic variants. Bioinfo 31:761–763. https://doi.org/10.1093/bioinformatics/btu703
Article CAS Google Scholar
Rao VS, Srinivas K, Su**i GN, Kumar GN (2014) Protein-protein interaction detection: methods and analysis. Int J Proteomics 2014:147648. https://doi.org/10.1155/2014/147648
Article CAS Google Scholar
Rappe M, Giovannoni S (2003) The uncultured microbial majority. Annu Rev Microbiol 57:369–394. https://doi.org/10.1146/annurev.micro.57.030502.090759
Article CAS Google Scholar
Remita MA, Halioui A, Malick Diouara AA, Daigle B, Kiani G, Diallo AB (2017) A machine learning approach for viral genome classification. BMC Bioinformatics 18:208. https://doi.org/10.1186/s12859-017-1602-3
Article Google Scholar
Ren J, Ahlgren NA, Lu YY, Fuhrman JA, Sun F (2017) VirFinder: a novel k-mer based tool for identifying viral sequences from assembled metagenomic data. Microbiome 5:69. https://doi.org/10.1186/s40168-017-0283-5
Article Google Scholar
Riesenfeld CS, Goodman RM, Handelsman J (2004a) Uncultured soil bacteria are a reservoir of new antibiotic resistance genes. Environ Microbiol 6:981–989. https://doi.org/10.1111/j.1462-2920.2004.00664.x
Article CAS Google Scholar
Riesenfeld CS, Schloss P, Handelsman J (2004b) Metagenomics: genomic analysis of microbial communities. Annu Rev Genet 38:525–552. https://doi.org/10.1146/annurev.genet.38.072902.091216
Article CAS Google Scholar
Robert AP III, Timothy DR (2020) Bactopia: a flexible pipeline for complete analysis of bacterial genomes. mSystems. 5(4). https://doi.org/10.1128/mSystems.00190-20
Rodriguez-Valera F (2004) Environmental genomics, the big picture? FEMS Microbiol Lett 231:153–158. https://doi.org/10.1016/S0378-1097(04)00006-0
Article CAS Google Scholar
Ronaghi M, Karamohamed S, Pettersson B, Uhlen M, Nyren P (1996) Real-time DNA sequencing using detection of pyrophosphate release. Anal Biochem 242:84–89. https://doi.org/10.1006/abio.1996.0432
Article CAS Google Scholar
Roux S, Enault F, Hurwitz BL, Sullivan MB (2015) VirSorter: mining viral signal from microbial genomic data. Peer J 3:e985. https://doi.org/10.7717/peerj.985
Article CAS Google Scholar
Sathya R, Abraham A (2013) Comparison of supervised and unsupervised learning algorithms for pattern classification. Int J Adv Res Artif Intell 2:34–8. https://doi.org/10.14569/IJARAI.2013.020206
Article Google Scholar
Saxena D, Sharma A, Siddiqui MH, Kumar R (2019) Blood brain barrier permeability prediction using machine learning techniques: an update. Curr Pharm Biotechnol 20(14):1163–1171. https://doi.org/10.2174/1389201020666190821145346
Article CAS Google Scholar
Schwengers O, Hoek A, Fritzenwanker M, Falgenhauer L, Hain T et al (2020) ASA³P: An automatic and scalable pipeline for the assembly, annotation and higher-level analysis of closely related bacterial isolates. PLOS Comput Biol 16:e1007134. https://doi.org/10.1371/journal.pcbi.1007134
Article CAS Google Scholar
Seemann T (2014) Prokka: rapid prokaryotic genome annotation. Bioinformatics 30(14):2068–2069. https://doi.org/10.1093/bioinformatics/btu153
Article CAS Google Scholar
Shean RC, Makhsous N, Stoddard GD et al (2019) VAPiD: a lightweight cross-platform viral annotation pipeline and identification tool to facilitate virus genome submissions to NCBI GenBank. BMC Bioinformatics 20:48. https://doi.org/10.1186/s12859-019-2606-y
Article Google Scholar
Shi JY, Li JX, Lu HM (2016) Predicting existing targets for new drugs base on strategies for missing interactions. BMC Bioinfo 17:282. https://doi.org/10.1186/s12859-016-1118-2
Article Google Scholar
Singh R, Singh PK, Kumar R, Kabir MT, Kamal MA, Rauf A, Albadrani GM, Sayed AA, Mousa SA, Abdel-Daim MM, Uddin MS (2021) Multi-omics approach in the identification of potential therapeutic biomolecule for COVID-19. Front Pharm 12:652335. https://doi.org/10.3389/fphar.2021.652335
Article CAS Google Scholar
Singh R, Porwal P (2021) Innovative technologies for enzyme production from extremophilic microbes. Md Kuddus (Ed), Microbial extremozymes: novel sources and industrial applications. 30–37. Elsevier Academic Press. ISBN: 978–0–12–822945–3. https://doi.org/10.1016/B978-0-12-822945-3.00009-9
Singh R, Chuhan N, Kuddus Md (2021a) Exploring the therapeutic potential of marine-derived bioactive compounds against COVID-19. Env Sci Pol Res 1–12. https://doi.org/10.1007/s11356-021-16104-6
Slater M, Kiggundu M, Dokomajilar C et al (2005) Distinguishing recrudescences from new infections in antimalarial clinical trials: major impact of interpretation of genoty** results on estimates of drug efficacy. Am J Trop Med Hyg 73:256–262. https://doi.org/10.4269/ajtmh.2005.73.256
Article CAS Google Scholar
Steele HL, Jaeger KE, Daniel R, Streit WR (2009) Advances in recovery of novel biocatalysts from metagenomes. J Mol Microbiol Biotechnol 16:25–37
CAS Google Scholar
Steiner MC, Gibson KM, Crandall KA (2020) Drug resistance prediction using deep learning techniques on HIV-1 sequence data. Viruses 12(5):560. https://doi.org/10.3390/v12050560
Article CAS Google Scholar
Stokes JM, Yang K, Swanson K, ** W, Cubillos-Ruiz A, Donghia NM, MacNair CR, French S, Carfrae LA, Bloom-Ackermann Z, Tran VM et al (2020) A deep learning approach to antibiotic discovery. Cell 181:475–483. https://doi.org/10.1016/j.cell.2020.01.021
Article CAS Google Scholar
Stoter FR, Chakrabarty S, Edler B, Habetse EAP (2019) CountNet: estimating the number of concurrent speakers using supervised learning. IEEE/ACM Trans Audio Speech Lang Process 27:268–282. https://doi.org/10.1109/taslp.2018.2877892
Article Google Scholar
Streit WR, Schmitz RA (2004) Metagenomics—the key to the uncultured microbes. Curr Opin Microbiol 7:492–498. https://doi.org/10.1016/j.mib.2004.08.002
Article CAS Google Scholar
Su R, Wu H, Xu B, Liu X, Wei L (2018) Develo** a multi-dose computational model for drug-induced hepatotoxicity prediction based on toxicogenomics data. IEEE/ACM Trans Comput Biol Bioinform. https://doi.org/10.1109/tcbb.2018.2858756
Article Google Scholar
Su X, Xu J, Yin Y, Quan X, Zhang H (2019) Antimicrobial peptide identification using multiscale convolutional network. BMC Bioinfo 20:730. https://doi.org/10.1186/s12859-019-3327-y
Article CAS Google Scholar
Talundzic E, Plucinski MM, Biliya S et al (2016) Advanced molecular detection of malarone resistance. Antimicrob Agents Chemother 60:3821–3823. https://doi.org/10.1128/AAC.00171-16
Article CAS Google Scholar
Tatusova T, DiCuccio M, Badretdin A, Chetvernin V, Nawrocki EP, Zaslavsky L, Lomsadze A, Pruitt KD, Borodovsky M, Ostell J (2016) NCBI prokaryotic genome annotation pipeline. Nucleic Acids Res 44(14):6614–24. https://doi.org/10.1093/nar/gkw569
Article CAS Google Scholar
Treangen TJ, Ondov BD, Koren S, Phillippy AM (2014) The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes. Genome Biol 15(11):524. https://doi.org/10.1186/s13059-014-0524-x
Article CAS Google Scholar
Vallenet D, Engelen S, Mornico D, Cruveiller S, Fleury L, Lajus A, Rouy Z, Roche D, Salvignol G, Scarpelli C, Médigue C (2009) MicroScope: a platform for microbial genome annotation and comparative genomics. Database (Oxford) 2009:bap021. https://doi.org/10.1093/database/bap021
Article CAS Google Scholar
Vamathevan J, Clark D, Czodrowski P, Dunham I, Ferran E, Lee G, Li B, Madabhushi A, Shah P, Spitzer M, Zhao S (2019) Applications of machine learning in drug discovery and development. Nat Rev Drug Discov 6:463–477. https://doi.org/10.1038/s41573-019-0024-5
Article CAS Google Scholar
van der Walt A, van Goethem M, Ramond JB et al (2017) Assembling metagenomes, one community at a time. BMC Geno 18:521. https://doi.org/10.1186/s12864-017-3918-9
Article Google Scholar
Veltri D, Kamath U, Shehu A (2018) Deep learning improves antimicrobial peptide recognition. Bioinfo 34:2740–2747. https://doi.org/10.1093/bioinformatics/bty179
Article CAS Google Scholar
Villasana MV, Pires IM, Sá J, Garcia NM, Zdravevski E, Chorbev I, Lameski P, Flórez-Revuelta F (2020) Promotion of healthy nutrition and physical activity lifestyles for teenagers: a systematic literature review of the current methodologies. J Pers Med 10:12
Article Google Scholar
Virgin HW, Todd JA (2011) Metagenomics and personalized medicine. Cell 147:44–56
Article CAS Google Scholar
Walker TM, Kohl TA, Omar SV (2015) Whole-genome sequencing for prediction of Mycobacterium tuberculosis drug susceptibility and resistance: a retrospective cohort study. Lancet Infect 18:21–31. https://doi.org/10.1016/S1473-3099(15)00062-6
Article CAS Google Scholar
Wang X, Wang S, Song T (2019) A spectral rotation method with triplet periodicity property for planted motif finding problems. Comb Chem High Throughput Screen 22:683–693. https://doi.org/10.2174/1386207322666191129112433
Article CAS Google Scholar
Wimley WC, Hristova K (2011) Antimicrobial peptides: successes, challenges and unanswered questions. J Membr Biol 239:27–34. https://doi.org/10.1007/s00232-011-9343-0
Article CAS Google Scholar
Wylie TN, Wyile KM (2021) ViroMatch: a computational pipeline for the detection of viral sequences from complex metagenomic data. Microbiol Resour Announc 10:e01468-e1520. https://doi.org/10.1128/MRA.01468-20
Article CAS Google Scholar
Xu Y, Zhang YH, Li J, Pan XY, Huang T, Cai YD (2019) New computational tool based on machine-learning algorithms for the identification of rhinovirus infection-related genes. Comb Chem High Throughput Screen 22:665–674. https://doi.org/10.2174/1386207322666191129114741
Article CAS Google Scholar
Yang Y, Niehaus KE, Walker TM et al (2018) Machine learning for classifying tuberculosis drug-resistance from DNA sequencing data. Bioinfo 34:1666–1671. https://doi.org/10.1093/bioinformatics/btx801
Article CAS Google Scholar
Yu L, Huang JB, Ma ZX, Zhang J, Zou YP, Gao L (2015) Inferring drug-disease associations based on known protein complexes. BMC Med Genomics 8:S2. https://doi.org/10.1186/1755-8794-8-s2-s2
Article Google Scholar
Yu L, Wang BB, Ma XK, Gao L (2016) The extraction of drug-disease correlations based on module distance in incomplete human interactome. BMC Syst Biol 10:111. https://doi.org/10.1186/s12918-016-0364-2
Article CAS Google Scholar
Yu L, Zhao J, Gao L (2017) Drug repositioning based on triangularly balanced structure for tissue-specific diseases in incomplete interactome. Artif Intell Med 77:53–63. https://doi.org/10.1016/j.artmed.2017.03.009
Article Google Scholar
Zeng XX, Ding NX, Rodriguez-Paton A, Zou Q (2017) Probability- based collaborative filtering model for predicting gene-disease associations. BMC Med Genomics 10:76. https://doi.org/10.1186/s12920-017-0313-y
Article Google Scholar
Zhang HT, Li DF, Zhao LL et al (2013) Genome sequencing of 161 Mycobacterium tuberculosis isolates from China identifies genes and intergenic regions associated with drug resis- tance. Nat Genet 45:1255–1217. https://doi.org/10.1038/ng.2735
Article CAS Google Scholar
Zhang X, Zou Q, Rodriguez-Paton A, Zeng XX (2019) Meta-path methods for prioritizing candidate disease miRNAs. IEEE/ACM Trans Comput Biol Bioinform 16:283–291. https://doi.org/10.1109/tcbb.2017.2776280
Article CAS Google Scholar
Zhao G, Wu G, Lim ES, Droit L, Krishnamurthy S, Barouch DH, Virgin HW, Wang D (2017) VirusSeeker, a computational pipeline for virus discovery and virome composition analysis. Virology 503:21–30. https://doi.org/10.1016/j.virol.2017.01.005
Article CAS Google Scholar
Zhao Q, Yu H, Ming Z, Hu H, Ren G, Liu H (2018) The bipartite network projection-recommended algorithm for predicting long noncoding RNA-protein interactions. Mol Ther Nucleic Acids 13:464–471. https://doi.org/10.1016/j.omtn.2018.09.020
Article CAS Google Scholar
Zitnik M, Nguyen F, Wang B, Leskovec J, Goldenberg A, Hoffman MM (2019) Machine learning for integrating data in biology and medicine: principles, practice, and opportunities. Int J Inf Fusion 50:71–91. https://doi.org/10.1016/j.inffus.2018.09.012
Article Google Scholar
Zou Q, Li JJ, Song L, Zeng XX, Wang GH (2016) Similarity computation strategies in the microRNA-disease network: a survey. Brief Funct Genomics 15:55–64. https://doi.org/10.1093/bfgp/elv024
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Amity Institute of Biotechnology, Amity University Uttar Pradesh Lucknow Campus, Lucknow, Uttar Pradesh, India
Rajnish Kumar, Garima Yadav & Rachana Singh
Department of Biochemistry, College of Medicine, University of Hail, Hail, Saudi Arabia
Mohammed Kuddus
Department of Veterinary Medicine and Surgery, College of Veterinary Medicine, University of Missouri, Columbia, MO, USA
Rajnish Kumar
Department of Medical Laboratory Sciences, College of Health Sciences, and Sharjah Institute for Medical Research, University of Sharjah, Sharjah , 27272, United Arab Emirates
Ghulam Md Ashraf

Authors

Rajnish Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Garima Yadav
View author publications
You can also search for this author in PubMed Google Scholar
Mohammed Kuddus
View author publications
You can also search for this author in PubMed Google Scholar
Ghulam Md Ashraf
View author publications
You can also search for this author in PubMed Google Scholar
Rachana Singh
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the article and approved the submitted version.

Rachana Singh and Rajnish Kumar were involved in the design, conception, and critical revision of the manuscript for intellectual content. Garima Yadav was involved in the compilation of the manuscript and incorporation of important relevant information. Mohammed Kuddus and Ghulam Md Ashraf critically examined and revised the manuscript.

Corresponding author

Correspondence to Rachana Singh.

Ethics declarations

Ethics approval and consent to participate

NA

Consent for publication

All the authors have given their consent to publish this review.

Conflict of interest

The authors declare no competing interests.

Additional information

Responsible Editor: Philippe Garrigues

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kumar, R., Yadav, G., Kuddus, M. et al. Unlocking the microbial studies through computational approaches: how far have we reached?. Environ Sci Pollut Res 30, 48929–48947 (2023). https://doi.org/10.1007/s11356-023-26220-0

Download citation

Received: 23 March 2022
Accepted: 24 February 2023
Published: 15 March 2023
Issue Date: April 2023
DOI: https://doi.org/10.1007/s11356-023-26220-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Unlocking the microbial studies through computational approaches: how far have we reached?

Abstract

Similar content being viewed by others

Use Cases and Future Aspects of Intelligent Techniques in Microbial Data Analysis

Machine learning algorithm to characterize antimicrobial resistance associated with the International Space Station surface microbiome

Understanding the Role of (Advanced) Machine Learning in Metagenomic Workflows

Introduction

In silico approaches for microbial genomics

Machine learning for metagenomic data analysis

Machine learning for disease prediction and classification

Application of ML in antimicrobial resistance

Limitations and conclusions

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Conflict of interest

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Unlocking the microbial studies through computational approaches: how far have we reached?

Abstract

Similar content being viewed by others

Use Cases and Future Aspects of Intelligent Techniques in Microbial Data Analysis

Machine learning algorithm to characterize antimicrobial resistance associated with the International Space Station surface microbiome

Understanding the Role of (Advanced) Machine Learning in Metagenomic Workflows

Introduction

In silico approaches for microbial genomics

Machine learning for metagenomic data analysis

Machine learning for disease prediction and classification

Application of ML in antimicrobial resistance

Limitations and conclusions

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Conflict of interest

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation