Automated machine learning optimizes and accelerates predictive modeling from COVID-19 high throughput datasets

Papoutsoglou, Georgios; Karaglani, Makrina; Lagani, Vincenzo; Thomson, Naomi; Røe, Oluf Dimitri; Tsamardinos, Ioannis; Chatzaki, Ekaterini

doi:10.1038/s41598-021-94501-0

Automated machine learning optimizes and accelerates predictive modeling from COVID-19 high throughput datasets

Article
Open access
Published: 23 July 2021

Volume 11, article number 15107, (2021)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Automated machine learning optimizes and accelerates predictive modeling from COVID-19 high throughput datasets

Download PDF

Georgios Papoutsoglou^1,2^na1,
Makrina Karaglani^1,3^na1,
Vincenzo Lagani^1,4,
Naomi Thomson¹,
Oluf Dimitri Røe^5,6,
Ioannis Tsamardinos^1,2^na1 &
…
Ekaterini Chatzaki^3,7^na1

3765 Accesses
20 Citations
19 Altmetric
2 Mentions
Explore all metrics

Abstract

COVID-19 outbreak brings intense pressure on healthcare systems, with an urgent demand for effective diagnostic, prognostic and therapeutic procedures. Here, we employed Automated Machine Learning (AutoML) to analyze three publicly available high throughput COVID-19 datasets, including proteomic, metabolomic and transcriptomic measurements. Pathway analysis of the selected features was also performed. Analysis of a combined proteomic and metabolomic dataset led to 10 equivalent signatures of two features each, with AUC 0.840 (CI 0.723–0.941) in discriminating severe from non-severe COVID-19 patients. A transcriptomic dataset led to two equivalent signatures of eight features each, with AUC 0.914 (CI 0.865–0.955) in identifying COVID-19 patients from those with a different acute respiratory illness. Another transcriptomic dataset led to two equivalent signatures of nine features each, with AUC 0.967 (CI 0.899–0.996) in identifying COVID-19 patients from virus-free individuals. Signature predictive performance remained high upon validation. Multiple new features emerged and pathway analysis revealed biological relevance by implication in Viral mRNA Translation, Interferon gamma signaling and Innate Immune System pathways. In conclusion, AutoML analysis led to multiple biosignatures of high predictive performance, with reduced features and large choice of alternative predictors. These favorable characteristics are eminent for development of cost-effective assays to contribute to better disease management.

Efficient analysis of COVID-19 clinical data using machine learning models

Article 04 May 2022

COVID-19 Biomarkers Detection Using ‘KnowSeq’ R Package

Integrated multiomics analysis to infer COVID-19 biological insights

Article Open access 31 January 2023

Introduction

The novel coronavirus SARS-CoV-2 has spread within a few months from the beginning of 2020 to become a world-wide pandemic¹. At the end of 2020, nearly 100 million people have verified infection, leading to more than 2 million deaths, more than all yearly deaths of lung cancer worldwide. This rapid outbreak brings intense pressure on healthcare systems, with an urgent demand for effective diagnostic, prognostic and therapeutic procedures for COVID-19. Scientists and clinicians are addressing this call with remarkable effort and energy by collecting data and information on diverse domains, as shown by the thousands of articles published on the topic since the beginning of the outbreak³⁸. For large datasets JADBio may decide to use a simple Hold-Out. Notice that configurations are cross-validated as atoms, i.e., all the combined algorithmic steps are simultaneously cross-validated as one. This avoids serious methodological errors, such as performing feature selection on the whole dataset first, and then estimating performance by cross-validating only the modeling algorithm (see [³⁹, page 245] for an eye-opening experiment of the severity of this type of error). Once all decisions are made, JADBio searches in the space of admissible configurations to identify the one leading to the optimal performance²⁰. The final model and final selection of features are produced by applying the winning configuration on the full dataset: on average this will lead to the optimal model out of all tries. Thus, JADBio does not lose samples to estimation. To estimate the performance of the final model JADBio uses the Bootstrap Bias Correction estimation method or BBC³⁸. BBC is conceptually equivalent to adjusting p-values in hypothesis testing for the fact that many hypotheses have been tested; similarly, BBC adjusts prediction performances for the fact that many configurations (combinations of algorithms) have been tried. JADBio’s performance estimation has been validated in large computational experiments²⁰: on average it is conservative. Hence, JADBio removes the need for an external validation set, provided it is applied on populations with the same distribution as the training data (JADBio does not remove the need for externally validating the presence of other factors that may affect performance such as systematic biases, batch effects, data distribution drifts, and others). JADBio provides an option for “aggressive feature selection” preference. Aggressive feature selection tries feature selection algorithms that on average return fewer selected features at a possible expense of predictive performance. The feature selection algorithms may also return multiple selected feature subsets (signatures) that lead to equally predictive models, up to statistical equivalence based on the training set. Regarding the specific algorithms tried, for classification tasks JADBio employs Lasso⁴⁰ and Statistical Equivalent Signatures or SES⁴¹ for feature selection. Such algorithms do not only remove irrelevant features, as differential expression analysis does, but also features that are redundant given the selected ones, i.e., they do not carry any informational added value for prediction. Hence, feature selection considers features in combination, while differential expression analysis does not. In addition, we would like to note that SES performs multiple feature selection and not single feature selection as Lasso. Specifically, as its name suggests, SES reports multiple feature selection subsets that lead to equally predictive models (up to statistical equivalence). This is important for providing choices to the designer of diagnostic assays and laboratory tests to select to measure the markers that are cost-effective to measure reliably. For modeling, JADBio tries Decision Trees, Ridge Logistic Regression, Random Forests, (linear, polynomial, and RBF) Support Vector Machines, as well as the baseline algorithm that classifies to the most prevalent class. However, we do note that this list is only indicative, as we constantly keep enriching the algorithmic arsenal of JADBio. All of the above are transparent to the user who is not required to make any analysis decisions. JADBio outputs (i) the (bio)signature, i.e., the minimal subset of features that ensures maximal predictive power, (ii) the optimal predictive model associated with the selected biosignature, (iii) estimates of predictive performance of the final model along with its confidence interval, (iv) numerous other visuals to interpret results. These include graphs that explain the role of the features in the optimal model [called ICE plots⁴²], estimates of the added value of each feature in the model, residual plots, samples that are identified as “hard to predict”, and others.

Threshold selection and optimization for clinical application is an important issue. Using a default 0.5 threshold on the probability is problematic: (i) the probabilities output by Random Forests and similar machine learning models are not trustworthy. They indicate the relative risk of the patient but they are not calibrated⁴³. (ii) A 0.5 threshold assumes that the cost of false negative predictions is equal to the cost of false positive predictions. Obviously, this is not the case for COVID-19 severe cases: falsely predicting patients as ‘non-severe’ critically affects their survival while falsely 'severe' considered patients may just take stronger treatments and/or make unnecessary use of medical resources. This means that, for clinical applications, the classification threshold needs to be optimized. JADBio facilitates threshold optimization as follows: the circles on the ROC curve of the model correspond to different classification thresholds. Each circle reports a different tradeoff between false positive (FPR) and true positive rate (TPR). The user can click on a circle and select the threshold that optimizes the trade-off between FPR and TPR for clinical application. We note that the FPR, TPR and all other metrics (along with confidence intervals) reported in each circle are also adjusted for multiple tries and the “winner’s curse” using the BBC protocol to avoid overestimation.

To avoid comparing predictive performance between training and test sets with different class balance, we employ only performance metrics that are independent and invariant to the class distribution (balancing), ie the Area Under the ROC Curve (AUC) and the Average Precision (equivalent to the area under the precision-recall curve). For the purpose of the current analysis, all comparisons employ the AUC metric.

Datasets

Principal Case: Proteomic and metabolomic profiles from sera of COVID-19 patients with severe and non-severe disease were retrieved from Shen et al. publication⁹. Three datasets were downloaded: a training dataset (C1 Training) of 13 severe and 18 non-severe COVID-19 patients containing 1638 features, a validation dataset (C2 Validation) of four severe and six non-severe COVID-19 patients containing 1589 features and another validation dataset (C3 Validation) of 12 severe and seven non-severe COVID-19 patients containing 29 targeted features. Although the authors kindly agreed to provide all data and information, we could not validate our models in C3 as the values were obtained through a different technology (targeted metabolomics). In any case, the features measured in C3 only partially overlapped with those selected through AutoML, so such validation would not be relevant.

Case Study 1: Gene expression profiles from host/viral metagenomic sequencing of upper airway samples of COVID-19 patients were compared with patients with other viral and non-viral acute respiratory illnesses (ARIs). There were 93 COVID-19 patients, 100 patients with other viral ARI and 41 patients with non-viral ARI that were retrieved from Mick et al.’s publication²⁹ and the GSE156063 dataset from GEO database. In specific, three datasets were used: A. a training dataset of 93 COVID-19 patients and 141 patients with ARIs (viral and non-viral) of 15,981 features, B. a training dataset of 93 COVID-19 patients and 100 patients with other viral ARI of 15,981 features and C. another training dataset of 93 COVID-19 patients and 41 patients with non-viral ARI of 15,981 features.

Case Study 2: Nasopharyngeal swab samples analyzed with RNA-sequencing from COVID-19 versus non-COVID-19 patients were compared. The dataset contained 35,787 features from nasopharyngeal swab samples of 430 individuals with PCR-confirmed SARS-CoV-2 presence (COVID-19 patients) and 54 non-COVID-19 patients (negative controls) that were retrieved from Lieberman et al.’s publication³⁰ and the GSE152075 dataset from GEO database.

Data preprocessing

The C1 and C2 cohort data of Shen et al. were used as they were publicly deposited. The raw Mick et al. data were preprocessed as in the original publication (variance stabilization) using the code provided by the authors. For the Lieberman et al. dataset, we followed the data preprocessing performed in the original publication and filtered out from all analyses those genes whose average counts were 1 or 0.

Pathway analysis

The biological involvement and related pathways of identified features were searched using the GeneCards—The Human gene database tool (https://www.genecards.org/).

Results and Software availability

The complete analysis results for each dataset such as the configurations tried, the configurations selected to produce the final model, the estimation protocol and the out-of-sample predictions of the cross-validations are reported via unique links in Supplementary Table 1. These results could be employed to compare JADBio against any other methodology, architecture, or algorithm on the same datasets. JADBio is available as SaaS platform at www.jadbio.com where a free trial version is offered. In addition, a free research license is guaranteed for researchers that wish to reproduce results or compare against the current version of JADBio (restrictions apply). Researchers can apply for this license by sending a request on the platform’s webpage. JADBio mainly uses in-house implementations of individual, widely accepted, algorithms (e.g., Decision Trees, Random Forests, etc.) and of unique algorithms namely the SES multiple feature selection algorithm, and the BBC-CV for adjusting CV performance for multiple testing. Open-source implementations of the latter two are available at the MXM R Package⁴¹. Supplementary Table 6 reports all individual algorithms employed by JADBio.

Data availability

All datasets used in the paper are publicly available. Original data for datasets analyzed can be found in the following publications: Lieberman et al.²⁸; Mick et al.²⁷ and Shen et al.⁹.

References

Sachs, J. D. et al. Lancet COVID-19 commission statement on the occasion of the 75th session of the UN General Assembly. Lancet 396, 1102–1124. https://doi.org/10.1016/s0140-6736(20)31927-9 (2020).
Article CAS Google Scholar
Lu Wang, L. et al. CORD-19: The Covid-19 Open Research Dataset. ar**v:2004.10706v10702 (2020).
Albahri, A. S. et al. Role of biological data mining and machine learning techniques in detecting and diagnosing the novel coronavirus (COVID-19): A systematic review. J. Med. Syst. 44, 122. https://doi.org/10.1007/s10916-020-01582-x (2020).
Article CAS PubMed PubMed Central Google Scholar
Alimadadi, A. et al. Artificial intelligence and machine learning to fight COVID-19. Physiol. Genom. 52, 200–202. https://doi.org/10.1152/physiolgenomics.00029.2020 (2020).
Article CAS Google Scholar
Santosh, K. C. AI-driven tools for coronavirus outbreak: Need of active learning and cross-population train/test models on multitudinal/multimodal data. J. Med. Syst. 44, 93. https://doi.org/10.1007/s10916-020-01562-1 (2020).
Article CAS PubMed PubMed Central Google Scholar
Zame, W. R. et al. Machine learning for clinical trials in the era of COVID-19. Stat. Biopharm. Res. 12, 506–517. https://doi.org/10.1080/19466315.2020.1797867 (2020).
Article PubMed PubMed Central Google Scholar
Randhawa, G. S. et al. Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study. PLoS ONE 15, e0232391. https://doi.org/10.1371/journal.pone.0232391 (2020).
Article CAS PubMed PubMed Central Google Scholar
Assaf, D. et al. Utilization of machine-learning models to accurately predict the risk for critical COVID-19. Intern. Emerg. Med. 15, 1435–1443. https://doi.org/10.1007/s11739-020-02475-0 (2020).
Article PubMed PubMed Central Google Scholar
Shen, B. et al. Proteomic and metabolomic characterization of COVID-19 patient sera. Cell 182, 59–72. https://doi.org/10.1016/j.cell.2020.05.032 (2020).
Article CAS PubMed PubMed Central Google Scholar
Yan, L. et al. An interpretable mortality prediction model for COVID-19 patients. Nat. Mach. Intell. 2, 283–288. https://doi.org/10.1038/s42256-020-0180-7 (2020).
Article Google Scholar
Ardakani, A. A., Kanafi, A. R., Acharya, U. R., Khadem, N. & Mohammadi, A. Application of deep learning technique to manage COVID-19 in routine clinical practice using CT images: Results of 10 convolutional neural networks. Comput. Biol. Med. 121, 103795. https://doi.org/10.1016/j.compbiomed.2020.103795 (2020).
Article CAS PubMed PubMed Central Google Scholar
Singh, D., Kumar, V., Vaishali, & Kaur, M. Classification of COVID-19 patients from chest CT images using multi-objective differential evolution-based convolutional neural networks. Eur. J. Clin. Microbiol. Infect. Dis. 39, 1379–1389. https://doi.org/10.1007/s10096-020-03901-z (2020).
Article CAS PubMed PubMed Central Google Scholar
Swapnarekha, H., Behera, H. S., Nayak, J. & Naik, B. Role of intelligent computing in COVID-19 prognosis: A state-of-the-art review. Chaos, Soliton. Fractals 138, 109947. https://doi.org/10.1016/j.chaos.2020.109947 (2020).
Article CAS Google Scholar
Fakhfakh, M., Bouaziz, B., Gargouri, F. & Chaari, L. ProgNet: COVID-19 prognosis using recurrent and convolutional neural networks. Open Med. Imaging J. 12, 2 (2020).
Article Google Scholar
Yang, Z. et al. Modified SEIR and AI prediction of the epidemics trend of COVID-19 in China under public health interventions. J. Thorac. Dis. 12, 165–174. https://doi.org/10.21037/jtd.2020.02.64 (2020).
Article PubMed PubMed Central Google Scholar
Zhang, H. et al. Deep learning based drug screening for novel coronavirus 2019-nCov. Interdiscip. Sci. 12, 368–376. https://doi.org/10.1007/s12539-020-00376-6 (2020).
Article CAS PubMed PubMed Central Google Scholar
Yu, K. H., Beam, A. L. & Kohane, I. S. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2, 719–731. https://doi.org/10.1038/s41551-018-0305-z (2018).
Article PubMed Google Scholar
Wynants, L. et al. Prediction models for diagnosis and prognosis of covid-19 infection: Systematic review and critical appraisal. BMJ 369, m1328. https://doi.org/10.1136/bmj.m1328 (2020).
Article PubMed PubMed Central Google Scholar
Chatzaki, E. & Tsamardinos, I. Somatic copy number aberrations detected in circulating tumor DNA can hold diagnostic value for early detection of hepatocellular carcinoma. EBioMedicine 57, 102851. https://doi.org/10.1016/j.ebiom.2020.102851 (2020).
Article PubMed PubMed Central Google Scholar
Tsamardinos, I. et al. Just add data: Automated predictive modeling and biosignature discovery. bioRxiv https://doi.org/10.1101/2020.05.04.075747 (2020).
Article Google Scholar
Borboudakis, G. et al. Chemically intuited, large-scale screening of MOFs by machine learning techniques. NPJ Comput. Mater. 3, 40. https://doi.org/10.1038/s41524-017-0045-8 (2017).
Article ADS CAS Google Scholar
Orfanoudaki, G., Markaki, M., Chatzi, K., Tsamardinos, I. & Economou, A. MatureP: Prediction of secreted proteins with exclusive information from their mature regions. Sci. Rep. 7, 3263. https://doi.org/10.1038/s41598-017-03557-4 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Adamou, M. et al. Toward automatic risk assessment to support suicide prevention. Crisis 40, 249–256. https://doi.org/10.1027/0227-5910/a000561 (2019).
Article PubMed Google Scholar
Panagopoulou, M. et al. Circulating cell-free DNA in breast cancer: Size profiling, levels, and methylation patterns lead to prognostic and predictive classifiers. Oncogene 38, 3387–3401. https://doi.org/10.1038/s41388-018-0660-y (2019).
Article CAS PubMed Google Scholar
Montesanto, A. et al. A new robust epigenetic model for forensic age prediction. J. Forens. Sci. 65, 1424–1431. https://doi.org/10.1111/1556-4029.14460 (2020).
Article CAS Google Scholar
Karaglani, M., Gourlia, K., Tsamardinos, I. & Chatzaki, E. Accurate blood-based diagnostic biosignatures for Alzheimer’s disease via automated machine learning. J. Clin. Med. https://doi.org/10.3390/jcm9093016 (2020).
Article PubMed PubMed Central Google Scholar
Panagopoulou, M. et al. Deciphering the methylation landscape in breast cancer: diagnostic and prognostic biosignatures through automated machine learning. Cancers 13(7), 1677. https://doi.org/10.3390/cancers13071677 (2021).
Nagy, Á., Ligeti, B., Szebeni, J., Pongor, S. & Győrffy, B. COVIDOUTCOME—estimating COVID severity based on mutation signatures in the SARS-CoV-2 genome. bioRxiv https://doi.org/10.1101/2021.04.01.438063 (2021).
Article Google Scholar
Mick, E. et al. Upper airway gene expression reveals suppressed immune responses to SARS-CoV-2 compared with other respiratory viruses. Nat. Commun. 11, 5854. https://doi.org/10.1038/s41467-020-19587-y (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Lieberman, N. A. P. et al. In vivo antiviral host transcriptional response to SARS-CoV-2 by viral load, sex, and age. PLoS Biol. 18, e3000849. https://doi.org/10.1371/journal.pbio.3000849 (2020).
Article CAS PubMed PubMed Central Google Scholar
Miwata, H. et al. Serum amyloid A protein in acute viral infections. Arch. Dis. Child 68, 210–214. https://doi.org/10.1136/adc.68.2.210 (1993).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. Serum amyloid A is a biomarker of severe Coronavirus Disease and poor prognosis. J. Infect. 80, 646–655. https://doi.org/10.1016/j.**f.2020.03.035 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kumar, Y., Yadav, R. & Bhatia, A. Can natural detergent properties of bile acids be used beneficially in tackling coronavirus disease-19?. Futur. Virol. 15, 779–782. https://doi.org/10.2217/fvl-2020-0210 (2020).
Article CAS Google Scholar
Boeske, A. et al. Direct binding to GABARAP family members is essential for HIV-1 Nef plasma membrane localization. Sci. Rep. 7, 5979. https://doi.org/10.1038/s41598-017-06319-4 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Huang, L. et al. Blood single cell immune profiling reveals the interferon-MAPK pathway mediated adaptive immune response for COVID-19. MedRxiv https://doi.org/10.1101/2020.03.15.20033472 (2020).
Article PubMed PubMed Central Google Scholar
Vastrad, B., Vastrad, C. & Tengli, A. Bioinformatics analyses of significant genes, related pathways, and candidate diagnostic biomarkers and molecular targets in SARS-CoV-2/COVID-19. Gene Rep. 21, 100956. https://doi.org/10.1016/j.genrep.2020.100956 (2020).
Article PubMed PubMed Central Google Scholar
Coperchini, F., Chiovato, L., Croce, L., Magri, F. & Rotondi, M. The cytokine storm in COVID-19: An overview of the involvement of the chemokine/chemokine-receptor system. Cytokine Growth Factor Rev. 53, 25–32. https://doi.org/10.1016/j.cytogfr.2020.05.003 (2020).
Article CAS PubMed PubMed Central Google Scholar
Tsamardinos, I., Greasidou, E. & Borboudakis, G. Bootstrap** the out-of-sample predictions for efficient and accurate cross-validation. Mach. Learn. 107, 1895–1922. https://doi.org/10.1007/s10994-018-5714-4 (2018).
Article MathSciNet PubMed PubMed Central MATH Google Scholar
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning (Springer-Verlag, 2009).
Book Google Scholar
Tibshirani, R. Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser. B (Methodol.) 58, 267–288. https://doi.org/10.1111/j.2517-6161.1996.tb02080.x (1996).
Article MathSciNet MATH Google Scholar
Lagani, V., Athineou, G., Farcomeni, A., Tsagris, M. & Tsamardinos, I. Feature selection with the R package MXM: Discovering statistically equivalent feature subsets. J. Stat. Softw. 80, 1–25. https://doi.org/10.18637/jss.v080.i07 (2017).
Article Google Scholar
Goldstein, A., Kapelner, A., Bleich, J. & Pitkin, E. Peeking inside the black box: Visualizing statistical learning with plots of individual conditional expectation. J. Comput. Graph. Stat. 24, 44–65. https://doi.org/10.1080/10618600.2014.907095 (2015).
Article MathSciNet Google Scholar
Zadrozny, B. & Elkan, C. in Proceedings of the Eighteenth International Conference on Machine Learning 609–616 (Morgan Kaufmann Publishers Inc., New York, 2001).

Download references

Acknowledgements

The authors would like to acknowledge Pavlos Charonyktakis and Pavlos Katsogridakis for providing insights into the JADBio platform and their invaluable help with carrying out the computational experiments. We would also like to express our gratitude to the authors of the original studies for providing full access to their original datasets and in particular to Professor Tiannan Guo, Westlake University, China, who eagerly responded and clarified details on their study.

Author information

These authors contributed equally: Georgios Papoutsoglou, Makrina Karaglani, Ioannis Tsamardinos and Ekaterini Chatzaki.

Authors and Affiliations

JADBio, Gnosis Data Analysis PC, Science and Technology Park of Crete, N. Plastira 100, Vassilika Vouton, 70013, Heraklion, Crete, Greece
Georgios Papoutsoglou, Makrina Karaglani, Vincenzo Lagani, Naomi Thomson & Ioannis Tsamardinos
Computer Science Department, University of Crete, Voutes Campus, 70013, Heraklion, Crete, Greece
Georgios Papoutsoglou & Ioannis Tsamardinos
Laboratory of Pharmacology, Medical School, Democritus University of Thrace, 68100, Alexandroupolis, Greece
Makrina Karaglani & Ekaterini Chatzaki
Institute of Chemical Biology, Ilia State University, Kakutsa Cholokashvili Ave 3/5, 0162, Tbilisi, Georgia
Vincenzo Lagani
Department of Clinical and Molecular Medicine, Norwegian University of Science and Technology, Prinsesse Kristinsgt. 1, 7491, Trondheim, Norway
Oluf Dimitri Røe
Clinical Cancer Research Center, Department of Clinical Medicine, Aalborg University Hospital, Hobrovej 18-22, 9100, Aalborg, Denmark
Oluf Dimitri Røe
Institute of Agri-Food and Life Sciences, Mediterranean University Research Centre, 71410, Heraklion, Crete, Greece
Ekaterini Chatzaki

Authors

Georgios Papoutsoglou
View author publications
You can also search for this author in PubMed Google Scholar
Makrina Karaglani
View author publications
You can also search for this author in PubMed Google Scholar
Vincenzo Lagani
View author publications
You can also search for this author in PubMed Google Scholar
Naomi Thomson
View author publications
You can also search for this author in PubMed Google Scholar
Oluf Dimitri Røe
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Tsamardinos
View author publications
You can also search for this author in PubMed Google Scholar
Ekaterini Chatzaki
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

G.P., V.L., I.T. and E.C. designed the study. G.P. and M.K. performed data analysis and interpretation. G.P. and M.K. drafted the manuscript. V.L., I.T., N.T., O.D.R. and E.C. revised the manuscript. All authors discussed the results and approved the manuscript.

Corresponding author

Correspondence to Ekaterini Chatzaki.

Ethics declarations

Competing interests

G.P., M.K., and N.T. are employees of Gnosis Data Analysis that offers the JADBio service commercially. I.T. and V.L. are co-founders of Gnosis Data Analysis that offers the JADBio service commercially and members of its scientific advisory board. Other authors (O.D.R. and E.C.) do not have any conflict of interest.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Papoutsoglou, G., Karaglani, M., Lagani, V. et al. Automated machine learning optimizes and accelerates predictive modeling from COVID-19 high throughput datasets. Sci Rep 11, 15107 (2021). https://doi.org/10.1038/s41598-021-94501-0

Download citation

Received: 22 April 2021
Accepted: 08 July 2021
Published: 23 July 2021
DOI: https://doi.org/10.1038/s41598-021-94501-0
Springer Nature Limited

This article is cited by

A novel blood-based epigenetic biosignature in first-episode schizophrenia patients through automated machine learning
- Makrina Karaglani
- Agorastos Agorastos
- Ekaterini Chatzaki
Translational Psychiatry (2024)
Comprehensive circulating microRNA profile as a supersensitive biomarker for early-stage lung cancer screening
- Masayasu Inagaki
- Makoto Uchiyama
- Toshihiko Yokoyama
Journal of Cancer Research and Clinical Oncology (2023)
Multi-omics Approach in Kidney Transplant: Lessons Learned from COVID-19 Pandemic
- Hiroki Mizuno
- Naoka Murakami
Current Transplantation Reports (2023)
Proof of concept of the potential of a machine learning algorithm to extract new information from conventional SARS-CoV-2 rRT-PCR results
- Jorge Cabrera Alvargonzález
- Ana Larrañaga Janeiro
- Jacobo Porteiro Fresco
Scientific Reports (2023)
Testing the applicability and performance of Auto ML for potential applications in diagnostic neuroradiology
- Manfred Musigmann
- Burak Han Akkurt
- Manoj Mannil
Scientific Reports (2022)

Automated machine learning optimizes and accelerates predictive modeling from COVID-19 high throughput datasets

Abstract

Similar content being viewed by others

Efficient analysis of COVID-19 clinical data using machine learning models

COVID-19 Biomarkers Detection Using ‘KnowSeq’ R Package

Integrated multiomics analysis to infer COVID-19 biological insights

Introduction

Datasets

Data preprocessing

Pathway analysis

Results and Software availability

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

This article is cited by

A novel blood-based epigenetic biosignature in first-episode schizophrenia patients through automated machine learning

Comprehensive circulating microRNA profile as a supersensitive biomarker for early-stage lung cancer screening

Multi-omics Approach in Kidney Transplant: Lessons Learned from COVID-19 Pandemic

Proof of concept of the potential of a machine learning algorithm to extract new information from conventional SARS-CoV-2 rRT-PCR results

Testing the applicability and performance of Auto ML for potential applications in diagnostic neuroradiology

Navigation

Automated machine learning optimizes and accelerates predictive modeling from COVID-19 high throughput datasets

Abstract

Similar content being viewed by others

Introduction

Datasets

Data preprocessing

Pathway analysis

Results and Software availability

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation