1 Introduction

The World Health Organization declared that new coronavirus disease 2019 (COVID-19) was a Public Health Emergency of International Concern on January 30th 2020 [1, 2]. By then there were a total number of 7818 confirmed cases of COVID-19 globally with more than 1370 severe cases and 170 deaths. The bulk of which was found in China [3]. Over the course of a few weeks the disease has propagated across the boundaries of China infecting nearly every country. At the time of writing this paper (May 01, 2020) there is a total of 2,397,216 confirmed cases globally with 162,956 deaths [4]. Symptoms of the disease include dry cough, sore throat, and fever. Although the majority of the cases are mild, some cases could lead to Acute Respiratory Distress Syndrome (ARDS), severe pneumonia, pulmonary oedema, and organ failure [5]. After the emergency declaration of WHO, several works have been done in the terms of modeling and prediction to try and provide ways to either understand the disease propagation, evaluate preventive measure put in place by authorities, provide early and accurate detection of the disease just to name a few. Mathematical modeling has been used for several years in epidemiological studies [6]. Mathematical modeling of disease transmission and propagation helps in the prediction of the course of epidemics, the design of mass vaccination programs and also it can provide guidance on what type of data are relevant in the study of the epidemics [7]. Some of the studies carried out in regards to the current COVID-19 include modeling of the dynamic of COVID-19, exploring the effect of prevention method like travel restriction of COVID-19 and studying the effect of climate on the COVID-19 propagation [8]. On the other hand, artificial intelligence (AI) is a tool used for prediction. AI is the study and development of algorithms (machines) that mimic human intelligence. AI has been successfully used in a several fields such as computer vision, online advertising, spam filtering, robotics, fraud detection and so on [9, 10]. In healthcare, AI has also gained attention in terms of disease detection, treatment selection, patient monitoring, drug discovery, gene function annotation, automated experiments, automated data collection etc. [11, 12]. As to what concerns the COVID-19, AI has been used in medical image acquisition, image segmentation and diagnosis [13]. In this paper, a review of the mathematical modeling and artificial intelligence used in the study, estimation and prediction of COVID-19 is presented. The paper is divided into three parts, the first presents the mathematical models used in the study of the pandemic, the second presents the various AI applications in disease diagnosis and estimation and in the third part a list of available datasets for COVID-19 is presented.

2 Material and method

The review is divided into three parts each dealing with a specific aspect like Mathematical modeling, AI applications and available datasets. For each of the three parts, the items reviewed were grouped into topics and then a summary of each group is done. In all a total number of 61 journal articles, reports, fact sheets and websites were reviewed. The items reviewed were all published between December 2019 to April 2020. Table 1 shows the structure of the review including the number of items reviewed and the main focus of the reviewed items.

Table 1 The breakdown of the review showing number of items covered per part

3 Mathematical modeling and COVID-19

Various research works were developed in literature for the modeling of dynamics and spread of COVID-19. Most of these were particularly based on the Susceptible-Exposed-Infected-Removed (SEIR) model and the Susceptible-infected-recovered (SIR) model. These models were largely used in the past for the study of epidemic spreading with various forms of networks of transmission [14,15,16,17,18,19,20,21,22]. Table 2 gives the summary of the various models used in COVID-19 studies. The following gives a review of these models.

Table 2 Summary of the various mathematical models used in COVID-19 studies

3.1 Susceptible-exposed-infected-removed (SEIR)

Choujun et al. [23] used daily intercity migration data together with a SEIR model to generate a new model that describes the dynamics of COVID-19 in China. They collected the daily intercity migration data form 367 cities using a mobile application that tracks human migration. They concluded that the number of infections in most cities in China would be highest between the middle of February to early March 2020. Anca and Kieran adapted a traditional SEIR model to study the specific dynamic compartments and epidemic parameters of COVID-19 [24]. They analyzed the current management strategy of the pandemic, including social distancing, travel bans, and service interruptions and closures for the generation of predictions, and assessment of the efficiency of these control measures. In [25], the combination of SEIR and regression models was used with John Hopkins University dataset on COVID-19 for the prediction of the change in the spreading of COVID-19. The study presented in [26] used an age-structured susceptible-exposed-infected-removed (SEIR) model for physical distancing measurement and evaluation. The authors showed that physical distancing measures were most effective if the gradual return to work started in April. The study of the transmission of the COVID-19 and its association with temperature and humidity using the SEIR model was initiated by **ao-**g et al. [27]. The outcomes of the study presented that raising the temperature and humidity values contributed to the control of transmission of the disease. In [28], the SEIR model was adapted to investigate the potential community-wide impact of public use of face masks on the transmission dynamics and control of the COVID-19 pandemic. It was suggested that face masks should be used nation-wide and implemented immediately (Table 3).

Table 3 Summary of the classifications methods used in COVID-19 studies

3.2 Susceptible-lovered (SIR)

A time-dependent susceptible-infected-recovered (SIR) model to track the transmission rate and the recovering rate at a particular time was proposed in [29]. They obtained a prediction error of 3% or less for confirmed cases and predicted that the day the recovering rate over took the transmission rate was on February 17, 2020 in the Hubei province of China. Wang et al. [30] modified the SIR model by adding different types of time-varying quarantine strategies such as government imposed mass isolation policies and micro-inspection measures at the community level to establish a method of calibrating cases of under-reported infections. The SIR model was also used to fit the cumulative data of COVID-19 to an empirical form in China [31]. It was reported that for given parameter values, the SIR model on the Euclidean network obtained high accuracy on data form China and predict when the pandemic would be expected to be over. In [32], a simple age-sensitive SIR model, which integrated known age-interaction contact patterns for the examination of potential effects of age-heterogeneous mitigations on an epidemic in a COVID-19-like parameter regime was studied. Authors found that strict age-targeted mitigation strategies had the potential to reduce mortalities. The age-structured SIR model with social contact matrices and Bayesian imputation was studied to evaluate the progress of the pandemic in India [33]. The authors evaluated the influence of social distancing measures like workplace non-attendance, and school closure on the transmission of the novel Corona virus. It was found that a three-week lockdown would be insufficient to prevent the spread of the disease. A simple SIR model modified to include certain variables of containment measures taken worldwide was used to study these measures [34]. By comparing various scenarios, it was shown that the infection progress strongly affected by the measures taken.

3.3 Other models

A Susceptible-Infectious-Quarantined-Recovered (SIQR) model for the analysis of data in Brazil was used [35]. It was found that the number of quarantined individuals grew exponentially and stabilized. The SEIQR (Susceptible-Exposed-Infectious-Quarantined-Recovered) model with time delays for latency and an asymptomatic phase was investigated [36]. It was reported that time-varying social distancing, using the SEIQR model, could reduce the number of infections by about 50%. Recently, a novel model known as Bats-Hosts-Reservoir-People transmission network model was used to simulate the potential transmission from bats (infection source) to human [37]. Another method was developed where the age-specific Susceptible-Exposed-Symptomatic-Asymptomatic-Recovered-Seafood Market (SEIARW) model based on two suspected transmission routes was used to quantify age-specific transmission [38]. The two routes were from market to person and from person to person. The authors concluded that COVID-19 transmissibility is higher in elderly persons as compared to young persons. In [39] the influence of interventions and self-protection measures (travel restriction, quarantine of entry, contact tracing, isolation and wearing masks) on COVID-19 transmission dynamic in mainland China excluding Hubei province was modeled using the Markov Chain Monte Carlo (MCMC). The results showed that the containment strategies were effective and magnificently suppressed the pandemic transmission. It was also found that softening personal protection too early might lead to the spread of disease. The SPSS modeler was also used to investigate the correlation between average daily temperatures and the growth rate of COVID-19 in infected countries [40]. It was shown that the pandemic rates were higher in case studies where the average temperature is lower. Finally, in [41] a coupled ordinary differential equation metapopulation model for different courses on the disease in different age groups were developed. It was shown that the economic lockdown could be safely reversed at any time without a substantial effect on the course of the disease. In addition, it was concluded that strict quarantines could not be necessary to keep the number of infected people low.

4 Artificial intelligence and COVID-19

Artificial intelligence (AI) has been used mostly for medical image segmentation and diagnosis to classify whether a patient has COVID-19 or what is the severity of the infection. The images used in these works were mostly from medical X-ray radiology or Computed Tomography (CT). Before presenting the AI methodologies used in COVID-19 detection and classification, a brief description of these medical imaging modalities is presented.

4.1 COVID-19 detection based on CT scan

X-ray radiology consists of beaming x-ray photons onto a part of body to be imaged and collecting the photons that pass through that part of the body. Depending on the body’s tissue type, it will attenuate (block) some of the incident photons. This will create a shadowy image of the body on a detector located behind the body. X-ray radiology is used to examine bone structure and detect infections in the lungs. Computed tomography (CT) takes the ides of X-ray radiography further by taking X-rays images of the body from multiple angles to produce cross-sectional images without dissecting the body. These cross-sectional images also called slices are tomographic images and these contain more detail medical information than the conventional x-rays radiography. CT images are used to detect abnormalities in the body like tumors and hemorrhage it can also be used to detect pulmonary embolisms, excess fluid, and pneumonia in the lungs [42, 43]. This makes it suitable for diagnosis of COVID-19 which is a disease that attacks the lungs and the respiratory system.

In their study, Pan Feng et al. seek to verify the change obtained in the chest images of patients with COVID-19 pneumonia. The study was carried out on 4-day intervals from the first day of diagnosis to the day of total recovery. Excluded from this study are patients with complicated pneumonia with severe respiratory distress. For non-severe cases, the results of the chest scanner show a progress of lesions severity during the first 10 days, then stabilizes thereafter. According to this study, almost all the patients presented a spike of the disease around the 10th day, and the signs of improvement around the 14th day of the symptoms [44]. In a series of experiments carried out in 3 days on 51 patients, Yicheng Fang et al. studied the performance of 2 methods of medical examinations on patients with Covid-19. The results indicate that the sensitivity of chest CT to Covid-19 is higher than the RT-PCR technique (98% for CT versus 71% for PCR). When RT-PCR tests are negative, chest CT can therefore be used on patients with clinical and epidemiological characteristics of COVID-19, to confirm or refute the previous results [45]. Li Yan et al. also conducted the study to determine the rate of false diagnoses and the performance of CT scans on COVID-19. Their study was carried out on the first 51 patients confirmed by nucleic acid tests. The study confirmed the high performance of the chest CT which produced a low rate of false diagnosis on COVID-19 [46].

4.2 Image based (X-ray, CT) AI CoVID-19 detection and classification

The classification consists of separating images into groups. The three standards well known procedures to do that are supervised learning, unsupervised learning, and semi-supervised learning.

Supervised learning is an automatic task allowing a function to match input-output pairs [47]. The purpose of a supervised learning algorithm is to produce a function which maps the input-output (vector-supervision signal) pair. The algorithm will allow in an optimal scenario to correctly label the data to determine the classes. In the parallel world of human psychology, it is called conceptual learning [48]. Among the supervised learning algorithm used of the detection of COVID-19 are Convolutional Neural Network (CNN), Support Vector Machines (SVM), Logistic Regression (LR), Naive Bayes (NB), Linear Discriminant Analysis (LDA), Decision Trees (DT) and Random Forest (RF). Table 4 shows the summary of the classifications methods used in COVID-19 studies.

Table 4 A collection of the open source dataset sources and their links

4.2.1 Convolutional neural network (CNN)

The principle of Neural Network (NN) is based on the collection of nodes (called artificial neurons), which freely model neurons in the brain. Based on examples, without any prior knowledge, without being programmed, this system automatically generates identification characteristics. When the algorithm uses multiple layers of neurons it is known as Deep learning. A Convolutional Neural Network (CNN) is a Deep Learning algorithm which takes an image as input, assign learnable weights to various features (objects) in the image so as to be able to differentiate one image from the other [9] [10].

Wang et al. in [84].

In COVID-19 and other pandemic studies other datasets such as population density, mobility, Security incidents, economic situation, humanitarian condition data, and healthcare workforce are important data that will ensure the accuracy of the studies. Several sources provide those datasets. One of such sources is The WorldPop which shares spatial demographic datasets from Africa, Asia and central and South America [85]. Some of the datasets provided by WorldPop are population data, births, internal migration, age and sex data, administrative areas and global flight data. The Humanitarian Data Exchange (HDX) coordinated by the UN Office for the Coordination of Humanitarian Affairs (OCHA) shares more than 17,000 humanitarian datasets form 253 locations around the globe [86]. The WHO on its part shares the Global Health Workforce Statistics [87]. The dataset includes data on the number of health workers as well as hospital bed capacity in each country. The tech-giants Apple and Google both released mobility reports on COVID-19. Apple called their dataset Mobility Trends Reports [88] while Google called it Google COVID-19 Community Mobility Reports [89]. Both presents aggregated data that registers the daily use of various modes of transportation (walking, driving, transit) since the start of February 2020 as well as places visited or stayed in by users of their services. The data was collected from customer requests for directions or location in Apple and in Google Maps. They also offer a useful visualization tool of the data. Our World in Data on its part provides COVID-19 Testing dataset where they collect data that are based on tests carried out to establish if a person is currently infected [90]. ACAPS [91] provides a dataset of Government Measures Dataset also provides Government Measures implemented by Governments all around the world in response to COVID-19 while The Armed Conflict Location & Event Data Project (ACLED) [92] provides security incidents related to COVID19 dataset. The International Monetary Fund (IMF) [93] and BFA Global [94] both provide datasets on the key economic responses of governments and the effect of COVID19 management measures on economy.

Lastly, the software providerC3.ai compiled, cleaned, structured and standardized COVID-19 data from most of the sources presented in this paper [95]. The initiative known as C3.ai COVID-19 Data Lake contains analysis-ready COVID-19 data in one place. The service is free and the datasets are updated continuously. It contains everything from time-series data to case reports. Also, a github repository was created to collect COVID-19 images regarding AI research papers and datasets. It contain 19 datsets, 11 review papers, 18 clinical papers on Covid19 images, 54 AI-related papers, 54 atrticles on CXR methods, and 1 paper on Line Artefact Quantification in Lung Ultrasound Images [96].

6 Discussion and conclusion

The use of mathematical modeling and AI with COVID-19 data will increase our knowledge on the disease propagation evaluating prevention measures as well as early and accurate detection of the disease in patients. However, to arrive at this end a lot of data is needed to explore various models and AI algorithms. The data available up till now are mostly of medical images (for diagnosis) and text based data (for social impact analysis). While the later may be generated by and readily available to a large number people, the former on the other hand can only be generated in a specialized institution by a specialized professional. This means that data in low resource setting are not available as these places do not have the sophisticated imaging equipment needed to generate such images [97]. Also it is well known in data science that datasets from different geographical locations may not hold the same information and this is especially true in terms of healthcare data. More data types are therefore needed that can be easily generated easily anywhere on the Globe so as to enhance and render the application of the mathematical models and AI algorithm possible for many. These data types could be physiological measurements such as ECG, SPO2, body temperature that could be obtained using wearable devices [98]. Data concerning the type of preventive measures implemented by authorities are also not well documented. In this work only a few of the dataset found provided that information. However, this information could help in the examination and optimization of the set measures thereby improving the situation.

In mathematical modeling, most of the articles found in the writing of this paper are of COVID-19 dynamics. However, modeling can be done with appropriate datasets to explore the effect of the variables like climate and preventive measure on the spread of COVID-19 as explained earlier. There is also not many studies on the correlation of environmental and climatic conditions to the COVID-19 propagation in the work only two articles were found that addresses this issue and they both provide in different and interesting way of looking at the propagation of this diseases [27, 28]. Simulation of second and third waves of COVID-19 outbreaks will also help to enhance surveillance. As countries start easing social restriction measures, a study is needed to estimate possible hopspots for new outbreaks.

AI (deep learning) is powerful tool for early and accurate diagnosis of COVID-19 and many articles have addressed it. Most of them apply convolutional neural networks (CNN) in their work for medical image classification. Few other studies apply the Random forest and Support Vector Machines. There are also some that applied U-Net and its variations for the segmentation of CT and X-Ray images. The authors of the AI algorithms reviewed here all claimed that their algorithm performs very well on test data. However, it is well known that good performance of an algorithm on test data does not mean that it will perform similarly when deployed on the field. This is due to fact that in real life the data is more prone to noise and other artefact that are not usually present in the training and test data. The lack of diverse annotated images is also not hel** the situation. In this review only 2 out of 18 studies were found to used annotated data from radiologists. Collaboration is needed between clinicians and AI experts in other to build a huge amount of annotated images of COVID-19. Also human in the loop or human augmentation can be another solution to overcome the problem caused by the disparity of an algorithm’s performance when applied to test data and when applied in the real world. Most of the studies reviewed used existing models while a few used well known models with some modifications. Those used with some modifications performed slightly better than the others stressing the need of develo** hybrid models to build better and robust architectures. Much work is also needs to be done in terms of drug and/or vaccine discovery, treatment selection and contamination risk assessment for medical personnel [99]. Finally, since most of the AI research objective on COVID-19 is to find the optimal solution for diagnosis, other algorithms like Genetic Programming and Boosting (AdaBoost) should be explored so as to clear any doubt regarding their performances.

In conclusion, COVID-19 has spread rapidly all over the world creating an emergency situation. Mathematical modeling and AI have both shown to be reliable tools in the fight against this pandemic. Most of the modeling done were based on the Susceptible-Exposed-Infected-Removed (SEIR) model and the Susceptible-infected-recovered (SIR) model while most of the AI implementations were Convolutional Neural Network (CNN) on X-ray and CT images. Several datasets concerning the COVID-19 have been collected and shared open source. However, much work is needed to be done in terms of providing the public with a wide variety of data types and from many regions as possible. Also, other AI and modeling applications in healthcare should be explored in regards to this COVID-19.