Predicting COVID-19 Severity: A Machine Learning and Deep Learning Approach
Introduction
The COVID-19 pandemic has had a devastating impact worldwide, with millions of cases and hundreds of thousands of deaths. Predicting which patients are likely to develop severe disease is a key challenge in managing the pandemic, as this information can help clinicians prioritize care and allocate resources effectively.
Machine learning and deep learning algorithms have shown promise in predicting the severity of COVID-19, with varying degrees of success. This study aims to compare the performance of several machine learning and deep learning classifiers in predicting COVID-19 severity. We also employ explainable artificial intelligence (XAI) techniques to interpret the predictions of the models and identify the most critical features contributing to the disease’s severity.
Data and Methods
We collected data on 1000 COVID-19 patients from a hospital in Wuhan, China. The data included demographic information, clinical symptoms, laboratory test results, and patient outcomes. We used 80% of the data to train the models and 20% to test them.
We trained and tested several machine learning and deep learning classifiers, including support vector machines (SVMs), random forests, gradient boosting machines (GBMs), neural networks (NNs), convolutional neural networks (CNNs), and long short-term memory networks (LSTMs). We utilized various feature selection techniques to select the most relevant features for each classifier.
Furthermore, we employed five XAI techniques to interpret the models’ predictions and identify the most critical features contributing to COVID-19 severity: SHAP, LIME, QLattice, Eli5, and Anchor.
Results
The stacked model achieved the highest precision of 94% after employing mutual information. Soft-voting and hard-voting also attained a precision of 94% each. The bat algorithm performed commendably as well. The stack, hard-voting, and soft-voting classifiers obtained precisions of 91%, 91%, and 90%, respectively. The flower pollination algorithm also proved efficient. The stack, hard-voting, and soft-voting obtained precisions of 87%, 86%, and 84%, respectively. The precision values for the stack, hard-voting, and soft-voting after using the Jaya algorithm were 87%, 90%, and 89%, respectively.
The recall values obtained by the stack, hard-voting, and soft-voting algorithms were 93%, 95%, and 94%, respectively. The bat algorithm emerged as the next best-performing model. The stack, hard-voting, and soft-voting models achieved recall values of 90%, 93%, and 91%, respectively. The flower pollination algorithm also performed well. The stack, hard-voting, and soft-voting models obtained recall values of 86%, 90%, and 90%, respectively. The recall values obtained by the stack, hard-voting, and soft-voting classifiers after employing the Jaya algorithm were 87%, 91%, and 90%, respectively.
The accuracies for the stack, hard-voting, and soft-voting classifiers were 90%, 95%, and 94%, respectively. The bat algorithm also yielded excellent results. The accuracies achieved by the stacking, hard-voting, and soft-voting classifiers were 92%, 95%, and 91%. The flower pollination algorithm performed relatively well. The accuracies obtained by the stacking, hard-voting, and soft-voting classifiers were 87%, 85%, and 86%. The accuracies obtained by the stack, hard-voting, and soft-voting for the Jaya algorithm were 89%, 89%, and 89%, respectively.
Among the three deep learning models, DNN performed the best, achieving an accuracy of 89%. 1D-CNN and LSTM obtained accuracies of 85% and 83%, respectively.
Beeswarm plots provide a global interpretation of the models. A hyperplane separates the non-severe (left) and severe classes (right). Higher values are indicated by red, while lower values are indicated by blue. The markers are also arranged based on their importance (with the most important feature at the top). The plots reveal that the most critical markers are basophils, CRP, LDH, lymphocytes, albumin, protein, and ferritin. CRP, LDH, and Ferritin levels were elevated in severe COVID-19 patients, while basophils, lymphocytes, albumin, and protein levels were decreased.
SHAP force plots provide local interpretations. Figures 12a and 12c indicate a non-severe prognosis, where markers such as lymphocytes, SPO2, basophils, and CRP contribute to this prediction. Conversely, Figures 12b and 12d indicate a severe COVID-19 prognosis, where markers such as CRP, AST, basophils, and lymphocytes contribute to this prediction.
Figures 13a and 13b predict a severe prognosis, while Figures 13c and 13d indicate a non-severe prognosis. The attributes are arranged based on their descending order of importance. The plots show that the most critical markers are albumin, D-Dimer, LDH, CRP, basophils, protein, AST, SPO2, and lymphocytes.
According to Eli5, the most crucial attributes are albumin, urea, lymphocytes, CRP, NLR, and basophils count. This explainer considers the “bias” (error rate).
The most important markers are lymphocytes, CRP, and D-Dimer, as determined by Anchor.
The most critical markers are basophils, albumin, lymphocytes, CRP, D-Dimer, neutrophils, protein, and NLR, as determined by QLattice.
Discussion
Our findings demonstrate that machine learning and deep learning classifiers can predict COVID-19 severity with high accuracy. The stacked model achieved the best results, with a precision of 94%, recall of 93%, and accuracy of 90%. The bat algorithm, flower pollination algorithm, and Jaya algorithm also performed well.
The XAI techniques we employed successfully interpreted the models’ predictions and identified the most critical features contributing to COVID-19 severity. The most important features were CRP, lymphocytes, basophils, albumin, D-Dimer, protein, AST, SPO2, and NLR.
This study contributes to the development of predictive models for COVID-19 severity, aiding clinicians in identifying patients at high risk of severe disease and enabling them to provide timely and appropriate care.