Epidemiological Models for Influenza and COVID-19

Robert B. Nachbar
Wolfram Research, Inc.

Table of Contents

Introduction

Background and Notes

Data and Functions

Compartmental Models

Hubei Outbreak—SEIRD

Hubei Outbreak—SEIQRD

Hubei Outbreak—SEIsaRD

Summary of Results

References

Initialization

Introduction

The COVID-19 outbreak, initially in China and now throughout the world, has captured the interest of a large number of organizations and individuals alike. Some effort has been spent to model (or at least visualize) the geographic spread (Gardner 2020; Espigule-Pons 2020). There have also been some reports of epidemiological models (Götz 2020; Peng et al. 2020; Zhou et al. 2020; Jia et al. 2020; García Moreno Esteva 2020; Antonov 2020) that have been developed in an effort to estimate parameters that can be used to project the severity of the outbreak, its duration and the mortality rate.
This report aims to put some of these modeling efforts into perspective so that conclusions and predictions can be better understood. We will be using compartmental models that allow one to describe the flow of individuals from one health state to another. We will attempt to employ just the right kinds of compartments and connections that are supported by the available data.
We explore several epidemiological models for the COVID-19 outbreak and use data from various provinces to estimate model parameters. We primarily want to see how similar or dissimilar the outbreaks are to each other, point out the shortcomings and make some suggestions for improvements.

Background and Notes

The following subsections contain useful information about the COVID-19 infection and the outbreak in Wuhan, China; it is by no means comprehensive.

Summary

Symptom reports for COVID-19 first occurred in Wuhan, capital and largest city in China’s Hubei province, in late December. COVID-19 is caused by the SARS-CoV-2 virus, which appears to have jumped to humans from an animal at the Huanan Seafood Market (though the exact animal of origin is unclear).
The incubation period for COVID-19 is usually around five days, with reports varying from one to 14 days (and in one case, 27 days). The median time from development of symptoms to death between six and 41 days, with a median of 14 days. Available data from South Korea suggests approximately a 14-day recovery period as well.
See this Wikipedia article for more information.

Timeline of Events

This is a timeline of major events related to the outbreak in Hubei province.
In[]:=
TimelinePlotLabeled[#1,Tooltip[#2,#3]]&@@@
Events
,
options

Out[]=

Data and Functions

This section contains the computational resources required for replicating the results in this paper, including data sources and custom functions for cleaning and processing the data.

Epidemic Data

Time series for reported cases, deaths and recoveries from each province are available from the Wolfram Data Repository. This can be imported using ResourceData.
Import and view the epidemic data:
In[]:=
covid19Data=ResourceData["Epidemic Data for Novel Coronavirus COVID-19"]
Out[]=

Functions

To simplify this workflow, I have made use of a custom modeling package and written several functions. The following are brief descriptions of these functions and their uses. (Full code for these functions can be found in the Initialization section of this notebook.)
The CompartmentalModeling package is used for computations directly dealing with compartmental models. Given a list of transitions for a model, KineticComparmentalModel computes the equations for a model, and CompartmentalModelGraph generates a graph of the model. For clarity, we use a form of schematic diagram known as a Petri net.
Graph a basic SIR model:
In[]:=
CompartmentalModelGraph
βλ[t]
→
ℐ,ℐ
γ
→
ℛ,

Out[]=
Generate the equations for the SIR model:
In[]:=
KineticCompartmentalModel
βλ[t]
→
ℐ,ℐ
γ
→
ℛ,t//Last
Out[]=
{
′
ℐ
[t]-γℐ[t]+β[t]λ[t],
′
ℛ
[t]γℐ[t],
′

[t]-β[t]λ[t]}
The fitDataWDR function returns reported infections, deaths and recoveries for a given locale and start date. It includes a "makeCorrectionsForHubei" option for removing the jump in confirmed cases in Hubei that occurred when the testing method changed.
Get usage info for the function:
In[]:=
?fitDataWDR
Out[]=
Symbol
fitDataWDR[locale, startDate] gives data for infected, recovered, and dead relative to the given date for the onset of infection for the given locale, either an administrative division or a country. For reference, cases were first reported on 31 Dec 2019.
Use the function to retrieve the cases in Hubei for a plot:
In[]:=
ListPlotfitDataWDR
Hubei, China
ADMINISTRATIVE DIVISION
,"1 Jan 2020",

Out[]=
A sumSquaredError function is defined to give the sum of squared errors for a model, given a set of parameters. I have also included additional helper functions toLog, fromLog, toLogit and fromLogit that are useful for passively constraining parameters to the domain
[0,∞]
or
[0,1]
, respectively, during optimization.

Compartmental Models

Basic Compartmental Models: SIR, SEIR

We will be using compartmental models, which have had numerous applications in biology, ecology, chemistry and medicine. For our purposes, each compartment represents a group of individuals in the same health state, for example susceptible or infectious. The connections between compartments indicate the direction and rate of movement from one health state to another.
One of the simplest compartmental models for epidemiology has three compartments: susceptible, infectious, and recovered or SIR. Susceptible individuals come in contact with infectious individuals and become infected. After some period of time, infectious individuals recover, are not longer infectious and have permanent immunity. The process can be described as a set of transitions.
Starting with a list of transitions and their labels, we can use CompartmentalModelGraph to visualize this model.
Define and display transitions for the SIR model:
In[]:=
modelSIR=
βλ[t]
→
ℐ,ℐ
γ
→
ℛ;​​GridThread@{modelSIR,{"infection","recovery"}},

Out[]=

βλ[t]
→
ℐ
infection
ℐ
γ
→
ℛ
recovery
Generate a graph of the SIR model:
In[]:=
CompartmentalModelGraphmodelSIR,

Out[]=
There is code in the Initialization section at the end of the notebook for generating these graphs.
This transmission model describes the rate of change of each of the compartments, which can be modeled mathematically as a system of ordinary differential equations (ODEs).
Many epidemiologic models include population demographics, that is, birth and natural death. For comparison, this is the SIR model with demographics.
From the base SIR model, we add an exposed compartment to form an SEIR, which is the base model used in this exploration.

Additional Compartments and Assumptions

More complex models will be considered throughout this report, adding new compartments to the SEIR model to account for dead, quarantined and infectious (symptomatic/asymptomatic) populations. We’ll apply each of these models to the Hubei outbreak, hoping to find one with a reasonable structure that recapitulates the reported data.
This paper generally assumes:
◼
  • No birth or natural mortality due to the short time span of the coverage of the model. Death due to disease will be included.
  • ◼
  • No migration into or out of the population.
  • ◼
  • First infections occurring on December 20, 2019, or December 30, 2019 (with the first confirmed reports). We don’t know with certainty when the first infection occurred. Du et al. (2020) assumed December 1, 2019.
  • ◼
  • No variation due to age or gender. Even though the severity of the infection and risk of death have been reported to be age and gender dependent, the available data for modeling does not have sufficient detail to support such stratification.
  • ◼
    All current cases are reported. Underreporting of cases is a serious concern, primarily due to the infection being asymptomatic in many individuals, especially children (Cai et al. 2020). This paper makes no attempt to extrapolate and only uses numbers directly from the reports.
    ◼
  • This work was carried out over a number of days, with the available Wolfram Data Repository data keeping pace. However, to assure a consistent analysis and interpretation of results, we’ll work with data for the period of January 21, 2020, to February 26, 2020.
  • ◼
  • More recent data can be used to test the models.
  • SEIRD Model

    Adding the dead compartment brings the model to five compartments and four transitions.
    Define the transitions for the SEIRD model:
    Label the transitions and show them in a grid:
    Display the graph for the model:
    Next we define and solve the ODEs for the model.
    Generate the ODEs for the model:
    Display the ODEs in a column:
    Solve for each variable:

    SEIQRD Model

    The additional quarantine transition is between infection and recovery for a total of six compartments and seven transitions.
    Define the transitions for the SEIQRD model:
    Label the transitions and show them in a grid:
    Display the graph for the model:
    Now we can define and solve the ODEs for the model.
    Generate the variables and ODEs for the SEIQRD model:
    Display the ODEs in a column:
    Solve for each variable:

    SEIsaRD Model

    Finally, we add symptomatic and asymptomatic infectious compartments (ignoring quarantine) for an SEIsaRD model. The duration of infection for symptomatic and asymptomatic infectious individuals is the same. We have two different recovered compartments for bookkeeping. Death due to disease occurs only for symptomatic infectious individuals.
    This model has a total of seven compartments and seven transitions.
    Define the transitions for the SEIsaRD model:
    Label the transitions and show them in a grid:
    Display the graph for the model:
    Now we can define and solve the ODEs for the model.
    Generate the variables and ODEs for the SEIsaRD model:
    Display the ODEs in a column:
    Solve for each variable:

    Hubei Outbreak—SEIRD

    In this section, we apply the SEIRD model to the Hubei outbreak with two different start dates and two effective population sizes, then compare the results from each fit.

    Functions

    Here, several functions are defined to compute and visualize the fit for the SEIRD model in Hubei. Note that fitErrorSEIRD expects the parameters to be log transformed and uses fromLog to back transform before using them in the functions for the model variables.
    Function for computing the sum of squared errors between the data and the SEIRD model:
    Function for plotting the data and the SEIRD model:

    Outbreak Start—December 20, 2019

    Fitting Data

    We use the fitDataWDR function described above to fit a model to the data for Hubei, with an assumed initial infection date of December 20, 2019.
    Compute a fit for the data:
    Plot the fit:
    We assume the effective population size is 100,000.
    Examine the distribution of the sum of squared errors with respect to the parameter values to ensue adequate sampling has been done.
    Five low-error diverse samples were selected for optimization.
    ◼
  • The following table shows the estimates for the epidemiological parameters.
  • ◼
  • The average duration of infection takes into account both recovery and death.
  • We assume the effective population size is 50,000.
    Examine the distribution of the sum of squared errors with respect to the parameter values to ensue adequate sampling has been done.
    Six low-error diverse samples were selected for optimization.
    ◼
  • The following table shows the estimates for the epidemiological parameters.
  • ◼
  • The average duration of infection takes into account both recovery and death.
  • ◼
  • The first three and last two fits are essentially the same.
  • ◼
  • The fourth fit underpredicts the deaths.
  • Outbreak Start—December 30, 2019

    Fitting Data

    Since we don’t know with certainty when the first infection occurred, let’s try a later date of December 30, 2019, to see what effect that has.
    Compute a fit for the data:
    Plot the fit:
    We assume the effective population size is 100,000.
    Examine the distribution of the sum of squared errors with respect to the parameter values to ensue adequate sampling has been done.
    Five low-error diverse samples were selected for optimization.
    ◼
  • The table below shows the estimates for the epidemiological parameters.
  • ◼
  • The average duration of infection takes into account both recovery and death.
  • ◼
  • The first two fits are essentially the same.
  • We assume the effective population size is 50,000.
    Examine the distribution of the sum of squared errors with respect to the parameter values to ensue adequate sampling has been done.
    Seven low-error diverse samples were selected for optimization.
    ◼
  • The following table shows the estimates for the epidemiological parameters.
  • ◼
  • The average duration of infection takes into account both recovery and death.
  • ◼
  • They are essentially the same.
  • Comparisons

    ◼
  • Increasing the population size makes ℐ peak broader.
  • ◼
  • Shifting the initial infection to a later date (closer to the data) does not have a noticeable effect.
  • ◼
  • In all cases, the susceptible compartment goes to 0 and the epidemic size is 100% of the population.
  • Hubei Outbreak—SEIQRD

    This section applies the SEIQRD model to the Hubei outbreak with two different start dates and three effective population sizes, comparing the results from each fit.

    Functions

    Outbreak Start—December 20, 2019

    Fitting Data

    We use the fitDataWDR function described previously to fit a model to the data for Hubei, with an assumed initial infection date of December 20, 2019.
    Compute a fit for the data:
    Plot the fit:
    We assume the effective population size is 500,000.
    Examine the distribution of the sum of squared errors with respect to the parameter values to ensue adequate sampling has been done.
    Use clustering to select eight low-error diverse samples for optimization.
    Examine the correlation between β and δ.
    ◼
  • The table below shows the estimates for the epidemiological parameters.
  • ◼
  • The average duration of infection takes into account both recovery and death.
  • ◼
  • The first two fits are essentially the same.
  • ◼
  • Only the first two fits have acceptable mortality rates.
  • We assume the effective population size is 100,000.
    Examine the distribution of the sum of squared errors with respect to the parameter values to ensue adequate sampling has been done.
    Use clustering to select eight low-error diverse samples for optimization.
    ◼
  • The following table shows the estimates for the epidemiological parameters.
  • ◼
  • The average duration of infection takes into account both recovery and death.
  • ◼
  • The first four fits are very similar.
  • ◼
  • The third and fourth fits have too short a time to quarantine.
  • ◼
  • All four have too short an incubation period.
  • We assume the effective population size is 50,000.
    Examine the distribution of the sum of squared errors with respect to the parameter values to ensue adequate sampling has been done.
    Use clustering to select eight low-error diverse samples for optimization.
    ◼
  • β and δ are correlated.
  • ◼
  • The following table shows the estimates for the epidemiological parameters.
  • ◼
  • The average duration of infection takes into account both recovery and death.
  • ◼
  • The first two fits are similar.
  • ◼
  • The third through eighth fits have much too short a time to quarantine and/or a much too low mortality rate.
  • Outbreak Start—December 30, 2019

    Fitting Data

    ◼
  • Since we don’t know with certainty when the first infection occurred, let’s try a later date to see what effect that has.
  • Plot the fit:
    We assume the effective population size is 500,000.
    Examine the distribution of the sum of squared errors with respect to the parameter values to ensue adequate sampling has been done.
    Use clustering to select eight low-error diverse samples for optimization.
    ◼
  • The following table shows the estimates for the epidemiological parameters.
  • ◼
  • The average duration of infection takes into account both recovery and death.
  • ◼
  • The fits are essentially the same.
  • ◼
  • The last two fits have unreasonably low mortality rates.
  • ◼
  • The transmission and quarantine rate constants are highly correlated.
  • We assume the effective population size is 100,000.
    Examine the distribution of the sum of squared errors with respect to the parameter values to ensue adequate sampling has been done.
    Use clustering to select eight low-error diverse samples for optimization.
    ◼
  • β and δ are correlated.
  • ◼
  • The following table shows the estimates for the epidemiological parameters.
  • ◼
  • The average duration of infection takes into account both recovery and death.
  • ◼
  • The first seven fits are similar.
  • ◼
  • The time to quarantine is too short.
  • ◼
  • The transmission and quarantine rate constants are highly correlated.
  • We assume the effective population size is 50,000.
    Examine the distribution of the sum of squared errors with respect to the parameter values to ensue adequate sampling has been done.
    Use clustering to select eight low-error diverse samples for optimization.
    ◼
  • The following table shows the estimates for the epidemiological parameters.
  • ◼
  • The average duration of infection takes into account both recovery and death.
  • ◼
  • The fits are similar.
  • ◼
  • The time to quarantine is too short for the last two fits.
  • ◼
  • The transmission and quarantine rate constants are highly correlated.
  • Comparisons

    ◼
  • Increasing the population size makes  and ℛ rise more slowly and decreases the epidemic size.
  • ◼
  • Shifting the initial infection to a later date (closer to the data) appears to make the  peak narrower and reverses the population size trends for β and δ.
  • Hubei Outbreak—SEIsaRD

    This section applies the SEIsaRD model to the Hubei outbreak with one start date and two effective population sizes, comparing the results from each fit.

    Functions

    Outbreak Start—December 20, 2019

    Fitting Data

    We use the fitDataWDR function described above to fit a model to the data for Hubei, with an assumed initial infection date of December 20, 2019.
    Compute a fit for the data:
    Plot the fit:
    We assume the effective population size is 100,000.
    Examine the distribution of the sum of squared errors with respect to the parameter values to ensue adequate sampling has been done.
    Use clustering to select eight low-error diverse samples for optimization.
    ◼
  • The following table shows the estimates for the epidemiological parameters.
  • ◼
  • The average duration of infection takes into account both recovery and death.
  • ◼
  • The first six fits are essentially the same.
  • We assume the effective population size is 50,000.
    Examine the distribution of the sum of squared errors with respect to the parameter values to ensue adequate sampling has been done.
    Use clustering to select eight low-error diverse samples for optimization.
    ◼
  • The following table shows the estimates for the epidemiological parameters.
  • ◼
  • The average duration of infection takes into account both recovery and death.
  • ◼
  • The fits are essentially the same.
  • Comparisons

    ◼
  • Increasing the population size increases the recovery rate.
  • ◼
  • In all cases, the susceptible compartment goes to 0 and the epidemic size is 100% of the population.
  • Summary of Results

    This report has provided some insight into what kind of compartmental model might be suitable to represent the COVID-19 pandemic. All three models (SEIRD, SEIQRD and SEIsaRD) exhibit long exponential tails for the ℐ or  peak. Only the model with quarantine (SEIQRD) permits an epidemic size less than the full population size.
    More work is needed.

    References

    Antonov, Anton. “Basic Experiments Workflow for Simple Epidemiological Models.” Wolfram Community. Accessed on April 8, 2020. https://community.wolfram.com/groups/-/m/t/1895686.
    Arino, Julien and Pauline van den Driessche. 2006. “Time delays in Epidemic Models; Modeling and Numerical Considerations” In Delay Differential Equations and Applications, edited by O. Arino, M. L. Hbid, and E. Ait Dads, 539–578. Dordrecht, The Netherlands: Springer.
    Biggerstaff, Matthew, Srimon Cauchemez, Carrie Reed, Manoj Gambhir, and Lyn Finelli. 2014. “Estimates of the Reproduction Number for Seasonal, Pandemic, and Zoonotic Influenza: A Systematic Review of the Literature.” BMC Infectious Diseases 14 (September). https://doi.org/10.1186/1471-2334-14-480.
    Boseley, Sarah and Lily Kuo. 2020. “Huge Rise in Coronavirus Cases Casts Doubt over scale of Epidemic.” The Guardian. February 13, 2020. https://www.theguardian.com/world/2020/feb/13/huge-rise-coronavirus-cases-raises-doubts-scale-epidemic-china.
    Brauer, Fred. “Reproduction Numbers and Final Size Relations.” Accessed on April 8, 2020. https://www.fields.utoronto.ca/programs/scientific/10-11/drugresistance/emergence/fred1.pdf.
    Cai, Jiehao, Jing Xu, Daojiong Lin, zhi Yang, Lei Xu, Zhenghai Qu, Yuehua Zhang, Hua Zhang, Ran Jia, pengcheng Liu, Xiangshi Wang, Yangling Ge, Aimei Xia, He Tian, Hailing Chang, Chuning Wang, Jingjing Li, Jianshe Wang, and Mei Zheng. 2020. “A Case Series of Children with 2019 Novel Coronavirus Infection: Clinical and Epidemiological Features.” Clinical Infectious Diseases. February 28, 2020. https://doi.org/10.1093/cid/ciaa198.
    Coburn, Brian J., Bradley G. Wagner, and Sally Blower. 2009. “Modeling Influenza Epidemics and Pandemics: Insights into the Future of Swine Flu (H1N1).” BMC Medicine, 7, no. 30 (June). http://www.biomedcentral.com/1741-7015/7/30.
    Du, Zhanwei, Ling Wang, Simon Cauchemex, Xiaoke Xu, Xianwen Wang, Benjamin J. Cowling, and Lauren Ancel Meyers. 2020. “Risk for Transportation of 2019 Novel Coronavirus (COVID-19) from Wuhan to Cities in China.” February 17, 2020. https://doi.org/10.1101/2020.01.28.20019299.
    Feng, Zhilan and Horst R. Thieme. 1995. “Recurrent Outbreaks of Childhood Diseases Revisited: The Impact of Isolation.” Mathematical Biosciences, 128, no. 1–2 (July–August): 93–130. https://doi.org/10.1016/0025-5564(94)00069-C.
    García Moreno Esteva, Enrique. “An SEIR Like Model That Fits the Coronavirus Infection Data.” Wolfram Community. April 8, 2020. https://community.wolfram.com/groups/-/m/t/1888335.
    Gardner, Lauren. “Mapping 2019-nCoV.” Johns Hopkins Whiting School of Engineering. Accessed April 8, 2020. https://systems.jhu.edu/research/public-health/ncov.
    Götz, Thomas. 2020. “First Attempts to Model the Dynamics of the Coronavirus Outbreak 2020.” February 10, 2020. https://arxiv.org/pdf/2002.03821.pdf.
    Jia, Jiwei, Jian Ding, Siyu Liu, Guidong Liao, Jingzhi Li, Ben Duan, Guoqing Wang, and Ran Zhang. 2020. “Modeling the Control of COVID-19: Impact of Policy Interventions and Meteorological Factors.” March 6, 2020. https://arxiv.org/pdf/2003.02985.pdf.
    Martcheva, Maia. 2015. An Introduction to Mathematical Epidemiology. New York, NY: Springer.
    “News and Notes.” 1978. British Medical Journal 1, no. 6112 (March): 586–590. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1603269/pdf/brmedj00115-0064.pdf.
    Peng, Liangrong, Wuyue Yang, Dongyan Zhang, Changjing Zhuge, and Liu Hong. 2020. “Epidemic Analysis of COVID-19 in China by Dynamical Modeling.” February 18, 2020. https://www.medrxiv.org/content/10.1101/2020.02.16.20023465v1.
    Rose, H. J. 1980. “The Use of Amantadine and Influenza Vaccine in a Type A Influenza Epidemic in a Boarding School.” Journal of Royal College of General Practitioners 30, no. 219 (October): 619–621. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2159753.
    Zhou, Yimin, Zuguo Chen, Xiangdong Wu, Zengwu Tian, Liang Cheng, and Lingjian Ye. 2020. “The Outbreak Evaluation of COVID-19 in Wuhan District of China.” February 22, 2020. https://arxiv.org/pdf/2002.09640.pdf.

    Initialization

    General

    Fitting Data

    ◼
  • Note: the option "makeCorrectionForHubei" was not well conceived and, in general, should not be used.
  • Fitting Error