COVID-19 in Iran: An Issue of Underdiagnosis

Mads Bahrami

Wolfram Research, Inc.

Analysing the case-fatality ratio for Iran and comparing it with the rest of the world in order to estimate the level of underdiagnosis in Iran.

Countries with the largest number of deaths due to COVID-19 show different case-fatality ratios:

In[]:=

As of early March, Iran seemed to have the greatest data disparity, very likely implying an underdiagnosis problem. Iran announced a national door-to-door plan to tackle this problem. This exploration attempts to estimate the number of underdiagnosed COVID-19 cases in Iran as of March 1. In this short essay, I will analyse the case-fatality ratio for Iran and compare it with the world in order to estimate the level of underdiagnosis in Iran.

Importing Data

For this exploration, I'm using the COVID-19 epidemic data available from the 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE, which we have curated in the Wolfram Data Repository.

Import the data directly from the Wolfram Data Repository:

In[]:=

ds0=ResourceData["Epidemic Data for Novel Coronavirus COVID-19"];

Comparing Case-Fatality Ratios

Let's start by looking at the confirmed cases and deaths in the world data. For an overview, we can make plots of the deaths versus confirmed cases and the daily case-fatality ratio. (Note that I am using accumulated data.)

Create plots of the deaths versus confirmed cases and the daily case-fatality ratio:

In[]:=

GraphicsGridListLinePlotvalues=Query[Transpose[{#ConfirmedCases["Values"],#Deaths["Values"]}]&]@ds0[Total,{"ConfirmedCases","Deaths"}],

,ListLinePlot#2#1&@@@values,

,FrameAll

Out[]=

Extracting this case-fatality ratio for a given country then comparing it to the data from Iran is a good way to check whether the Iran data is anomalous. We'll do this for the five countries with the most deaths.

Take the top five countries with the largest number of COVID-19 deaths on March 1:

In[]:=

tMinMax=

Wed 22 Jan 2020 00:00:00GMT-7.

Sun 1 Mar 2020 00:00:00GMT-7.

;top=ds0[GroupBy["Country"],Total,{"ConfirmedCases","Deaths"}][TakeLargestBy[TimeSeriesWindow[#Deaths,tMinMax]["LastValue"]&,5]];

Add the data of all countries and append it to the previous result:

In[]:=

PrependTo[top,"World"Normal@ds0[Total,{"ConfirmedCases","Deaths"}]];

Divide deaths by confirmed cases to get the case-fatality ratio:

In[]:=

ratioData=#2/#1&@@@Normal[Values[Values[top]]]/.IndeterminateNone//Quiet;

Plot the ratios for a sequence of dates:

In[]:=

ColumnDateListPlotratioDataMarch1=TimeSeriesWindow[#,tMinMax]&/@ratioData,

,DateListPlotratioDataMarch1[[3]],

,FrameAll

Out[]=

World

China

Iran

Italy

South Korea

Japan

Note that the world data is almost the same as the China data, as shown in the overlaid plot above. Also, Iran's initial data started with only deaths—no prior confirmed cases (look at initial values for Iran).

Distributions across Countries

A box-and-whisker chart can effectively show the distribution of case-fatality ratios for the different countries.

Make a box-and-whisker chart for the case-fatality ratios:

In[]:=

BoxWhiskerChartN[Values/@ratioDataMarch1],"Outliers",



Out[]=

As you may notice, the ratio from Iran is much higher, implying the possibility of undiagnosed cases of COVID-19 in Iran. To find an estimate of underdiagnosis, let us find the empirical distribution of the world case-fatality ratio.

Find the empirical distribution based on the data values for the world:

In[]:=

=EmpiricalDistribution[ratioDataMarch1[[1]]];

Now I assume that the true COVID-19 death data in Iran is more than the reported one by a number called

and of the confirmed cases by a number called

Add the corrected numbers to the last day of Iran's data:

In[]:=

correction=Query[Divide@@({#Deaths["LastValue"],#ConfirmedCases["LastValue"]}+{x,y})&]@top

Iran

COUNTRY

,All,TimeSeriesWindow[#,tMinMax]&

Out[]=

54+x

978+y

Now we can compute the possible values of

and

such that Iran's last ratio is within the 25% and 75% quantile of the world data.

Plot a region within the 25%–75% quantile of the world data:

In[]:=

RegionPlotQuantile[,0.25]≤correction≤Quantile[,0.75]&&y≥x,{x,0,300},{y,0,10000},



Out[]=

The best-case scenario would be no deaths, only more confirmed cases, i.e.

x=0,y>0

Compute the best-case value of

In[]:=

bestcase=y/.Solve[Quantile[,0.75]==correction/.x0]//Round//First

Conclusion and Updates

As of March 2, there were 978 confirmed cases and 54 deaths in Iran due to COVID-19. In the best-case scenario, my analysis shows that there should have been about 850 more cases in order to bring Iran's case-fatality ratio within the 25%–75% quantile range of the world data. The next day, the Iran health ministry announced 523 more cases and 12 more deaths, bringing Iran's data to 1501 confirmed cases and 66 deaths. Including that data in the calculations above, in the best-case scenario, there should have been more COVID-19 cases in Iran.

My hope has been that Iran's new initiative would bring much better data. I can easily check this by recreating the plot from above using the most recent data.

A plot of case-fatality ratios using the latest data:

Fortunately, the underdiagnosis problem in Iran seems to have been addressed in the interim. I welcome any further and more sophisticated statistical analysis.