Analysing the case-fatality ratio for Iran and comparing it with the rest of the world in order to estimate the level of underdiagnosis in Iran.
Countries with the largest number of deaths due to COVID-19 show different case-fatality ratios:
In[]:=
As of early March, Iran seemed to have the greatest data disparity, very likely implying an underdiagnosis problem. Iran announced a national door-to-door plan to tackle this problem. This exploration attempts to estimate the number of underdiagnosed COVID-19 cases in Iran as of March 1. In this short essay, I will analyse the case-fatality ratio for Iran and compare it with the world in order to estimate the level of underdiagnosis in Iran.
Let's start by looking at the confirmed cases and deaths in the world data. For an overview, we can make plots of the deaths versus confirmed cases and the daily case-fatality ratio. (Note that I am using accumulated data.)
Create plots of the deaths versus confirmed cases and the daily case-fatality ratio:
Extracting this case-fatality ratio for a given country then comparing it to the data from Iran is a good way to check whether the Iran data is anomalous. We'll do this for the five countries with the most deaths.
Take the top five countries with the largest number of COVID-19 deaths on March 1:
Note that the world data is almost the same as the China data, as shown in the overlaid plot above. Also, Iran's initial data started with only deaths—no prior confirmed cases (look at initial values for Iran).
Distributions across Countries
A box-and-whisker chart can effectively show the distribution of case-fatality ratios for the different countries.
Make a box-and-whisker chart for the case-fatality ratios:
As you may notice, the ratio from Iran is much higher, implying the possibility of undiagnosed cases of COVID-19 in Iran. To find an estimate of underdiagnosis, let us find the empirical distribution of the world case-fatality ratio.
Find the empirical distribution based on the data values for the world:
In[]:=
=EmpiricalDistribution[ratioDataMarch1[[1]]];
Now I assume that the true COVID-19 death data in Iran is more than the reported one by a number called
x
and of the confirmed cases by a number called
y
.
Add the corrected numbers to the last day of Iran's data:
As of March 2, there were 978 confirmed cases and 54 deaths in Iran due to COVID-19. In the best-case scenario, my analysis shows that there should have been about 850 more cases in order to bring Iran's case-fatality ratio within the 25%–75% quantile range of the world data. The next day, the Iran health ministry announced 523 more cases and 12 more deaths, bringing Iran's data to 1501 confirmed cases and 66 deaths. Including that data in the calculations above, in the best-case scenario, there should have been more COVID-19 cases in Iran.
My hope has been that Iran's new initiative would bring much better data. I can easily check this by recreating the plot from above using the most recent data.
A plot of case-fatality ratios using the latest data:
Fortunately, the underdiagnosis problem in Iran seems to have been addressed in the interim. I welcome any further and more sophisticated statistical analysis.