Covid-19 vaccination data analysis using SQL functions​
​by Damian Calin
The objective is to analyse recently vaccination data made available @ https://github.com/owid/covid-19-data/tree/master/public/data/vaccinations .
​
The exercise is using SQL Functions described and defined in previous post (https://community.wolfram.com/groups/-/m/t/2195893)​
​
In order to use Sql function I’ve created a Wolfram Package that can by found @ https://www.wolframcloud.com/env/damcalrom/Published/SQLOperators.wl​
​

Load SQL functions

In[]:=
CloudImport["https://www.wolframcloud.com/env/damcalrom/Published/SQLOperators.wl"];

Load vaccination data

Source: https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv
The structure of file is described at https://github.com/owid/covid-19-data/tree/master/public/data/vaccinations
filedata="https://raw.githubusercontent.com/owid/covid-19-data/master/public/data/vaccinations/vaccinations.csv";​​(*definetablecolumns*)​​cols={"location","isocode","date","totalvaccinations","peoplevaccinated","peoplefullyvaccinated","dailyvaccinationsraw","dailyvaccinations","totalvaccinationsperhundred","peoplevaccinatedperhundred","peoplefullyvaccinatedperhundred","dailyvaccinationspermillion"};​​(*importintosqltablestructure*)​​tbvacc=importSQL[filedata,",",cols];

Preview data

tbvacc//​​(*filteronacountry*)​​whereSQL[#,isocode"FRA"]&//​​(*orderbymostrecentdate*)​​orderBySQL[#,{-AbsoluteTime[DateObject[date]]}]&//​​tableSQLAsDataset
Out[]=
location
isocode
date
totalvaccinations
peoplevaccinated
peoplefullyvaccinated
dailyvaccinationsraw
dailyvaccinations
totalvaccinationsperhundred
peoplevaccinatedperhundred
France
FRA
2021-07-19
65579149
37813475
28473629
698408
618857
97.06
55.97
France
FRA
2021-07-18
64880741
37540891
28035561
221919
610331
96.03
55.56
France
FRA
2021-07-17
64658822
37434100
27919232
572343
602705
95.7
55.41
France
FRA
2021-07-16
64086479
37182060
27589678
863450
590007
94.85
55.03
France
FRA
2021-07-15
63223029
36815314
27070927
836849
570447
93.57
54.49
France
FRA
2021-07-14
62386180
36481445
26550219
333017
543402
92.34
54.0
France
FRA
2021-07-13
62053163
36306450
26390202
806014
587443
91.84
53.74
France
FRA
2021-07-12
61247149
36028594
25848214
638725
566930
90.65
53.32
France
FRA
2021-07-11
60608424
35868419
25364153
168536
558111
89.7
53.09
France
FRA
2021-07-10
60439888
35810087
25253466
483456
562035
89.46
53.0
rows 1–10 of
205
columns 1–10 of
12

Prepare data for the reporting

- Aggregate data by country and transform measures into time series.
- Apply filter to exclude invalid values.
- Get the last percentage of vaccinated people
tbvaccagg=​​ tbvacc//​​groupBySQL[#,{location}]&//​​summarySQL[#,Association["tsdaily"(TimeSeries[dailyvaccinations,{date}]//Normal//Cases[#,{t_,v_}/;NumberQ[v]]&//TimeSeries),​​ "tsvaccperc"(TimeSeries[peoplevaccinatedperhundred,{date}]//Normal//Cases[#,{t_,v_}/;NumberQ[v]]&//TimeSeries)]]&//​​ (*eliminateemptytimeseries*)​​whereSQL[#,Length[tsdaily["DatePath"]]>30&&Length[tsvaccperc["DatePath"]]>30]&;//​​ AbsoluteTiming​​​​tbvaccagg//​​showTableSQL​​
Out[]=
{34.1966,Null}
Total Rows:137 Elapsed:0.
Out[]//TableForm=

Load cases/death Covid-19 data

Source: https://raw.githubusercontent.com/datasets/covid-19/main/data/countries-aggregated.csv
The measures (confirmed, recovered, deaths) are expressed in cumulative values
In[]:=
filecases="https://raw.githubusercontent.com/datasets/covid-19/main/data/countries-aggregated.csv";​​tbcases=importSQL[filecases,",",{"date","country","confirmed","recovered","deaths"}];

Preview data

tbcases//​​(*filteronacountry*)​​whereSQL[#,country"France"]&//​​(*orderbymostrecentdate*)​​orderBySQL[#,{-AbsoluteTime[DateObject[date]]}]&//​​showTableSQL
Total Rows:546 Elapsed:0.541103
Out[]//TableForm=
date
country
confirmed
recovered
deaths
2021-07-20
France
5952339
409403
111715
2021-07-19
France
5934122
408973
111682
2021-07-18
France
5929929
408739
111662
2021-07-17
France
5917397
408709
111657
2021-07-16
France
5906448
408567
111641
2021-07-15
France
5895453
408283
111619
2021-07-14
France
5884395
407965
111609
2021-07-13
France
5882945
407965
111597
2021-07-12
France
5875987
407685
111543
2021-07-11
France
5874719
407391
111515

Prepare data for analysis

- Aggregate data by country and transform measures into time series
- Exclude negative values and apply Differrences to have daily values instead of original cumulative values
- Exclude timeseries with less than 100 datapoints
- Apply a rolling 7 day average on measures.
tbcasesagg=tbcases//​​ groupBySQL[#,{country}]&//​​ summarySQL[#,Association["tsconfirmed"(TimeSeries[confirmed,{date}]//Differences//Normal//Cases[#,{t_,v_}/;v>0]&//TimeSeries),​​ "tsdeaths"(TimeSeries[deaths,{date}]//Differences//Normal//Cases[#,{t_,v_}/;v>0]&//TimeSeries)]]&//​​ whereSQL[#,Length[tsconfirmed["DatePath"]]>100&&Length[tsdeaths["DatePath"]]>100]&//​​ rowMapSQL[#,{tsconfirmed,tsdeaths},MovingMap[Mean,#,Quantity[7,"Events"],Automatic]&]&;//​​ AbsoluteTiming​​​​tbcasesagg//​​showTableSQL​​
Cloud
::timelimit
:This computation has exceeded the time limit for your plan.
​
Out[]=
$Aborted
Join the 2 tables on country
Now all timeseries of interest are in the same table
tbfinal=tbcasesagg//​​ joinSQL[#,tbvaccagg,locationcountry]&;​​​​tbfinal//​​showTableSQL​​
Total Rows:93 Elapsed:1.×
-7
10
Out[]//TableForm=

Comparing countries data

Plot all 3 timeseries on the same rescaled plot to have an idea of correlation.
Select a subset of countries for the analyse
Get the latest vaccination percentage for each country
countries={"France","Sweden","Germany","United Kingdom","Malta","Israel","Italy"};​​​​tbfinal//​​whereSQL[#,ContainsAny[{country},countries]]&//​​addColumnsSQL[#,Association["plot"tsdeaths//Rescale,​​ tsconfirmed//Rescale,​​ tsdaily//Rescale//​​ DateListPlot[#,PlotStyle{Automatic,Automatic,Directive[Red,Thick]},​​ PlotLegends{"deaths","cases","vaccine"}]&,​​ "vaccinationpercent"tsvaccperc["LastValue"]]]&// ​​ selectSQL[#,{country,plot,vaccinationpercent}]&//​​ showTableSQL​​​​​​
Total Rows:7 Elapsed:0.640882
Out[]//TableForm=
country
plot
vaccinationpercent
France
deaths
cases
vaccine
55.97
Germany
deaths
cases
vaccine
59.77
Israel
deaths
cases
vaccine
66.42
Italy
deaths
cases
vaccine
60.81
Malta
deaths
cases
vaccine
87.5
Sweden
deaths
cases
vaccine
59.85
United Kingdom
deaths
cases
vaccine
68.28
Define a new measure to calculate the growth rate of timeseries on last n datapoints
In[]:=
growrate[ts_,n_]:=ts//Normal//Take[#,-n][[{1,-1},2]]&//(Ratios[#]-1)*100&//Last//N;
Calculate infections growth rate for last 30 days and compare for each country vaccination percent
tbfinal//​​addColumnsSQL[#,Association["casesgrowratepercent"growrate[tsconfirmed,30],​​ "vaccinationpercent"tsvaccperc["LastValue"],​​ "plot"tsconfirmed//Rescale,​​ tsdaily//Rescale//​​ DateListPlot[#,PlotStyle{Automatic,Directive[Red,Thick]},​​ PlotLegends{"cases","vaccine"}]&//Quiet​​ ]]&//​​orderBySQL[#,{-casesgrowratepercent}]&//​​selectSQL[#,{country,vaccinationpercent,casesgrowratepercent,plot}]&//​​showTableSQL
Total Rows:93 Elapsed:6.01277
Out[]//TableForm=
country
vaccinationpercent
casesgrowratepercent
plot
Malta
87.5
8893.75
cases
vaccine
Israel
66.42
2177.33
cases
vaccine
Senegal
3.65
1520.77
cases
vaccine
Cyprus
56.89
1425.27
cases
vaccine
Malawi
2.01
1257.71
cases
vaccine
Australia
28.98
800.
cases
vaccine
Zimbabwe
7.97
615.014
cases
vaccine
Spain
63.52
579.754
cases
vaccine
Morocco
31.31
535.5
cases
vaccine
Kazakhstan
26.33
519.827
cases
vaccine
For Malte and Israel there is an significative infection growth while the vaccinations percentage are high but not for Senegal and Malawi.
There is no systematic relationship between higher vaccination percentage and new infections growth
Lets try to correlate the 2 measures
tbfinal//​​addColumnsSQL[#,Association["casesgrowratepercent"growrate[tsconfirmed,30],​​ "vaccinationpercent"tsvaccperc["LastValue"]​​ ]]&//​​summarySQL[#,Association["plot"With[dataplot=MapThread[Callout,{Transpose[{vaccinationpercent,casesgrowratepercent}],country}],​​ corr=Correlation[vaccinationpercent,casesgrowratepercent],​​ dataplot//​​ ListPlot[#,​​ PlotLabelColumn[{"Vaccination vs Cases Growth",Row[{"Correlation: ",ScientificForm[corr,3]}]}],​​ FrameLabel{"Vaccination percent","Recent Cases Growth percent"},​​ FrameTrue,​​ ImageSizeMedium]&]]]&//​​showTableSQL​​​​
Total Rows:1 Elapsed:0.349141
Out[]//TableForm=
plot
We do the same exercise for deaths growth rate vs vaccination rate
tbfinal//​​addColumnsSQL[#,Association["deathsgrowratepercent"(growrate[tsdeaths,30]//Quiet),​​ "vaccinationpercent"tsvaccperc["LastValue"]​​ ]]&//​​summarySQL[#,Association["plot"With[dataplot=MapThread[Callout,{Transpose[{vaccinationpercent,deathsgrowratepercent}],country}],​​ corr=Correlation[vaccinationpercent,deathsgrowratepercent],​​ dataplot//​​ ListPlot[#,​​ PlotLabelColumn[{"Vaccination vs Cases Growth",Row[{"Correlation: ",ScientificForm[corr,3]}]}],​​ FrameLabel{"Vaccination percent","Recent Deaths Growth percent"},​​ FrameTrue,​​ ImageSizeMedium]&]]]&//​​showTableSQL
Total Rows:1 Elapsed:0.299096
Out[]//TableForm=
plot
Voila .. Week correlation for given measures.
Any suggestions or different measures to evaluate the efficacy of vaccination campaigns are welcomed ...
​
​