With the Covid-19 pandemic, we are living in unprecedented and highly uncertain times: if scientists know very well the identity of the silent killer: SARS-COV-2, much else about the pandemic is still maddeningly unclear: what are the health factors that make some people really sick and others not? Are the models too optimistic or too pessimistic? How long does the immunity last? How many people got infected in reality and are the reported number accurate enough so we can infer anything useful about the epidemic’s true nature? Given that we have never faced an event like this before, this makes the situation even more difficult in terms of uncertainty.
In this context, within the Emergent Alliance, the Regional Risk Index Team set the goal to apply Machine Learning and Statistical tools to better understand the Dynamics of the disease and translate this knowledge into a predictive risk assessment tool, so we can engineer an improved understanding of the Systemic Risk throughout the recovery pathway.
One major difficulty, in today’s landscape, is the lack of information about the disease spread and risk level at the local and subnational level. The macro dynamics are definitely important as they show the global trend, but this is just one part of the story: a growing number of areas are reporting more granular transmission patterns which could represent a significant turn in the battle against Covid-19. By leveraging the large scale and heterogenous data sources, we propose a system that predicts the health risk level at the regional granularity, in different EU geographies.
We know that different local areas and health systems experience the crisis differently. Not only the number of cases differ substantially from place to place but some areas will be less equipped to deal with the crisis. The risk level being driven by multiple and complex factors, our Risk Index should convey as much as possible, this inherent complexity.
Our Regional Risk Index is a composite index that takes into consideration the predicted Infections, in addition to other features representative of the systemic vulnerabilities that pre-existed in the area of interest before the Covid-19 surge. This way, we can account for both Covid-19 specificities, as well as regional characteristics that either support the recovery, or do not provide enough resilience.
Risk Index Formula and Rationale
The current version of the Risk Index is calculated using a Generalized Linear Model (GLM). With this approach, the risk is designed to represent the probability of an adverse event and is processed as a weighted linear combination of the input factors.
The weighted sum is then passed through a Sigmoid link function to ensure the final outputs are between 0 and 1:
Risk Index= f(WiXi ) where f is the Sigmoid function
The weights have been chosen so the prevalence of Covid-19 in the area is amplified by the population density, the pre-existing health issues, the proportion of vulnerable people and the local risk topology, while being counterbalanced by the Country-level Stringency Index .
The Risk Index is a relative measure of how critical a region is in a given point in time: if the area is predicted to experience an increase of Covid-19 cases and in addition, has a poor structural resilience, its Risk Index will be higher, compared to another region where either the reported cases are stabilizing or the existing infrastructure is more robust.
All Data sources are openly available. Some initial pre-processing and data cleaning step