Someone’s life expectancy is the expected number of years he or she will remain alive. It is an average that is computed for several groups of people of varying specificity, such as the entire global population, newborns in Ghana, or 15-year-old women in Europe. It is a statistic used in many debates, especially in those concerning a country’s (under)development. The statistic is always presented with much confidence, that is, no-one really doubts the accuracy and reliability, which becomes clear in thousands of articles, but let’s pick one:
Or even more specific:
I find such statements truly remarkable, since it is not at all straightforward that we can compute life expectancy statistics with great confidence and accuracy. A great deal of uncertainty enters the calculations in several ways, of which I would like to discuss a few: picking indicators, large prediction horizons, and lacking backtesting.
Picking explanatory variables
If one reads that, for instance, “men’s life expectancy increased by 4.7 years,” then this must mean that at least one underlying fundamental, or explanatory variable, has changed for men compared to the previous calculation. Picking these explanatory variables is not at all a clear business.
Looking at a few models, such as the Lee-Carter model, the Renshaw-Haberman model, and the CBD two-factor model, one may observe that researchers don’t pick such factors – which may be a good thing. Such models take historic matrices of mortality rates, estimate the trends in these matrices, and then generate future mortality rates matrices. If one has computed these, one can subsequently calculate life expectancies by multiplying these future matrices with a current matrix of population figures.
In layman’s terms, they look at the (recent) history, find out what is happening, and assume that “this” – whatever it is, was, or will be – continues to happen in the (near) future. What has surprised many is that the trends that were found seem to be linear: over time, people’s life expectancies increased almost perfectly linearly.
An obvious objection is that future trends may be different from past ones, and need not be linear at all. Researchers have explained the variations in life expectancy figures by a host of factors, including cleaner drinking water, improved sanitation, improved, nutrition, particularly during infancy and childhood, vaccinations, quicker and improved access to high-quality trauma care, improved drugs, more extensive H.I.V. testing and treatment, better care for newborns, etcetera, etcetera.
Such trends may indeed be or have been quite linear, but this may be a case of spurious correlation (if you have a linearly increasing or decreasing time series and another increasing or decreasing one, regardless what they are about, you will find a significant correlation – but this need not mean that causality is at work). Assume that somewhere around 2020 a cure is found for cancer. Or for H.I.V. Or that it becomes possible to “3D-print organs.” Or that a first nuclear war erupts. Such developments would have an immense impact on current life expectancies – yet they cannot be predicted by looking at historical patterns, and thus they cannot be incorporated in today’s calculations.
It is equally clear that on the one hand calculating the likelihood and impact of such events should ideally be part of determining life expectancies, but that on the other hand doing so is and will remain impossible.
Large prediction horizons
Difficulties are aggravated because the prediction horizons are huge. People already reach ages of over 100 years, meaning that future mortality matrices should be calculated at least 100 years into the future. Or, if some other method is used, judgements about future developments regarding factors explaining and affecting life expectancy figures should be drawn for at least 100 years into the future.
Banks have been criticised for that they were unable to correctly predict the number of defaulting clients in the next year. This forecasting task is much simpler and straighforward, and requires a way shorter forecasting horizon (1 year versus, say, 100 years). Such models broke down when something systemic happened (i.e. the subprime mortgage crisis) that the model could not predict, because it had not featured in the historic data that was used to estimate the model. This is very relevant for life expectancy models, since developments as finding a cure to cancer or H.I.V., or the occurrence of a nuclear war, have not yet featured in historic data.
The point? A simple model predicting the number of defaults in a bank’s client portfolio was completely unable to predict the still not so “unpredictable” occurrence of a financial crisis, yet we totally and unreservedly trust models that should somehow take into account potential events that never happened before, over time periods at least 100 times as large.
Large prediction horizons (2)
The critiques just mentioned criticise the models as such – that is, they refer to the uncertainties that are implied by the choice of the model, because of developments that the model either fails to capture entirely or that it is unable to accurate predict. But even if we, for a moment, ignore these objections, and focus on developments that the models should be able to predict, we see that large prediction horizons, even within the model, imply that uncertainty is extremely large.
Prediction or forecasting is conventionally done using so-called ARIMA modelling, which looks at historic data to find patterns, and then extrapolates these patterns. The issue is that uncertainty multiplies (or propagates). That is, a certain prediction for 2020 may depend on a prediction for 2019, which may in turn depend on a prediction for 2018, etcetera. But if the 2014 prediction is a bit uncertain, the 2015 one may be a bit more uncertain, the 2020 one rather uncertain, the 2050 one an educated guess, and the 2080 one a shot in the dark.
This is best understood when considering one of the actual outputs of actual life expectancy predictions. I found the picture below, but much more volatile versions can be found:
It can be seen that at the time of their writing (2006), that the life expectancy of Finnish women was estimated to be somewhere between 81.3 and 82.8 years. In 2040, the 95% uncertainty (hence not the total uncertainty) would be captured by an interval of about 5 years. Imagine what this gap would be for, say, 2080.
I don’t wish to argue that we should not calculate life expectancies anymore, but I would urge journalists and policy-makers not to take life expectancy predictions for granted, and not to take them as some kind of perfect indicator of a country’s development.