Introduction

The COVID-19 pandemic has been attributed to 6.91 million mortalities globally, up to the 31st March 20231. This has had far reaching implications for public health policy worldwide and led to unprecedented interventions. The clinical severity observed in response to an infection with SARS-CoV-2 has evolved over time as a consequence of emerging variants of concern, vaccination campaigns, high infection attack rates, and changes to the clinical management of patients.

Emerging variants of concern have been the impetus behind resurgent waves of SARS-CoV-2 incidence and changes to the severity profile of infections. The Alpha variant was first sequenced in September 2020 and became the dominant variant in the UK. Relative to wild type, Alpha2 was estimated to have a 62% (Hazard Ratio (HR) – 1.62 (95% CI: 1.48, 1.78)) and 73% (HR – 1.73 (95% CI: 1.41, 2.13)) increased risk of hospitalisation and mortality, respectively. The discernible replacement of Alpha by Delta was detected in early 20213 and it became the dominant variant by June4. Delta was found to have a substantially increased risk of hospitalisation relative to Alpha with a HR of 1.85 (95% CI: 1.39, 2.47)5. Delta was replaced by the Omicron BA.1 in December 20216 with increased vaccine escape noted as a significant factor. Omicron BA.1 was estimated to have an almost threefold reduction in the risk of hospital admission relative to Delta7. In March 2022, Omicron BA.2 replaced Omicron BA.18, and there was found to be limited evidence of a difference in the severity of infection7. Omicron BA.5 replaced Omicron BA.2, in June 2022, as the dominant variant in the UK9. There was evidence that Omicron BA.510 infections may be associated with an increased risk of hospitalisation relative to Omicron BA.2. Following a summer epidemic wave of SARS-CoV-2 infections in 2022, Omicron further diversified. Several of these lineages convergently acquired mutations on the receptor binding domain that are associated with immune evasion11. These lineages include BF.7 (a BA.5.2 derivative), BA.5.3 sub lineages BQ.1 and BQ.1.1, as well as lineages derived from BA.2.75. Notably, a BA.2 recombinant carrying many of these mutations (XBB) drove a wave of incidence in Singapore and later became dominant in the UK12.

The vaccination campaign in the United Kingdom began on the 8th December 202013. The campaign was implemented in phases with the groups prioritised by clinical risk. Phase 1 included 9 high priority groups and began with care home residents, individuals over the age of 70, the clinically extremely vulnerable, frontline healthcare staff and social care workers14. The remaining phase 1 groups included those aged 50 to 69 years old. Subsequently phase 215 was implemented in April 2021 and began a further age stratified approach beginning with those aged 40–49 and concluding with the 18-29 age group. Phase 3, 4, and 5 focused on booster campaigns, the clinically vulnerable, and children over the age of 12.

The primary vaccinations administered in the UK were the AstraZeneca vaccine, Vaxzevria, and the Pfizer vaccine, Comirnaty. The impact of vaccination campaigns on the disease severity in the population has been influenced by the timing of the campaign and the variant specific response. An early study found that Vaxzevria had a vaccine efficacy of 66.7% (95% Confidence Interval (CI): 57.4, 74.0), 14 days after the second dose16 for wild type. The efficacy of the vaccine was estimated to be 81.3% (95% CI: 60.3, 91.2) for individuals that had a longer prime-boost interval. The vaccine efficacy of 2 doses of Vaxzevria against symptomatic infection was estimated to be 70.4% (95% CI: 43.6, 84.5)17 for Alpha. It was later estimated that the vaccine effectiveness for Delta was 67.0% (95% CI: 61.3, 71.8)18 and limited protection against symptomatic disease was found for Omicron BA.119. There was no significant difference found in the vaccine effectiveness for Omicron BA.1 and Omicron BA.220 and there was limited evidence for the effectiveness of two doses of Vaxzevria for Omicron BA.4/BA.5. Vaxzevria was the primary vaccine administered to those aged over 40 in the UK after concerns of haemostatic complications in younger ages. Comirnaty was administered at the start of the vaccination campaign and subsequently it was primarily administered to those aged under 40. The vaccine efficacy for two doses of Comirnaty was estimated to be 96.2% (95% CI: 93.3, 98.1)21 early in the pandemic. Later studies estimated the vaccine effectiveness to be 89.5% (95% CI: 85.9, 92.3) for Alpha22 and 88.0% (95% CI: 85.3, 90.1) for Delta18. Evidence of the two dose effectiveness of Comirnaty for Omicron subvariants23 was limited with wide uncertainty24. For the 3rd and 4th booster vaccinations, the Joint Committee on Vaccination and Immunisation stated25 a preference for Comirnaty or a half dose of Spikevax and where mRNA vaccines could not be used then individuals were offered Vaxzevria.

On the 15th August 2022, the Medicines and Healthcare products Regulatory Agency (MHRA) approved the use of a bivalent COVID-19 vaccine made by Moderna, which targeted both the 2020 SARS-CoV-2 viral strain and Omicron BA.126. The Pfizer/BioNTech bivalent vaccine was approved by the MHRA, less than a month later, on the 3rd September 202227, which also targeted the 2020 SARS-CoV-2 viral strain and Omicron BA.1. A second bivalent vaccine from Pfizer/BioNTech was approved by the MHRA in November 2022, which targeted Omicron BA.4/BA.5 and the 2020 viral strain. A study published near the end of 202328 found that the vaccine effectiveness of bivalent BA.1 boosters against hospitalisation peaked at 53.0% (95% CI: 47.9, 57.5) 2 to 4 weeks after a dose was administered and at 10 weeks this had reduced to 35.9% (95% CI: 31.4, 40.1). In September 2023 the MHRA approved Pfizer/BioNTech29 and Moderna’s30 bivalent vaccine to target Omicron XBB.1.5; with ongoing work to understand this vaccine’s effectiveness against emerging variants. Analysis of vaccine effectiveness through population-based studies in the UK has been impacted by the cessation of free testing in the UK. This limits the understanding of variant prevalence and impacts the means to adjust for past infection in statistical analyses, with limited information on the infection ascertainment rates.

Improvements in the medical management of patients infected with COVID-19 has reduced the hospitalisation and fatality risk for those infected with the virus. Research found the use of non-invasive continuous positive airways pressure and awake prone positioning to be associated with improved patient outcomes31. To try and reduce the risk of severe disease in the clinically extremely vulnerable, anti-viral medicine, and neutralising monoclonal antibodies have been made available in the community and within Secondary Care32. This has included nirmatrelvir and ritonavir (Paxlovid), sotrovimab (Xevudy), remdesivir (Veklury), and molnupiravir (Lagevrio)33.

This paper describes changes over time to the real-time infection hospitalisation risk (IHR) and infection fatality risk (IFR) using the Office for National Statistics Coronavirus Infection Survey (ONS CIS) and Real-time Assessment of Community Transmission (REACT) prevalence survey. We assess the impact by region and age over the length of the pandemic.

Results

Parameter estimation

The estimated PCR positivity length for every dominant variant across the epidemic in the UK can be seen in Supplementary Fig. 1 and Supplementary Table 1. The temporal changes in the time from symptom onset to hospitalisation and death by age groups can be seen in Supplementary Figs. 2 and 3, respectively. The PCR test sensitivity modelling for every dominant variant and each age group can be seen in Supplementary Fig. 4. The modelling estimates for the time from symptom onset to a first positive test by age and region can be seen in Supplementary Figs. 59.

Real-time infection hospitalisation risk – national

The IHR in England peaked at 3.39% (95% Credible Intervals (CrI): 2.79, 3.97) in January 2021, during the period when the Alpha variant was dominant and most of the population were unvaccinated (Fig. 1). After the rollout of the vaccination programme, the IHR started declining rapidly. Near the end of the Delta period, in November 2021, the IHR had reduced to 0.58% (95% CrI: 0.50, 0.67) and the lowest IHR was estimated to be 0.32% (95% CrI: 0.27, 0.39) in December 2022. Since this time, the IHR has fluctuated and it was estimated to be 0.47% (95% CrI: 0.39, 0.59) by February 2023. Overall, the IHR has declined by 86.03% (80.86, 89.35) since January 2021. The REACT and ONS CIS prevalence estimates and hospital admissions attributed to COVID-19 can be seen in Supplementary Figs. 10 and 11, respectively.

Fig. 1: The Infection Hospitalisation Risk for England.
figure 1

A The posterior estimates of the median infection hospitalisation risk for England, based on the combined REACT and ONS sampling, with 95% credible intervals. B Posterior estimates of the median infection hospitalisation risk for England, based on REACT sampling, with 95% credible intervals. C Posterior estimates of the median infection hospitalisation risk for England, based on ONS sampling, with 95% credible intervals. Not all estimates derived from the ONS CIS study have been plotted. The data for the figure are provided as a Source Data file.

Real-time infection hospitalisation risk – age groups

The IHR peaked in January and February 2021 for the age groups over 44 (Table 1 and Fig. 2). The IHR peaked later for those aged between 6 to 24 (March 2021) and 25 to 44 (April 2021). The IHR declined in every age group from May 2021 and reached the lowest estimated value in April 2022 for the age groups over 54 and in December 2022 for the age groups under 55. Since this time, we have seen fluctuations in the estimated IHR for every age group. Since the peak in early 2021 until February 2023 the IHR has decreased by 92.51% (88.84, 94.52) for the ≥ 75 age group; 92.25% (88.04, 94.63) for the 65 to 74 age group; 91.90% (87.86, 94.37) for the 55 to 64 age group; 91.51% (87.90, 94.00) for the 45 to 54 age group; 92.72% (89.50, 94.89) for the 25 to 44 age group; and 88.30% (80.48, 92.25) for the 6 to 24 age group. The REACT and ONS CIS prevalence estimates and hospital admissions attributed to COVID-19 for each age group can be seen in Supplementary Figs. 12 and 13, respectively. The full results for each prevalence study and age group can be seen in Supplementary Figs. 1419.

Table 1 Key Estimates of the Infection Hospitalisation Risk by Age
Fig. 2: The Infection Hospitalisation Risk by Age.
figure 2

The posterior estimates of the median infection hospitalisation risk by age group, based on the combined REACT and ONS sampling, with 95% credible intervals. The data for the figure are provided as a Source Data file.

Real-time infection hospitalisation risk – regions

Following the national estimates, we estimated the highest IHR in all NHS regions was in January and February 2021 (Fig. 3 and Table 2). After early 2021, the IHR rapidly declined in all NHS regions. The IHR reached the lowest estimated value in March and April 2022 in the East of England, London, South East, and the North East and Yorkshire when Omicron BA.2 was dominant. The IHR continued declining until October 2022 in the North West and until December 2022 in the Midlands and South West. All regions have since seen a fluctuating pattern in the IHR. The REACT and ONS CIS prevalence estimates and hospital admissions attributed to COVID-19 for each NHS region can be seen in Supplementary Figs. 20 and 21, respectively. The full results for each study and region can be seen in Supplementary Figs. 22 to 28.

Fig. 3: The Infection Hospitalisation Risk for the Regions of England.
figure 3

The posterior estimates of the median infection hospitalisation risk for the regions of England, based on the combined REACT and ONS sampling, with 95% credible intervals. The data for the figure are provided as a Source Data file.

Table 2 Key Estimates of the Infection Hospitalisation Risk for the Regions of England

Real-time infection fatality risk – national

The infection fatality risk in England peaked at 0.97% (95% CrI: 0.62, 1.36) in January 2021, after which time the IFR began to rapidly decline (Fig. 4). In November 2021, at the end of the Delta period, the IFR had reduced to 0.11% (95% CrI: 0.08, 0.15). The IFR continued to decrease through the Omicron BA.1 and Omicron BA.2 period reaching 0.06% (95% CrI: 0.04, 0.08) in April 2022. Since this time, we have observed fluctuations in the IFR and at the end of Februrary 2023 it was estimated to be 0.10% (95% CrI: 0.07, 0.16). Since the peak in January 2021 the IFR had declined, overall, by 89.67% (80.18, 93.93) up to February 2023. The national REACT and ONS CIS prevalence estimates and deaths attributed to COVID-19 can be seen in Supplementary Figs. 10 and 29.

Fig. 4: The Infection Fatality Risk for England.
figure 4

A The posterior estimates of the median infection fatality risk for England, based on the combined REACT and ONS sampling, with 95% credible intervals. B Posterior estimates of the median infection fatality risk for England, based on REACT sampling, with 95% credible intervals. C Posterior estimates of the median infection fatality risk for England, based on ONS sampling, with 95% credible intervals. Not all estimates derived from the ONS CIS study have been plotted. The data for the figure are provided as a Source Data file.

Real-time infection fatality risk – age groups

In every age group we estimated the highest IFRs to be in January and February 2021 (Table 3 and Fig. 5). For most of the age groups the IFR reached the lowest estimated value in March and April 2022 with the exception of the 6 to 24 age group which reached its lowest estimated value in February 2023. From early 2021 until February 2023 we have seen a decline in the IFR of 95.95% (91.95, 97.70) for the ≥ 75 age group; 95.19% (91.00, 97.18) for the 65 to 74 age group; 94.19% (88.93, 96.71) for the 55 to 64 age group; 92.92% (87.74, 95.72) for the 45 to 54 age group; 90.71% (82.53, 95.05) for the 25 to 44 age group; and 92.74% (78.17, 98.35) for the 6 to 24 age group. The REACT and ONS CIS prevalence estimates and deaths attributed to COVID-19 for each age group can be seen in Supplementary Figs. 12 and 30, respectively. The full results for each study and age group can be seen in Supplementary Figs. 3136.

Table 3 Key Estimates of the Infection Fatality Risk by Age
Fig. 5: The Infection Fatality Risk by Age.
figure 5

The posterior estimates of the median infection fatality risk by age group, based on the combined REACT and ONS sampling, with 95% credible intervals. The data for the figure are provided as a Source Data file.

Real-time infection fatality risk – regions

Similar to the trends seen in the IHR, we found the highest estimated IFR for most regions to be in January 2021 with the exception of London and the North East that peaked in November 2020 (Table 4 and Fig. 6). The IFR reached the lowest estimated value in March and April 2022 for every English region except the South West, South East, and East of England (estimated to be in July 2022). We subsequently have observed fluctuations in the estimated IFR for every region. The REACT and ONS CIS prevalence estimates and deaths attributed to COVID-19 for each region can be seen in Supplementary Figs. 20 and 37, respectively. The full results for each study and region can be seen in Supplementary Figs. 38 to 46. The regional age composition and index of multiple deprivation scores can be seen in Supplementary Fig. 47.

Table 4 Key Estimates of the Infection Fatality Risk for the Regions of England
Fig. 6: The Infection Fatality Risk for the Regions of England.
figure 6

The posterior estimates of the median infection fatality risk for the regions of England, based on the combined REACT and ONS sampling, with 95% credible intervals. The data for the figure are provided as a Source Data file.

Discussion

Over the course of the pandemic in England, the severity from infection of SARS-CoV-2 has substantially decreased. Changes to the IHR and IFR have been driven by a combination of vaccination, immunity from infection, patient management, and the demographic distribution of infections. We observe that since the January 2021 peak until February 2023, there has been a decline of 86.03% (80.86, 89.35) and 89.67% (80.18, 93.93) in the IHR and IFR, respectively. The early decline, since January 2021, was likely a consequence of the rollout of the vaccination programme, which reached the oldest and most vulnerable individuals in December 2020 and January 2021. Consequently, we observe a later peak in March and April 2021 in the IHR for the age groups under 45. We observed considerable regional heterogeneity at the start of the pandemic, which has substantially reduced post vaccination and following high infection attack rates in the population. Nationally, the IHR and IFR continued to decline until December and April 2022, respectively, which followed Autumn and Winter booster campaigns in England. However, the period following early 2022 has been characterised by an undulatory pattern in the IFR and IHR in response to the timing of vaccination campaigns, resurgent epidemic waves, and emerging variants. We estimated by the end of the study that 4.73 (3.85, 5.93) individuals in 1,000 that are infected with SARS-CoV-2 will be hospitalised and that 1.00 (0.67, 1.56) individual in 1,000 that are infected will die.

Early point estimates of the IFR in 2020, calculated from antibody surveys, ranged from 1.15% (95% Prediction Interval Range (PI): 0.78%, 1.79%) in high income countries to 0.23% (95% PI: 0.14%, 0.42%) in low income countries34. Further IFR estimates retrospectively calculated, of the largely pre-vaccination period in the UK, have ranged from 1.57% (95% Uncertainty Interval (UI): 1.22, 2.47) in April 2020 to 1.20% (95% UI: 0.88%, 1.73%) in January 202135. The study period for this paper began on the 8th November 2020 and therefore does not cover the early pandemic. However, we did not find a reduction in the IFR until after January 2021. Nonetheless, the considerable uncertainty in these estimates overlap with the credible intervals of this study’s estimates in January 2021. Some of these early serological studies were not adequately powered, with regards to sample size, and draw from existing surveys34,35,36,37 that may not be representative of the general population.

To calculate incidence and the time to a clinical event we used temporally variable parameters. The time from symptom onset date (used here as a proxy for infection date) to hospitalisation and death evolves in response to epidemic phases38, changes to clinical management, prior immunity, and the pathogenesis of novel variants39. The length of the infectious period of a randomised sampled cohort changes in response to epidemic phases. PCR positivity was found to vary across the variants that became dominant in the UK. We found the Alpha variant to have the longest PCR positivity and a reduction in length was estimated for the Omicron variants.

The criteria for a hospital admission or mortality attributed to COVID-19 can be multifaceted, obfuscated by the comorbidities, clinical policy, and extrinsic factors including hospital pressure. The absolute value of the IHR and IFR estimates are sensitive to the criteria used to define a hospitalisation or mortality from COVID-19. The definition commonly used in the pandemic within the UK has been 28-day deaths40, which was thought to be likely an underestimate of true deaths from COVID-1941. However, the 60-day deaths definition could overestimate COVID-19 deaths in some subgroups, particularly in older individuals who have higher baseline mortality rates. Death certificate confirmed COVID-19 deaths suffered from changes to death reporting practices across the pandemic40. Determining the cause of a hospital admission from surveillance data requires assumptions without further clinical information. Hospital surveillance data can include some nosocomial patients as well as patients admitted to hospital for other illnesses who tested positive for COVID-19 on admission. Although the absolute values of the IHR and IFR are sensitive to these criteria, the temporal trends are robust, provided the definitions remain consistent over time.

The method used in this paper calculates the proportion of infections that led to deaths and hospital admissions attributed to COVID-19. Since we are interested in the IFR/IHR across grouped time-periods, or rounds, we made an approximation to the method that relies on the assumption that the risk is constant within each round. This assumption simplifies the method, without substantially affecting the within round estimates. This method is slightly affected by the epidemic phase severity bias42, whereby severity is overestimated during phases of growth and underestimated during phases of decline. To adjust for this bias, we would need to adjust for different incubation period distributions conditional on patient outcomes. That is, we would need to construct two prevalence time-series, prevalence among individuals that will get admitted to hospital or die and prevalence among individuals that will not. These two time-series would need to be deconvolved using different incubation periods, corresponding to the outcome, to estimate the bias-corrected infection incidence time-series. To obtain these two mutually exclusive prevalence time-series, data linking prevalence to patient outcome and exposure date is needed. The REACT survey data, available to this study, did not include the personally identifiable information needed to link to patient outcomes and this was not possible with ONS CIS due to the conditions of the participant consent agreement.

In this paper we have assessed temporal changes to the real-time infection hospitalisation and fatality risk from COVID-19. There has been considerable regional heterogeneity, which is likely a consequence of differing infection attack rates, distinct age compositions, and the relative differences in deprivation across England. At the end of February 2023, the IHR and IFR in England are estimated to be 0.47% (95% CrI: 0.39%, 0.59%) and 0.10% (95% CrI: 0.067%, 0.16%), respectively.

Methods

This section describes the methods used to calculate the real-time IFR and IHR from the REACT and ONS CIS studies. We discuss the calculations for the viral parameter history, which includes calculating the length of PCR positivity by variant, PCR test sensitivity by variant, and the time delay modelling from symptom onset to a first positive test, hospital admission and death. For each study (Supplementary Fig. 48) we describe the modelling to calculate prevalence, incidence, and infection severity risk.

Epidemiological data

There were two large surveys in the United Kingdom that provided real-time estimates of SARS-CoV-2 positivity: the ONS Coronavirus Infection Survey43 and the REACT 1 antigen survey coordinated by Imperial College London in conjunction with Ipsos MORI44. ONS data was sourced through the Secure Research Service (SRS)45 where we extracted demographic and regional breakdowns of the Reverse Transcription Polymerase Chain Reaction (RT-PCR) test results. Non-identifiable aggregated REACT study data was provided through a data agreement between UKHSA and Imperial College London.

REACT 1 was a repeat cross-sectional study that estimated SARS-CoV-2 prevalence in England46 from May 2020 until March 2022. The study aimed to sample between 95,000 to 175,000 individuals randomly for each survey round over the age of 547, which was updated from 100,000 to 150,000 individuals in the original study protocol46. The study sent out recruitment letters on a 4–6-week basis to a randomised sample from the National Health Service patient register that aimed to be nationally representative. For children under the age of 18 the recruitment letters were sent to a parent or guardian. Individuals that chose to participate were sent throat and nasal swabs, which were sent for RT-PCR testing. The participants were then invited to complete an online questionnaire, which included demographics, infection history, and behavioural topics. Overall, the REACT 1 and REACT 2 studies found a response rate of 23.4% across the study period48. The REACT reports and study protocol have been published through Imperial College London and Welcome Open Research46,49. Supplementary Figs. 4951 describes the sample sizes for each age group and region over time.

The ONS CIS study was produced by the ONS in collaboration with the Wellcome Trust, University of Oxford, IQVIA, Lighthouse Laboratories, Joint Biosecurity Centre, UKHSA and the University of Manchester50. The study began in April 2020 as a pilot and invited 20,000 households from the ongoing Labour Force Survey51 and those that had agreed previously to participate in the Opinions and Lifestyle Survey52. Then in August 2020, the survey expanded to invite a randomised household sample from AddressBase53. The extended household study aimed to achieve around 150,000 swabs a fortnight in England between October 2020 and March 2023. The study requested that the entire household (over the age of 2) take a nose and throat swab, which was sent to the Lighthouse Laboratory for RT-PCR testing. Tests were conducted by home visits from a study worker and in April 2022 a proportion of participants of the study were asked to post their samples. At this time the number of swabs required reduced by 25% with an aim to swab 227,300 individuals every 28 days in England. From the 1st August 2022, the study collections had moved to being entirely remote. The ONS reported50 an attrition rate of 0.62% in December 2020 that fluctuated between a high of 1.37% in July 2021 to the lowest rate of 0.32% in December 2021. The ONS study paused at the end of March 2023 with the intention to restart later in the year. Supplementary Figs. 4951 describes the sample sizes for each age group and region over time.

Mortality data, subset by age and geography, were sourced from the UKHSA COVID-19 death linelist. To limit capturing deaths that were less likely to be linked to a COVID-19 infection we only included deaths that had occurred 60 days following a positive RT-PCR test. Hospitalisation attributed to an infection with COVID-19 were collected from the NHSE&I situational report data54. Mortality data was available for the 9 English regions55 and hospital data was reported for the 7 NHS English Health regions56.

Population size data, stratified either by age group or region, were sourced from the ONS at a yearly resolution. This data excludes communal establishments, as these were not sampled during the REACT or ONS surveys. Linear interpolation was used to provide estimates of population sizes for the midpoint of each round for each of the prevalence studies.

In order to provide context for the estimated temporal changes in IHR and IFR, we extracted data on sequenced cases from the Second Generation Surveillance System at the UKHSA, appending metadata on the age and residential region for the case. To complement the national, regional and age stratified analyses we calculated the proportion of sequenced cases associated with wild type, Alpha, Delta, and Omicron lineages.

Due to data protection regulations the model methodologies were developed to work across different data platforms including the ONS SRS, the UKHSA Halo system and Public Health England data infrastructure. The study period of this paper is from the 8th November 2020 until March 2023.

Methodology outline

Our methodology for calculating the infection severity risk features two key steps. Firstly, we must take the number of positive and negative tests for each survey round and estimate the number of new infections for that round, referred to as the incidence for that round. This requires us to adjust for several quantities: positivity duration, delay from infection to testing positive, test sensitivity and test specificity. Given an estimate of the round incidence, we then produce an estimate of how many clinical outcomes, such as deaths or hospitalisations, are attributed to a given round. This is achieved by estimating the delay from symptom onset to clinical outcome, and then temporally adjusting clinical outcomes. Furthermore, many of these key parameters are known to vary over time, variant type, and age group. We first detail the models used to estimate these key disease history parameters. Then, we will outline our final methodology for calculating the rate of severe outcomes, first using each prevalence study in isolation, then combining the results of the two studies.

A Bayesian methodology is used throughout, with all statistical models implemented using CmdStanR (version 0.6.1)57, and posterior sampling performed using Hamiltonian Markov Chain Monte Carlo (MCMC). For each model we run 4 chains, each with 1000 warmup iterations followed by 1000 sampling iterations. Convergence was assessed using the \(\hat{R}\) statistic58, with convergence declared if \(\hat{R} \, < \, 1.01\).

Propagation of uncertainty

The data used in this paper is stored in several different research environments, therefore it is not possible for this method to be implemented in a single Stan Bayesian program. Consequently, to propagate the uncertainty it is necessary to pass the posterior estimates from the models that estimate key disease parameters to later models that estimate the incidence or calculate the rate of severe outcomes. We have not made this explicit in the following methods and instead summarise our approach here.

If a model \({{{{{{\mathcal{M}}}}}}}_{1}\) depends on a parameter \({{{{{\rm{\theta }}}}}}\) that is obtained from model \({{{{{{\mathcal{M}}}}}}}_{0}\), then this is provided to model \({{{{{{\mathcal{M}}}}}}}_{1}\) via the prior

$${{{{{\rm{\theta }}}}}}\sim {{{{{\mathcal{D}}}}}}\left(\hat{{{{{{{\rm{\mu }}}}}}}_{{{{{{\rm{\theta }}}}}}}},\hat{{{{{{{\rm{\sigma }}}}}}}_{{{{{{\rm{\theta }}}}}}}}\right),$$

where \({{{{{\mathcal{D}}}}}}\) is an appropriate parametric distribution. For example, a beta distribution is an appropriate choice to describe the posterior distribution of parameters bounded between 0 and 1. The values of \(\hat{{{{{{{\rm{\mu }}}}}}}_{{{{{{\rm{\theta }}}}}}}},\hat{{{{{{{\rm{\sigma }}}}}}}_{{{{{{\rm{\theta }}}}}}}}\) were obtained by using maximum likelihood to fit \({{{{{\mathcal{D}}}}}}\) to the posterior draws of \({{{{{\rm{\theta }}}}}}\) from model \({{{{{{\mathcal{M}}}}}}}_{0}\). For the parameters that do need to be provided between models, we found that either normal or beta distributions were able to provide satisfactory fits for the parameters.

Disease history parameter estimation

Positivity duration

From the ONS COVID Infection Survey, we have longitudinal testing data for individuals. After initially entering the survey, individuals test weekly for the first four weeks, before testing monthly for the remainder of their time on the survey. We estimate the duration of positivity \({{{{{{\rm{\tau }}}}}}}_{{{{{\rm{pos}}}}}}\in {{\mathbb{R}}}_{+}\), defined as the delay from when a case first becomes positive to when a case ceases to be positive, by decomposing into two delays such that \({{{{{{\rm{\tau }}}}}}}_{{{{{\rm{pos}}}}}}={{{{{{\rm{\tau }}}}}}}_{{{{{\rm{fp}}}}}}+{{{{{{\rm{\tau }}}}}}}_{{{{{\rm{EoP}}}}}}\), where \({{{{{{\rm{\tau }}}}}}}_{{{{{\rm{fp}}}}}}\in {{\mathbb{R}}}_{+}\) is the delay from symptom onset to first testing positive, and \({{{{{{\rm{\tau }}}}}}}_{{{{{\rm{EoP}}}}}}\in {{\mathbb{R}}}_{+}\) is the delay from when the case first tests positive to when the case ceases to test positive. As part of this, we make the assumption that time of symptom onset approximates the time at which the case becomes positive59. While it would be possible to not make this assumption and use an interval censoring model to estimate the delay from the infection becoming positive to the infection testing positive, due to the size of the intervals relative to the delay, the uncertainty on any estimates produced using this approach would be too large and would consequently degrade results (please see a schematic in Supplementary Fig. 52).

Delay from first positive test, to the end of positivity

For each case, data can be obtained on two delays: the delay from the first positive test to the last positive test followed by a negative test \({{{{{{\rm{\tau }}}}}}}_{{{{{\rm{lp}}}}}}\in {{\mathbb{R}}}_{+}\), and the delay from the first positive test to the first negative test \({{{{{{\rm{\tau }}}}}}}_{{{{{\rm{fn}}}}}}\in {{\mathbb{R}}}_{+}\). Therefore, we have that \({{{{{{\rm{\tau }}}}}}}_{{{{{\rm{EoP}}}}}}\in \left[{{{{{{\rm{\tau }}}}}}}_{{{{{\rm{lp}}}}}},{{{{{{\rm{\tau }}}}}}}_{{{{{\rm{fn}}}}}}\right]\).

It is likely that the test positivity duration has changed with the emergence of different variants, therefore we condition \({{{{{{\rm{\tau }}}}}}}_{{{{{\rm{EoP}}}}}}\) upon the dominant variant at the time of the case’s infection. We also make the assumption that the positive duration of each case is distributed according to a Weibull distribution, and for each variant we perform interval-censored regression to estimate the positive duration distribution. This is achieved by defining a date range, over which individuals who first test positive within this range are assumed to be infected with that variant. The date ranges used are:

$$t\le 2020-11-07 \sim {{{{\rm{Wild type}}}}},$$
$$2020-11-08\le t\le 2021-05-08 \sim {{{{\rm{Alpha}}}}},$$
$$2021-05-09\le t\le 2021-12-18 \sim {{{{{\rm{Delta}}}}}},$$
$$2021-12-19\le t\le 2022-03-06 \sim {{{{\rm{BA.}}}}}1,$$
$$2022-03-07\le t\le 2022-05-22 \sim {{{{\rm{BA.}}}}}2,$$
$$2022-05-23\le t\le 2022-10-08 \sim {{{{\rm{BA.}}}}}4{{{{\rm{/}}}}}5{{{{\rm{,}}}}}$$
$$2022-10-09\le t\le 2023-03-31 \sim {BQ}/{CH}/{XBB}.$$

Letting \({\tau }_{{{{{\rm{EoP}}}}}}^{\left(i\right)}\) denote the delay for the \({i}^{{th}}\) case, we assume that \({{{{{{\rm{\tau }}}}}}}_{{{{{\rm{EoP}}}}}}^{\left(i\right)} \sim {{{{\rm{Weibull}}}}}\left(\alpha,{{{{{{\rm{\lambda }}}}}}}_{{{{{{\rm{v}}}}}}}\right)\). Here, the shape parameter \({{{{{\rm{\alpha }}}}}}\in {{\mathbb{R}}}_{+}\) is shared across all variants with the rate \({{{{{{\rm{\lambda }}}}}}}_{{{{{{\rm{v}}}}}}}\in {{\mathbb{R}}}_{+}\) parameter conditional upon the variant assigned to the \({i}^{{th}}\) case, where \({{{{{\rm{v}}}}}}\in \{1,\ldots,7\}\) denotes which variant the \(i\)th case is assigned to.

For the \(i\)th case, we must compute the likelihood for the event

$${\tau }_{{{{{\rm{EoP}}}}}}\in \left[{\tau }_{{{{{\rm{lp}}}}}},{\tau }_{{{{{\rm{fn}}}}}}\right]$$
(1)

which has a likelihood given by

$${\mathbb{P}}\left({\tau }_{{{{{\rm{EoP}}}}}}\in \left[{\tau }_{{{{{\rm{lp}}}}}},{\tau }_{{{{{\rm{fn}}}}}}\right] |{{{{{\rm{\alpha }}}}}},{\lambda }_{{{{{{\rm{v}}}}}}}\right)={F}_{{{{{\rm{weibull}}}}}}\left({\tau }_{{{{{\rm{fn}}}}}};{{{{{\rm{\alpha }}}}}},{\lambda }_{{{{{{\rm{v}}}}}}}\right)-{F}_{{{{{\rm{weibull}}}}}}\left({\tau }_{{{{{\rm{lp}}}}}};{{{{{\rm{\alpha }}}}}},{\lambda }_{{{{{{\rm{v}}}}}}}\right).$$
(2)

For our priors, we let \({{{{{\rm{\alpha }}}}}}\sim {{{{\rm{Exponential}}}}}\left(0.1\right)\), and \({\lambda }_{{{{{{\rm{v}}}}}}}{{{{{\mathcal{\sim }}}}}}{{{{{\mathcal{N}}}}}}\left({{{{\mathrm{15,10}}}}}\right)\).

Our assumption that the shape parameter is shared across all variants is due to the presence of large censoring intervals that make inferring the shape of the distribution difficult. Consequently, we find it necessary to use data from multiple variants to infer the shape of the Weibull distribution.

Onset to first positive test delay

For symptomatic individuals with a positive COVID-19 test, the ONS COVID Infection Survey reports symptom onset date. This allows data to be extracted on symptom onset and first positive test time for each patient. The mean time from symptom onset to a positive test is used as a proxy measure to estimate the temporal variation in PCR positivity by approximating the average time from becoming positive to testing positive.

This distribution is likely to vary significantly compared to the community COVID-19 testing, because community testing is based on healthcare seeking behaviour among the general population as opposed to randomised testing within the population. This delay is further affected by epidemic phases, whereby during times of growth the observed delays are shorter, and during times of decay the observed delays are longer, consequently it is necessary for the parameters of this delay to be modelled as time-varying. After visualising the observed delays, we find that the skew-normal distribution is the only positive unbounded continuous distribution that would be appropriate to model this distribution that is available in Stan. Other distributions available in Stan were either symmetric, featured heavy tails or had other undesirable properties that meant that they were not appropriate for modelling the data.

Let \({{{{{{\rm{\tau }}}}}}}_{{{{{\rm{fp}}}}}}^{\left(i\right)}\in {\mathbb{R}}\) be the delay from the \(i\)th case developing symptoms to first testing positive. Under our assumptions we have that

$${\tau }_{{{{{\rm{fp}}}}}}^{\left(i\right)}\sim {{{{\rm{SkewNormal}}}}}\left({\xi }_{k},{\omega }_{k},{{{{{{\rm{\upsilon }}}}}}}_{k}\right),$$
(3)

where \(\underline{\xi }\in {\mathbb{R}}^{K},\,\underline{\omega }\in {\mathbb{R}}_{+}^{K}\,{{{{{\rm{and}}}}}}\,\underline{\upsilon }\in {\mathbb{R}}^{K}\) are the location, scale, and shape parameters of the skew normal distribution respectively, and \({{{{{\rm{k}}}}}}\in \{{{{{\mathrm{1,2}}}}},\ldots,{{{{{\rm{K}}}}}}\}\) denotes which survey round the \(i\)th observation belongs to. The parameters of the skew normal are modelled using first order random walk smoothing priors, which enforce smoothness by assuming that increments of the random walk are normally distributed, i.e. let \(\underline{x }=(x_{1},\ldots,x_{K})\in {{\mathbb{R}}}^{ K }\) be a first order random walk (\({RW}1\)), then we have that \({x}_{i+1}-{x}_{i}{{{{{\mathcal{ \sim }}}}}}{{{{{\mathcal{N}}}}}}\left(0,{{{{{{\rm{\sigma }}}}}}}^{2}\right)\) where \({{{{{\rm{\sigma }}}}}}\in {{\mathbb{R}}}_{+}\) is a hyperparameter to be estimated that controls the smoothness of the random walk. Hence, for modelling the parameters of the skew normal, we let

$$\underline{\xi } \sim RW1({\sigma }_{\xi })$$
$$\log (\underline{\omega }) \sim RW1({\sigma }_{\omega })$$
$$\underline{\upsilon } \sim R{{{{{\rm{W}}}}}}1({\sigma }_{\upsilon })$$

where \({{{{{{\rm{\sigma }}}}}}}_{{{{{{\rm{\xi }}}}}}},{{{{{{\rm{\sigma }}}}}}}_{{{{{{\rm{\omega }}}}}}},{{{{{{\rm{\sigma }}}}}}}_{{{{{{\rm{\upsilon }}}}}}}\sim {{{{{{\mathcal{N}}}}}}}_{+}\left(0,1\right)\).

A survey round is defined as the sampling period that was determined by the REACT (typically 2 weeks) and the ONS CIS study.

Onset to clinical event delay

To measure the distribution of delays from onset to clinical event, we use the Secondary Uses Service (SUS)60 data from the NHS and the UKHSA death line list data sets. These data are daily censored, so we consider this as doubly-interval censored data. In these data, we only observe patients conditional on the clinical event occurring, which introduces right-truncation, since data are only observed before the final day of data collection T, which in this case is 23rd April 2023.

To estimate the mean delay from onset (as reported by patients) to hospitalisation or death, we fit to the data using a Weibull and lognormal distribution respectively, with both models accounting for interval censoring and right truncation. These distributions were selected as the best performing distributions out of the gamma, Weibull, and lognormal distributions, according to the Pareto-smoothed importance sampling leave-one-out cross-validation scores61. We fit the models to data aggregated into three-month periods by symptom onset date, in order to obtain time-varying delay parameters. The method here adapts the methods from Ward & Johnsen38, Ward et al.62, and Vekaria, et al.63.

In this method, we assume that symptom onset time \(S\in {\mathbb{Z}}\) for each individual sits within an interval \(\left[{s}_{1},{s}_{2}\right]\), where \({s}_{1}\) is the reported symptom onset date and \({s}_{2}\) is the day after, i.e., \({s}_{2}={s}_{1}+1\). Similarly, the clinical event time \(E\in {\mathbb{Z}}\) sits within an interval \(\left[{e}_{1},{e}_{2}\right]\) where \({e}_{2}={e}_{1}+1\). The likelihood of observing a given clinical event time, conditional on the observed onset interval, is given by:

$${\mathbb{P}}\left(E\in \left[{e}_{1},{e}_{2}\right] | S\in [{s}_{1},{s}_{2}],E < T\right)=\frac{{\mathbb{P}}\left(E\in \left[{e}_{1},{e}_{2}\right],S\in [{s}_{1},{s}_{2}]\right)}{{\mathbb{P}}(E < T,S\in [{s}_{1},{s}_{2}])}.$$
(4)

This likelihood could be modelled by numerically integrating across the observation intervals. However, this would be very computationally expensive. Instead, we can include estimated event times for each patient as latent variables within our model62, which we assume to be uniformly distributed across the observation interval. Introducing these latent variables \({e}^{*}\) and \({s}^{*}\), our likelihood function simplifies to

$${\mathbb{P}}\left({E=e}^{*}{|S}={s}^{*},E < T\right) =\frac{{\mathbb{P}}\left(E={e}^{*},S={s}^{*}\right)}{{\mathbb{P}}\left(E < T,S={s}^{*}\right)}\\ =\frac{{\mathbb{P}}\left(E={e}^{*}{|S}={s}^{*}\right){\mathbb{P}}\left(S={s}^{*}\right)}{{\mathbb{P}}\left(E < {T|S}={s}^{*}\right){\mathbb{P}}\left(S={s}^{*}\right)}\\ =\frac{{\mathbb{P}}\left(E={e}^{*}{|S}={s}^{*}\right)}{{\mathbb{P}}\left(E < {T|S}={s}^{*}\right)}\\ =\frac{{f}_{\theta }\left({e}^{*}-{s}^{*}\right)}{{F}_{\theta }(T-{s}^{*})}$$
(5)

where \({f}_{\theta }(.)\) is the probability density function of the parametric distributions with parameters \({\theta }_{1}\) and \({\theta }_{2}\). We combine this likelihood with prior distributions for our latent variables given by

$${e}^{*} \sim {{{{{\rm{Uniform}}}}}}\left({e}_{1},{e}_{2}\right),$$
$${s}^{*} \sim {{{{\rm{Uniform}}}}}\left({s}_{1},{s}_{2}\right).$$

We assume \({\theta }_{1}\) represents the mean for admissions and the log mean for mortalities, and follows a weakly informative normal prior distribution. For the delay to hospitalisation, we assume

$${\theta }_{1}{{{{{\mathcal{ \sim }}}}}}{{{{{\mathcal{N}}}}}}\left({{{{\mathrm{10,5}}}}}\right).$$

For the delay to death, we assume

$${\theta }_{1}{{{{{\mathcal{ \sim }}}}}}{{{{{\mathcal{N}}}}}}\left(\log (27.5),0.5\right).$$

We assume \({\theta }_{2}\) represents the shape parameter for admissions, and has the prior

$${\theta }_{2} \sim {{{{{\rm{Exponential}}}}}}\left(0.0001\right).$$

For mortalities, we assume \({\theta }_{2}\) represents the log of the standard deviation, and has the prior

$${\theta }_{2}{{{{{\mathcal{ \sim }}}}}}{{{{{\mathcal{N}}}}}}({{{{\mathrm{0,1}}}}}).$$

This model is fit using MCMC implemented in Stan, with full model formula

$${e}^{*} \sim {{{{{\rm{Uniform}}}}}}\left({e}_{1},{e}_{2}\right),$$
$${s}^{*} \sim {{{{{\rm{Uniform}}}}}}\left({s}_{1},{s}_{2}\right),$$
$${{{{{\rm{loglikelihood}}}}}} \sim \log ({f}_{\theta }\left({e}^{*}-{s}^{*}\right))-\log ({F}_{\theta }\left(T-{s}^{*}\right)).$$
(6)

We consider time varying onset to clinical outcome delays, fitting the delays independently to each three-month time period, starting from September 2020.

Sensitivity and specificity

The REACT and ONS studies both use RT-PCR tests that can have variable sensitivity and specificity, which were not adjusted for in the reported results of either study. These values are influenced by swabbing protocol, laboratory, specimen storage, days since symptom onset, site of swab, age, and variant mutations. Primers are adjusted if a drop in sensitivity is observed for a variant64,65. In addition, given that test sensitivity is conditional upon days since symptom onset, it is known that the average test sensitivity will be affected by epidemic phase bias66.

The average RT-PCR test sensitivity for individuals in each round is calculated from two key components: an estimate of the test sensitivity as a function of the time to symptom onset, and an estimated delay distribution of the time between symptom onset and first positive test. Given the existence of potential differences in viral dynamics between different variants, we estimate a different test sensitivity trajectory for each variant, and we estimate the time from symptom onset to testing positive for each round and each study.

To calculate RT-PCR test sensitivity, we have assessed repeat tests by age group from ONS CIS where an individual must have a symptom onset date, with at least one positive test up to 12 days prior and 30 days after symptom onset date.

To fit the RT-PCR test sensitivity, we adapt the method of Binny et al.67 who fit a piecewise linear logistic regression model, with the binary outcome of a positive or negative test, using days relative to symptom onset date, \({{{{{\rm{d}}}}}}\in {\mathbb{R}}\), as the primary explanatory variable.

$${p}_{{{{{\rm{sens}}}}}}\left({{{{{\rm{d}}}}}}\right)={{{{\rm{logi}}}}}{{{{{\rm{t}}}}}}^{-1}\left({{{{{{\rm{\beta }}}}}}}_{0}+\left({{{{{\rm{d}}}}}}-D\right)\left({{{{{{\rm{\beta }}}}}}}_{1}+\left({{{{{{\rm{\beta }}}}}}}_{2}-{{{{{{\rm{\beta }}}}}}}_{1}\right){{\mathbb{1}}}_{\{{{{{{\rm{d}}}}}} > D\}}\right)\right)$$
(7)

In practice, we found a piecewise linear logistic regression to be a poor fit for our data, and as such we instead modify this function by: removing the changepoint and replacing with a sigmoid, which results in a smoothed out changepoint that is more biologically plausible; and replacing the piecewise linear terms with piecewise polynomial terms, which allows for greater flexibility in fitting to the data. The derived function form is given by:

$${p}_{{{{{\rm{sens}}}}}}\left(d\right)= {{{{\rm{logi}}}}}{{{{{\rm{t}}}}}}^{-1}\left({{{{{{\rm{\beta }}}}}}}_{0}+{{{{{{\rm{\beta }}}}}}}_{1}{\left|d-D\right|}_{1}^{{{{{{{\rm{\lambda }}}}}}}_{1}}\left(1-\Phi \left(\left(d-D+2\right)/2\right)\right) \right. \\ \left.+{{{{{{\rm{\beta }}}}}}}_{2}{\left|d-D\right|}_{1}^{{{{{{{\rm{\lambda }}}}}}}_{2}}\Phi \left(\left(d-D-2\right)/2\right)\right),$$
(8)

where \({{{{{{\rm{\beta }}}}}}}_{1},{{{{{{\rm{\lambda }}}}}}}_{1},{{{{{{\rm{\lambda }}}}}}}_{2}\in {{\mathbb{R}}}_{+}\) and \({{{{{{\rm{\beta }}}}}}}_{0},{{{{{\rm{D}}}}}}\in {\mathbb{R}}\), \({{{{{{\rm{\beta }}}}}}}_{2}\in {{\mathbb{R}}}_{-}\). We use \(\Phi\) to denote the cumulative distribution function of the standard normal distribution, however for this purpose we are using it as a sigmoidal function rather than for its probabilistic interpretation. The following priors are used in fitting this function:

$$\begin{array}{c}{{{{{{\rm{\beta }}}}}}}_{0}{{{{{\mathcal{\sim }}}}}}{{{{{\mathcal{N}}}}}}\left(0,5\right),\\ {{{{{{\rm{\beta }}}}}}}_{1} \sim {{{{{{\mathcal{N}}}}}}}_{+}\left(0,1\right),\\ {{{{{{\rm{\beta }}}}}}}_{2} \sim {{{{{{\mathcal{N}}}}}}}_{-}\left(0,1\right),\\ {{{{{{\rm{\lambda }}}}}}}_{1},{{{{{{\rm{\lambda }}}}}}}_{2}\sim {{{{{{\mathcal{N}}}}}}}_{+}\left(0,1\right),\\ T{{{{{\mathcal{\sim }}}}}}{{{{{\mathcal{N}}}}}}\left(0,5\right).\end{array}$$

Infection hospitalisation and fatality risk modelling

The ONS and REACT studies reported positive testing rates over time to help inform the public health response to the pandemic via calculations of the effective reproduction number and acting as inputs into government modelling. With results at both a national level as well as geographic and demographic subdivisions, it is possible to detect higher risk areas in need of greater intervention. By combining the estimated incidence with clinical outcome data in the form of hospital admissions and mortalities, the IHR and IFR can be calculated by the method set out below.

Estimating prevalence from positivity

For each round \(k\in \left[1,\ldots,K\right]\) and subgroup (i.e., region or age group) indexed according to \(s\in \left[1,\ldots,S\right]\), we calculate the expected test sensitivity pavg_sens for that round using

$${p}_{{{{{\rm{avg}}}}}\_{{{{\rm{sens}}}}}}\left(k,s\right) ={E}_{t}\left[{p}_{{{{{\rm{sens}}}}}}\left({{{{{\rm{t}}}}}},k,s\right)\right] \\ ={\int }_{\!\!\!\!-12}^{30}{p}_{{{{{\rm{sens}}}}}}\left(t,k,s\right)\cdot {f}_{{SN}}\left({t|}{{{{{{\rm{\xi }}}}}}}_{k,s},{{{{{{\rm{\alpha }}}}}}}_{k,s},{{{{{{\rm{\omega }}}}}}}_{k,s}\right)\partial {{{{{\rm{t}}}}}},$$
(9)

where \({p}_{{{{{\rm{sens}}}}}}\left(t,k,s\right)\) is our estimate of the probability of testing positive \(t\) days after symptom onset in the \(k\)th round for the \(s\)th subgroup given in equation (8), \({f}_{{SN}}\) is the probability density function of the skew normal distribution that we use to model the delay from symptom onset to testing for positive tests, and \({{{{{{\rm{\xi }}}}}}}_{k,s},{{{{{{\rm{\alpha }}}}}}}_{k,s},{{{{{{\rm{\omega }}}}}}}_{k,s}\) are the estimated parameters of the skew normal distribution for the \(k\)th estimate of positivity and the \(s\)th subgroup.

Test specificity \({p}_{{{{{\rm{spec}}}}}}\in \left[{{{{\mathrm{0,1}}}}}\right]\), does not have sufficient data available to produce an estimate. We apply a strong prior that encodes prior beliefs that RT-PCR tests are highly sensitivity, with a false positive rate of approximately 1 in 10,000, i.e., \({p}_{{{{{\rm{spec}}}}}}\sim {{{{{\mathcal{B}}}}}}{eta}\left({{{{\mathrm{10000,1}}}}}\right)\).

We assume that the prevalence is constant across each round \({p}_{{{{{\rm{prev}}}}}}\in \left[{{{{\mathrm{0,1}}}}}\right]\). Given \({p}_{{{{{\rm{prev}}}}}}\), \({p}_{{{{{\rm{spec}}}}}}\), and \({p}_{{{{{\rm{avg}}}}}\_{{{{\rm{sens}}}}}}\), the probability that a randomly tested individual will test positive \({p}_{{{{{\rm{pos}}}}}}\in \left[{{{{\mathrm{0,1}}}}}\right]\), is given by

$${p}_{{{{{\rm{pos}}}}}}={p}_{{{{{\rm{prev}}}}}}\cdot {p}_{{{{{\rm{avg}}}}}\_{{{{\rm{sens}}}}}}+\left(1-{p}_{{{{{\rm{prev}}}}}}\right)\left(1-{p}_{{{{{\rm{spec}}}}}}\right)$$
(10)

Let \({N}_{k,s}\in {{\mathbb{Z}}}_{+}\) be the number of tests performed for a given round and stratum, and \({P}_{k,s}\in \left[0,{N}_{k,s}\right]\) be the number of tests that were positive for that round and stratum. Then the likelihood is given by

$${P}_{k}\sim {{{{\rm{Binomial}}}}}\left({N}_{k},{p}_{{{{{\rm{pos}}}}}}\right)$$
(11)

A hierarchal model structure with second order random walk smooths and random effects is used when estimating \({p}_{{{{{\rm{prev}}}}}}\left({{{k}}},{{{s}}}\right)\). To maintain identifiability of the model in the presence of both smoothing and random effects, which are effectively two different smooths at the round-subgroup level, we use a special formulation adapted from a BYM2 framework68 given by

$${p}_{{{{{\rm{prev}}}}}}\left(k,s\right)={{{{\rm{logi}}}}}{{{{{\rm{t}}}}}}^{-1}\left({{{{{{\rm{\beta }}}}}}}_{{{{{{\rm{k}}}}}}}+{{{{{\rm{\gamma }}}}}}\left(\sqrt{{{{{{{\rm{\alpha }}}}}}}_{1}}{f}_{{{{{\rm{avg}}}}}\_{{{{\rm{prev}}}}}}\left(k\right)+\sqrt{{{{{{{\rm{\alpha }}}}}}}_{2}}{f}_{{{{{\rm{prev}}}}}}\left(k,s\right)+\sqrt{{{{{{{\rm{\alpha }}}}}}}_{3}}{{{{{\rm{\xi }}}}}}\left(k,s\right)\right)\right)$$
(12)

where \({f}_{{{{{\rm{avg}}}}}\_{{{{\rm{prev}}}}}}\) and \({f}_{{{{{\rm{prev}}}}}}\) are logit-scaled smooth functions, and \({{{{{\rm{\xi }}}}}}\sim {{{{{\mathcal{N}}}}}}\left({{{{\mathrm{0,1}}}}}\right)\) are the random effects, \({{{{{{\rm{\beta }}}}}}}_{{{{{{\rm{k}}}}}}}\in {\mathbb{R}}\) the intercept term, \({{{{{\rm{\gamma }}}}}}\in {{\mathbb{R}}}_{+}\) an overall scale term, and \({{{{{{\rm{\alpha }}}}}}}_{1},{{{{{{\rm{\alpha }}}}}}}_{2},{{{{{{\rm{\alpha }}}}}}}_{3}\in \left[{{{{\mathrm{0,1}}}}}\right]\) the elements of a 2-simplex, i.e. \({{{{{{\rm{\alpha }}}}}}}_{1}+{{{{{{\rm{\alpha }}}}}}}_{2}+{{{{{{\rm{\alpha }}}}}}}_{3}=1.\)

Both \({f}_{{{{{\rm{avg}}}}}\_{{{{\rm{prev}}}}}}\) and \({f}_{{{{{\rm{prev}}}}}}\left(\cdot,s\right)\) are constrained to have a mean of zero to maintain identifiability. In addition, we ensure that both \({f}_{{{{{\rm{avg}}}}}\_{{{{\rm{prev}}}}}}\) and \({f}_{{{{{\rm{prev}}}}}}\left(\cdot,s\right)\) are on approximately the same scale as the random effects by placing a standard normal distribution prior on them \({f}_{{{{{\rm{avg}}}}}\_{{{{\rm{prev}}}}}},{f}_{{{{{\rm{prev}}}}}}{{{{{\mathcal{\sim }}}}}}{{{{{\mathcal{N}}}}}}\left({{{{\mathrm{0,1}}}}}\right)\), in addition to their improper smoothing prior. Therefore, the overall scale is controlled by \({{{{{\rm{\gamma }}}}}}\) given that

$${{{{\rm{Var}}}}}\left[\sqrt{{{{{{{\rm{\alpha }}}}}}}_{1}}{f}_{{{{{\rm{avg}}}}}\_{{{{\rm{prev}}}}}}\left(k\right)+\sqrt{{{{{{{\rm{\alpha }}}}}}}_{2}}{f}_{{{{{\rm{prev}}}}}}\left(k,s\right)+\sqrt{{{{{{{\rm{\alpha }}}}}}}_{3}}{{{{{\rm{\xi }}}}}}\left(k,s\right)\right]={{{{{{\rm{\alpha }}}}}}}_{1}+{{{{{{\rm{\alpha }}}}}}}_{2}+{{{{{{\rm{\alpha }}}}}}}_{3}=1,$$
(13)

which results in a well-identified model structure. For the random walk smoothing priors we let

$${f}_{{{{{\rm{avg}}}}}\_{{{{\rm{prev}}}}}} \sim {{{{{\rm{RW}}}}}}2\left({{{{{{\rm{\sigma }}}}}}}_{{{{{\rm{avg}}}}}\_{{{{\rm{prev}}}}}}\right),$$
$${{f}_{{{{{\rm{prev}}}}}}\left(\cdot,s\right)\sim R{{{{{\rm{W}}}}}}2\left({{{{{{\rm{\sigma }}}}}}}_{{{{{\rm{prev}}}}}}\right),{{{{{\rm{ for }}}}}}\, s\in 1:S,}$$
$${{{{{{\rm{\sigma }}}}}}}_{{{{{\rm{avg}}}}}\_{{{{\rm{prev}}}}}},{{{{{{\rm{\sigma }}}}}}}_{{{{{\rm{prev}}}}}}\sim {{{{\rm{Exponential}}}}}\left(100\right)$$

where \({RW}2\left({{{{{\rm{\sigma }}}}}}\right)\) implies a penalty on the second order derivative in the form of

$$\frac{{d}^{2}}{d{t}^{2}}{{{{{\rm{f}}}}}}\left({{{{{\rm{k}}}}}}\right)\approx {{{{{\rm{f}}}}}}\left(k+1\right)-2{{{{{\rm{f}}}}}}\left(k\right)+{{{{{\rm{f}}}}}}\left(k-1\right) \sim {{{{{\rm{N}}}}}}\left(0,{{{{{\rm{\sigma }}}}}}\right),\forall {{{{{\rm{k}}}}}}.$$
(14)

The \({{{{{\rm{\alpha }}}}}}\) terms control the relative contribution to the variance from each of the components, and we let \({{{{{\rm{\alpha }}}}}} \sim {{{{\rm{Dirichlet}}}}}\left({{{{\mathrm{2,2,2}}}}}\right)\).

In addition to the prevalence for each age group, we also estimated the national prevalence by Multilevel Regression and Poststratification, which allows us to perform statistical adjustment for demographics that are over/under represented in the sample. The above method for calculating the prevalence in each age group uses a multilevel regression approach, and it remains to perform a poststratification step to estimate the national prevalence by reweighting the prevalence for each age group. Letting \({p}_{{{{{\rm{prev}}}}}}^{{{{{\rm{nat}}}}}}\left(s\right)\) be the poststratified estimate of national prevalence, calculated as

$${p}_{{{{{\rm{prev}}}}}}^{{{{{\rm{nat}}}}}}\left(s\right)=\frac{{\sum }_{k=1}^{K}{p}_{{{{{\rm{prev}}}}}}\left(k,s\right)\cdot {N}_{k}}{{\sum }_{k=1}^{K}{N}_{k}}$$
(15)

where \({N}_{k}\) is the population of the \({k}^{{th}}\) strata. We poststratified our results according to the age breakdown of our sample, on the basis that age is the most important variable to account for when producing nationally representative estimates of the IHR.

Calculating incidence attributed to round

For each survey (REACT and ONS), we converted the estimated prevalence rates \({p}_{{{{{\rm{prev}}}}}}\left(k,s\right)\) for population stratum \(s\) and round \(k\) into an incidence time series \(I\left(k,s\right)\) using this expression:

$${{{{{{\rm{I}}}}}}}_{{{{{{\rm{s}}}}}}}\left(k\right)=\frac{{p}_{{{{{\rm{prev}}}}}}\left(k,s\right)\Omega \left({{{{{\rm{k}}}}}},s\right)l\left(k\right)}{{{{{{{\rm{\tau }}}}}}}_{{{{{\rm{pos}}}}}}\left(s\right)}.$$
(16)

Here, \({{{{{{\rm{\tau }}}}}}}_{{{{{\rm{pos}}}}}}\left(s\right)\) is the expected duration for which an individual tests positive, \(\Omega \left(k,s\right)\) is the population size of stratum \(s\) during round \(k\), and \(l\left(k\right)\in {\mathbb{N}}\) is the length of the round in days. We note that \({{{{{{\rm{\tau }}}}}}}_{{{{{\rm{pos}}}}}}\), and other parameters used in calculating it, are derived from the ONS study, since REACT surveying did not provide adequate data to estimate these values.

One way to think of this equation is that initial positive test frequencies are shifted back in time to more accurately reflect when individuals were infected. In this paper, we shift the testing dates to symptom onset date rather than infection date, since we have more reliable data on the delay distributions post symptom onset date. Finally, we multiply by the population \(\Omega\) to scale up our sample to population wide numbers, but also divide by the time for which someone tests positive \({{{{{{\rm{\tau }}}}}}}_{{{{{\rm{pos}}}}}}\).

Calculating outcome counts attributed to round

Given an estimate of the number of new infections that occurred during a round, it remains to estimate the number of clinical outcomes attributed to individuals infected during that round, which then finally allows us the calculate the rate out severe outcomes.

There is a time delay between symptom onset and clinical outcome38, which must be accounted for in the relationship between incidence and hospitalisation or death69,70,71. For a given stratum \(s\), we must establish a time series \({d}_{s}\left({{{{{\rm{t}}}}}}\right)\) that models clinical outcomes \({c}_{s}\left({{{{{\rm{t}}}}}}\right)\) in that stratum by date of symptom onset rather than by date of outcome. We model \({d}_{s}\left({{{{{\rm{t}}}}}}\right)\) as

$${d}_{s}\left(t\right)={\sum }_{{t}^{{\prime} }=0}^{\infty }{c}_{s}\left(t+{t}^{{\prime} }\right)\cdot {p}_{s}\left({t}^{{\prime} }{|t}\right)$$
(17)

where \({p}_{s}\left({t}^{{\prime} },|,t\right)\) is the probability that the time from symptom onset to outcome is \({t}^{{\prime} }\) for someone in stratum \(s\), given that they were infected on day \(t\). This approximates the method for mapping outcomes to date of symptom onset69, under the assumption that each round has a constant risk.

From the daily-level time series \({d}_{s}\left(t\right)\), it remains to estimate the number of clinical events attributed to the \({k}^{{{{{\rm{th}}}}}}\) round in stratum \(s\), denoted by \({D}_{s}\left(k\right)\), using

$${D}_{s}\left(k\right)={\sum }_{{{{{{\rm{i}}}}}}\in {{{{{\mathscr{I}}}}}}\left({{{{{\rm{k}}}}}}\right)}{{{{{{\rm{d}}}}}}}_{{{{{{\rm{s}}}}}}}\left({{{{{\rm{i}}}}}}\right)$$
(18)

where \({{{{{\mathscr{I}}}}}}\left(k\right)\) is the set of timepoints that correspond to the \(k\)th round.

Outcome risk modelling

Given posterior draws of \({I}_{s}\left(k\right),{D}_{s}\left(k\right)\), we compute the clinical outcome rate, however the resulting estimates are noisy, implying that some smoothing is required.

We first fit a normal distribution to the posterior draws of \({{{{{{\rm{D}}}}}}}_{{{{{{\rm{s}}}}}}}\left({{{{{\rm{k}}}}}}\right),{{{{{{\rm{I}}}}}}}_{{{{{{\rm{s}}}}}}}\left({{{{{\rm{k}}}}}}\right)\) for each round and subgroup, as we are able to easily provide this parametric summary of the posterior as an input to the model. Letting \({{{{{{\rm{\mu }}}}}}}_{s}\left(k\right),{{{{{{\rm{\sigma }}}}}}}_{s}\left(k\right)\) be the parameters of the normal distributions for each round and stratum, we have that

$${D}_{s}\left(k\right){{{{{\mathcal{\sim }}}}}}{{{{{\mathcal{N}}}}}}\left({{{{{{\rm{\mu }}}}}}}_{s}^{\left(D\right)}\left(k\right),{{{{{{\rm{\sigma }}}}}}}_{s}^{\left(D\right)}\left(k\right)\right),$$
$${I}_{s}\left(k\right){{{{{\mathcal{\sim }}}}}}{{{{{\mathcal{N}}}}}}\left({{{{{{\rm{\mu }}}}}}}_{s}^{\left(I\right)}\left(k\right),{{{{{{\rm{\sigma }}}}}}}_{s}^{\left(I\right)}\left(k\right)\right)$$
(19)

Employing the Jeffrey’s prior for the clinical outcome rate, the posterior distribution for the infection risk \({R}_{s}\left(k\right)\in \left[{{{{\mathrm{0,1}}}}}\right]\) is given by

$${R}_{s}\left(k\right){{{{{\mathcal{\sim }}}}}}{{{{{\mathcal{B}}}}}}{{{{\rm{eta}}}}}\left({D}_{s}\left(k\right)+0.5,{I}_{s}\left(k\right)-{D}_{s}\left(k\right)+0.5\right)$$
(20)

In addition, we place a second order random walk smoothing prior, with random effects present on \({R}_{k}\left(t\right)\). A similar structure is used to the hierarchal model employed when estimating prevalence to maintain identifiability in the presence of both a random walk smooth and a random effect smooth;

$${R}_{k}\left(t\right)={{{{\rm{logi}}}}}{{{{{\rm{t}}}}}}^{-1}\left({{{{{\rm{\beta }}}}}}+{{{{{\rm{\gamma }}}}}}\left(\sqrt{{{{{{\rm{\alpha }}}}}}}{f}_{R}\left(t\right)+\sqrt{1-{{{{{\rm{\alpha }}}}}}}{{{{{\rm{\xi }}}}}}\left(t\right)\right)\right),$$
$${{{{{\rm{\beta }}}}}}\sim {{{{{\mathcal{N}}}}}}\left(-4,1\right),$$
$${{{{{\rm{\gamma }}}}}} \sim {{{{{\mathcal{N}}}}}}\left(0,2\right),$$
$${{{{{\rm{\alpha }}}}}} \sim {{{{\rm{Beta}}}}}\left(2,2\right),$$
$${f}_{R} \sim {RW}2\left({{{{{{\rm{\sigma }}}}}}}_{R}\right),{f}_{R}{{{{{\mathcal{ \sim }}}}}}{{{{{\mathcal{N}}}}}}\left(0,1\right),{{{{\rm{mean}}}}}\left({f}_{R}\right){{{{{\mathcal{ \sim }}}}}}{{{{{\mathcal{N}}}}}}\left(0,0.001\right),$$
$${{{{{\rm{\sigma }}}}}}\sim {{{{\rm{Exponential}}}}}\left(100\right),$$
$${{{{{\rm{\xi }}}}}}\sim {{{{{\mathcal{N}}}}}}\left(0,1\right)$$
(21)

Infection study model combination

To combine the prevalence studies, each study was matched temporally over the same sampling periods, using test results from the ONS study matched to REACT round sampling dates. We weight the model so that the two samples are assigned weights by adjusting for the relative sample sizes. This weighting method follows the approach of Balcome, et al.72 adapting the work of Haddad et al.73.

Here we let \({D}_{O}\) and \({I}_{O}\) denote \({D}_{s}\left(k\right)\) and \({I}_{s}\left({{{{{\rm{k}}}}}}\right)\), respectively, for the ONS infection survey study. In this vein, let \({D}_{R}\) and \({I}_{R}\) denote \({D}_{s}\left(k\right)\) and \({I}_{s}\left({{{{{\rm{k}}}}}}\right)\) for the REACT study. Letting \(R\) denote \({R}_{s}\left(k\right)\) then the posterior distribution for this event probability can be modelled using a Beta distribution, i.e.

$$R{{{{{\mathcal{\sim }}}}}}{{{{{\mathcal{B}}}}}}{{{{{\rm{eta}}}}}}\left(D,I-D\right),$$
(22)

where \(I\) is the number of infections, and \(D\) is the corresponding number of events. The aim of this combination method is to obtain a weighting factor \(\hat{{{{{{\rm{\alpha }}}}}}}\) such that \({D=D}_{O}+{\hat{\alpha }D}_{R}\) and \({I=I}_{O}+{\hat{\alpha }I}_{R}\).

We weight the two studies based on their relative sample sizes, so that when the sample sizes are equal, both studies are assigned equal weight, and otherwise the largest study is assigned greater weight. That is, we set \(\hat{\alpha }=\frac{{N}_{R}}{{N}_{O}}\), where \({N}_{O}\) is the sample size of the ONS study and \({N}_{R}\) is the sample size of the REACT study. Including \(\hat{{{{{{\rm{\alpha }}}}}}}\) into the posterior distribution for the event probability, we obtain

$$R{{{{{\mathcal{\sim }}}}}}{{{{{\mathcal{B}}}}}}{{{{{\rm{eta}}}}}}\left({D}_{O}+\hat{\alpha }{D}_{R},{I}_{O}-{D}_{O}+\hat{\alpha }\left({I}_{R}-{D}_{R}\right)\right)$$
(23)

As in the previous section, when calculating the clinical outcome rate for a single study, we provided parametric summaries of the posteriors of \(I,D\) terms to the model as inputs. We also place the same smoothing prior on \(R\) as in the previous section.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.