Background

In settings affected by crises due to armed conflict, community violence, displacement and/or food insecurity, acute malnutrition is a prominent public health threat that, at the individual level, presents a short-term mortality risk, exacerbates endemic and epidemic infectious diseases and worsens long-term developmental outcomes. Acute malnutrition prevalence among children is also a key summative indicator of crisis severity, as it reflects the wider situation of food security, livelihoods and the public health and social environment [1]. For the purpose of this paper, and in accordance with current Unicef guidance, we refer to acute malnutrition (also commonly known as wasting) as the occurrence of two partially overlapping presentations: marasmus, characterised by a recent and severe weight loss, and the rarer but more lethal oedematous form (kwashiorkor). Anthropometric indices including weight-for-height or -length, middle-upper arm circumference (MUAC) and presence of bilateral pitting oedema may be combined into continuous indicators (e.g. weight-for-height/length Z-score, relative to the mean of a well-nourished reference population: WHZ) or dichotomised based on thresholds to classify children as severely or moderately acutely malnourished (SAM, MAM), and, at the population level, compute prevalence estimates [2]. Such information helps to assess progress towards national and global targets, identify an appropriate package of food security and nutritional services, estimate resources needed (e.g. treatment caseload), monitor the performance of services and detect changes in crisis severity as part of early warning systems such as the integrated food security phase classification (IPC) [3,4,5].

Cross-sectional anthropometric surveys among children 6 to 59 months old (mo) are an important component of nutritional surveillance in crisis settings, along with facility-based and programmatic data [6]. Over the past decade, considerable progress has been made to standardise methods and analysis of these surveys. In particular, the Standardised Monitoring and Assessment of Relief and Transitions (SMART) project [7] provides generic study protocols and aides for survey design, training and quality control, as well as the bespoke Emergency Nutrition Software for sample selection, data entry and analysis. SMART surveys, usually implemented at a small geographic scale (e.g. districts or individual camps), are the most common population-based method to measure malnutrition burden in humanitarian response. However, SMART surveys are somewhat burdensome in terms of human and financial resources, require several weeks to plan, implement and report on, and may have limited geographic reach due to insecurity or other access constraints, thereby resulting in potentially biased, untimely, and/or insufficiently granular information. Otherwise put, surveys alone may not adequately support early detection of deteriorating situations and efficient resource allocation [8]. More recently, COVID-19 related restrictions temporarily curtailed SMART survey implementation, just as the pandemic was expected to contribute to a projected doubling in the global population facing food insecurity crisis conditions, and, consequently, a substantial increase in acute malnutrition burden [9].

To complement small-scale nutrition surveys and other surveillance data, and in order to reduce the burden of repeated surveys while also generating timely information on a more regular basis at operationally useful geographical resolution, we explored the performance of predictive statistical models of acute malnutrition burden in Somalia and South Sudan, two crisis-affected countries prominently affected by service access constraints, food insecurity and malnutrition.

Methods

Study design

We used a combination of existing datasets collected for programmatic purposes by humanitarian and government actors (see below) to develop and evaluate country-specific models to predict various anthropometric indicators at the resolution of one month and a single administrative level 2 unit (district in Somalia, county in South Sudan), hereafter referred to as a ‘stratum’.

Drawing from an a priori causal framework of factors leading to acute malnutrition (Additional file 1, Figure S5), we identified potential predictor variables collected at the desired resolution and merged these with individual child-level data from SMART surveys designed to be representative of single strata. We fitted various candidate models to a training data subset, and evaluated their predictive accuracy on a validation data subset, as well as on cross-validation.

Study population and timeframe

For Somalia (including Somaliland and Puntland), we sourced predictor and anthropometric survey data from January 2014 to December 2018 inclusive. During this period, Somalia’s population rose from about 12.8 M to 14.5 M [10]. Surveys were done in 22 (29%) of Somalia’s 75 districts. For South Sudan, the analysis spanned January 2015 to April 2018, and featured surveys from 63 (80%) of the country’s 79 counties, as per 2013 administrative borders. South Sudan’s population declined from 10.2 M to 9.7 M during the period, reflecting refugee movements to neighbouring countries [11].

Data sources

Anthropometric surveys

We accessed reports and raw datasets of 177 SMART surveys from South Sudan (two were excluded due to very unusual values, leaving 175 analysis-eligible), and 167 from Somalia (82 were excluded: 76, mainly done before 2016, were representative of livelihood zones rather than districts, and thus could not be coupled with predictor data; five appeared to have followed a non-representative sampling design; one had no available dataset, leaving 85 analysis-eligible). For each survey, we inspected the report to identify any possible bias sources and, in particular, any reported restriction of the effective sampling frame due to insecurity or inaccessibility (e.g. if a report stated that two out of 12 boma, South Sudan’s administrative level 3 unit, could not be included in the sample, we approximated the sampling coverage as 10/12 ≈ 83%). We also rescaled the ENA software-reported quality score for the survey (a composite of several indicators including proportion of outlier values, digit preference and properties of the distribution of observed values, ranging from 0% = best to 50% = worst [12]) to a 0–100% range, where best = 100%. We reanalysed all surveys by converting the raw anthropometric readings (weight, height or length, age, MUAC) into z-score indices as per the World Health Organization 2006 standardised anthropometric distributions using the anthro package in R, flagging and excluding all observations with missing values, <  > 5 z-scores from the mean and/or outside the allowed age range (6-59mo). Lastly, we classified all children into severe acute malnutrition (SAM) or global acute malnutrition (GAM) according to two alternative definitions: (i) bilateral oedema and/or weight-for-height (WHZ) < 3Z (SAM) or < 2Z (GAM); (ii) bilateral oedema and/or MUAC < 115 mm (SAM) or < 125 mm (GAM) [13]. We fitted generalised linear models (binomial for SAM and GAM, gaussian otherwise) with standard errors adjusted for cluster design to verify concordance with point estimates and 95% confidence intervals (CI) contained in the survey reports.

Predictors

We developed a causal framework of acute malnutrition (Additional file 1, Figure S5) based on existing evidence and plausibility reasoning. We used this framework to identify factors potentially predicting the outcomes of interest. We searched for candidate predictor data representing these factors online and through contacts with humanitarian actors in both Somalia and South Sudan, the main desirable characteristics of datasets being stratification by stratum and month, and that data be generated routinely for programmatic purposes, i.e. realistically available without further primary data collection. Most datasets had already been sourced as part of similar projects to retrospectively estimate mortality in both countries [10, 11]. Candidate predictors for both Somalia and South Sudan are detailed in Tables 1 and 2, respectively. Each predictor dataset was subjected to data cleaning to remove obvious errors. We excluded predictors that were missing for ≥ 30% of strata or ≥ 30% of months. Remaining completeness problems were resolved through interpolation (humanitarian presence), manual imputation (missing market data points were attributed a weighted average of the geographically nearest market’s value and the mean of all other non-missing markets, with 0.7 and 0.3 weights respectively) and automatic imputation using the mice R package [14] (water price, SAM and MAM treatment quality). To reduce stochastic noise in the time series, we computed three-month window rolling means for all time-varying predictors and applied moderate local spline smoothing to terms of trade or market price variables. Where appropriate, we computed per-population rates using stratum-month population figures previously estimated as part of mortality estimation projects for each country. Briefly, these combine available base estimates (census projections in South Sudan; quality-weighted averages of four alternative sources in Somalia), natural growth assumptions and data on refugee as well as internal displacement to and from each stratum, by month.

Table 1 Candidate predictor datasets, Somalia
Table 2 Candidate predictor datasets, South Sudan

While for both countries data on food security and nutritional therapeutic services were available (Tables 1 and 2) and moderately predictive (data not shown), we ultimately decided to exclude them as candidate predictors for two reasons: (i) we considered that improved prediction could plausibly result in better targeting of these humanitarian services, which in turn would result in improved nutrition, a reverse-causal effect whose future size the model might fail to predict; and (ii) we assumed that end-users would benefit from a model that could be used to predict malnutrition burden even where none of these services were available, e.g. due to access constraints.

Predictive models

We explored two prediction approaches, as follows.

Generalised linear modelling

We first split the data by period into a training set (consisting of approximately the chronologically first 70% of the data) and a ‘holdout’ (i.e. validation) set (the most recent 30%). For each anthropometric indicator, we fitted generalised linear models (GLM) to individual child observations in the training dataset, with robust standard errors to account for the cluster sampling design of most surveys, a quasi-binomial distribution for binary outcomes (SAM, GAM) and a gaussian distribution for continuous outcomes (WHZ, MUAC), which we did not transform as they were normally distributed. We specified model weights as the product of survey quality score and survey sample coverage.

After visual inspection, we categorised continuous predictors, and selected categorical versus continuous versions of these based on linearity of the association and the smallest-possible Chi-square (for binary outcomes) or F-test (continuous outcomes) p-value testing whether the univariate model provided better fit than a null model. We also used this p-value to select among candidate lags for each predictor; however, we modelled climate variables (rainfall, Normalised Difference Vegetation Index or NDVI) as either the means of the two trimesters, or the mean over the semester prior to each survey observation. We then fitted models consisting of all possible combinations of predictors, and shortlisted the best 10% based on predictive accuracy (lowest mean square error, MSE) of model predictions, relative to observations in the holdout dataset. Predictions were compared with observations by first aggregating all individual-child predictions as yielded by the models to the stratum-month level (as a mean SAM or GAM prevalence, or the mean of continuous anthropometric outcomes, in that stratum-month).

We manually selected the best fixed effects model among these based on relative accuracy on holdout data, accuracy on external data simulated through leave-one-out cross-validation (LOOCV) [18], the plausibility of observed associations, and model parsimony (while the latter characteristic is relatively unimportant for prediction, in practice we wished to avoid users of the model having to collect a large amount of predictor data). Lastly, we explored plausible two-way interactions.

We also fitted mixed models (with stratum as a random effect, given that in both countries surveys were repeated in many districts / counties). The latter, however, offered inconsistent accuracy advantages over fixed effects models on either cross-validation or holdout datasets. Furthermore, we assumed that end users would be most interested in predicting malnutrition prevalence in hard-to-survey districts / counties, i.e. where no a priori random effects would be estimable. For these reasons, we discarded mixed models altogether.

Machine learning

After splitting data as above, we used the ranger package [19] to grow random forest (RF) regression models on the training dataset, aggregated at stratum-month level: this approach makes minimal assumptions about data structure; briefly, it partitions the data according to various randomly generated ‘trees’, where each node is defined by a particular value of one of the predictor variables, with branches being the resulting split in the data; the ‘depth’ of each tree is defined by the number of variables that are used to create nodes; randomness is introduced by the choice of variables to build any given tree, values at which splits occur, and the order of variables in the tree structure. The distribution of the outcome arising from the partitions in each tree is compared to the observed data to determine accuracy. RF averages predictions across a large ensemble of trees. We grew RFs with 1000 trees, using all candidate predictors as above, and computed prediction CIs using a jack-knife estimator [20].

Performance evaluation

For both the GLM and RF approach, we present various metrics of predictive accuracy, for estimation: (i) effective coverage, defined here as the proportion of stratum-months for which the predicted point estimate fell within the 95% or 80%CIs of the observed data; (ii) relative bias, defined as \(\frac{1}{n}\sum_{i=1}^{i=n}\frac{{\widehat{y}}_{i}-{y}_{i}}{{y}_{i}}\), where \(n\) is the number of stratum-months, \({\widehat{y}}_{i}\) the prediction and \({y}_{i}\) the observation for stratum-month \(i\); and (iii) relative precision, namely the mean ratio of predicted stratum-month one-sided 95%CIs to point estimate; and for classification: (iv) sensitivity and (v) specificity of predictions against SAM or GAM prevalence thresholds commonly used in humanitarian response, and adopting observed point estimates as the gold standard. While it is recommended to avoid over-reliance on thresholds and instead examine changes in malnutrition burden over time in light of contextual factors [6], in practice these arbitrary thresholds, introduced about two decades ago [21], are considered when the baseline is unclear to make initial decisions on the most appropriate nutritional and food security interventions package (e.g. management of SAM only versus of SAM and MAM; targeted versus ‘blanket’ of generalised food distributions / cash transfers).

For brevity we present only best models for ‘now-casting’ (i.e. prediction of malnutrition based on data collected up to the present). We also explored models for forecasting malnutrition 3 months into the future (i.e. prediction based on data collected up to 3 months previously), but found that these had low performance (data not shown). All analysis was done using R software [22] through the RStudio [23] platform.

Results

Anthropometric survey patterns

Details of eligible surveys from Somalia are reported in Table 3 and Fig. 1. Most surveys were done in 2016 and 2018 and the majority relied on multi-stage cluster sampling, with a fairly constant sample size range over time. The highest SAM and GAM prevalence, but also the lowest quality scores, were noted in 2017, during a drought-triggered food insecurity crisis. In South Sudan, all surveys relied on cluster sampling, and there was minimal change in average SAM and GAM prevalence over time; quality scores and the proportion of flagged observations suggested higher survey quality in South Sudan than in Somalia (Table 4, Fig. 2).

Table 3 Characteristics of analysis-eligible anthropometric surveys from Somalia. Medians are reported unless noted. Numbers in parentheses indicate the interquartile range
Fig. 1
figure 1

Trends in key survey indicators, Somalia. Each dot represents the point estimate of a single survey. Box plots indicate the median and inter-quartile range, and whiskers the 95% percentile interval

Table 4 Characteristics of analysis-eligible anthropometric surveys from South Sudan. Medians are reported unless noted. Numbers in parentheses indicate the interquartile range
Fig. 2
figure 2

Trends in key survey indicators, South Sudan. Each dot represents the point estimate of a single survey. Box plots indicate the median and inter-quartile range, and whiskers the 95% percentile interval

Performance of Somalia models

GLM model coefficients and performance metrics for Somalia are shown in Table 5: odds ratios, OR < 1 and linear coefficients > 0 indicate a protective effect, and vice versa. One predictor (livelihood) consistently featured in the most predictive models (displaced and pastoralist livelihoods were generally associated with better anthropometric status than for agriculturalists). Armed conflict intensity, measles occurrence over the previous trimester, terms of trade, NDVI over the previous semester and average market price of water were useful predictors for some but not all anthropometric outcomes. Generally, predictive performance was low: models yielded mostly upward-biased predictions that fell within the observed survey 95%CIs for only 17% to 80% of stratum-months, depending on the outcome; while denominators were very small, only the model for GAM (WFH + oedema) reached a moderate combination of sensitivity and specificity to classify prevalence as per the 15% threshold. Graphs of predictions versus observations support this pattern; Fig. 3 shows results for SAM (WFH + oedema), while remaining graphs are in the Additional file 1.

Table 5 Performance of predictive generalised linear models in Somalia for real-time estimation, by acute malnutrition outcome
Fig. 3
figure 3

GLM-predicted versus observed SAM (WFH + oedema) prevalence, Somalia, by district-month, on training data, LOOCV and holdout data. Shaded channels indicate an absolute deviance of predictions of up to ±1% (darkest shade), ±2% and ±3% (lightest shade). Vertical dotted lines denote commonly used SAM prevalence thresholds

RF models had similar performance to the GLM approach. For GAM (WFH + oedema: binary outcome), relative bias, relative precision and 95%CI coverage were +10.1% and + 31.6%, ± 23.0% and ± 17.7%, and 59.6% and 56.7% on LOOCV and holdout data, respectively, with a sensitivity and specificity on LOOCV of 72.0% and 59.1% for the 15% prevalence threshold. The most important variables for prediction were measles incidence, NDVI, terms of trade and water price (Additional file 1). For WFH (continuous outcome), relative bias, relative precision and 95%CI coverage were + 7.1% and + 29.5%, ± 19.1% and ± 13.1%, and 57.4% and 30.0% on LOOCV and holdout data, respectively (Additional file 1).

Performance of South Sudan models

Table 6 shows GLM predictions for South Sudan. Here, the most significant associations were with livelihood type, total rainfall and terms of trade. Predictive performance was also low (Fig. 4), with coverage no better than 82% across all outcomes and no instance of high sensitivity and specificity for classification.

Table 6 Performance of predictive generalised linear models in South Sudan, by acute malnutrition outcome
Fig. 4
figure 4

GLM-predicted versus observed SAM (WFH + oedema) prevalence, South Sudan, by district-month, on training data, LOOCV and holdout data. Shaded channels indicate an absolute deviance of predictions of up to ±1% (darkest shade), ±2% and ±3% (lightest shade). Vertical dotted lines denote commonly used SAM prevalence thresholds

RF models had far better fit to the training data than GLMs, but performed similarly on cross-validation and holdout data. The most important variables were livelihood, terms of trade, uptake of measles vaccination and total rainfall (Additional file 1).

Discussion

In this study we combined a range of previously collected, anthropometric household survey data with a range of potential population-level predictor datasets quantifying theoretical factors causally associated with acute malnutrition burden in crisis settings, to explore whether key quantities such as SAM or GAM prevalence could be estimated through prediction, as a complement to ground surveys. Resulting predictive models based on either GLM or machine learning approaches had disappointing performance in both Somalia and South Sudan across several anthropometric outcomes. Generally, predictive accuracy was better for outcomes based on WFH than on MUAC, but even for the former our models would not, in our opinion, provide actionable information.

Models to predict acute malnutrition risk at the individual or household level exist [24, 25]. While we did not search the literature systematically due to insufficient resources, we are aware of only two other population-level predictive studies. Osgood-Zimmerman et al. [26] produced gridded maps of various anthropometric indicators for all of Sub-Saharan Africa based on periodic countrywide surveys (e.g. Demographic and Health Surveys) and > 20 geospatial remotely sensed or previously estimated predictors; Mude et al. [27] predicted with reasonable accuracy MUAC across time and space in northern Kenya based on village-level data collected for food security surveillance by the Arid Lands Resource Management Project, with predictors including the characteristics of observed MUAC data themselves, cattle herd dynamics, extent of food aid, climate and season. At least one further research project is ongoing (https://www.actionagainsthunger.org/meriam). Bosco et al. [28] have used geospatial and remotely sensed covariates to map stunting prevalence, while Lentz et al. [29] have also demonstrated the potential of a GLM-based approach for predicting food insecurity in Malawi. We have previously used the same datasets as in this study to develop reasonably predictive models of population-level death rate (a farther-downstream and thus potentially even more multifactorial outcome), albeit only for retrospective estimation [10, 11].

Given the above, we expected better predictive performance. It is plausible that additional data on factors causally associated with acute malnutrition, including infant and young child feeding practices, use of food security coping strategies, dietary diversity, access to water, sanitation and hygiene services and health service utilisation would have improved prediction: these data are sometimes generated in crisis settings through cross-sectional surveys, but to our knowledge are not typically available at the granular level required for our predictive problem. It is also likely that problems with available data quality constrained model accuracy. Non-differential error or misclassification arising from measurement problems (e.g. imprecise child anthropometric measurements) and data entry errors would generally reduce model goodness-of-fit and bias estimated associations towards the null: observed-versus-predicted graphs generally suggest ‘regression dilution’ [30], a phenomenon whereby predictions align around an underestimated linear slope, consistent with high noise in predictor variables. Differential error may also have affected model accuracy in various ways. For example, the predictive value of certain variables would have been dampened if anthropometric surveys had systematically underestimated acute malnutrition in the very locations where those predictors exhibited their most extreme values, as might be plausible for surveys done in very remote, insecure locations and thus constrained by time, local staff competency or the need to exclude unreachable communities from the effective sampling frame. We attempted to mitigate such bias by down-weighting lower-quality surveys with evidence of sampling frame selection bias, but models without this weight were not substantively different (data not shown). Pragmatically, these data quality limitations illustrate the challenges of prediction based on data not collected for research.

Our study aim was not to explore associations: as such, we focussed on accuracy and, for example, ignored significant effect modifications that did not improve prediction. Observed GLM associations and variable importance metrics for RF are nonetheless informative. Measles incidence and rainfall or NDVI had plausible associations with most outcomes in both countries, while water price had a very strong association in Somalia. Terms of trade, however, were important in South Sudan but marginal in Somalia. We saw inconsistent associations with forced displacement or armed conflict intensity, though these have been documented elsewhere [31], and, critically, rainfall abnormalities (as opposed to total precipitation) were not an important predictor in any model. A recent review of 90 studies concludes that acute malnutrition is understudied relative to chronic malnutrition (stunting); the review also finds that, while adequate rainfall during the growing season has been associated with less acute malnutrition, relationships with drought and armed conflict are inconclusive [32]. Indeed, the interplay of unusual climate events and armed conflict has proved challenging for food security prediction [33]. More generally, our and others’ findings underscore the context-specific complexity of causal pathways leading to acute malnutrition. They may also reflect the relative noisiness of different datasets, i.e. their accuracy.

Aside from data limitations, our analysis does not thoroughly explore available predictive methods. Among GLM-based approaches, it is possible that different transformations of outcomes or predictors, as well as methods to identify the most informative variables, such as lasso regression, could have yielded improved performance. Among machine learning methods, boosted regression trees could have reduced bias. We note however that these methods would need to yield very considerable improvements over those we used in order to produce useful predictions.

Conclusions

This analysis suggests that predictive modelling for acute malnutrition burden in crisis settings may not be an immediately viable alternative to ground surveys, at least in the countries studied. Given the potential benefit of such an approach [5], we nonetheless recommend further study, possibly in other settings, using larger datasets and more advanced machine learning methods (boosted regression trees, support vectors, neural networks) and/or Bayesian frameworks. To facilitate such research, as well as other publicly beneficial analyses, humanitarian actors should systematically make key datasets, including but not limited to anthropometric surveys, publicly available in curated, accessible form [34]. These include, but are not limited to, service data from different sectors (e.g. outpatient consultations; vaccination coverage; anthropometric screening data among outpatient children and pregnant women; admissions and exit outcomes for management of acute malnutrition; water availability and quality; coverage of excreta disposal; food security service beneficiaries and Kcal equivalents); market data (e.g. staple prices); morbidity and mortality surveillance data; cross-sectional surveys measuring food security, dietary diversity and infant and young child feeding practices; protection assessments; surveys of perceptions of affected populations; humanitarian presence and activity who-does-what-where matrices; and alternative data on insecurity (e.g. incidents monitored by the UN country team) or humanitarian access (e.g. road safety). A simple principle could be to publish all data barring any whose public availability could place humanitarian actors or affected people at unacceptable risk; aggregation and anonymisation may mitigate such risks. Lastly, any studies to date to predict population-level nutrition burden should be synthesised to identify actionable evidence and guide further analysis.