Abstract
Basic aspects in the handling of fatty acid-data have remained largely underexposed. Of these, we aimed to address three statistical methodological issues, by quantitatively exemplifying their imminent confounding impact on analytical outcomes: (1) presenting results as relative percentages or absolute concentrations, (2) handling of missing/non-detectable values, and (3) using structural indices for data-reduction. Therefore, we reanalyzed an example dataset containing erythrocyte fatty acid-concentrations of 137 recurrently depressed patients and 73 controls. First, correlations between data presented as percentages and concentrations varied for different fatty acids, depending on their correlation with the total fatty acid-concentration. Second, multiple imputation of non-detects resulted in differences in significance compared to zero-substitution or omission of non-detects. Third, patients’ chain length-, unsaturation-, and peroxidation-indices were significantly lower compared to controls, which corresponded with patterns interpreted from individual fatty acid tests. In conclusion, results from our example dataset show that statistical methodological choices can have a significant influence on outcomes of fatty acid analysis, which emphasizes the relevance of: (1) hypothesis-based fatty acid-presentation (percentages or concentrations), (2) multiple imputation, preventing bias introduced by non-detects; and (3) the possibility of using (structural) indices, to delineate fatty acid-patterns thereby preventing multiple testing.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Clinical fatty acid (FA)-research is becoming increasingly performed, but basic statistical methodological issues have remained largely underexposed in scientific literature thus far. We aim to address three of these issues in the handling of FA-data, and provide quantitative examples of their imminent confounding impact on results of FA-analyses, which may confuse the understanding of the roles FA play in (patho)physiology.
First, FA are reported in two ways: as absolute concentrations, or as percentages of the total FA-concentration. The implications of these different presentations have been scarcely addressed. Importantly, the few studies that have investigated this question showed significant differences between both approaches [1–3]. This is conceivable, because an increase in the percentage of one FA automatically results in the decrease in the relative percentage of another FA, even when its absolute concentration remains unchanged [4, 5]. Nevertheless, recent research still seems to opt rather randomly for either presentational method.
A second methodological issue is how to handle non-detectable FA-concentrations. In contrast to other research fields [6], FA-research thus far has not addressed this problem. Therefore, possible important analytical consequences remain uninvestigated, which may potentially cause biases in the interpretation of FA-data.
Third, because of the great number and variety of FA, the risk exists that multiple testing induces type-I errors, or the need for strict correction [7, 8]. A solution to this problem could be meaningful data-reduction, decreasing the number of tests needed. One possible way to achieve data-reduction might be the use of indices, delineating distinct FA-patterns by incorporating several FA-concentrations into one variable [9]. Examples of important patterns in FA-research are chain length, unsaturation, and peroxidizability of FA, because these characteristics modulate membrane fluidity and susceptibility to radical attack and are thereby conceivably involved in the pathogenesis of e.g. recurrent depression [10, 11]. Using indices, e.g. the unsaturation index (UI), chain length index (CLI) or peroxidation index (PI) [9], would allow testing of these more complex hypotheses on FA-patterns involving multiple FA, thereby obviating the need to interpret analyses of every individual FA to test your hypothesis. Thus far, most FA-research did not correct for multiple testing [7], and tested indices only in addition to the individual FA. This might potentially have resulted in type-I errors, and thereby bias in the interpretation of FA-analyses.
In this paper, the conceivable confounding effects of these three statistical methodological issues are examined, by providing quantitative examples in a practical research setting, using an example dataset of FA-concentrations of recurrently depressed patient and healthy controls, described previously [10]. This was done on the basis of the following research questions: (1) what is the influence of presentation of results in percentages or concentrations, and how does this differ for different FA, (2) what is the influence of the approach used for missing/non-detectable FA-concentrations on the significances of outcome differences; and (3) what is the influence of the use of indices for data-reduction on outcome differences?
Materials and Methods
To investigate our research questions, we reanalyzed an example dataset consisting of washed erythrocyte FA-concentrations (pmol/106 erythrocytes) from 137 recurrently depressed patients and 73 age- and sex-matched controls, determined by capillary gas chromatography, described in more detail previously [10, 12, 13].
Percentages or Concentrations
To investigate the effects of presentational method, we expressed FA-concentrations both in concentrations (pmol/106 erythrocytes) and molecular percentages (individual FA’s concentrations divided by the total FA-concentration). Subsequently, to quantify the difference between the two presentational methods for each FA, we calculated the correlation between its presentation as a percentage or as a concentration using Pearson’s r (r absolute–percentual; Table 1). A r absolute–percentual of 1.00 (perfect correlation) indicates no difference between the two types of FA presentation, while a r absolute–percentual closer to zero indicates larger differences.
To learn how, for individual FA, presentation as concentrations or percentages results in differential biases, we investigated whether the difference between the two types of presentation (expressed as r absolute–percentual) depended on individual FA-characteristics. Therefore, we performed a second-level analysis exploring the relation between characteristics of the different individual FA and their observed r absolute–percentual. We first calculated each FA’s mean concentration (meanFA(i); Table 1). Subsequently, for each FA, we calculated the absolute value (non-negative) of the correlation between the specific FA-concentration and the total FA-concentration for an individual subject (|r|FA(i)-concentration–FA-total; Table 1). Finally, we determined the influence of these individual FA-characteristics (meanFA(i) and |r|FA(i)-concentration–FA-total) by entering these in a stepwise linear regression model as predicting variables with r(i)absolute–percentual (after Fisher r-to-Z transformation [14]) as dependent variable.
Handling of Non-detectable Values
To examine the influence of the handling of non-detectable/missing values, we compared: (1) substituting non-detectable values with zero, and omitting missing values; (2) omitting both non-detectable and missing values; and (3) using multiple imputation (MI) to estimate both non-detectable and missing values, using the software package Amelia II [15]. Simulation research previously demonstrated that MI was able to provide highly valid estimations of non-measured values, while incorporating the uncertainty involved [6, 16]. MI has been used on missing FA-concentrations before [17, 18], but not on non-detectable FA-concentrations.
To impute non-detectable/missing values, we used information on sex, age, marital status, educational level, social class, Hamilton Depression Rating Scale score, weight, length, waist and hip circumference, smoking, and salivary cortisol and dehydroepiandrosterone sulphate, folic acid, vitamin B6 and B12, homocysteine, and all other measured FA-concentrations. In addition, for non-detectable values, we assigned range priors in Amelia II indicating that a non-detectable FA concentration must lie between 0.001 and the detection limit of that FA (99 % confidence).
We used differences in erythrocyte FA-concentrations between patients and controls as example outcomes, calculated with independent Student's t tests. We compared the results of these different approaches to handle non-detectable/missing values to demonstrate their impact.
Calculation of Indices
To investigate the influence of the use of indices on outcome differences we compared two methods. First, we compared the 29 individual FA concentrations in our example dataset between patients and controls using Student's t tests and a Bonferroni correction. We interpreted the outcome differences to detect patterns of differences in chain length, unsaturation or peroxidizability between patients and controls.
As an alternative to the interpretation of these multiple individual FA-tests, we applied data-reduction using indices, which we compared between patients and controls using Student's t tests. We selected three indices specifically designed to delineate patterns in chain length, unsaturation or peroxidizability.
-
1.
The chain length index (CLI), providing information about FA-chain length. We calculated the CLI by adding the products of each FA’s concentration and the number of carbon atoms in their carbon chain and dividing this with the total FA-concentration;
-
2.
The unsaturation index (UI), indicating the number of double bounds per FA. Calculated as follows: (1 × monoenoics + 2 × dienoics + 3 × trienoics + 4 × tetraenoics + 5 × pentaenoics + 6 × hexaenoics)/total FA-concentration;
-
3.
The peroxidation index (PI), showing FA’s susceptibility to peroxidation. Calculated as follows: (0.025 × monoenoics + 1 × dienoics + 2 × trienoics + 4 × tetraenoics + 6 × pentaenoics + 8 × hexaenoics)/total FA-concentration.
Subsequently, we compared the results of these index tests to the patterns that emerged from the interpretation of the differences between patients and controls in the individual FA. For this, we compared the index test results to the individual FA-tests on multiply imputed data, and also constructed the indices from imputed data. In this way, we prevented missing values in the original dataset causing many missing values among the indices, which would have reduced statistical power.
Statistical Software
We used PASW statistics 18.0 (SPSS, Inc., 2009, Chicago, IL, USA). MI was performed using Amelia II [15], available via the R software package [19].
Results
Correlation between Percentages and Concentrations
Table 1 shows the difference between percentages and concentrations (expressed as r absolute–percentual) for each FA. Correlations ranged from 0.30 for 18:0 to 1.00 for 16:1n-9.
In the second-level analysis, linear regression showed that meanFA(i) was associated with r(i)absolute–percentual (β = −0.685; t (207) = −4.882; P < 0.001). This indicates that results presented in percentages or concentrations differed more for FA with higher concentrations.
Furthermore, when |r|FA-concentration–FA-total was also included in the regression model, it had an independent negative influence on r absolute–percentual (β = −0.824; t (207) = −5.486; P < 0.001; Fig. 1). The influence of meanFA(i) on r absolute–percentual was no longer significant. This indicates that differences between results presented in percentages and concentrations were significantly greater for those FA that have a stronger correlation with the total FA-concentration, and that this influence explained the effect of high FA concentrations on differences between results presented in percentages or concentrations.
Handling of Non-detectable Values
In our example dataset, 21 patients and 8 controls had missing FA-results due to technical reasons. The non-detectable percentage ranged from 0 % for 16:0-24:0, 22:5n-3, 22:6n-3, C18:2n-6, 20:3n-6, 22:4n-6, 22:5n-6, 18:1n-7, 18:1n-9 and 24:1n-9, to 60.5 % for 22:2n-6. The mean non-detectable percentage was 11.1 %.
The impact of different methods to handle missing/non-detectable values on example outcomes are demonstrated in Table 1. Compared to results obtained after MI, substitution of non-detectable values with zero resulted in different significance-levels for comparisons between patients and controls. Using zero substitution, the difference between patients and controls in 20:5n-3 was not significant, and differences in 20:3n-6 and 16:1n-9 were less significant. Other FA results were comparable, with lower concentrations for FA with non-detectable values, reflecting the expected bias toward zero.
When non-detectable values were omitted and not used in the analyses, the differences between patients and controls in 18:4n-3, 20:5n-3, 18:3n-6, 20:3n-6 were less or no longer significant, while significant differences in 20:1n-7 and 22:1n-9 emerged and differences in 22:2n-6 could not be tested, all compared to results obtained after MI.
Using Indices to Describe FA Patterns
The tests on the 29 individual FA after multiple imputation are listed in the right columns of Table 1. First, a Bonferroni correction for multiple comparisons was performed resulting in a corrected α of 0.05/29 = 0.0017. After this correction, differences between patients and controls for 20:5n-3, 20:3n-6, 16:1n-7 and 16:0 were no longer significant. Other differences remained significant, with lower concentrations in patients for 22:5n-3, 22:6n-3, 20:3n-6, 20:4n-6, 22:4n-6, 22:5n-6, 14:1n-5, 16:1n-9, 24:1n-9, 20:0, 22:0 and 24:0. Concentrations of 18:4n-3 18:3n-6, 22:2n-6, 16:1n-7, and 16:0 were higher in patients compared to controls. In analogy to our previous interpretations [10], these results fitted with patterns of reduced chain length, unsaturation and peroxidation for FA of the patients.
After data-reduction using the CLI, UI, and PI, differences between patients and controls were calculated (Table 2). The mean FA values for patients were less unsaturated (P ∼ 1.2 × 10−18; Cohen’s d = 2.35), shorter (P ∼ 7.1 × 10−19; Cohen’s d = 1.46), and less peroxidizable (P ∼ 4.0 × 10−15; Cohen’s d = 1.83).
When comparing the index results to the results of the multiple individual FA-tests, pattern outcomes were similar, with reduced chain length, unsaturation and peroxidation for FA of patients. Using indices resulted in fewer tests, but provided no information on differences in individual FA concentrations between patients and controls.
Discussion
Our results indicate that: (1) presentation of FA in either percentages or concentrations yields different results, particularly for those FA with a stronger correlation with the total FA-concentration, (2) differences in the approach used for non-detectable/missing values influence significance-levels of outcomes of FA-analysis, and (3) the use of the CLI, UI and PI showed differences between patients and controls in FA-patterns, in agreement with interpretations from individual FA-tests.
Differences between data presented in concentrations and percentages imply that these methods are not simply interchangeable. Moreover, differences between percentages and concentrations depended on individual FA-characteristics (|r|FA-concentration to FA-total). This dependency could inflict differential biases in individual FA results. Therefore, our findings emphasize the importance of a hypothesis driven choice of which method to use. Percentages could be used as a measure of the relative importance of a FA set against the total FA concentration; while absolute concentrations could be used for the measurement of a FA itself, independent of the concentration of other FA [3–5, 20].
The appropriate method of presentation could theoretically differ for each research question [21]. For example, concentrations could be most useful to distinguish depressed patients from controls, while percentages might predict disease progression. Therefore, the appropriate presentation method may depend upon which presentation is more (patho)physiologically to the research question under investigation. However, the dearth of research comparing both approaches so far, may—at present—hamper the formation of a hypothesis about which method be more (patho)physiologically relevant. If so, comparison of both methods of presentation could provide a guideline for future research.
Our results show that the way non-detectable/missing values are handled could potentially bias results, because significance levels of differences in example outcomes differed depending on which method was used. However, it should be noted that not only significances of differences, but also magnitudes of differences determine the bias introduced. Nevertheless, if non-detects occur, knowledge of the way they were handled, and discussion of any possible bias that may be inflicted as such, could prevent interpretation errors. Because other research fields already showed superiority of MI compared to other ways of handling non-detectable/missing values [6], this may soon be adapted as the preferred method to handle missing/non-detectable FA-concentrations as well.
By applying data reduction using indices—the CLI, UI and PI—we tested differences between patients and controls in FA-patterns. Index results were similar to the interpretation of the multiple tests on individual FA [10]. This suggests that indices could provide meaningful data-reduction in FA research. Furthermore, from a statistical viewpoint, the use of indices enabled us to test pattern hypotheses more efficiently by using only one outcome variable (CLI, UI or PI), instead of tests of many individual FA. This precluded the need for correction for multiple tests. In our example dataset this was not of specific analytical concern, because differences in individual FA-tests were large and mostly survived the Bonferroni correction. Nevertheless, this advantage may be beneficial in smaller samples or in diseased populations with smaller differences compared with controls. In addition, the indices facilitated quantitative testing of pattern hypotheses, in contrast to the qualitative interpretation of the individual FA tests. The disadvantage of integrating information on multiple FA-concentrations in one index, could be that it might undesirably simplify the underlying complexity of FA-metabolism. In such situations the relevance of an individual FA could be obscured, because differences in individual FA are not tested.
Whether indices should be used in FA-research seems to depend on the hypothesis under investigation. If FA are analyzed in order to test a pattern [e.g. membrane fluidity; unsaturation or peroxidizability; estimated enzyme activity; (inflammation regulating) FA ratios], indices could be used to first test this general pattern hypothesis. Subsequently, based on the index results, new specific hypotheses concerning selected individual FA could be tested. This would reduce the risk for type-I errors, or the need for strict correction for multiple testing [8]. A recent example of the possible usefulness of applying indices is the observation of bimodal distributions of FA unsaturation and chain length patterns in recurrently depressed patients [22]. However, if FA are analyzed to test a hypothesis concerning a specific FA (e.g. EPA), indices have no use, and should not be tested additionally since this would only increase the problem of multiple testing. Future studies are needed to further clarify the applicability of indices in FA-research.
Some additional limitations should be noted. The examples of the possible influences of the presented statistical methodological issues have been presented on the basis of only one dataset. However, although the size of the biasing effects may differ between different datasets, the basic principles of the issues addressed concern analysis of FA data in general. Second, our example dataset has a moderate sample size when compared to epidemiological studies. This could have influenced the stability of correlation coefficients, and therefore the results. Third, the data presented only concern these three statistical methodological issues, and do not investigate other important factors that may also influence results, e.g. chemical analytical methods, and the nature of the sample (tissue, cell type, lipid fraction, e.g. cholesteryl esters, triacylglycerol, phospholipids) [4]. Finally, because differences in outcome measures were large in our example dataset, the disadvantages of multiple testing—and thereby the advantages of data reduction—could not be clearly exemplified and remain to be further explored in different datasets.
Nevertheless, our study addresses recurrent basic issues in practical FA research. Using a second-level analysis we were able to quantitatively demonstrate the consequences of the various methods of presentation. In addition, we suggested a novel way to handle non-detectable FA-values, using MI. Finally, we showed, to our knowledge for the first time, that indices could be used to delineate differences in FA patterns between depressed patients and controls.
In conclusion, a hypothesis-based choice of the method of FA-presentation (percentages or concentrations) could prevent bias in future FA-research. If it is not clear which method is preferable a priori, comparison of both methods could guide subsequent investigations. Furthermore, MI might prevent bias potentially inducible by missing/non-detectable values. Finally, indices could assist theory based data-reduction, thereby preventing type-I errors associated with multiple testing. Awareness and cautious handling of these statistical methodological issues in future FA-research may further improve interpretation of FA-analyses, and thereby deepen the understanding of the roles FA play in health and disease.
References
Tuna N, Frankhauser S, Goetz FC (1968) Total serum fatty acids in diabetes: relative and absolute concentrations of individual fatty acids. Am J Med Sci 225:120–130
Schwertner HA, Mosser EL (1993) Comparison of lipid fatty acids on a concentration basis vs weight percentage basis in patients with and without coronary artery disease or diabetes. Clin Chem 39:659–663
Bradbury KE, Skeaff CM, Green TJ, Gray AR, Crowe FL (2010) The serum fatty acids myristic acid and linoleic acid are better predictors of serum cholesterol concentrations when measured as molecular percentages rather than as absolute concentrations. Am J Clin Nutr 91:398–405
Hodson L, Skeaff CM, Fielding BA (2008) Fatty acid composition of adipose tissue and blood in humans and its use as a biomarker of dietary intake. Prog Lipid Res 47:348–380
Chow CK (2009) Fatty acid composition of plasma phospholipids and risk of prostate cancer. Am J Clin Nutr 89:1946
Baccarelli A, Pfeiffer R, Consonni D, Pesatori AC, Bonzini M, Patterson DG, Bertazzi PA, Landi MT (2005) Handling of dioxin measurement data in the presence of non-detectable values: overview of available methods and their application in the Seveso chloracne study. Chemosphere 60:898–906
Appleton KM, Rogers PJ, Ness AR (2008) Is there a role for n-3 long-chain polyunsaturated fatty acids in the regulation of mood and behaviour? A review of the evidence to date from epidemiological studies, clinical studies and intervention trials. Nutr Res Rev 21:13–41
Feise RJ (2002) Do multiple outcome measures require p-value adjustment? BMC Med Res Methodol 2:8
Hulbert AJ, Pamplona R, Buffenstein R, Buttemer WA (2007) Life and death: metabolic rate, membrane composition, and life span of animals. Physiol Rev 87:1175–1213
Assies J, Pouwer F, Lok A, Mocking RJT, Bockting CL, Visser I, Abeling NG, Duran N, Schene AH (2010) Plasma and erythrocyte fatty acid patterns in patients with recurrent depression: a matched case-control study. PLoS one 14:(5)e10635
Piomelli D, Astarita G, Rapaka R (2007) A neuroscientist’s guide to lipidomics. Nat Rev Neurosci 8:743–754
Assies J, Lok A, Weverling GJ, Lieverse R, Visser I, Abeling NGGM, Duran M, Schene AH (2004) Fatty acids and homocysteine levels in patients with recurrent depression: an explorative pilot study. Postaglandins Leukot Essent Fatty Acids 70:349–356
Dacremont G, Cocquyt G, Vincent G (1995) Measurements of very long-chain fatty acids, phytanic and prostanic acid in plasma and cultures fibroblasts by gas chromatography. J Inherit Metab Dis 18:77–83
Fisher RA (1915) Frequency distribution of the values of the correlation coefficient in samples of an indefinitely large population. Biometrika 10:507–521
Honaker J, Joseph A, King G, Scheve K, Singh N (2010) AMELIA II: a program for missing data. [Software] Retrieved from http://gking.harvard.edu/amelia/
Donders AR, van der Heijden GJMG, Stijnen T, Moons KGM (2006) Review: a gentle introduction to imputation of missing values. J Clin Epidemiol 59:1087–1091
Block RC, Harris WS, Reid KJ, Spertus JA (2008) Omega-6 and trans fatty acids in blood cell membranes: a risk factor for acute coronary syndromes? Am Heart J 156:1117–1123
Krabbendam L, Bakker E, Hornstra G, van Os J (2007) Relationship between DHA status at birth and child problem behaviour at 7 years of age. Prostaglandins Leukot Essent Fatty Acids 76:29–34
R Foundation for Statistical Computing (2010) R: a language and environment for statistical computing. Austria, Vienna
Schwertner HA (1994) Lipid fatty acids calculated on a concentration vs percentage basis. Am J Clin Nutr 59:1093
Crowe FL (2009) Reply to CK Chow. Am J Clin Nutr 89:1946–1947
Mocking RJ, Assies J, Koeter MW, Ruhé HG, Lok A, Schene AH (2012) Bimodal distribution of fatty acids in recurrent major depressive disorder. Biol Psychiatry 71:e3–e5
Acknowledgments
The authors would like to thank Dr. T.A. Eggelte for his stimulating critique.
Conflict of interest
All authors report no biomedical financial interests or potential conflicts of interest.
Open Access
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
This article is published under an open access license. Please check the 'Copyright Information' section either on this page or in the PDF for details of this license and what re-use is permitted. If your intended use exceeds what is permitted by the license or if you are unable to locate the licence and re-use information, please contact the Rights and Permissions team.
About this article
Cite this article
Mocking, R.J.T., Assies, J., Lok, A. et al. Statistical Methodological Issues in Handling of Fatty Acid Data: Percentage or Concentration, Imputation and Indices. Lipids 47, 541–547 (2012). https://doi.org/10.1007/s11745-012-3665-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11745-012-3665-2