Investigating sources of variability in metabolomic data in the EPIC study: the Principal Component Partial R-square (PC-PR2) method
Fages A., Ferrari P., Monni S., Dossus L., Floegel A., Mode N., Johansson M., Travis RC., Bamia C., Sánchez-Pérez MJ., Chiodini P., Boshuizen HC., Chadeau-Hyam M., Riboli E., Jenab M., Elena-Herrmann B.
© 2014, Springer Science+Business Media New York. The key goal of metabolomic studies is to identify relevant individual biomarkers or composite metabolic patterns associated with particular disease status or patho-physiological conditions. There are currently very few approaches to evaluate the variability of metabolomic data in terms of characteristics of individuals or aspects pertaining to technical processing. To address this issue, a method was developed to identify and quantify the contribution of relevant sources of variation in metabolomic data prior to investigation of etiological hypotheses. The Principal Component Partial R-square (PC-PR2) method combines features of principal component and of multivariable linear regression analyses. Within the European Prospective Investigation into Cancer and nutrition (EPIC), metabolic profiles were determined by 1H NMR analysis on 807 serum samples originating from a nested liver cancer case–control study. PC-PR2 was used to quantify the variability of metabolomic profiles in terms of study subjects age, sex, body mass index, country of origin, smoking status, diabetes and fasting status, as well as factors related to sample processing. PC-PR2 enables the evaluation of important sources of variations in metabolomic studies within large-scale epidemiological investigations.