BACKGROUND: Metabolomics is a valuable tool for characterising biological mechanisms involved in cancer development, but produces complex datasets with intricate interdependencies. While linear dimension reduction techniques such as principal component analysis (PCA), have proven useful to summarise informative hidden patterns, biological evidence suggests metabolic relationships extend beyond linearity. Non-linear dimension reduction techniques, such as autoencoders (AEs), may identify more meaningful components. METHODS: We applied AEs and PCA to metabolomic data available for 5828 matched case-control pairs from 8 cancer-specific case-control studies nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort, and compared their performance. We evaluated the association between components identified by AEs and PCA with cancer risk, and explored the biological interpretation of components through their association with genetic factors and selected biomarkers. FINDINGS: PCA and AEs showed similar reconstruction performance. PCA's first component (PCA.1) captured phosphatidylcholines (PCs) as the primary source of variability and was associated with cancer risk. Conversely, AEs decomposed PC metabolism into two components, one of which exhibited a stronger association with cancer risk than PCA.1. Unlike PCA.1, this component was strongly associated with genetic variants mapping to the TMEM258 and FADS genes, key in polyunsaturated fatty acids (PUFA) biosynthesis and regulation. Consistently, the AE component demonstrated stronger associations with circulating omega-3 and omega-6 PUFA levels than PCA.1. INTERPRETATION: Linear methods remain adequate for general dimension reduction. However, AEs better captured specific pathways, identifying a component reflecting perturbations in PUFA metabolism associated with cancer risk. FUNDING: World Cancer Research Fund (IIG_FULL_2022_013).
Journal article
2026-02-03T00:00:00+00:00
124
Autoencoder, Cancer, Dimension reduction, Fatty acids, Metabolomics, Neural networks