Please use this identifier to cite or link to this item: http://hdl.handle.net/2445/180519
Title: A New Pipeline for the Normalization and Pooling of Metabolomics Data
Author: Viallon, Vivian
His, Mathilde
Rinaldi, Sabina
Breeur, Marie
Gicquiau, Audrey
Hemon, Bertrand
Overvad, Kim
Tjønneland, Anne
Rostgaard-Hansen, Agnetha Linn
Rothwell, Joseph A.
Lecuyer, Lucie
Severi, Gianluca
Kaaks, Rudolf
Johnson, Theron
Schulze, Matthias B.
Palli, Domenico
Agnoli, Claudia
Panico, Salvatore
Tumino, Rosario
Ricceri, Fulvio
Verschuren, W. M. Monique
Engelfriet, Peter
Onland-Moret, N. Charlotte
Vermeulen, Roel
Nøst, Therese Haugdahl
Urbarova, Ilona
Zamora-Ros, Raul
Rodriguez Barranco, Miguel
Amiano, Pilar
Huerta, José María
Ardanaz, Eva
Melander, Olle
Ottoson, Filip
Vidman, Linda
Rentoft, Matilda
Schmidt, Julie A.
Travis, Ruth C.
Weiderpass, Elisabete
Johansson, Mattias
Dossus, Laure
Jenab, Mazda
Gunter, Marc J.
Lorenzo Bermejo, Justo
Scherer, Dominique
Salek, Reza M.
Keski-Rahkonen, Pekka
Ferrari, Pietro
Keywords: Metabolòmica
Càncer
Metabolites
Cancer
Issue Date: 17-Sep-2021
Publisher: MDPI AG
Abstract: Pooling metabolomics data across studies is often desirable to increase the statistical power of the analysis. However, this can raise methodological challenges as several preanalytical and analytical factors could introduce differences in measured concentrations and variability between datasets. Specifically, different studies may use variable sample types (e.g., serum versus plasma) collected, treated, and stored according to different protocols, and assayed in different laboratories using different instruments. To address these issues, a new pipeline was developed to normalize and pool metabolomics data through a set of sequential steps: (i) exclusions of the least informative observations and metabolites and removal of outliers; imputation of missing data; (ii) identification of the main sources of variability through principal component partial R-square (PC-PR2) analysis; (iii) application of linear mixed models to remove unwanted variability, including samples' originating study and batch, and preserve biological variations while accounting for potential differences in the residual variances across studies. This pipeline was applied to targeted metabolomics data acquired using Biocrates AbsoluteIDQ kits in eight case-control studies nested within the European Prospective Investigation into Cancer and Nutrition (EPIC) cohort. Comprehensive examination of metabolomics measurements indicated that the pipeline improved the comparability of data across the studies. Our pipeline can be adapted to normalize other molecular data, including biomarkers as well as proteomics data, and could be used for pooling molecular datasets, for example in international consortia, to limit biases introduced by inter-study variability. This versatility of the pipeline makes our work of potential interest to molecular epidemiologists.
Note: Reproducció del document publicat a: https://doi.org/10.3390/metabo11090631
It is part of: Metabolites, 2021, vol. 11, num. 9, p. 631
URI: http://hdl.handle.net/2445/180519
Related resource: https://doi.org/10.3390/metabo11090631
ISSN: 2218-1989
Appears in Collections:Publicacions de projectes de recerca finançats per la UE
Articles publicats en revistes (Institut d'lnvestigació Biomèdica de Bellvitge (IDIBELL))

Files in This Item:
File Description SizeFormat 
metabolites-11-00631.pdf3.13 MBAdobe PDFView/Open


This item is licensed under a Creative Commons License Creative Commons