Abstract

Background: Large-scale proteomic studies have to deal with unwanted variability, especially when samples originate from different centers and multiple analytical batches are needed. Such variability is typically added throughout all the steps of a clinical research study, from human biological sample collection and storage, sample preparation, spectral data acquisition, to peptide and protein quantification. In order to remove such diverse and unwanted variability, normalization of the protein data is performed. There have been already several published reviews comparing normalization methods in the-omics field, but reports focusing on proteomic data generated with mass spectrometry (MS) are much fewer. Additionally, most of these reports have only dealt with small datasets. Results: As a case study, here we focused on the normalization of a large MS-based proteomic dataset obtained from an overweight and obese pan-European cohort, where different normalization methods were evaluated, namely: center standardize, quantile protein, quantile sample, global standardization, ComBat, median centering, mean centering, single standard and removal of unwanted variation (RUV); some of these are generic normalization methods while others have been specifically created to deal with genomic or metabolomic data. We checked how relationships between proteins and clinical variables (e.g., gender, levels of triglycerides or cholesterol) were improved after normalizing the data with the different methods. Conclusions: Some normalization methods were better adapted for this particular large-scale shotgun proteomic dataset of human plasma samples labeled with isobaric tags and analyzed with liquid chromatography-tandem MS. In particular, quantile sample normalization, RUV, mean and median centering showed very good perfor-mances, while quantile protein normalization provided worse results than those obtained with unnormalized data.

Details