Poster Presentation 28th Annual Lorne Proteomics Symposium 2023

Minimising intra- and inter-batch variation in large clinical proteomics cohort studies (#188)

Jumana M Yousef 1 2 , Terry P Speed 3 4 , Samantha J Emery-Corbin 1 2 , Megan Penno 5 , Helena Oakey 5 , Laura Dagley 1 2
  1. Division of Advanced Technology and Biology Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia
  2. Department of Medical Biology, University of Melbourne, Melbourne, VIC, Australia
  3. Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research, Melbourne, VIC, Australia
  4. School of Mathematics and Statistics, University of Melbourne, Melbourne, VIC, Australia
  5. Faculty of Health and Medical Sciences, The University of Adelaide, Adelaide, SA, Australia

Mass spectrometry-based proteomics studies conducted on sizable population cohorts have immense power for biomarker discovery. As a result of the cohort size and extended periods of MS acquisition, however, undesirable intra- and inter-batch variation can occur which is both unavoidable and can obscure real biological signals and thus lead to false discoveries.

These large cohort studies therefore present unique challenges within the framework of existing data analysis pipelines which normally cater to moderately-sized cohorts. Critical evaluation of the most appropriate experimental design and selection of a proper normalization method are crucial to achieving reliable MS quantification results with low false discovery rate. Even with suitable controls in place, data analysis must reduce variability due to batch effects while keeping the true biological variation.

We have acquired data across a large dataset consisting of a plasma cohort (n >1000), which covered 7 batches of sample preparation and were acquired over a two-month period using diaPASEF acquisition method on a timsTOF Pro instrument. We modified an existing normalization approach2 taking the advantage of the spatial experimental design, spiked-in protein standards, and quality control pool samples (QC).

To evaluate our approach for eliminating unwanted variations, our method was compared to a standard Loess normalization used for small sample size proteomics and SERRF (systematic error removal using random forest) 3, a normalisation approach used for large scale metabolomics data. Our methodology significantly reduces unwanted variation, thereby maximising the statistical power and clinical potential of large-cohort, clinical studies.

 

References:

  1. Webb-Robertson B M, et al. Statistically Driven Metabolite and Lipid Profiling of Patients from the Undiagnosed Diseases Network. Chem.2020, 92, 2, 1796–1803
  1. Kim T, et al. hRUV: Hierarchical approach to removal of unwanted variation for large- scale metabolomics data (2020) https://doi.org/10.1101/2020.12.21.423723
  1. Fan S, et al. Systematic Error Removal Using Random Forest for Normalizing Large-Scale Untargeted Lipidomics Data. Chem.2019, 91, 5, 3590–359