Oral Presentation 28th Annual Lorne Proteomics Symposium 2023

Scaling proteomics to population size cohorts: Early insights and opportunities from ASPREE proteomics (#29)

Ahmed Mohamed 1 2 , Dolores ArenasCavero 1 3 , Julian Kelabora 1 , Samah Issa 1 , Steve Binos 1 , Sukhdeep Spall 1 , Nishika Kapuruge 2 , Belinda Phipson 2 , Amanda Au 3 , Paul Ainsworth 3 , Kym Lowes 3 , Joel Smith 4 , Robyn Woods 5 , John McNeil 5 , Nadja Bertleff-Zieschang 1 , Melissa Davis 2 , Frank Bowling 1 , Andrew Webb 1
  1. Colonial Foundation Healthy Ageing Centre, WEHI, Parkville, Vic, Australia
  2. Bioinformatics division, WEHI, Parkville, Vic, Australia
  3. Screening Laboratory & National Drug Discovery Centre, WEHI, Parkville, Vic, Australia
  4. Department of Pathology, Royal Melbourne Hospital, Parkville, Vic, Australia
  5. School of Public Health and Preventive Medicine, Monash University, Melbourne, Vic, Australia

Proteomics analysis of biofluids has a demonstrated potential to be a rich source of disease biomarkers. Despite the numerous case-control studies in biofluid proteomics, clinical translation from bench to bedside remains poor. While large-scale population proteomics offer greater potential for clinical translation, technical challenges and inherent variability of LC-MS systems have hindered further progress in the field. Here, we present our journey to develop a robust pipeline for scalable proteomics using highly controlled sample and data processing workflows. Using a novel data-driven experimental design, we are in the process of producing the largest proteomics dataset ever-generated, profiling two timepoints of >12,000 urine samples from the ASPREE cohort with an average over 10,000 peptides per sample. We present an exploratory analysis of a data subset of more than 8,000 MS runs across 20 batches, focusing on consistency, coverage, within and across batch variability. We demonstrate the great potential of the resource in addressing predictive biomarker discovery for elderly health and disease, combined with over 3,000 longitudinal clinical variables curated by ASPREE. Finally, We describe opportunities for statistical and machine-learning model development, harnessing the sophisticated experimental design and extensive clinical metadata curated by ASPREE.