Comprehensive identification and accurate quantitation of peptides is the key goal of mass spectrometry-based proteomics. To achieve as complete data as possible, researchers need to configure a large number of settings and parameters when conducting both experimental analysis of samples as well as bioinformatic processing of the raw data. Importantly though, these instrumental and bioinformatic processing parameters do not operate in isolation and a favourable change in one parameter frequently has a detrimental impact on the experiment elsewhere. Optimising an experimental and bioinformatic analysis pipeline involves finding a balance between parameters that give acceptable results and a substantial literature has been produced in support of this endeavour.
When optimizing experimental methods, researchers typically analyse real experimental samples under a variety of conditions and then process the raw data with a range of search software and parameters. Key metrics generated by these analyses are the qualitative list of proteins and peptides identified and relative abundance values for each species. However, when different instrumental analysis methods and/or different computation analysis software give conflicting results, the ‘best’ option can be difficult to select. In part, the challenge in selecting an optimal method is that the ‘true’ complement of peptides and proteins present in a complex sample is never completely known and researchers don’t have a ground-truth against which the results of optimisation processes can be compared.
Here, we present Synthedia – a computational platform that can generate DIA-LC-MS data files in silico in mzML format with a complement of peptide ions and fragments that is exactly known. A wide range of different operating parameters can be simulated such as varying gradient lengths, chromatographic peak widths, scan speeds, mass spectral resolutions and isolation windowing schemes and experiments can be simulated that contain multiple ‘treatment groups’ and ‘replicates’. To demonstrate the use of this software, we conduct extensive simulations of data with different acquisition speeds to demonstrate how quantitative accuracy declines with decreasing points per chromatographic peak. Lastly, we demonstrate how rates of peptide identification, false-positives and false-negatives vary between different DIA data anlaysis software as a function of both chromatographic peak width and chromatographic gradient length.