PGMiner: Complete proteogenomics workflow; from data acquisition to result visualization

Article ID	Journal	Published Year	Pages	File Type
4944627	Information Sciences	2017	9 Pages	PDF

Abstract

In parallel with the development of nucleotide sequencing an equally important interest in further describing the sequence in terms of function arose and the latter represents the current bottleneck in the overall research question. Sequencing the transcriptome allows determination of expressed nucleotide sequences and using mass spectrometry allows sequencing on the protein level. Both approaches can only sequence a subset of the existing transcripts. Moreover, for example post translational modification events can only be determined on the proteomics level. Therefore, it is essential to combine proteomics and genomics. For that purpose, proteogenomics data analysis pipelines have been described. Here, we describe a novel proteogenomics workflow which encompasses everything from the acquisition of data to result visualization in the Konstanz Information Miner (KNIME), a state of the art workflow management and data analytics platform. We amended KNIME with a number of processes like peptide consensus prediction, peptide mapping, and database equalizing, as well as result visualization. This enabled construction of our new workflow, entitled PGMiner, which not only includes all data analysis steps, but is highly customizable which is rather cumbersome for most existing pipelines. Furthermore, no burdensome installation processes have to be performed making PGMiner the most user friendly tool available.

Keywords

Proteogenomics Bioinformatics Mass spectrometry Workflow management Computational proteomics