Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
1178290 | Biochimica et Biophysica Acta (BBA) - Proteins and Proteomics | 2013 | 8 Pages |
Classified into 16 superfamilies, conopeptides are the main component of cone snail venoms that attract growing interest in pharmacology and drug discovery. The conventional approach to assigning a conopeptide to a superfamily is based on a consensus signal peptide of the precursor sequence. While this information is available at the genomic or transcriptomic levels, it is not present in amino acid sequences of mature bioactives generated by proteomic studies. As the number of conopeptide sequences is increasing exponentially with the improvement in sequencing techniques, there is a growing need for automating superfamily elucidation. To face this challenge we have defined distinct models of the signal sequence, propeptide region and mature peptides for each of the superfamilies containing more than 5 members (14 out of 16). These models rely on two robust techniques namely, Position-Specific Scoring Matrices (PSSM, also named generalized profiles) and hidden Markov models (HMM). A total of 50 PSSMs and 47 HMM profiles were generated. We confirm that propeptide and mature regions can be used to efficiently classify conopeptides lacking a signal sequence. Furthermore, the combination of all three-region models demonstrated improvement in the classification rates and results emphasise how PSSM and HMM approaches complement each other for superfamily determination. The 97 models were validated and offer a straightforward method applicable to large sequence datasets.
► Successful conopeptide superfamily classification based on pro- and mature peptides ► Combination of HMM and PSSM approaches improved conopeptide superfamily prediction. ► Our method is more efficient and more easily amenable to large data sets than BLAST.