Article ID Journal Published Year Pages File Type
506087 Computers in Biology and Medicine 2006 12 Pages PDF
Abstract

Human papillomaviruses (HPVs) are small DNA tumor viruses which infect epithelial tissues and induce hyperproliferative lesions. Infection by high-risk genital HPVs is associated with the development of anogenital cancers. Classification of risk types is important in understanding the mechanisms in infection and in developing novel instruments for medical examination such as DNA microarrays. The sequence-based classification methods are useful in classifying risk types by considering residues in conserved positions. In this paper, we present a machine learning approach to the classification of HPV risk types by using the protein sequences. Our approach is based on the hidden Markov model and the kernel method. The former searches informative subsequence positions and the latter computes efficiently to classify protein sequences. In the experiments, the classifier predicted four unknown HPV types exactly. An additional result shows that the kernel-based classifiers learned with more informative subsequences outperform the classifiers learned with the whole sequence or random subsequences.

Related Topics
Physical Sciences and Engineering Computer Science Computer Science Applications
Authors
, , ,