کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
2820577 1160865 2015 9 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Characterization and machine learning prediction of allele-specific DNA methylation
موضوعات مرتبط
علوم زیستی و بیوفناوری بیوشیمی، ژنتیک و زیست شناسی مولکولی ژنتیک
پیش نمایش صفحه اول مقاله
Characterization and machine learning prediction of allele-specific DNA methylation
چکیده انگلیسی


• Three kinds of allele-specific DNA methylation events, P-ASM, sS-ASM and cS-ASM, have distinct characteristics.
• The combinations of genomic and methylation related features are powerful in distinguishing different types of ASM.
• cS-ASM can be successfully predicted with logistic regression classifier.
• cS-ASM associated genes in the brain are frequently the transcripts with alternative splicing forms.

A large collection of Single Nucleotide Polymorphisms (SNPs) has been identified in the human genome. Currently, the epigenetic influences of SNPs on their neighboring CpG sites remain elusive. A growing body of evidence suggests that locus-specific information, including genomic features and local epigenetic state, may play important roles in the epigenetic readout of SNPs. In this study, we made use of mouse methylomes with known SNPs to develop statistical models for the prediction of SNP associated allele-specific DNA methylation (ASM). ASM has been classified into parent-of-origin dependent ASM (P-ASM) and sequence-dependent ASM (S-ASM), which comprises scattered-S-ASM (sS-ASM) and clustered-S-ASM (cS-ASM). We found that P-ASM and cS-ASM CpG sites are both enriched in CpG rich regions, promoters and exons, while sS-ASM CpG sites are enriched in simple repeat and regions with high frequent SNP occurrence. Using Lasso-grouped Logistic Regression (LGLR), we selected 21 out of 282 genomic and methylation related features that are powerful in distinguishing cS-ASM CpG sites and trained the classifiers with machine learning techniques. Based on 5-fold cross-validation, the logistic regression classifier was found to be the best for cS-ASM prediction with an ACC of 0.77, an AUC of 0.84 and an MCC of 0.54. Lastly, we applied the logistic regression classifier on human brain methylome and predicted 608 genes associated with cS-ASM. Gene ontology term enrichment analysis indicated that these cS-ASM associated genes are significantly enriched in the category coding for transcripts with alternative splicing forms. In summary, this study provided an analytical procedure for cS-ASM prediction and shed new light on the understanding of different types of ASM events.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Genomics - Volume 106, Issue 6, December 2015, Pages 331–339
نویسندگان
, , , , , ,