Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4943183 | Expert Systems with Applications | 2017 | 9 Pages |
Abstract
To solve, manage and analyze biological problems using computer technology is called bioinformatics. With the emergent evolution in computing era, the volume of biological data has increased significantly. These large amounts of data have increased the need to analyze it in reasonable space and time. DNA sequences contain basic information of species, and pattern matching between different species is an important and challenging issue to cope with. There exist generalized string matching and some specialized DNA pattern matching algorithms in the literature. There is still need to develop fast and space efficient pattern matching algorithms that consider new hardware development. In this paper, we present a novel DNA sequences pattern matching algorithm called EPMA. The proposed algorithm utilizes fixed length 2-bits binary encoding, segmentation and multi-threading. The idea is to find the pattern with multiple searcher agents concurrently. The proposed algorithm is validated with comparative experimental results. The results show that the new algorithm is a good candidate for DNA sequence pattern matching applications. The algorithm effectively utilizes modern hardware and will help researchers in the sequence alignment, short read error correction, phylogenetic inference etc. Furthermore, the proposed method can be extended to generalized string matching and their applications.
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
Muhammad Tahir, Muhammad Sardaraz, Ataul Aziz Ikram,