Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
453804 | Computers & Electrical Engineering | 2011 | 15 Pages |
Fast search algorithms (FSA) used for variable block size motion estimation follow irregular search (data access) patterns. This poses as the main challenge in designing hardware architectures for them. In this study, we build a baseline architecture for fast search algorithms using state-of-the-art components available in academia. We improve its performance by introducing: (1) a super 2-dimensional (2-D) random access memory architecture for reading regular and interleaved two-rows or two-columns as opposed to one-row or one-column accessibility of the state of the art; (2) a 2-D processing element array with a tuned interconnect to support neighborhood connections required by the conventional fast search algorithms and to exploit on-chip data reuse. Results show that our design increases system throughput by up to 85.47%, and achieves power reduction by up to 13.83% with an area increase in the worst case by up to 65.53% compared to the baseline architecture.
Graphical abstractFigure optionsDownload full-size imageDownload as PowerPoint slideResearch highlights► We design a parallel architecture for full and fast search motion estimation. ► A 2-dimensional (2-D) RAM architecture is designed for search range buffer. ► A 2-D processing element array with a tuned interconnect is used for data reuse. ► Results show that our design increases system throughput by up to 85.47%. ► We achieve 13.83% power reduction at the cost of 65.53% worst case area overhead.