Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
10343577 | Microprocessors and Microsystems | 2013 | 10 Pages |
Abstract
Reducing the effects of off-chip memory access latency is a key factor in exploiting efficiently embedded multi-core platforms. We consider architectures that admit a multi-core computation fabric, having its own fast and small memory to which the data blocks to be processed are fetched from external memory using a DMA (direct memory access) engine, employing a double- or multiple-buffering scheme to avoid processor idling. In this paper we focus on application programs that process two-dimensional data arrays and we determine automatically the size and shape of the portions of the data array which are subject to a single DMA call, based on hardware and applications parameters. When the computation on different array elements are completely independent, the asymmetry of memory structure leads always to prefer one-dimensional horizontal pieces of memory, while when the computation of a data element shares some data with its neighbors, there is a pressure for more “square” shapes to reduce the amount of redundant data transfers. We provide an analytic model for this optimization problem and validate our results by running a mean filter application on the Cell simulator.
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Computer Networks and Communications
Authors
Selma Saidi, Pranav Tendulkar, Thierry Lepley, Oded Maler,