Article ID Journal Published Year Pages File Type
429523 Journal of Computational Science 2014 11 Pages PDF
Abstract

Highlight•Multiprocessor occupancy is an important factor in the performance of a GPU.•Communications of thread blocks needs kernel invocation that is time consuming.•The new matrix representation of membrane systems increases occupancy of a GPU.•The new matrix representation decreases communications of thread blocks.•The new algorithm based on matrix representation shows better performance than other.

In previous studies, objects of each membrane were assigned to threads of one thread block of the graphic processing unit (GPU). The number of active threads was low if the number of objects inside a membrane was low. This study represents objects of membranes as entities of a matrix. Then a sub-matrix represents the appropriate number of objects assigned to threads of each thread block to balance the load and keep the occupancy high even when the number of objects per membrane is low. The size of the sub-matrix or the appropriate number of active threads is determined automatically. Furthermore, by this approach it is possible to assign more than one membrane to each thread block and to execute communication between membranes in the same thread block without the need for time-consuming inter-block communication. For example, using the previous algorithm, for two objects per membrane the speed up is 0.6×, while for the proposed algorithm the speed up is 32.4×.

Related Topics
Physical Sciences and Engineering Computer Science Computational Theory and Mathematics
Authors
, ,