Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
859947 | Procedia Engineering | 2013 | 7 Pages |
An efficient parallel implicit computing method for large-scale structural and solid mechanics problems based on GPU and Compute Unified Device Architecture (CUDA) is present. To obviate the global matrix assembly and reduce the GPU global memory usage, the system equation is parallel solved by a matrix-free version of the conjugate gradient method using Node by Node scheme. A preprocessing and arrangement of model data is introduced to support the node-based parallel computing, which is achieved by one GPU thread per node scheme. Other operations, such as element stiffness matrixes are also parallel computing on GPU by node- or element- based scheme. Multiple strategies for efficient use of GPU memory, method to achieve memory coalescing, and optimal choice of parameters are introduced. Several high APIs offered by CUDA are adopted to reduce the parallel programming difficulty. The numerical examples illustrate the scalability and effectiveness of the present parallel approach.