Article ID | Journal | Published Year | Pages | File Type |
---|---|---|---|---|
4947006 | Neurocomputing | 2017 | 10 Pages |
Abstract
Deep neural networks require high performance computing and highly effective implementation to constrain the running time into a reasonable range. We proposed a novel node-level parallelization, conditional independent parallelization, of the forward and backward propagations to improve the level of concurrency. The propagations exploit a conditional independent graph (CIG) built in O(N) times, which consists of conditional independent sets of nodes. Each set in the CIG is sequentially visited, while the nodes in the set are calculated concurrently. Besides, we analyze the properties of the CIG and prove the correctness of the propagations with the CIG, then study the theoretical speedup ratios of the parallelization. Moreover, this parallelism can be applied to arbitrary structures of neural networks without influencing convergence, which only needs a conditional independent graph. It can be further integrated into other frameworks with batch-level and data-level parallelism to improve the level of concurrency. Since modern GPU supports concurrent kernels, the parallelization can also be implemented on GPU directly. To verify the parallelization in experiments, we implement an autoencoder, a dependency parser and an image recognizer with the parallelization and test them on a 4-core CPU I7 4790K with 32Â GB memory. The results demonstrate that it can achieve maximum speedups of 3.965 Ã for the autoencoder, of 3.106 Ã for the parsing and of 2.966 Ã for the recognizer.
Keywords
Related Topics
Physical Sciences and Engineering
Computer Science
Artificial Intelligence
Authors
Fugen Zhou, Fuxiang Wu, Zhengchen Zhang, Minghui Dong,