کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
432646 689003 2016 18 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
New distributed algorithms for fast sign detection in residue number systems (RNS)
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
پیش نمایش صفحه اول مقاله
New distributed algorithms for fast sign detection in residue number systems (RNS)
چکیده انگلیسی


• Most operations hitherto deemed hard to realize in RNS share the same bottleneck.
• Finding the LSB of an unknown in the Chinese Remainder Theorem is that bottleneck.
• Show 2 new fast methods to solve the bottleneck, one meets analytic speed bound.
• Moduli selection enables exhaustive pre-computation even at very long word lengths.

We identify a canonical parameter in the Chinese Remainder Theorem (CRT) and call it the “Reconstruction Coefficient”, (denoted by “RCRC”); and introduce the notions of “Partial” and “Full” Reconstruction. If the RCRC can be determined efficiently, then arithmetic operations that are (relatively) harder to realize in RNS; including Sign Detection, Base change/extension and Scaling or division by a constant can also be implemented efficiently. This paper therefore focuses on and presents two distinct methods to efficiently evaluate the RCRC at long wordlengths. A straightforward application of these methods leads to ultra-fast sign-detection.An independent contribution of this paper is to illustrate non-trivial trade-offs between run-time computation vs. pre-computation and look-up. We show a simple method to select the moduli which leads to both the (i) number of RNS channels  nn;  as well as  (ii) the largest channel modulus  mnmn satisfying {O(n),O(mn)}⪅N≡ the full-precision bit-length. The net result is that for many canonical operations; exhaustive look-up covering all possible input values is feasible even at long cryptographic bit-lengths NN. Under fairly general and realistic assumptions about the capabilities of current hardware, the memory needed for exhaustive look-up tables is shown to be bounded by a low degree polynomial of  nn. Moreover, both methods to compute RCRC can achieve a delay of O(lgn) in a RN system with nn channels. To the best of our knowledge, no other method published to date has shown a path to achieve that lower bound on the execution delay. Further, small values of channel moduli make it ideal to implement each individual RNS channel on a simple core in a many-core processor or as a distributed node, and our algorithms require a limited number of inter-channel communications, averaging O(n)O(n). Results from a multi-core GPU implementation corroborate the theory.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Journal of Parallel and Distributed Computing - Volume 97, November 2016, Pages 78–95
نویسندگان
, ,