کد مقاله کد نشریه سال انتشار مقاله انگلیسی نسخه تمام متن
4950432 1440643 2017 14 صفحه PDF دانلود رایگان
عنوان انگلیسی مقاله ISI
Copernicus, a hybrid dataflow and peer-to-peer scientific computing platform for efficient large-scale ensemble sampling
ترجمه فارسی عنوان
کوپرنیکوس، جریان داده های ترکیبی و پلت فرم محاسبات علمی همکار به همکار برای نمونه گیری نمونه کار گروهی کارآمد در مقیاس وسیع
موضوعات مرتبط
مهندسی و علوم پایه مهندسی کامپیوتر نظریه محاسباتی و ریاضیات
چکیده انگلیسی


- Hybrid dataflow and peer-to-peer computing to fully automated ensemble sampling.
- The platform automatically distributes workloads and manages them resiliently
- Problems are defined as workflow by reusing existing software and scripts.
- Portability in networks where parts are behind firewalls.

Compute-intensive applications have gradually changed focus from massively parallel supercomputers to capacity as a resource obtained on-demand. This is particularly true for the large-scale adoption of cloud computing and MapReduce in industry, while it has been difficult for traditional high-performance computing (HPC) usage in scientific and engineering computing to exploit this type of resources. However, with the strong trend of increasing parallelism rather than faster processors, a growing number of applications target parallelism already on the algorithm level with loosely coupled approaches based on sampling and ensembles. While these cannot trivially be formulated as MapReduce, they are highly amenable to throughput computing. There are many general and powerful frameworks, but in particular for sampling-based algorithms in scientific computing there are some clear advantages from having a platform and scheduler that are highly aware of the underlying physical problem. Here, we present how these challenges are addressed with combinations of dataflow programming, peer-to-peer techniques and peer-to-peer networks in the Copernicus platform. This allows automation of sampling-focused workflows, task generation, dependency tracking, and not least distributing these to a diverse set of compute resources ranging from supercomputers to clouds and distributed computing (across firewalls and fragile networks). Workflows are defined from modules using existing programs, which makes them reusable without programming requirements. The system achieves resiliency by handling node failures transparently with minimal loss of computing time due to checkpointing, and a single server can manage hundreds of thousands of cores e.g. for computational chemistry applications.

ناشر
Database: Elsevier - ScienceDirect (ساینس دایرکت)
Journal: Future Generation Computer Systems - Volume 71, June 2017, Pages 18-31
نویسندگان
, , , ,