J. Parallel Distrib. Comput. 75 (2015) 168-183

Contents lists available at ScienceDirect

J. Parallel Distrib. Comput.

journal homepage: www.elsevier.com/locate/jpdc

# Pars network: A multistage interconnection network with fault-tolerance capability

## Fathollah Bistouni<sup>a</sup>, Mohsen Jahanshahi<sup>b,\*</sup>

<sup>a</sup> Department of Information Technology, Qazvin Branch, Islamic Azad University, Qazvin, Iran

<sup>b</sup> Young Researchers and Elite club, Central Tehran Branch, Islamic Azad University, Tehran, Iran

### HIGHLIGHTS

- A new fault-tolerant multistage interconnection network called Pars is proposed.
- A novel design idea, *perfect connection*, is implemented on Pars.
- Pars network can meet both performance and cost requirements.
- Pars network outperforms popular networks ABN, ASEN, and EGN.

### ARTICLE INFO

Article history: Received 12 November 2013 Received in revised form 12 August 2014 Accepted 17 August 2014 Available online 27 August 2014

Keywords: Multi-processor systems Multistage interconnection network Fault-tolerance Pars network Reliability Self-routing Cost-effectiveness

# 1. Introduction

High performance computers are increasingly in demand in the areas of structural analysis, weather forecasting, artificial intelligence, expert systems, industrial automation, remote sensing, military defense, and genetic engineering, among many other scientific and engineering applications. Without super power computers, many of these challenges to advance human civilization cannot be made within a reasonable time period. Achieving high performance depends not only on using faster and more reliable hardware devices, but also on major improvements in computer architecture and processing techniques. Parallel processing, which was once the sole preserve of academia and industrial research, is

\* Corresponding author. E-mail addresses: f\_bistouni@qiau.ac.ir (F. Bistouni), mohsenjahanshahi@gmail.com, mjahanshahi@iauctb.ac.ir (M. Jahanshahi).

### ABSTRACT

Interconnection networks are used for communication between nodes in multi-processor systems as well as super-systems. These systems require effective communication between the processor and memory blocks and therefore, interconnection networks are considered as the heart of the parallel processing and multi-processor systems. Multistage interconnection networks (MINs) are the main option for use in supercomputer environments that consist of thousands of processing elements. In this paper, a regular class of fault-tolerant MINs named Pars network along with its routing algorithm are presented. Analytical results demonstrate that the Pars network outperforms known regular networks, namely ABN, ASEN, EGN, and IEGN in terms of cost, fault-tolerance, terminal reliability, mean time to failure, and permutation capability.

© 2014 Elsevier Inc. All rights reserved.

now opening up to a much wider audience who are slowly awakening the advantages and potential of parallelism. As traditionally conservative areas, such as commerce and finance, embrace parallel processing as a means to obtain competitive advantage by costeffective means, perhaps it will be market forces which decide the future of parallel processing. During the past ten years, the trends indicate by even faster networks distributed systems, and multiprocessor computer architecture suggests the parallelism as the future of computing [32].

Parallel processors are computer systems consisting of multiple processing units connected via some interconnection network plus the software needed to make the processing units work together [23]. A generic high-end parallel architecture is shown in Fig. 1. Several processor nodes are connected through an interconnection network, which enables the nodes to transmit data between themselves. Each node has a (possibly multi-core) processor (P), a share of the main memory (M), and a cache hierarchy (C). The processor nodes are connected to the global interconnection









Fig. 1. Generic multiprocessor system with distributed memory.

through a network interface (NI). Another important component of a computer system is I/O; I/O devices (such as disks) are often connected to an I/O bus, which is interfaced to the memory in each processor node through the interconnection. Therefore, processor, memory hierarchy, and interconnection network are critical components of a parallel system [21].

An interconnection network (IN) connects various modules to each other and therefore, these modules have the proficiency of communication [20,27,18,6]. These modules can be of any type like memory, I/O component, processor, and so on [6,26,48,11].

There are several ways to interconnect a large number of processors: bus, crossbar, and MIN. The bus network suffers from a performance bottleneck due to bus connection while to connect *N* input to *N* output with a crossbar  $N^2$  switches are needed. Consequently, crossbar networks cannot be scaled to any arbitrary size [11,2,3]. MINs are better suited for various parallel computing and telecommunication applications because they provide efficient performance using a reasonable number of switches. Generally, An  $N \times N$  MIN is a communication network with *N* input terminals (sources) and *N* output terminals (destinations) composed of ( $\log_k N$ ) stages of  $k \times k$  switches with a total number of switching elements equal to ( $\frac{N}{k} \log_k N$ ) [48,3,50,29,28].

Fault tolerance is meant to continue operating despite some possible failures in the system, although at a degraded performance. An IN is very important for its continuous operation over a relatively long period of time. The basic idea for the fault-tolerance of the MIN is to create redundancy in the number of paths between each source–destination pair so that alternative paths can been used in case of faults [6,26,39,48,11,2,3].

MINs can be divided into two main categories: (1) single path MINs and (2) multiple path MINs [50,9,15,1]. The modest cost of single path MINs such as the Baseline [60], Delta [52], and Generalized cube [57] makes them attractive for large multiprocessor systems, but the problem with this topology is that there is only one path from a given network input to a given output. Thus, if there is a fault on that path, no communication is possible.

In this paper, a new class of regular fault-tolerant MIN named Pars network is proposed. The Pars network provides more paths between any source-destination pair in comparison to popular regular MINs such as ABN [4], ASEN [43,44], and EGN [59].

As we will see, Pars network can achieve general goals for design of fault-tolerant MINs (i.e. good reliability, cost effective, and simple control).

The rest of this paper is organized as follows: Section 2 provides a view of the related works. Section 3 describes the structure and design of Pars network. Section 4 focuses on the routing procedure. Section 5 presents the performance analysis of Pars network. A brief discussion will be given in Section 6. Finally, some concluding remarks are presented in Section 7.

### 2. Related works

Discussion of reliability and fault tolerance on MINs has long been a concern of researchers in the field of MINs which has led to the design of different topologies. With subtle look at the reviewed works, we can derive two general approaches to improve fault tolerance and reliability of MINs:

#### 1. Increasing the number of stages

In this case, it is possible to provide more paths between each source–destination pair by increasing the number of switching stages. This approach is one of the first ideas to increase fault tolerance and reliability of the network is used [1,10]. Benefits of this method can be listed as follows: (1) It can increase reliability and fault-tolerance to some extent. (2) This approach increases costs exactly as a single switching stage does, therefore it slightly raises the cost and thus, is cost effective. (3) It is easy to be implemented.

Despite the above advantages, this approach also has defects that make it inefficient. These imperfections can be listed as follows: (1) Its use is limited, which means that it is not efficient to intermittent use on a given network and cannot always increase reliability [31,8]. As a result, this method enhances the reliability of the network to a small extent. (2) It can increase the length of the path between any source–destination pair.

Therefore, the above method would not be an advanced solution to resolve the issue of reliability and fault tolerance.

2. Using several improved MIN in parallel

Alternative approach has been noticeable after the first method [44,5,4,60] and the analysis has shown that this type can provide better reliability compared to the first method [11]. In this approach, the advantages can be listed as follows: (1) It can increase reliability and fault tolerance at a higher level than the first method. (2) In this approach, there is the possibility of reducing the number of switching stages, thus it can reduce the length of the path between any source–destination pair.

Contrary to the above merits, this approach also has problems that can be listed as follows: (1) Typically, the network cost increases due to the increased size of switching elements. (2) On the other hand, the increasing size of the switch causes the switch failure rate to be increased [11,4]. (3) In this approach, auxiliary links are commonly used to deal with some fault conditions that this can cause an increase in path length between each source-destination pair.

In total, according to the above discussions, we will be able to solve many of the problems associated with the latter approach, if we achieve a structure that provides the following features: (1) It must be able to provide multiple paths between each source-destination pair. (2) In addition, it should be composed of small-sized switching elements to decrease the network costs. Moreover, it has been proven that in practice, small size switches are less prone to failure [11,2,4,5,8]. Therefore, networks that are built from small-size switches can potentially provide high reliability. (3) As much as possible, it should preserve performance metrics in real-time applications even under fault and conflict situations.

Therefore, in this paper, our motivation is to design a network that meets the above requirements. To achieve this goal, we used a method that has not been viewed in none of the previous works and that is, increasing the reliability and fault tolerance. Also it leads to real-time network performance even on faulty conditions. We have proposed a network in which both output switches on the penultimate stage can lead to the desired destination which its structure will be explained in details in Section 3. Further, we will compare our proposed structure with four most popular representative MINs namely ABN, ASEN, EGN, and IEGN [9] that were built according to the second approach described above. ABN, ASEN, and IEGN have good reliability, but impose high costs on network because they have to use switches of size  $3 \times 3$ . While EGN has a low cost due to the use of switches with a small size of 2  $\times$  2, but it provides less number of paths between each source-destination pair compared with ABN and ASEN. In addition, it should be noted that in [9], is demonstrated that the IEGN has a significant advantage in contrast to the networks of ABN, ASEN, and EGN. However, in this reference similar to [30,25], it was assumed that all switching components of different sizes have the same Download English Version:

# https://daneshyari.com/en/article/432692

Download Persian Version:

https://daneshyari.com/article/432692

Daneshyari.com