

Contents lists available at ScienceDirect

### Nuclear Instruments and Methods in Physics Research A



journal homepage: www.elsevier.com/locate/nima

# Design and implementation of a nanosecond time-stamping readout system-on-chip for photo-detectors



Shebli Anvar<sup>a</sup>, Frédéric Château<sup>a</sup>, Hervé Le Provost<sup>a</sup>, Frédéric Louis<sup>a</sup>, Konstantinos Manolopoulos<sup>b</sup>, Yassir Moudden<sup>a,\*</sup>, Bertrand Vallage<sup>c</sup>, Eric Zonca<sup>a</sup>

<sup>a</sup> CEA/Irfu/SEDI Gif-sur-Yvette, France

<sup>b</sup> Physics Department, University of Athens, Greece

<sup>c</sup> CEA/Irfu/SPP Gif-sur-Yvette, France

#### ARTICLE INFO

Article history: Received 2 July 2013 Received in revised form 4 October 2013 Accepted 4 October 2013 Available online 17 October 2013

Keywords: Digital electronic circuits Data acquisition concepts Data acquisition circuits Computing Large detector-systems performance Software architectures

#### ABSTRACT

A readout system suitable for a large number of synchronized photo-detection units has been designed. Each unit embeds a specifically designed fully integrated communicating system based on Xilinx FPGA SoC technology. It runs the VxWorks real-time OS and a custom data acquisition software designed within the Ice middleware framework, resulting in a highly flexible, controllable and scalable distributed application. Clock distribution and delay calibration over customized fixed latency gigabit Ethernet links enable synchronous time-stamping of events with nanosecond precision. The implementation of this readout system on several data-collecting units as well as its performances are described.

© 2013 Published by Elsevier B.V.

#### 1. Introduction

Many detectors used in particle physics and astroparticle physics gather a large number of similar detection units. This is the case for cosmic ray laboratories as well as large gamma [1] or neutrino telescopes [2,3]. The physical distribution of these detectors which can extend over a few kilometers calls for a highly distributed, embedded data acquisition system. Such 2D or 3D detector arrays can be assimilated to a generic "camera" whose pixels consist of a single photo-detector or a collection of near-by photo-detectors. Various constraints should be taken into account when designing the readout system, which include power consumption, synchronization of pixel readout units, and bandwidth for data transmission to the centralized "processing farm" which is in charge of triggering the overall readout and data storage for events fulfilling the criteria derived from the physics goals.

The design study presented below was initiated to R&D the readout of a prototype of photo-detection node for a large-scale underwater neutrino telescope. For such a detector, the physics event spreads over a very large part of the array, so that global triggering on the information of the whole array is preferred to local triggering. For this purpose, all data from single photo-detectors are

transmitted to the processing farm which will build the trigger decision. In addition, large photo-detector rate fluctuations due to environmental conditions require a de-randomization of the data flux to make the best usage of the available bandwidth. We addressed this question with the use of a processor in the detecting node, coupled to a medium size external memory in which data are stored locally in buffers corresponding to a macroscopic timeslice. The data, consisting of the numerical encoding of the photodetector signals, are transmitted asynchronously from the nodes to the processing farm, with feed-back mechanisms avoiding network congestions.

The paper is organized as follows: we first describe the overall design in Section 2. In Section 3 we detail the data encoding in the photo-detection node, and present in Section 4 the synchronization of the collection of pixel nodes performed by a specifically designed clock system implemented over the Gigabit Ethernet network. Finally, Section 5 deals with the software implementation both in the detecting nodes and the processing farm.

#### 2. Description of the design

An underwater neutrino telescope is a 3D array of photodetectors anchored on the seabed at a typical depth of 3000 m under the sea level. In the detector envisaged, the pixel node would consist of a cluster of 31 photo-detectors housed in a high

<sup>\*</sup> Corresponding author. Tel.: +33 (0)1 69 08 17 81.

E-mail address: yassir.moudden@cea.fr (Y. Moudden).

<sup>0168-9002/\$ -</sup> see front matter © 2013 Published by Elsevier B.V. http://dx.doi.org/10.1016/j.nima.2013.10.019

pressure resistant glass sphere, and equipped with a full readout unit. As a result, the characteristics and form-factor of the electronic board have to comply with tight constraints of power dissipation and lack of accessibility for maintenance after deployment. In the implementation described here, the photo-detector signals were simulated by LVDS pulses a few nanoseconds wide. A sequence of pulses from the generator is sent simultaneously to all the 31 input channels.

Inside the node, the configuration of the photo-detector frontends is carried over local standard I2C and SPI links. Each node is remotely connected to the farm via a gigabit Ethernet optical link. The processor node, shown in Fig. 1, collects and time-stamps the input signals and encapsulates the acquired data in Ethernet frames sent to the farm. It is also in charge of configuring the node acquisition under remote control from the remote farm. A Readout System on Chip (RSOC) design based on Commercial Off-The-Shelf (COTS) hardware and software components is selected to fulfill the requirements. The RSOC is implemented in the high density FPGA Xilinx Virtex5-FX70T. It integrates a processor, the nanosecond accuracy input data time-stamping and readout firmware, I2C and SPI IPs and a customized gigabit Ethernet link. The processor runs the Real Time Operating System (RTOS) VxWorks from Wind River. The design makes use of the embedded PowerPC 440 core, the Tri-mode Ethernet Media Access Controller (TEMAC) core together with the ISERDES and GTX Rocket I/O primitives [4]. Dynamic and non-volatile memories were included as separate components. A flash memory holds both the Xilinx configuration and the first processor boot code in charge of downloading the RTOS image through the Ethernet. The synchronization of the distributed detection nodes is achieved by the distribution of a clock signal from the farm to the nodes on distances of the order of 100 km. The distributed clock signal is recovered in the node from the 8bit/10bit encoded 1.25 Gbps received serial link used for the bidirectional 1000BASE-X Ethernet communication to the farm. The synchronization of the detecting nodes further requires the ability to send specific commands from the farm, typically a "start counter" at the beginning of a data taking period, which are received in the remote node on a deterministic edge of the recovered Ethernet 62.5 MHz clock. This design manages to send and receive such commands in the physical layer without interfering with the standard Ethernet protocol for higher level communications (e.g. UDP, TCPIP).

The asynchronous LVDS signal delivered by the pulse generator to an input channel is sampled into an 8-bit shift DDR register clocked at 500 MHz synchronous to the recovered 62.5 MHz Ethernet clock. This design makes use of the ISERDES deserializer primitive available in nearly all IO blocks of the Virtex5 devices. Those primitives are used to implement 31 Time to Digital Converters (TDC) measuring each pulse rising and falling times with nanosecond precision. Finally, the developed software takes advantage of the IP stack and multi-tasking tools provided by the RTOS to transparently mix the slow control and data flows. It is highly coupled to the RSOC configured as an embedded communicating node. The TDC design and custom node synchronization logic will be first described in the following sections, preceding the presentation of the data handling software framework.

#### 3. Time to digital conversion

Each input channel LVDS signal is processed by a separate TDC channel which is implemented and coupled to the data acquisition as shown in Fig. 2. In Xilinx FPGAs, two IO blocks pair to form a differential input and provide access to two cascadable deserializers configurable in several ways. In the present design, the blocks are configured into an 8 bit wide serial to parallel converter in Double Data Rate (DDR) mode. This setup is used to sample the LVDS input at 1 GHz. Two copies of the same 500 MHz clock, one inverted with respect to the other, are needed for this design as well as a slower synchronous 125 MHz clock, used to read the TDC output. All former clock signals are derived from the globally distributed 62.5 MHz clock recovered on the Ethernet physical layer as discussed in Section 4. Two bytes are written to a FIFO along with a time stamp whenever the first one is different from zero. The 24 bit time stamp provides a coarse timing with 8 ns resolution while bit position within each byte provides the 1 ns fine timing.

The proposed TDC design was tested within a full data acquisition chain. An Agilent 81110A 330 MHz pulse generator provided LVDS 10 ns input pulses including 2 ns rising and falling edges, at 200 kHz. The standard deviation on the pulse period was measured at  $\approx$  220 ps using a LeCroy 64Xi 600 MHz oscilloscope. Pulse width and period measurements with a single TDC channel are reported in Fig. 3. This configuration also allows for a density test of the proposed TDC: the pulse generator and TDC being asynchronous, the first high bit in the TDC output byte is expected to uniformly fall in either eight bits. The results on Fig. 3 show a slight timing asymmetry between odd and even bits. This differential non-linearity is probably due to a very slight extra phase shift between the two 500 MHz TDC clocks.

For further validation of the data acquisition chain, the pulse generator was connected to a fan-out board in order to simultaneously source the 31 TDC channels implemented in a single detector node Virtex5 FPGA. The rising edge of each pulse is thus time-stamped by all 31 TDCs. In this configuration, as shown on the top histogram in Fig. 4, for each recorded pulse, the maximum difference between two time stamps never exceeds the 1 ns sampling resolution of the TDC and the maximum dispersion of the fanout is  $\approx 0.41$  ns. The second plot, shows how often each channel timestamped the incoming pulse 1 ns later than the earliest time-stamp. This plot reveals an inhomogeneity in the fan out, channels 13, 14, 25 and 31



Fig. 1. Functional block diagram of the pixel node prototype processor board.



Fig. 2. Schematic view of the implemented TDC.

Download English Version:

## https://daneshyari.com/en/article/8178412

Download Persian Version:

### https://daneshyari.com/article/8178412

Daneshyari.com