

Received: November 7, 2016 Accepted: December 9, 2016 Published: December 19, 2016

TOPICAL WORKSHOP ON ELECTRONICS FOR PARTICLE PHYSICS, 26–30 September 2016, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany

# MATRIX: a 15 ps resistive interpolation TDC ASIC based on a novel regular structure

### J. Mauricio,<sup>1</sup> D. Gascón, D. Ciaglia, S. Gómez, G. Fernández and A. Sanuy

FQA, ICC, Universitat de Barcelona, Avinguda Diagonal 647, Barcelona, Spain

*E-mail:* jmauricio@icc.ub.edu

ABSTRACT: This paper presents a 4-channel TDC ASIC with the following features: 15-ps LSB (9.34 ps after calibration), 10-ps jitter, < 4-ps time resolution, up to 10 MHz of sustained input rate per channel, 45 mW of power consumption and very low area (910 × 215  $\mu$ m<sup>2</sup>) in a commercial 180 nm technology. The main contribution of this work is the novel design of the clock interpolation circuitry based on a resistive interpolation mesh circuit (patented), a two-dimensional regular structure with very good properties in terms of power consumption, area and low process variability.

KEYWORDS: Digital electronic circuits; Instrumentation and methods for time-of-flight (TOF) spectroscopy; Timing detectors; VLSI circuits

<sup>&</sup>lt;sup>1</sup>Corresponding author.

# Contents

| 1 | Intr                       | oduction          |  |  |  |
|---|----------------------------|-------------------|--|--|--|
| 2 | Matrix TDC design overview |                   |  |  |  |
|   | 2.1                        | Building blocks   |  |  |  |
|   | 2.2                        | RIMC              |  |  |  |
|   | 2.3                        | PLL               |  |  |  |
|   | 2.4                        | Front-End Readout |  |  |  |
|   | 2.5                        | Back-End Readout  |  |  |  |
|   | 2.6                        | Serializer        |  |  |  |
|   | 2.7                        | SPI slave         |  |  |  |
| 3 | Chip measurements          |                   |  |  |  |
|   | 3.1                        | Code density test |  |  |  |
|   | 3.2                        | Linearity         |  |  |  |
|   | 3.3                        | Jitter            |  |  |  |
|   | 3.4                        | Power consumption |  |  |  |
| 4 | Con                        | clusions          |  |  |  |

### 1 Introduction

Time-of-Flight (ToF) measurement is one of the major challenges in high energy physics experiments, medical imaging and even in Laser Imaging Detection and Ranging (LiDAR). Precise timing measurements allow to compute the distance that a particle traveled and thus identify tracks, perform coincidence measurements or determine distance to objects.

Our research group has been working for years on fast-timing ASIC designs for Positron Emission Tomography (PET) applications [1]. FlexToTv2 ASIC provides very good timing performance: 100-ps Single Photon Time Resolution (SPTR) and 128-ps full width at half maximum (FWHM) of Coincidence Time Resolution (CTR). The outputs of this chip are discrete in amplitude but continuous in time, so that an external equipment is required to perform fine timing measurements. Our mid-term goal is to integrate a Time-to-Digital Converter (TDC) into a System-on-Chip (SoC) to provide timing measurements in the digital domain with low power consumption (10 mW per channel). Moreover, the timing resolution of the TDC should be good enough to avoid degradation in timing performance of the analog readout; in other words, the Least Significant Bit (LSB) should be in the range of 10 to 20 ps. Thus, the combination of jitter and non-linearity should not exceed 1 LSB.

The TDC architecture is heavily constrained by the target LSB and the available manufacturing technology, but power consumption also plays an important role in the choice of the architecture. Several works [2, 3] exhibit very high performance in terms of timing resolution (< 10 ps) but in all

1

5

8



Figure 1. MATRIX block diagram.

these cases power consumption would not fulfill our application requirements when implemented in 180 nm technology.

In this work, we present a 4-channel TDC ASIC prototype called Multichannel Architecture Tdc with Resistive Interpolation matriX (MATRIX). The main contribution of this work is the Resistive Interpolation Mesh Circuit (RIMC), which features 15-ps LSB (typical LSB of 9.3 ps after calibration), 10-ps jitter, < 4-ps time resolution, 1280 ns dynamic range, dead time < 20 ns, up to 10 MHz of sustained input rate per channel, 45 mW of power consumption and very low area  $(910 \times 215 \ \mu m^2)$  in a commercial 180 nm technology.

This paper is organized as follows: in section 2 the building blocks of the chip are described, section 3 shows the preliminary chip measurement results, and finally in section 4 the conclusions are drawn.

## 2 Matrix TDC design overview

## 2.1 Building blocks

MATRIX is a multilevel approach TDC which consist of a fine clock interpolator and a coarse counter block (see figure 1). Sub-clock resolution is achieved by producing copies of an 800 MHz internal clock spaced in steps of 15 picoseconds. The synthesis of these clocks is performed by

the RIMC. The clock signals are captured by the front-end readout block on the rising edge of the input *TIME* signal, and thus sub-clock phase is measured. The coarse counter block is in charge of counting entire clock periods (binary natural) to extend TDC dynamic range to 1280 ns. The back-end readout acquires synchronized data coming from both TDC levels, coarse and fine interpolator, builds the event and stores them into a 4-Word FIFO. Finally, the serializer block transmits data at 200 Mbps in a serial protocol.

## 2.2 RIMC

The circuit shown in figure 2 is a novel clock synthesizer is composed of an array of ring oscillators coupled by means of resistors, and thus providing 56 clock phases of the 800 MHz reference clock. These phases are organized in 7 rows by 8 columns of Delay Elements (DEs). Note that oscillation is achieved by inserting an odd number of rows and connecting the outputs of the lasts DEs to the inputs of the firsts DEs. The DE (see figure 2.b) contains a current starved inverter which fixes the row width to 89 ps (1/14 of the clock period) with the Phase-Locked Loop (PLL) control voltage (*VCTL*), while the resistor introduces a 15-ps sub-gate delay between adjacent columns (from left to right). The typical end-to-end delay between the first and the last column nodes for a given row is 120 ps since the number of columns is 8 (the first column in the left is used as dummy).



Figure 2. (a) RIMC schematic. (b) DE schematic. (c) Starved inverter schematic.

Figure 3 shows a chronogram of the nominal clock phases (printed per rows) from all the nodes in the RIMC. It can be seen that clock edges are alternated between adjacent rows since a starved inverter is used instead of a buffer. This fact has many benefits in terms of area and power consumption since the number of transistors per buffer is three (two for the inverter and one for biasing). Phase information of the clocks is used by the readout system to halve the number of DEs. Thus, only half clock period has to be covered by sub-gate delays, since the edge of the clock where transition occurs will determine if the conversion belongs to the first or to the second

|       | T <sub>Cycle</sub> / 2 |          |
|-------|------------------------|----------|
| row 0 |                        |          |
| ROW 1 |                        | ///////  |
| ROW 2 | ////////               |          |
| ROW 3 |                        |          |
| ROW 4 | ////////               |          |
| ROW 5 |                        |          |
| ROW 6 |                        | <u>/</u> |

Figure 3. Chronogram of the RIMC nodes sorted by rows.

half of the fine TDC counter. Observe that there is a 35% overlap between transitions in adjacent rows since, as already stated, the end-to-end delay from the first to the last column is 120 ps while row-to-row delay is 89 ps. This overlapping is used to accommodate either local or global process and temperature variations which, according to simulations, never exceeds this 35%.

## 2.3 PLL

The internal clock is generated on-chip by the novel circuit aforedescribed, which also outputs an 800 MHz clock sample. The PLL divides this clock sample by M, where M can be adjusted to 4, 8 or 16, and then the divided clock is compared with an external reference clock (50, 100 or 200 MHz) that controls a Charge Pump which, in turn, generates the control voltage of the synthesizer (*VCTL*).

#### 2.4 Front-End Readout

This TDC has four channels. An event occurs when a rising edge is produced in any of the TIME < 3:0 > inputs. This rising edge triggers the Time Capture Registers (TCRs), which are the first stages of the front-end readout (see a row example in figure 4). The group of 7 TCRs (one per row) is called Time Capture Matrix (TCM). This first stage of the readout captures the logic level of the 7 by 8 clock matrix coming from the RIMC using D-type Flip-Flops (FFs). These full custom FFs are optimized to reduce mismatch variability issues while not increasing power consumption substantially. After 1 ns TCM captures the clock phases, data is stored into the second FFs as long as no events are processed by the Back-End Readout block (*!BUSY* disabled). Once data is in the storage Flip-Flops, combinational logic computes the row and column where the transition occurred, and the edge (either 0/1 or 1/0, see figure 4 top right). Note that two consecutive rows may detect the same transition due to the 35% overlapping between contiguous rows as shown in figure 3.

#### 2.5 Back-End Readout

The Front-End block computes fine interpolation value when a new event occurs. Moreover, it stores them into the 4-word FIFO (per channel) allowing peak event rates up to 50 MHz (20-ns dead



Figure 4. Schematic of Row #0 of the Front-End Readout block.

time), and it synchronizes data coming from both fine and coarse counters. Synchronization works as follows: the coarse counter block generates an auxiliary signal which indicates if the captured counter has recently changed. In those cases where the coarse counter reading has recently changed and the fine interpolator has still not overflowed, fine counter is corrected by subtracting 1 LSB.

The fine data encoding algorithm works as follows: the node identifier is computed as  $8 * N_{Row} + N_{Col}$  from all those rows which have detected a transition (two at most). This stage is a 6-bit value where the 3 MSBs corresponds to the row number and the 3 LSBs to the column number. If the event is detected by two consecutive rows both node identifiers are averaged, and thus improving resolution with the arithmetic expansion (soft bins). Otherwise, resolution does not improve. Finally, in case that the captured clock phase is high, fine counter increases by 110 LSBs (half full scale). In summary, it is important to highlight that the average LSB is smaller than the typical LSB (15 ps) thanks to the row overlapping that produces soft bins.

## 2.6 Serializer

Once events are built and buffered, each TDC channel has its own transmitter which serializes data at 200 Mbps. The frame width is 18 bits, 10 for the coarse counter and 8 for the fine counter. Thus, the maximum sustained event rate allowed by this block is 10 MHz per channel.

Serialized data can be sent in both modes, single-ended and Low-Voltage Differential Signaling (LVDS). The LVDS driver allows four differential voltage modes to optimize power consumption. In those cases where the distance between the chip outputs and the external receiver are short and power consumption is critical, the LVDS transmitter can be switched off, and thus transmitting in single-ended mode.

#### 2.7 SPI slave

This block allows the configuration of the chip via software. One of the main functionalities of the SPI control is to change the power consumption profile of the chip by disabling the unused timing IOs, or modifying the current output of the LVDS drivers. Also, the SPI Slave permits to modify the PLL multiplier selector and provides PLL debugging functionalities (*VCTL* reset and monitoring).

#### **3** Chip measurements

This section shows the preliminary linearity, jitter and power consumption measurements of the firsts MATRIX chip version. The calibration of the ASIC is required in order to achieve the best performance since time bins in the interface between two consecutive rows will have more resolution than the ones coming from a single row, as explained in section 2.5.

#### 3.1 Code density test

Calibration is performed by means of density code test. This test consists in producing M random pulse shots (100 K in the current test) following a uniform distribution at the TDC input channels. The binary code corresponding to wider TDC bins will appear more often than the narrower ones. Figure 5 shows the density code test results for one of the MATRIX TDC channels.

TDC channel calibration may vary dynamically due to the thermal dependence of interpolation resistors. These variations introduce up to  $\pm 7\%$  for the full temperature operating range ( $-20^{\circ}$  C to  $+80^{\circ}$  C), which corresponds to  $\pm 1$  ps variation.

It can be seen that most of the fine counter bins (97.1%) are in the range between 0 and 25 ps, but eventually some outliers are observed in some TDC channels. Note that there are several bins with no sensitivity (zero bin width). These bins correspond to those RIMC regions where row overlapping never occurs, i.e. without node identifier averaging. The physical number of bins is 112 (7 rows ×8 columns × two clock edge types), corresponding to the 35% overlapping described in section 4, despite the required 15-ps bins to cover the clock period (1250 ps) without row overlapping is 83 (1250/15). It is important to highlight that the typical number of sensitive bins (width > 0 ps) achieved by the averaging algorithm is 134 (see figure 5), which corresponds to a typical bin width (LSB size) of 9.34 ps, with a standard deviation of 7 ps (before calibration).



Figure 5. Density code test results for one MATRIX TDC channel.

## 3.2 Linearity

Linearity is measured by injecting N pulse shots (40 K in the current test) synchronized with the external reference clock and by sweeping the relative phase between 0 and 1250 ps in steps of 5 ps. Linearity is then measured by averaging the N calibrated measurements for each step and then by subtracting the step size of 5 ps. Further details of the testbench setup can be found in figure 6.

Figure 7 shows the Differential Non-Linearity (DNL) and Integral Non-Linearity (INL) measurements of a single MATRIX TDC channel (after calibration). The measured DNL in the 12 available MATRIX prototype samples (48 channels) is always between  $\pm 4.7$  ps, with an RMS lower than 1.1 ps. The INL error is always between  $\pm 10.2$  ps and the RMS is lower than 3.7 ps. Hence,



Figure 6. Setup for linearity and jitter measurements.



Figure 7. DNL (top) and INL (bottom) measurements in picoseconds for a given MATRIX channel sample after calibration.

the total precision due to non-linearities of TDCs is estimated by  $\sqrt{\sigma_{DNL}^2 + \sigma_{INL}^2}$ , which is less than 4 ps for MATRIX TDC.

## 3.3 Jitter

Jitter is measured by computing the standard deviation in each of the 5-ps steps done during the linearity test, and obtaining the typical value from these 250 steps. Table 1 shows that MATRIX TDC jitter increases with the  $\sqrt{PLL_M}$ , where  $PLL_M$  is the PLL multiplication factor. This indicates that the jitter produced by the PLL is the dominant contribution. The PLL natural frequency (bandwidth) decreases with large multiplication factors (square root dependence), which in turn affects to the phase error, settling time and jitter [4]. Thus, jitter can be improved in further MATRIX versions by increasing the bandwidth of the internal PLL.

|    | TDC Jitter (ps) |            |  |
|----|-----------------|------------|--|
|    | Uncalibrated    | Calibrated |  |
| 4  | 9.7             | 9.3        |  |
| 8  | 13.4            | 12.9       |  |
| 16 | 21.2            | 20.6       |  |

**Table 1**. TDC jitter measurements for different PLL multiplier configurations.

#### 3.4 Power consumption

MATRIX has several configuration modes which enables to optimize power consumption. When the chip is in standby mode, the power consumption is 0.76 mW when the reference input clock is disabled, and 30.1 mW when enabled. Note that most of the power consumption is due to the continuous oscillation of the RIMC and the continuous switching of the D-input in the FFs of the TCM block. When the chip is fully operative, the power consumption is in the range between 45.2 mW (LVDS differential mode is 90 mV) and 67.7 mW (LVDS differential mode is 600 mV). Data transmission in low power mode has been tested successfully in our testbench setup.

Table 2 shows a comparison between several proposals with similar performances to MATRIX. Although it is difficult to compare different technology nodes, it can be seen that the power consumption per channel is clearly lower than any other proposal, even when compared with smaller technology nodes.

 Table 2. A comparison between state of the art proposals and MATRIX.

| Technology | Bin Size                                                                       | Linearity <sup>1</sup>                                                                                                                                   | <b>Power/channel</b>                                                                                                                                                                                                               |
|------------|--------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 180 nm     | 41 ps                                                                          | 35 ps                                                                                                                                                    | 25 mW                                                                                                                                                                                                                              |
| 130 nm     | 5 ps                                                                           | 3 ps                                                                                                                                                     | 43 mW                                                                                                                                                                                                                              |
| 350 nm     | 0.6 ps                                                                         | 5 ps                                                                                                                                                     | 80 mW                                                                                                                                                                                                                              |
| 180 nm     | 9.3 ps                                                                         | 4 ps                                                                                                                                                     | 11.3 mW                                                                                                                                                                                                                            |
|            | Technology           180 nm           130 nm           350 nm           180 nm | Technology         Bin Size           180 nm         41 ps           130 nm         5 ps           350 nm         0.6 ps           180 nm         9.3 ps | Technology         Bin Size         Linearity <sup>1</sup> 180 nm         41 ps         35 ps           130 nm         5 ps         3 ps           350 nm         0.6 ps         5 ps           180 nm         9.3 ps         4 ps |

<sup>1</sup>Linearity estimated by  $\sqrt{\sigma_{\text{DNL}}^2 + \sigma_{\text{INL}}^2}$ .

## 4 Conclusions

A novel TDC concept has been designed, prototyped and tested. The most attractive feature of MATRIX is the required power consumption to achieve an average time bin of 9.3 ps. The power consumption (11.3 mW) of this new 2-D resistive interpolation TDC generation makes it suitable for those applications with hard power consumption constraints. The linearity error is very low (4 ps) thanks to the low variability that RIMC offers. This linearity could be improved by using Low Vth transistors in the TCM. However, one of the weak points of this first chip version is jitter, which varies from 9.3 ps to 20.6 ps depending on the frequency of the input reference clock.

A second MATRIX version with improved jitter has been already submitted and it will be available in early 2017. In a near future we expect to design a System-on-Chip (SoC) called High Resolution FlexToT (HR-FlexToT), which will integrate a new FlexToT [1] version with improved energy and timing resolution and MATRIX TDC.

#### References

- [1] A. Comerma et al., *FlexToT-Current mode ASIC for readout of common cathode SiPM arrays*, *IEEE Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC)*, 2013, pp. 1–2.
- [2] L. Perktold and J. Christiansen, *A multichannel time-to-digital converter ASIC with better than 3 ps RMS time resolution*, 2014 *JINST* **9** C01060.
- [3] P. Keränen and J. Kostamovaara, A Wide Range, 4.2 ps(rms) Precision CMOS TDC With Cyclic Interpolators Based on Switched-Frequency Ring Oscillators, IEEE Trans. Circuits Syst. I, Reg. Papers1 62 (2015) 2795.
- [4] D. Fischette ,*Practical Phase-Locked Loop Design*, talk given at *International Solid-State Circuits Conference*, 2004.
- [5] S. Russo et al., A 41 ps ASIC time-to-digital converter for physics experiments, Nucl. Instrum. Meth. A 659 (2011) 422.