# HIGH DATA RATE LINKS FOR HIGH ENERGY PHYSICS

Paulo Moreira CERN, Switzerland

ECOLE IN2P3 DE MICROÉLECTRONIQUE 23 – 27 JUNE 2013, PORQUEROLLES FRANCE

# Credits

This slides were prepared with basis on the work of several people. Among them the most important contributions are from:

- Ozgur Cobanoglu ISKO, Turkey
- Rui Francisco CERN, Switzerland
- Ping Gui SMU, USA
- Gianni Mazza INFN, Italy
- Mohsine Menouni CPPM, France
- Lukas Perktold CERN, Switzerland
- David Porret CERN, Switzerland
- Filip Tavernier CERN, Switzerland

# Outline

- A quick "flight" over digital communications
- Multi-Gb/s CMOS Design
- Power Dissipation In CMOS Circuits
- Optoelectronics Data Transmission Links
- All that you don't (should) care about:
  - ESD Protection Circuits
    - Introduction
    - Input Matching using Coupled T-Coils
    - Input Matching using Inductive Compensation
  - Connectors, PCB & PACKAGE
- Laser Driver Design
- PIN Receiver Design
- Assembling ASICS
- "Analogue-Verilog"
- Handling SEUs

# A QUICK "FLIGHT" OVER DIGITAL COMMUNICATIONS

# **Digital Communications**

- Digital communication systems are widely used for data acquisition, trigger and control links in HEP:
  - A few exceptions: e.g. the CMS tracker data links
- Advantages:
  - Digital communications can be done virtually error free
  - When appropriate codes are used, transmission errors can be:
    - Detected
    - Corrected
  - They are "natural" for digital systems
  - Easy handling of multiple data sources and destinations
  - Easy re-routing between multiple sources and destination

# **Typical Architecture**



# BW – Limited Channel (1/2)



# BW – Limited Channel (2/2)



- Due to the limited channel bandwidth the transmitted pulses are broadened in time.
- If the broadening is significant, the signal corresponding to one symbol (bit) will overlap in time the signal of the next symbol!
  - This is called Inter Symbol Interference (ISI)
  - ISI is seen in an eye-diagram as:
    - Vertical eye-closure:
      - Reduction of the vertical eye-opening
    - Horizontal eye-closure:
      - Random positions of the threshold level crossings (Jitter or Phase Noise)
  - Equalization (high-pass filtering in this case) can be used to "restore" the signal high frequency content:
    - Reduces ISI
    - Improves Jitter
    - Can never fully compensate the channel
  - Two additional steps are needed to restore the signal to their "original" shape:
    - Threshold detection, restores the symbol levels by comparing a signal with a " predefined" level
    - The **signal is retimed** using the recovered clock, eliminating the jitter
- Two additional phenomena impair error free data transmission:
  - Attenuation
  - Noise

.

# Noise





• The average bit error probability:

$$P_e = P(0|1) \cdot P(1) + P(1|0) \cdot P(0)$$

• It is a strong function of the signal to noise ratio:

$$SNR_1 = \frac{V_1 - D}{\sigma_1}$$
  $SNR_0 = \frac{D - V_0}{\sigma_0}$ 

- In the limit the measured Bit Error Rate BER is equal to P<sub>e</sub>
  - SNR =  $6 \rightarrow BER = 10^{-9}$ 
    - Testing time to "count" 100 errors: 20 s @ 5 Gb/s
  - SNR =  $7.9 \rightarrow BER = 10^{-15}$ 
    - Testing time to "count" 100 errors: 231 days @ 5 Gb/s



Paulo.Moreira@cern.ch

# Jitter



- Moving away from the optimum sampling instant drastically increases the BER because the SNR decreases due to:
  - Signal jitter
  - Signal finite rise and fall times
    - The signal magnitude |V–D| decreases because of the finite raising/fall times of the signal
- Two main causes for non-optimum sampling:
  - Retiming clock static phase error
    - E.g. due an unbalance in the CDR PLL charge-pump
  - Retiming clock jitter
    - E.g. due to the CDR tracking behaviour or VCO noise

# **Error Control Coding**

- Due to Noise and ISI the received message might differ from the transmitted
  - Some of the transmitted bits will be wrongly detected
- Error control coding introduces extra bits in the transmitted message:
- These allow to:
  - Detect the presence of errors
  - Correct detected errors
- Error control is done at the expense of bandwidth

Even parity: Parity bits are computed such that the number of "1s" in each row and column is even

|                             | Transmitted message |     |     |     | Row parity |     |     |      |
|-----------------------------|---------------------|-----|-----|-----|------------|-----|-----|------|
| K                           | [6]                 | [5] | [4] | [3] | [2]        | [1] | [0] | R.P. |
| н                           | 1                   | 0   | 0   | 1   | 0          | 0   | 0   | 0    |
| i                           | 1                   | 1   | 0   | 1   | 0          | 0   | 1   | 0    |
| g                           | 1                   | 1   | 0   | 0   | 1          | 1   | 1   | 1    |
| g                           | 1                   | 1   | 0   | 0   | 1          | 1   | 1   | 1    |
| S                           | 1                   | 1   | 1   | 0   | 0          | 1   | 1   | 1    |
| C.P.                        | 1                   | 0   | 1   | 0   | 0          | 1   | 0   | 1    |
| Column parity Matrix parity |                     |     |     |     |            |     |     |      |

Received message

|      | /   |     |     |     |     |     |     |      |
|------|-----|-----|-----|-----|-----|-----|-----|------|
| K    | [6] | [5] | [4] | [3] | [2] | [1] | [0] | R.P. |
| н    | 1   | 0   | 0   | 1   | 0   | 0   | 0   | 0    |
| i    | 1   | 1   | 0   | 1   | 0   | 0   | 1   | 0    |
| g    | 1   | 1   | 0   | 0   | 1   | 1   | 1   | 1    |
| f    | 1   | 1   | 0   | 0   | 1   | 1   | 0   | 1    |
| S    | 1   | 1   | 1   | 0   | 0   | 1   | 1   | 1    |
| С.Р. | 1   | 0   | 1   | 0   | 0   | 1   | 0   | 1    |
|      |     |     |     |     |     |     | A   |      |

Discrepancy between the received row/column parities and the parities computed by the receiver!

# Line Coding

- Apart from detecting and correcting errors, coding is also applied to condition the signal to the transmission medium and/or the transmitter/receiver architecture. This is called line coding.
- Most common addressed problems are:
  - DC wander caused by AC coupling ("high-pass" type response), see figure below:
    - DC blocking between circuits
    - Laser-driver mean power control
    - Offset cancelation circuits in pin-receivers
  - Lack of signal transitions to keep the RX clock locked on the data
- Line codes typically have the following properties:
  - limit the low frequency content in the signal spectrum
  - Guarantee a high density of symbol transitions
- Some popular examples:
  - 8B/10B
  - Scrambling



| Input     |      |      |      |  |  |  |
|-----------|------|------|------|--|--|--|
| Disparity | +    | 0    | -    |  |  |  |
| 000       | 1101 |      | 0010 |  |  |  |
| 001       |      | 1001 |      |  |  |  |
| 010       |      | 1010 |      |  |  |  |
| 011       |      | 0011 |      |  |  |  |
| 100       |      | 1100 |      |  |  |  |
| 101       |      | 0101 |      |  |  |  |
| 110       |      | 0110 |      |  |  |  |
| 111       | 1011 |      | 0100 |  |  |  |

3B/4B

# Example: The GBT Frame (1/3)



- Link bandwidth: 4.8 Gb/s
- Payload: 3.36 Gb/s
- DC balance:
  - Scrambler
  - No bandwidth penalty
- Forward Error Correction:
  - Two encoders
    - Each encoder receives 44 bits and computes a 16 bit FEC code
  - Code: Reed-Solomon double error correction
  - 4-bit symbols (RS(15,11))
  - Interleaving: 2
  - Error correction capability:
    - $2_{\text{Interleaving}} \times 2_{\text{RS}} = 4 \text{ symbols} = 16 \text{-bits}$
- GBT frame efficiency: 70%
  - A line code is always required for DC balance and synchronization
  - For comparison, the Gigabit Ethernet frame efficiency is 80% (at the physical level 8B10B coding)
  - At a small penalty (10%, when compared with the Gigabit Ethernet) the GBT protocol will offer the benefits of Error Detection and Correction

# Example: GBT Coding Architecture (2/3)



# Example: GBT Forward Error Correction (FEC) (3/3)

- Radiation levels at 20 cm radius from the beam:
  - 2 ×10<sup>15</sup> neutrons/cm<sup>2</sup>
  - 1 × 10<sup>15</sup> hadrons/cm<sup>2</sup>
  - 500 kGy total dose
- High rates of Single Event Upsets (SEU) are expected for SLHC links:
  - Particle "detection" by Photodiodes used in optical receivers
  - SEUs on PIN-receivers,
  - SEUs on Laser-drivers
  - SEUs on SERDES circuits
- Experimental results confirmed that:
  - Error correction is mandatory to achieve errorrates  $\leq 10^{-12}$
  - Upsets lasting for multiple bit periods will occur on PIN detectors!
  - Upset lasting for multiple frames can occur in commercial TIAs



# **MULTI-GB/S CMOS DESIGN**

# **Basic CS-Amplifier**



#### Frequency Response: CS Amplifier



Paulo.Moreira@cern.ch

#### Performance (Gain & BW) Depends on I<sub>ds</sub>



# Biasing for Maximum g<sub>m</sub>



Paulo.Moreira@cern.ch

#### "Available" Bandwidth: CS Amplifier



Paulo.Moreira@cern.ch

#### Invariance of Characteristic Current Densities (1/6)

- As a result of constant-field scaling the optimum current densities remain largely invariant over technology nodes:
  - $\quad J_{opt}(f_T)\approx 0.3 \; mA/\mu m$
  - $\quad J_{opt}(f_{MAX})\approx 0.2 \ mA/\mu m$
  - $\quad J_{opt}(NF_{MIN})\approx 0.15 \; mA/\mu m$
- The optimum current densities are basically independent on:
  - Technology node
  - Foundry
  - Gate length
  - Threshold voltage
- It is valid for NMOS as well as for PMOS
  - PMOS optimum current densities are 40% to 45% of the NMOS values

$$J = \frac{I_D}{W} \quad [in \ mA/\mu m]$$

(T. O. Dickson et al. - JSSC vol. 41, no. 8, p. 1830, August 2006)



#### Transistors with various gate lengths in the 90-nm node.



Peak f<sub>T</sub> current density:

- $J_{opt}(f_T) \approx 0.3 \text{ mA/}\mu\text{m}$
- Independent of the gate length within a technology node
- Identical for Bulk and SOI







• Threshold voltages

# Invariance of Characteristic Current Densities (6/6)

#### **Rule of thumb for high-frequency design:**

- All rules of thumb must be <u>critical accessed</u> for the case at hand!
- Employ constant density biasing techniques
- Bias the NMOS transistors in between:
  - The peak  $f_T$  and peak  $f_{MAX}$  for high bandwidth:
    - $J_{opt} \in$  [0.2 mA/µm, 0.3 mA/µm]
  - Or optimum NF<sub>MIN</sub> for low noise:
    - $J_{opt} \approx 0.15 \text{ mA/}\mu\text{m}$
- For PMOS transistors use current densities are 40% to 45% of the NMOS values
- This should result in a design that is robust against process variations (1<sup>st</sup> order assumption):
- Signal headroom might dictate that lower current densities need to be used
  - The optimum values for  $f_T$ ,  $f_{MAX}$  and NF<sub>MIN</sub> for are relatively broad
  - Using  $0.5 \times J_{opt} < J < J_{opt}$  has little influence on performance and can significantly reduce power consumption and/or increase signal handling ability

### Application to NMOS – CML Gates



Paulo.Moreira@cern.ch

#### **Transistor Layout: Capacitances**

- The gate capacitance is typically dominated by the gate-to-channel (oxide) capacitance (even for relatively small devices):  $C_g = C_{ox} \cdot W \cdot L + C_{overlap} \cong C_{ox} \cdot W \cdot L \quad (since C_{ox} \cdot W \cdot L \gg C_{overlap})$
- Source and drain (C<sub>sb</sub> and C<sub>db</sub>) capacitances are diffusion capacitances composed of:
  - Bottom-plate capacitance:

$$C_{bottom} = C_j \cdot W \cdot L_s$$

- Sidewall capacitance:

$$C_{sw} = C_{jsw} \cdot \left(2 L_s + W\right)$$

- In small devices the sidewall capacitance dominates
- Increasing W decreases the relative importance of the sidewall capacitance
- Large devices have thus relatively smaller source and drain parasitic capacitances



# Transistor Layout: Multi-finger Devices



Multi-finger devices have:

- Smaller drain diffusion capacitance:
  - Basically "no" sidewall drain capacitance:
    - Channel on both sides of the drain diffusion
    - A small fraction still remains at the transistor edges:
      - Mainly when an odd number of fingers is used
      - So, no odd number of fingers in the designs!
- Less than half the bottom plate capacitance:
  - Design rules "impose" larger diffusion on the periphery then that in between gates
- Smaller gate resistance:
  - In the example:
    - Two 1/2 width gates
    - Drive <sup>1</sup>/<sub>2</sub> the gate capacitance each
    - Leading to "1/4 " RC delay
  - Gate resistance can be further decreased by contacting on both sides of the gate

#### "Available" Bandwidth: Multi-Finger Devices



### BW Broadening: Resistive Load (1/2)



Add a load resistor R<sub>ld</sub> in parallel with "R<sub>ds</sub>":

- The output pole will increase: -  $f_{3dB} = 1/[2 \pi (R_{1d} || R_{ds}) C_{gs}]$
- The gain will decrease: -  $G = g_m (R_{Id} || R_{ds})$
- To a first approximation the gain bandwidth shouldn't change:
  - GBW =  $g_m / [2 \pi C_{gs}]$
  - (44.8 GHz in the example considered)
- <u>However</u>, the above neglects the Miller capacitance:
  - GBW =  $g_m / [2 \pi (C_{gs} + C_{Miller})]$
  - GBW = 13.7 GHz
- Since the gain decreases as the load resistance decreases the Miller capacitance (due to C<sub>gd</sub>) also decreases
- Due to the lower gain, the Miller capacitance is smaller than that of the basic CS stage and the GBW is thus higher than that of the basic CS stage
- In practice the current source (represented in the figure) is avoided altogether and the transistor biased through the load resistor. This avoids the output capacitance  $C_{db}$  of an additional current source
  - The parasitic capacitance of the load resistor must be taken into account

# BW Broadening: Resistive Load (2/2)



- The Gain-Bandwidth is:
  CS: 13.7 GHz
- A zero exists in the transfer function due to the frequency dependence of the Miller capacitance.
  - The next stage transfer function has at minimum a single pole
- Both corner frequencies are dependent on the badly controlled C<sub>gd</sub> and on the openloop gain of the next stage.

# BW Broadening: Impedance "transforming"



To achieve a high Gain-Bandwidth product:

- Isolate the 1<sup>st</sup> Gain stage as much as possible from the 2<sup>nd:</sup>
  - That is, lower the capacitive loading of the 1<sup>st</sup> stage
- Drive the second stage with as low as possible impedance:
  - That is, move as high as possible the pole introduced by C<sub>gs</sub> and the Miller capacitance of the 2<sup>nd</sup> stage
- In other words, transform the impedance that stage 1 represents to stage 2 and vice versa.
- A common-drain (CD) stage can do that job. It has a:
  - High input impedance:  $Z_{in} \cong Z[C_{qs}, C_{qd}]$ 
    - No Miller multiplied capacitance
    - Bootstrapped C<sub>qs</sub>
  - Low output impedance  $Z_{out} \cong 1/g_m$

# BW Broadening: CS-CD Gain "Stage" (1/2)





- The Gain-Bandwidth has considerably increased:
  - CS: 13.7 GHz
  - CS-CD: 63.1 GHz
- The frequency response of the CS-CD "stage" is almost "single pole":
  - The Miller multiplied capacitance C<sub>gd</sub> is driven by the low impedance of the common drain stage (1/g<sub>m</sub>)
  - The second stage pole is pushed to a very high frequency
  - The dominant pole is now:
    - $f_{3dB} = 1/[2 \pi (R_{ld} || R_{ds}) C_{n1}]$
    - Where C<sub>n1</sub> is the "bootstrapped" gate capacitance of the CD stage plus parasitics
- The CD buffer stage has some drawbacks:
  - Additional power consumption
  - Harder to bias for low supply voltages:
    - Additional V<sub>gs</sub> drop.
    - $V_{dd} \le 1.5 \text{ V}$  for 0.13  $\mu \text{m}$  CMOS

### BW Broadening: CS-CD Gain "Stage" (2/2)



Paulo.Moreira@cern.ch
#### BW Broadening: Cherry & Hooper Gain "Stage" (1/3)



- Cherry and Hooper introduced in 1963 a technique that allows to achieve Gain-Bandwidth products of the order of 90 % of f<sub>T</sub> for bipolar transistor circuits. The principle can be applied to MOS circuits
- The basic ideas are:
  - All the nodes in the signal path are low impedance
  - Voltage amplification is achieved by a cascade of two stages: a G<sub>m</sub> stage followed by a transimpedance (T<sub>z</sub>) stage
  - For maximum signal transfer the stages must alternate in a succession of high-impedance / low-impedance stages
  - For bipolar transistors due to the relative low impedance of the base, this was implemented as alternating series-feedback / shuntfeedback stages
  - MOS are inherently high input impedance stages so a series feedback stage is strictly speaking not necessary
  - In MOS circuits series-feedback can be used to set accurately the gain. However, this will cost power and will increased noise (reduction of the first stage effective G<sub>m</sub>).

#### BW Broadening: Cherry & Hooper Gain "Stage" (2/3)



- The Gain-Bandwidth has increased again:
  - CS: 13.7 GHz
  - CS-CD: 63.1 GHz
  - CH: 102.3 GHz
  - The frequency response is now dominated by two poles not too distant  $(f_1 > f_2/10)$ 
    - The transfer function displays a second order roll-off
  - Like the CS-CD stage the CH stage has some drawbacks:
    - Additional power consumption
    - Not easy to bias either

#### BW Broadening: Cherry & Hooper Gain "Stage" (3/3)



#### BW Broadening: Inductive Peaking (1/2)

- Add an inductor in series with the load resistor
- The bandwidth can be extended up to 1.85 times
- For optimum group delay response the BW gain is 1.6 times
- The circuit displays a second order transfer function
  - The frequency response is characterized by the ratio "m"

 $L = m.R^2.C$ 

| Factor<br>m | Normalized f <sub>3dB</sub> | Response               |
|-------------|-----------------------------|------------------------|
| 0           | 1.00                        | No shunt peaking       |
| 0.32        | 1.60                        | Optimum group<br>delay |
| 0.41        | 1.72                        | Maximally flat         |
| 0.71        | 1.85                        | Maximum<br>bandwidth   |





#### BW Broadening: Inductive Peaking (2/2)



#### **Power optimization:**

- For a CML stage the bandwidth can be enhanced by increasing the tail current:
  - By reducing  $R_L$  at fixed  $\Delta V$
- Or, for by using inductive peaking for a given current and R<sub>L</sub>
- It should then be clear that a specified bandwidth can be achieved with lower current if inductive peaking is used.
  - See "active inductive" peaking example later.

#### BW Broadening: Active "Inductive" Peaking (1/3)

- The main drawback of inductive peaking is the large area required by the peaking inductors
- The impedance "looking" at the source of a Common Gate stage can be "made inductive"
- That is, it has a zero in the transfer function if the gate is "driven" by a resistor (R<sub>p</sub>)
- By choosing the value of R<sub>p</sub> it is possible to place the zero so that it compensates for the pole in the gain stage
- Bandwidth improvements up to 34%
  are possible
- It has however some disadvantages:
  - The voltage headroom to process signals is decreased
  - The noise performance is also slightly degraded



#### BW Broadening: Active "Inductive" Peaking (2/3)



Ecole IN2P3 de microélectronique

Paulo.Moreira@cern.ch

#### BW Broadening: Active "Inductive" Peaking (3/3)







For the same delay "inductive" peaking gives a power consumption advantage



Paulo.Moreira@cern.ch

# POWER DISSIPATION IN CMOS CIRCUITS

#### **Power Dissipation and CMOS Circuits**

- No matter the complexity of the circuit (logic gate, Flip-Flop, ...), a CMOS circuit can be "seen" as:
  - current source and a current sink driving
  - a capacitance (other circuits and interconnects)
- The beauty of CMOS circuits is that:
  - Energy is only dissipated during the "0"  $\rightarrow$  "1" and "1"  $\rightarrow$  "0" transitions.
  - A "standing still circuit" consumes no power!
    - If we ignore leakage currents, of course!



### **Reducing CMOS Power Dissipation**



- High frequency = High power consumption
  - Power consumption is proportional to <u>f</u>
  - Always run the circuits at the lowest useful frequency!
- When a circuit is not "playing a useful role" stop the clock for that circuit or sub-circuit:
  - No transitions "no" power consumption!
- If compatible with the operation speed and noise margins use low supply voltages!
  - Power consumption is proportional to  $\underline{V}^2$
- Design circuits with "minimum" transistor sizes whenever possible!
  - Power consumption is proportional to <u>C</u>
  - However, make sure the rise/fall times are short to avoid the "short-circuit current"
  - Optimize the circuit for low gate count.
- Minimize the interconnect capacitance.
- Optimized the circuit/gate layout to reduce parasitic capacitances.

### Can Technology Scaling Help to Keep it Cool?

- Technology scaling has a threefold objective:
  - Increase the transistor density
  - Reduce the gate delay
  - Reduce the power consumption
- Between two technology generations, the objectives are:
  - Doubling of the transistor density
  - Reduction of the gate delay by 30% (43% increase in frequency)
  - Reduction of the power by 50% (at 43% increase in frequency)

- Ideally, CMOS technologies evolution (scaling):
  - All the device dimensions (lateral and vertical) are reduced by  $1/\alpha$
  - Concentration densities are increased by  $\alpha$
  - Device voltages reduced by  $1/\alpha$
- Typically:
  - $1/\alpha = 0.7$  (30% reduction in the dimensions)
  - $\alpha = 1.41$
- In Practice a lot more than that is going on!



#### A "Practical" Case

 $\beta = \alpha \times \alpha$ 

- Two non-consecutive generations: 130 nm  $\rightarrow$  [90 nm]  $\rightarrow$  65 nm
  - One generation:  $\alpha$  = 1.43
  - Two generations:  $\beta = \alpha^2 = 2$
- The actual scaling between the two technology nodes considered is:
  - $-\beta = \beta_L = \beta_W = \alpha^2 = 130 \text{ nm} / 65 \text{ nm} = 2$
  - $-\beta_{V} = 1.5 \text{ V} / 1.2 \text{ V} = 1.25$
  - $\beta_{OX}$  = 3.03 nm / 2.69 nm = 1.13 (Low Power flavour)

Deviation from the "ideal" scaling "law"

- $\beta_{OX}$  = 3.03 nm / 2.00 nm = 1.52 (General Purpose flavour) \_
- Let's try to understand how much gain in power dissipation can be obtained if a circuit is ported over two generations under the following conditions:
  - Geometries are scaled according to the technology scaling:
    - That is W and L are reduced according to scaling  $\alpha$  = 1.43 per generation
  - Supply reduced according the practical scaling
  - Frequency:
    - 1<sup>st</sup> Case: Profit to increase the frequency by a factor of 2
    - 2<sup>nd</sup> Case: In HEP (LHC) 40 MHz is a kind of "Good given" frequency so let's keep the operation frequency constant!

#### How Much Can We Get From Technology Scaling? (1/2)

#### **CMOS Logic**

- Capacitance scaling:
  - $C = \varepsilon_{ox} / t_{ox} \times W \times L$
  - $1 \rightarrow \beta_{ox}/\beta^2 = 0.28$  (LP)
  - $1 \rightarrow \beta_{ox}/\beta^2$  = 0.38 (GP)
- 1<sup>st</sup> Case: f scales as  $1 \rightarrow 2$  (between two nonconsecutive generations,  $\beta_f = 0.5 (1/\beta_f = 2)$ )
  - Ideal
    - Power:  $C \times V^2 \times f$
    - One generation scaling:  $1 \rightarrow (1/\alpha)^2 = 0.5$
    - Two generations scaling  $1 \rightarrow (1/\alpha)^4 = 0.25$
  - Actual two generations
    - 1  $\rightarrow \beta_{OX} / (\beta_f \times \beta^2 \times \beta_V^2)$  = 0.36 (LP)
    - 1  $\rightarrow \beta_{OX}$  / ( $\beta_f \times \beta^2 \times \beta_V^2$ ) = 0.49 (GP)
- 2<sup>nd</sup> Case:  $f \rightarrow 1 \ (\beta_f = 1)$ 
  - Ideal
    - Power:  $C \times V^2 \times 1$
    - One generation scaling:  $1 \rightarrow (1/\alpha)^3 = 0.35$
    - Two generations scaling:  $1 \rightarrow (1/\alpha)^6 = 0.12$
  - Actual
    - 1  $\rightarrow \beta_{OX} / (\beta^2 \times \beta_V^2) = 0.18 \text{ (LP)}$
    - 1  $\rightarrow \beta_{OX} / (\beta^2 \times \beta_V^2) = 0.24 \text{ (GP)}$

- Interconnect capacitances will prevent such a good results!
- Leakage currents as well (not taken into account here!)

Even though this is a "first order" analysis, it clearly shows that moving up in the technology node gives a clear advantage in terms of power consumption! Within a technology node the choice of "flavour" also makes a difference.

### How Much Can We Get From Technology Scaling? (2/2)

#### **Current Mode Logic**

- "Fixed" current buffers (e.g. LVDS, SLVS,...):
  - Current and voltage signal levels stay the same:
    - Specified current and load impedance
  - Only supply voltage scales!
  - Power scaling:
    - Power:  $I \times V \rightarrow 1/\beta_V = 0.8 (LP/GP)$
- Fast Current Mode (and high bandwidth "Analogue"):
  - Ideal:
    - Power:  $V_{DD} \times W \times J_{opt}$
    - Keeping the current density constant
    - One generation scaling:  $1 \rightarrow 1/\alpha^2 = 0.49$
    - Two generations scaling:  $1 \rightarrow 1/\alpha^4 = 0.24$
  - Actual:
    - $1 \rightarrow 1/\beta_V \times 1/\beta_W \times 1 = 0.4$  (LP/GP)

# OPTOELECTRONICS DATA TRANSMISSION LINKS

#### HEP Data Link



#### "Typical" Signal Path for Data Transmission Systems (1/2)



#### "Typical" Signal Path for Data Transmission Systems (2/2)



Ecole IN2P3 de microélectronique

Paulo.Moreira@cern.ch

#### High Speed Data Link Design

- The design of high speed data links requires careful design and modelling of all the elements in the signal path!
- Don't overlook "innocent" components:
  - Laser bias circuit
  - PIN bias circuit
  - ESD protection
  - Package
  - Connectors
  - PCB
- It is not enough to have high performance optoelectronics and ASICs!
- Equally important are:
  - Package selection / design
  - PCB design
  - Connectors
- In your simulation test benches make sure to use realistic signal sources and loads:
  - At high frequencies there is no such a thing as a 'zero' impedance voltage source or 'infinite' impedance current source.

#### Your Simulation Test Bench is Important!



 Once an input matching network was included in the ASIC the circuit revealed excellent performance!

- Source impedance inaccurately modelled:
  - ESD impact unrevealed by simulations!
- Jitter independent of the pre-emphasis settings:
  - Bandwidth limitation not in the output node
  - Either in an internal or input node
- Once the modelling problem was detected:
  - Simulations accurately reproduce the response
  - Input identified as the BW limiting node
- "RC" time constant of the source impedance plus ESD protection capacitance dominated the response!
  - An input matching network was clearly needed!



Paulo.Moreira@cern.ch

## ESD PROTECTION CIRCUITS Introduction

#### What is ESD Protection?

- Electrostatic Discharge (ESD) event!
- ESD events occur during:
  - Handling by humans:
    - Human Body Model (HBM)
  - Handling by machines:
    - Machine Model (MM)
    - Charged Device Model (CDM)
- All pins must have ESD protection!
- Including:
  - High speed
  - Sensitive analog
- Failing to protect against ESD will most likely result in poor yield
- ESD protection must be included early in the design phase:
  - Its impact on circuit performance must be considered

#### ESD Models

| Characteristics                   | НВМ                                                                                | ММ                                                            | CDM                                                  |
|-----------------------------------|------------------------------------------------------------------------------------|---------------------------------------------------------------|------------------------------------------------------|
| Equivalent circuit                | Series 1.5 k $\Omega$ + 100 pF                                                     | Series 0.5-1.0 μH + 200 pF                                    | Field plate to chip<br>capacitance only              |
| Test voltage                      | 2000 V                                                                             | 200 V                                                         | 500 V                                                |
| Discharge Path                    | Between ANY two pins                                                               | Between ANY two pins                                          | One pin only<br>(Discharge Pin)                      |
| Simulates                         | Human discharging<br>through chip                                                  | Metal tool discharging through the chip                       | Charged chip<br>discharging to ground                |
| Discharge<br>Waveform             | Exponential Decay<br>Time Constant = 150 ns                                        | 11-16 MHz damped<br>Oscillation                               | ~1 GHz damped<br>Oscillation                         |
| On-Chip stress<br>characteristics | Lowest Current and<br>Voltage, longest<br>duration<br>I(HBM)=V(HBM)/1.5 kΩ         | Intermediate Current,<br>Voltage and duration                 | Highest Current,<br>Voltage and shortest<br>duration |
| Failure<br>Mechanisms             | Thermal failures:<br>MOSFET snapback,<br>Interconnect fusing,<br>gate oxide damage | Junction damage,<br>Interconnect fusing,<br>gate oxide damage | Gate oxide damage and interconnect damage            |

#### **ESD** Protection Circuits



- ESD voltages can exceed the breakdown voltage of any structure on the ASIC
- The current will flow through the path(s) of least resistance
- ESD protection provides a preferred discharge current path designed to carry high currents
- The discharge path must carry a high current while developing a low voltage drop or the current will be re-routed through sensitive circuits
- To be effective (low impedance) diodes have to have relatively large areas and thus relatively large capacitances:
  - Example of HBM protection diodes:
    - 2 x 50 µm long, 0.72 µm wide diodes to ground
    - 4 x 50 µm long, 0.72 µm wide diodes to supply
    - Equivalent parasitic capacitance: ~400 fF

#### Why Does it Matter?



- For multi Gb/s systems the wave propagation nature of electrical signals needs to be considered!
- Waves travel along transmission lines and are partially reflected at points where the characteristic impedance changes:
  - Transmission lines are intrinsically bidirectional
  - The important quantity is power (and not voltage and current independently)!
- Waves are fully absorbed (no reflection) at the termination if the termination has the same impedance of the line!
- Otherwise, waves are partially reflected:
  - $\Phi = 0^{\circ}$  if  $R_{\text{TERM}} > Z_0$
  - $\Phi$  = 180° if R<sub>TERM</sub> < Z<sub>0</sub>

#### Why Does it Matter?

- For purposes of signal propagation, the ESD circuit can be seen as a capacitor shorting the termination impedance
- The transmission line termination is thus frequency dependent and can't match the line impedance at all frequencies
- The incoming wave will be partially reflected over a range of frequencies
- The ratio between the reflected and incident wave amplitudes is the reflection coefficient:

$$\Gamma = \frac{V_r}{V_i} \tag{1}$$

• The reflection coefficient is related with the impedance of the line and of termination:

$$\Gamma = \frac{Z_L - Z_0}{Z_L + Z_0}$$
[2]

• Or to the return loss (the ratio between the incident and reflected powers)

$$RL(dB) = -20\log|\Gamma| = 10\log\frac{P_i}{P_r}$$
[3]

- The higher the return loss the better the load is matched to the source!
  - The name choice was unfortunate!

#### Why Does it Matter?

- Example:
  - Signal frequency: 5 GHz
  - 50  $\Omega$  transmission line
  - Terminated by a 50  $\Omega$  resistor "shorted" by 1 pF parasitic capacitance
  - Capacitor impedance at 5 GHz:
    - 31.8 Ω
  - Termination impedance at 5 GHz (parallel of R and C):
    - 19.4 Ω
  - Reflection coefficient at 5 GHz:
    - - 0.44



(we are ignoring here that these are all complex numbers)

- The reflected wave has 44% of the amplitude of the incident wave and an 180° phase shift!
- Return loss at 5 GHz:
  - 7.13 dB
  - 19.3% of the incident power is reflected towards the load.



#### A Real Example



- The on-chip termination resistor is in parallel with:
  - A relatively large capacitance originating from the ESD devices
  - The input stage capacitance
- The ESD capacitance becomes a short at high frequencies
- The return loss does not fulfil the specification (> 16 dB)
- Add a matching network to improve the return loss
- Two methods will be discussed:
  - 1. Coupled T-Coils
  - 2. "Inductive compensation"

# ESD PROTECTION CIRCUITS

## Input Matching: Coupled T-Coils

#### Load C<sub>ESD</sub> "Compensation"

S. Galal and B. Razavi - JSSC vol. 38, no. 12, p. 2334, Dec. 2003



Paulo.Moreira@cern.ch

### The T-coil Concept



- At very low frequencies:
  - $Z_{in} = R_T$  because the inductors act as a short and the capacitors are "open"
- At very high frequencies:
  - $Z_{in} = R_T$  because  $C_B$  acts as a short and the inductors are "open"
- Z<sub>in</sub> = R<sub>T</sub> can be guaranteed for all intermediate frequencies by properly choosing:
  L<sub>1</sub>, L<sub>2</sub>, k and C<sub>B</sub>,.
- In this case, C<sub>in</sub> (including the ESD capacitance) never influences the overall input impedance so that the return loss ideally is infinite.

#### T-Coil Input Impedance (1/2)



#### T-Coil Input Impedance (2/2)



$$Z_{in} = R_T \Leftrightarrow \begin{cases} L_1 = L_2 \\ L_1 = \frac{R_T C_{in}}{2(1+k)} & \rightarrow \text{ choose } L_1 \text{ or } k \\ C_B = \frac{C_{in}}{4} \frac{1-k^2}{(1+k)^2} \end{cases}$$

Paulo.Moreira@cern.ch

#### **T-Coil Transfer Function**









 $\zeta = 0.7$  for maximum bandwidth  $\zeta = \frac{\sqrt{3}}{2} \approx 0.866$  for uniform group delay

#### **T-Coil Implementation**

- Coupled inductors are implemented by means of 'symmetrical inductors' provided by the design kit
- Coupling factor is adjusted by choosing an appropriate spacing between the turns
- Modeling:
  - Design kit parameterized models
  - LRCK extraction over the inductor region
- Symmetrical inductor:
  - outer diameter: 173 μm
  - coil width: 8.5 µm
  - 3 turns
  - spacing: 5 µm
- Minimal HBM ESD protection is used to keep the input bandwidth high
- Input:
  - Differential 100  $\Omega$  terminated
  - Termination implemented by two 50 Ω resistors in series with the mid point to the common-mode voltage



| Design parameters |         |  |
|-------------------|---------|--|
| R <sub>T</sub>    | 50 Ω    |  |
| C <sub>IN</sub>   | 621 fF  |  |
| L <sub>1</sub>    | 545 pH  |  |
| L <sub>2</sub>    | 545 pH  |  |
| k                 | 0.44    |  |
| C <sub>B</sub>    | 63.2 fF |  |
#### Input Network Layout



#### Return Loss Calculation (1/2)

**Specification:** 

- RL > 16 dB
- 0 < f < 5 GHz

RL = 16 dB:

- 2.5% of the power is reflected
- Reflected wave has an amplitude which is 15.8% of the incident wave.





- The return loss specifications can be met with "insertion" of a T-coil.
- The actual value of the coupling factor is not really important as long as the other parameters are changed accordingly.

#### **Return Loss Simulation**

"Locally" RLCK extraction



#### Input Transfer Function Simulation



- The input gain (transfer function from the source to the input gates) is smaller with a T-coil (12 GHz vs. 7 GHz)
- However, for the 5 Gbit/s intended data rate, 7 GHz is more than enough.

#### Eye Diagrams with an Ideal Source



Ideal source and ideal T-line  $\rightarrow$  No reflection from the source  $\rightarrow$  No advantage in having a T-coil at the input

#### Eye Diagrams with Non-Ideal Source



100  $\Omega$  source and ideal T-line  $\rightarrow$  significant reflection from the source  $\rightarrow$  T-coil reduces the effect of these reflection at the input gates

#### **Measurement Results**



#### **Measurement Results**

# Compared with a Commercial Device on the VTRx



# **ESD PROTECTION CIRCUITS**

## Input Matching: Inductive Compensation

### 130 nm Process "Flavours"



#### Inductor Model



- For the same inductance :
  - $C_{ox1}$  and  $C_{ox2}$  are ± 3 times larger in LM than in DM
  - Lower self-resonant frequency
- BUT:
  - High-resistivity substrate (± 1.5  $\Omega$ ·cm)
  - Large  $R_{si1}/R_{si2}$  and small  $C_{si1}/C_{si2}$
  - So, self-resonant frequency is not much lower

### LM versus DM Inductors



- Higher DC resistance for the LM inductor:
  - Can be mitigated by reducing the load resistance slightly
- Comparable behavior in the range 1 GHz 8 GHz
  - The region of interest
- Higher self-resonant frequency for DM inductors (above 10 GHz)

#### Input Stage Input Impedance Modeling

- Because of the inductive loading and the Miller effect, the input impedance of the input stage is not purely capacitive.
- Between DC and 10 GHz, it can be reliably modeled by a C-R-C network:



35.8 fF

25.6 fF

1.51 kΩ

#### Return Loss Improvement with a Single Inductor



| C <sub>p</sub>   | 343 fF  | Parasitic capacitance of the package                                     |
|------------------|---------|--------------------------------------------------------------------------|
| L <sub>p</sub>   | 750 pH  | Bondwire inductance                                                      |
| C <sub>pad</sub> | 75 fF   | Capacitance of the bondpad                                               |
| R <sub>t</sub>   | 50 Ω    | On-chip termination resistance                                           |
| C <sub>esd</sub> | 378 fF  | Capacitance of the ESD devices                                           |
| C <sub>in1</sub> | 35.8 fF | Ad-hoc model for the input impedance of the input stage of the modulator |
| C <sub>in2</sub> | 25.6 fF |                                                                          |
| R <sub>in</sub>  | 1.51 kΩ |                                                                          |
| L <sub>c</sub>   | ?       | Compensation inductor to improve the matching                            |

### **Determining the Compensation Inductance**

Calculations



- By "tuning" the inductor the return loss can be optimized over the frequency range of interest
- The return loss specification is met with 0.6 nH  $\leq L_c \leq 1$  nH

#### Accounting for the Bondwire Inductance Spread



- The bondwire inductance "is also part" of the compensation network
- The value of the bondwire inductance is not well defined!
  - It can be obtained from the package model
  - But a large variation has to be assumed to account for the difference between the package cavity size and the chip size
  - Additionally, variations from the bonding process are expected
  - A range of 0.5 to 1.0 nH was assumed (nominal 0.75 nH)
- A 0.8 nH compensation inductance is OK for 'all' bondwire inductances considered
- But ...

#### Accounting for the Termination Resistor Spread



- Poly resistors (used for  $R_T$ ) have a 3 $\sigma$  spread of 15-20%
  - Return loss at DC can get as low as 19 dB
- In **extreme cases**, this resistor variation could push the return loss below 16 dB for frequencies above 5.5 GHz and thus violating the return loss specification
- In order to comply to the return loss specification, the compensation inductance has been decreased slightly to **0.62 nH** so that return loss at lower frequencies is traded with return loss at higher frequencies, giving a flatter overall behaviour.

#### Input Bandwidth



Input bandwidth always above 9 GHz (7 GHz in GBLD v4)

## Eye Diagrams

#### **RLCK** extracted



- Measured at the input stage inputs
- 50 Ω T-line of 10 cm
- 40  $\Omega$  source impedance

#### **Input Network Schematic**



#### Input Network Layout



# CONNECTORS PCB & PACKAGE

#### Reaching the ASIC



#### "Build" Models



#### Fully Model the Signal Path



#### **Unpopulated Board**



Measurement (yellow) and simulation (blue) with 100 ohms termination at the BGA pads

#### Fully Populated Board



Probe is at the receiver termination ,  $\approx$  3.5 mm from the input buffer. Measurement (yellow) and Simulation (Brown).

#### Checking the Models



#### Schrodinger In High Frequency Electronics!?



- Can signals be observed without being disturbed?
- Manufacturers provide equivalent electrical modes for their oscilloscope probes
- Simulate to evaluate how much the system is being disturbed
- In our case the loading effect of the probe was small!

#### With Probe (gray)

#### No Probe (blue)

#### "Virtual Probe" - Simulation



#### Improving the Package



# LASER DRIVER DESIGN

#### LASER-Diodes in HEP



### GBLD – Giga Bit Laser Driver

#### Main specs:

- Bit rate 5 Gb/s (min)
- Modulation:
  - Current sink
  - Single-ended/differential
- Laser modulation current:
  - 2 to 12 mA
- Laser bias:
  - 2 to 43 mA
- "Equalization"
  - Pre-emphasis/de-emphasis
  - Independently programmable for rising/falling edges
- Input return loss
  - > 16 dB for f < 5 GHz</p>
- Supply voltage: 2.5 V
- Die size: 2 mm × 2 mm
- I2C programming interface



#### **GBLD** Architecture


## Pre- De-Emphasis (1/5)

- To transmit NRZ data with little ISI the transmission channel must have a bandwidth of at least (rule of thumb):
  - BW =  $0.7 \times Bit Rate$
- For example, for transmission at 5 Gb/s the bandwidth required is:
  - BW = 0.7 × 5 Gb/s = 3.5 GHz
- For illustration purposes lets suppose that:
  - The laser driver is modelled by an ideal current source
  - And the circuit driven by the laser driver is modelled by a RC network with 3.5 GHz bandwidth:
    - R = 50 Ω
    - C = 0.91 pF
- Simulated eye-diagram:
  - There is very little ISI:
    - The eye is well opened vertically and horizontally
    - The jitter is very low





#### Pre- De-Emphasis (2/5)

- Lets suppose now that the bandwidth of the circuit being driven is four times lower:
  - BW = 3.5 GHz / 4 = 795 MHz
    - R = 50 Ω
    - C = 3.64 pF
- Simulated eye-diagram
  - There are significant amounts of ISI:
    - A "bit" extends for much longer than a bit period
  - The eye-diagram is almost closed vertically and horizontally
  - Jitter is high
  - The BER would be "prohibitive" for such a system!





# Pre- De-Emphasis (3/5)

- The problem with the low bandwidth is that:
  - For fast successions of "0s" and "1s" the signal has no time to reach the final value before a new "0" or a new "1" is transmitted.
- In an RC type of circuit this can be "easily" overcome:
  - At every "0"-t-"1" or "1"-to-"0" transition drive the circuit to a higher (or lower) voltage than the final one:
    - In our case this is accomplished by using a larger current than the final one
  - Once the voltage reaches the desired amplitude switch to the final current.
- In practice, no level crossing detection is made:
  - Immediately after a transition and for a "short" time a current pulse is added to the "normal" modulation current



#### Pre- De-Emphasis (4/5)



Ecole IN2P3 de microélectronique

Paulo.Moreira@cern.ch

#### Pre- De-Emphasis (5/5)



# **Delay Line**



#### AND Gate



Only if (inA + > inA -) & (inB + > inB -) is out + > out-

Paulo.Moreira@cern.ch

#### **Pre-Driver & Driver**



**Emphasis / De-Emphasis Drivers** 



Avoid Parasitic Capacitances in the Signal Path



# PIN – RECEIVER DESIGN

#### **PIN-Diodes In HEP**



# GBTIA – Giga Bit Transimpedance Amplifier

#### Main specs:

- Bit rate:
  - 5 Gb/s (min)
- Sensitivity:
  - 20 μA P-P (10<sup>-12</sup> BER)
- Total jitter:
  - < 40 ps P-P
- Input overload:
  - 1.6 mA (max)
- Dark current:
  - 0 to 1 mA
- Supply voltage:
  - 2.5 V
- Power consumption:
  - 250 mW
- Die size:
  - 0.75 mm  $\times$  1.25 mm



# **GBTIA** Design Overview

- Integrating the TIA with the LA is potentially risky:
  - Single ASIC
  - Crosstalk noise trough the supply and substrate
- Fully differential architecture:
  - Good tolerance to the supply noise
  - Input noise larger than for single ended
  - The power consumption is higher
- The photodiode is AC coupled to the TIA
  - Dark current deviated from the TIA
  - Low value of the low cut off frequency
  - Parasitics of the integrated coupling capacitances limit the bandwidth
- Transimpedance amplifier (TIA)
  - Wide bandwidth
  - Moderate noise
  - Stability
- Limiting Amplifier and output buffer (LA)
  - High gain
  - Wide bandwidth
  - Offset compensation
- Additional features :
  - Internal voltage regulator (with enable/disable control)
  - Average power indicator

#### **GBTIA** Architecture



#### **Transimpedance Amplifier**

- Shunt feedback amplifier is widely used for high speed receiver designs
- To increase the bandwidth :
  - Decrease the feedback resistor
  - Increase the amplifier open loop gain
  - Decrease the input node capacitance
- To minimize the thermal noise :
  - Increase the feedback resistor
  - Decrease the input node capacitance
  - Increase the amplifier transconductance



$$Z_T = \frac{v_{out}}{i_{in}} = \frac{-A_V}{A_V + 1} \frac{R_F}{1 + j\omega C_T \frac{R_F}{1 + A_V}}$$
$$BW = \frac{1 + |A_V|}{2\pi R_F C_T}$$

$$i_{n,in}^{2} = \frac{4kT}{R_{F}} + \frac{4kT\Gamma}{R_{F}^{2}} \left(\frac{1}{g_{m}} + \frac{(2\pi f \cdot R_{F}C_{T})^{2}}{g_{m}}\right)$$

#### Bandwidth Enhancement by Inductive Peaking

- In order to maintain a low level of noise while keeping a large bandwidth, shunt peaking is used to bust the open loop gain at high frequencies
- Shunt peaking
  - Inductance in series with the load resistance
  - Enhances the bandwidth
  - The frequency response is characterized by the ratio "m"

$$L = m.R^2.C$$

| Factor<br>m | Normalized f <sub>3dB</sub> | Response               |
|-------------|-----------------------------|------------------------|
| 0           | 1.00                        | No shunt peaking       |
| 0.32        | 1.60                        | Optimum group<br>delay |
| 0.41        | 1.72                        | Maximally flat         |
| 0.71        | 1.85                        | Maximum<br>bandwidth   |





# **GBTIA Input Stage**

- Differential
- Inductive peaking
  - The target bandwidth of 3.5 GHz is achieved for the worst case PVT (simulations including parasitic)
  - High transimpedance gain ( $R_F$ =380  $\Omega$ )
  - Low level of input referred noise
- Cascode
  - Reduces the Miller effect
- Current density is optimized
  - High current density needed to achieve high cut off frequency for the input transistor
  - Input transistor size optimized for an input capacitance of 700 fF
- 2 V supply required



# Limiting Amplifier Requirements



- Considering the sensitivity and the gain of the TIA preamplifier:
  - In the sensitivity limit the TIA output voltage is 12 mV
  - The next ASIC requires a 400 mV pp input
  - The LA must provide a gain of 40 dB (28 dB in worst-case scenario)
  - The minimum overall bandwidth is 3.5 GHz
  - The noise contribution must be negligible :
    - The input referred noise must be lower than 850 µV RMS (12mV/14) for a BER of 10E-12
- The input capacitance of the LA must be sufficiently low so that it does not reduce the TIA bandwidth
- The number of stages is set to 5 (4 LA + a buffer)
- Offset cancellation is incorporated in LA block to prevent the mismatch in the differential amp from saturating the amplifier and "mask" the small input signals
- In order to maintain a wide bandwidth while delivering a large current to the load, the amplifiers stages in the LA are designed to have increasingly larger size and current:
  - Minimize the load capacitance seen by the previous stage
  - Allow bandwidth extension
- The gain of the first stage (LA1) is higher than the following stages to reduce its noise contribution.

# Limiting Amplifier Stage

High bandwidth topology for each stage

- Cherry and Hooper structure
- A g<sub>m</sub> stage followed by shuntfeedback stage
- Second stage uses active "inductors"
- By active inductive peaking, the bandwidth is increased by 34% over a resistive loaded topology.



# Pin-Diode Biasing (1/2)

- TID effects result in large DC leakage currents across the PDs:
  - Can be as high as 1 mA!
- A large leakage current causes the PD bias voltage to decrease due to voltage drops in the biasing circuit!
- For high sensitivity and high bandwidth operation it is necessary to keep the reverse biasing voltage across the PD above a minimum:
  - 0.7 V for the devices used
- High receiver sensitivity also requires the impedance of the biasing circuit to be high, so that the signal current flows through the TIA and not the bias circuit
- The last two requirement are incompatible if simple bias resistors are used!
- To overcome the problem two "adaptive" current sources were used to bias the PD
- In the presence of relatively high leakage currents the circuit:
  - Maintains a "high" bias voltage
  - Represents a "high" impedance to the signal



# Pin-Diode Biasing (2/2)

- The bias circuit is capable of maintaining a "high" bias voltage:
  - across a large range of leakage currents (up to 1 mA)
- While being high impedance
- And having a low-band cut-off frequency:
  - Below 3 MHz in the worst case.
- A low cut-off frequency is required to avoid DC wander which will close the eye-diagram at the input of the receiver degrading the receiver sensitivity.
- For the codes used (8B/10B and 21-bit linear shift register scrambling) there is no penalty at 5Gb/s:
  - the low cut-off frequency of 3 MHz is a good compromise



Paulo.Moreira@cern.ch

#### **GBTIA**



## Measured Eye-Diagram

- Photodiode:
  - Responsitivity: 0.9 A/W @  $\lambda$  = 1310 nm
  - Active area diameter: 60 μm
  - Illuminated on the top
  - Typical equivalent capacitance of 240 fF.
- The chip is wire-bonded to the photodiode:
  - To minimize the wire bond inductance and the input parasitic capacitance, the connection between the TIA and the photodiode is made very short and does not exceed 200 µm
- The eye diagram was measured at 5 Gb/s using a PRBS sequence of length 2<sup>7</sup>-1
- For a –6 dBm input, the rise time is 30 ps and the total jitter is below 0.15 UI (30 ps) for a Bit Error Rate (BER) of 10<sup>-12</sup>
- For –18 dBm input, the jitter is less than 0.55 UI (110 ps) and the rise time is 60 ps
- The output amplitude is virtually independent of the input power:
  - 800 mV differential





#### Influence of the optical DC level on the BER



- The PD dark current can increase to 1 mA after 200 Mrad TID irradiation.
- To measure the influence the dark current, the PD was additional illuminated by a "DC" laser source.
- The integrated biasing circuit ensures a sufficient voltage across the PD.
- No noticeable degradation of the BER resulting from a higher low cut-off frequency was observed.
- However, the sensitivity degrades due to shot noise generated by the PD DC current.
  - The power penalty introduced by the shot noise of the DC level is ≈4 dB as expected

#### **BER versus Total Dose**



- Only the optical receiver chip (without the PD) was irradiated
- The PD was replaced by a passive network with input capacitance 500 fF.
- The chip was irradiated to a dose of up to 200 Mrad
- Only a marginal variation of the BER is observed

#### **Eye-Diagram versus Total Dose**



### **GBTIA: SEUs**

- The Bit Error Rate, was measured to be independent of the bit rate
- Test flux:
  - 2 × 10<sup>8</sup> p/cm<sup>2</sup>/s
- Error rates < 10<sup>-9</sup> can't be achieved even at high input optical power
- Forward error correction is thus mandatory

- SEU induced error "bursts" in the TIA
  - Biggest majority single or double bit errors
  - Longest observer 5 bits (SEU dominated region)





# GBTIA vs Commercial PIN-Receiver (1/2)

Same PIN type in the tree cases



### GBTIA vs Commercial PIN-Receiver (2/2)



# **ASSEMBLING ASICS**

## The GBTX in Numbers

- <sup>1</sup>/<sub>2</sub> million gates
- Approximately:
  - 300 8-bit programmable registers (all TMR)
  - 300 8-bit e-Fuse memory
- Clock tree (chip wide):
  - 9 clock trees (all TMR)
  - Frequencies: 40/80/160/320 MHz
- 7 PLLs:
  - RX: CDR PLL + Reference PLL (2.4 GHz)
  - Serializer PLL (4.8 GHz)
  - Phase-Shifter PLL (1.28 GHz)
  - xPLL (VCXO based PLL, 80 MHz)
  - (2x) ePLL (320 MHz)
- 17 master DLLs:
  - 8 for phase alignment of the e-links
  - 8 for clock de-skewing
- 40 replica delay lines:
  - For phase alignment of the e-links
- 7 power domains:
  - Serializer (1.5V)
  - DESerializer (1.5V)
  - Clock Manager (1.5V)
  - Phase shifter (1.5V)
  - Core digital (1.5V)
  - I/O (1.5V)
  - Fuses (3.3V)





### **GBTX** – Floor Plan and Modelling



#### DESerializer



# "ANALOGUE-VERILOG"

# PLL Modeling with (Plain) Verilog


### Phase Detector and VCO

| Phase Frequency Detector                                                       |                                                                |                                                                                                          | <pre>`timescale 1 fs / 1 fs module VCO( delay_control, vco_output); input[31:0] delay_control:</pre>                                                                                                                                                           |                                                    | vco  |  |  |
|--------------------------------------------------------------------------------|----------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------|------|--|--|
| `timescale 1 fs / 1 fs                                                         |                                                                |                                                                                                          | output vco_ou                                                                                                                                                                                                                                                  | itput;                                             |      |  |  |
| module ThreeStatePD (down, up, r, v);                                          |                                                                |                                                                                                          | reg reset;                                                                                                                                                                                                                                                     |                                                    |      |  |  |
| output<br>input                                                                | down,<br>up;<br>r,<br>v;                                       | <ul><li>// Early signal</li><li>// Late signal</li><li>// Reference input</li><li>// VCO input</li></ul> | initial begin<br>reset=1,<br>#10000<br>reset=0,<br>end                                                                                                                                                                                                         | ;<br>D;<br>;                                       |      |  |  |
| wire                                                                           | r, v, reset;                                                   |                                                                                                          | delay1( .in                                                                                                                                                                                                                                                    | (~vco_output & ~reset), .delay_control(delay_contr | ol), |  |  |
| reg<br>initial<br>always @ (po                                                 | up, down;<br>begin<br>up = 0;<br>down = 0;<br>end<br>osedge r) |                                                                                                          | .out(tap1)),<br>delay2(.in(tap1),.delay_control(delay_control),.out(tap2)),<br>delay3(.in(tap2),.delay_control(delay_control),.out(tap3)),<br>delay4(.in(tap3),.delay_control(delay_control),.out(vco_output));<br>endmodule<br>////<br>`timescale 1 fs / 1 fs |                                                    |      |  |  |
| up <= #1 1'b1;                                                                 |                                                                | input                                                                                                    | in;                                                                                                                                                                                                                                                            |                                                    |      |  |  |
| always @ (posedge v)<br>down <= #1 1'b1;                                       |                                                                | input[31:0]<br>output                                                                                    | delay_control;<br>out;                                                                                                                                                                                                                                         |                                                    |      |  |  |
| always @ (posedge reset)<br>begin<br>up <= #1 1'b0;<br>down <= #1 1'b0;<br>end |                                                                | reg out;<br>initial<br>out = 1'b0;<br>always @(in)                                                       |                                                                                                                                                                                                                                                                |                                                    |      |  |  |
| assign #1                                                                      |                                                                |                                                                                                          | out <= #( delay_control/8 ) in; // delay =<br>delay_control/(2*number_of_delay_cells);                                                                                                                                                                         |                                                    |      |  |  |
| endmodule                                                                      |                                                                | endmodule                                                                                                | end                                                                                                                                                                                                                                                            |                                                    |      |  |  |

Ecole IN2P3 de microélectronique

#### Loop Filter Simulation: R-C filter

- Simulation step:  $t_n = t_{n-1} + \Delta t$
- The current I(t)

•

•

•

- It is "imposed" on the circuit (it is the independent variable)
- It is controlled by the phase-detector output
- For accuracy the time advances in small increments  $\Delta t$ :
  - Capacitor voltages change very little during a simulation step
  - The time integral of a function in the interval  $t_{n-1}$  to  $t_n$  can be approximated by:

$$\int_{t_{n-1}}^{t_n} f(t) dt = f(t_{n-1}) \cdot \Delta t \quad [1]$$

For the simple R-C filter:

$$V_{cap}(t_n) = V_{cap}(t_{n-1}) + \frac{1}{C} \int_{t_{n-1}}^{t_n} I(t) dt$$
 [2]

$$V_{control}(t_n) = R \cdot I(t_n) + V_{cap}(t_n)$$
 [3]

Simulation equations:

$$\left| V_{cap}(t_n) \cong V_{cap}(t_{n-1}) + \frac{1}{C} \cdot I(t_{n-1}) \cdot \Delta t \right| \quad [4]$$

$$V_{control}(t_n) = R \cdot I(t_n) + V_{cap}(t_n)$$
<sup>[5]</sup>



Notice that the time increments don't need to be constant (they have to be small). In that case, replace in the equation  $\Delta t$  by:  $\Delta t_n = t_n - t_{n-1}$ 

$$\frac{T_{vco}}{100} \le \Delta t \le \frac{T_{vco}}{10}$$
Typically

Ecole IN2P3 de microélectronique

Paulo.Moreira@cern.ch

# Charge Pump & Loop Filter

Charge-Pump (includes the loop filter)

| `timescale 1 fs / 1 fs                                                                                                                                                   |                                                                                                                                                             |                                                                                                                                                                                                                                                                                                  |  |  |  |  |  |  |  |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|--|--|
| module ChargePump(up, down, delay_control);                                                                                                                              |                                                                                                                                                             |                                                                                                                                                                                                                                                                                                  |  |  |  |  |  |  |  |
|                                                                                                                                                                          |                                                                                                                                                             |                                                                                                                                                                                                                                                                                                  |  |  |  |  |  |  |  |
| `define DeltaVProportional (`Icp * `Rfilt)/1.0E-3 // PLL proportional term (in mV)<br>`define DeltaVIntegral (`Tref * `Icp / `Cfilt)/1.0E-3 // PLL integral term (in mV) |                                                                                                                                                             |                                                                                                                                                                                                                                                                                                  |  |  |  |  |  |  |  |
| Variables used in 'analogue' computations declared as real                                                                                                               |                                                                                                                                                             |                                                                                                                                                                                                                                                                                                  |  |  |  |  |  |  |  |
| initial                                                                                                                                                                  | dv_capacitor,<br>control_voltage,<br>frequency,<br>period,<br>integral_term,<br>direct_term;<br>begin                                                       | <ul> <li>// Differential capacitor voltage (in mV)</li> <li>// Integral plus proportional control voltage (in mV)</li> <li>// VCO frequency (in GHz)</li> <li>// VCO period (in ns)</li> <li>// loop control integral term (in mV)</li> <li>// loop control proportional term (in mV)</li> </ul> |  |  |  |  |  |  |  |
| · · · · · · · · · · · · · · · · · · ·                                                                                                                                    | integral_term = `DeltaVIntegral/`IntegrationPoints;<br>direct_term = `DeltaVProportional;<br>integral_evaluation_time = 25000000/`IntegrationPoints;<br>end |                                                                                                                                                                                                                                                                                                  |  |  |  |  |  |  |  |



#### Simulation Example



# HANDLING SEUs

## SEUs on Analogue

- In principle, Functional Interrupts (if they are self recovery) in analogue circuits have little consequence overall for a frontend ASIC or a data acquisition system:
  - What do they mean, for example, for an analogue signal representing the charge deposition of a particle?
  - What are the consequences for the overall system of an occasional wrong reading?
- However some SEUs on critical Analogue circuits might be important:
  - For example if a (global) bias circuit is disturbed and returning to the quiescent point takes a significant amount of time this might result on a functional interrupt that lasts long enough to affect the overall system performance!

- There is no universal answer:
  - 1<sup>st</sup> Make an estimate of how much charge will be typically deposited by radiation
  - 2<sup>nd</sup> Try to evaluate (realistically) what are the consequences for the full system (not just the circuit in question)
  - 3<sup>rd</sup> If it is important to make the circuit SEU robust:
    - Choose an appropriate solution
    - Evaluate the consequences for the system and the circuit
- Two generic approaches are often used in critical analogue circuits:
  - For slow circuits like e.g. bias circuits, "artificially" increase the capacitance of the sensitive nodes:
    - This will reduce the voltage disturbance due to the injected charge
    - Recovery might be slow so it important that the voltage disturbance is small
  - For fast circuits ("artificial") increase the bias currents (an thus the transistor sizes) so that the injected current represents a small fraction of the charge already "stored" in the circuit:
    - This results in additional power consumption

# Example: SEUs in VCOs

- In data transmission systems, SEUs in VCOs are critical:
  - After recovery synchronization is required at the transmitter and/or receiver easily leading to system dead times of the order of the milliseconds!
- Depending on the injected charge, an SEU can:
  - Induce a phase jump
  - Stop the VCO oscillation for a few cycles!
- Example:
  - Technology: 130 nm CMOS
  - Oscillation frequency: 5 GHz
  - Differential delay-cell ring oscillator
  - SEUs modelled by current pulses injecting 300 fC in 10 ps
- Design criteria:
  - The VCO tail current is set so that an SEU causes a phase jump smaller than 10% of the bit period
  - For the PLL this "resembles" like a bit of excess jitter and can be easily handled
- Penalty:
  - High power consumption





# SEUs in Digital Circuits

- Over the years an extensive set of techniques have been developed to protect against SEUs in digital circuits that offer different levels of robustness:
  - One-hot encoded state machine:
    - Monitored by a "watchdog" circuit
  - Dice-cell flip-flops:
    - Depending on the protection level required, it might also need to be monitored
  - Hamming encoding
  - Triple Modular redundancy:
    - Triple voted registers
    - Triple voted registers with triplicated combinatorial logic
    - Triple voted registers with triplicated combinatorial logic and triplicated clock tree
  - ...
  - Plus what your imagination can came up with...
- Depending how critical SEU induced errors are systems might require only protection of the state machines and/or the data path
- As for the analogue circuits not all digital circuits need the same level of protection:
  - Evaluate which circuits and to which level they need to be protected
  - In digital, SEU robustness is paid in terms of silicon area, power consumption, simulation and testing complexity!

# Fast TMR Flip-Flops

- Clock dividers in high data rate SERDES circuits typically run at the highest possible speeds in a given technology:
  - They are based on fast flip-flops
- Dynamic flip-flops are intrinsically faster than static flip-flops enabling very high data rates
- However, they are even more sensitive to SEUs than static flip-flops
- It is them mandatory to use TMR techniques to protect the dynamic flip-flops from being upset:
  - E.g. a wrong count sequence can unlock a PLL resulting in a long dead time
- Triple voting adds additional delay in a circuit reducing the operation speed
- However, in dynamic FFs it is possible to vote an internal node:
  - This results in a much smaller speed penalty than if the voting would take place before or at the input stage of the flip-flop



| Туре    | Unprotected | Single voter TMR | Tripple voter TMR | Proposed |
|---------|-------------|------------------|-------------------|----------|
| Static  | 1.0         | 0.9              | 0.8               | -        |
| Dynamic | 1.9         | 1.4              | 1.1               | 1.6      |
|         |             |                  |                   |          |

Only a small speed penalty, 1.6 x faster than a static FF



Paulo.Moreira@cern.ch

### SEUs: Don't Forget the Clock Tree



# SEU Recovery in Clock Gated Registers



Paulo.Moreira@cern.ch