# GEORG-AUGUST-UNIVERSITÄT GÖTTINGEN

# II. Physikalisches Institut

Implementation of an Electrical Read-Out System for Multi-Module Laboratory Tests of the ATLAS Pixel Detector

#### von

## Matthias George

The innermost component of the ATLAS inner tracker is the pixel detector. It is made up of three barrel layers and three disk layers. The innermost barrel layer is about 5 cm away from the particle interaction point. Therefore this detector layer will be exposed to radiation more than every other detector and thus it will not be usable until the main upgrade of ATLAS after ten years of operation. A new pixel detector layer will be inserted after four years. This thesis aims to provide an electrical data read-out chain for testing of the new pixel modules.



Post address: Friedrich-Hund-Platz 1 37077 Göttingen Germany II.Physik-UniGö-Dipl-2009/03 II. Physikalisches Institut Georg-August-Universität Göttingen September 2009

# GEORG-AUGUST-UNIVERSITÄT GÖTTINGEN

# II. Physikalisches Institut

Implementation of an Electrical Read-Out System for Multi-Module Laboratory Tests of the ATLAS Pixel Detector

#### von

## Matthias George

Dieser Forschungsbericht wurde als Diplomarbeit von der Fakultät für Physik der Georg-August-Universität zu Göttingen angenommen.

Angenommen am: 01. September 2009 Referent: Prof. Dr. A. Quadt

Korreferent: PD Dr. Jörn Grosse-Knetter

# Contents

| 1.        | Intr | oduction                                                         | 1          |
|-----------|------|------------------------------------------------------------------|------------|
| 2.        | Phy  | sics                                                             | 3          |
|           | 2.1. | Introduction                                                     | 3          |
|           | 2.2. | The Standard Model                                               | 3          |
|           | 2.3. | Supersymmetry                                                    | 4          |
| 3.        |      | ATLAS Pixel Detector                                             | 7          |
|           | 3.1. | Experimental Setup - The LHC                                     | 7          |
|           | 3.2. | The ATLAS Detector                                               | 8          |
|           |      | 3.2.1. The Muon Spectrometer                                     | 9          |
|           |      | 3.2.2. The Magnet System                                         | 9          |
|           |      | 3.2.3. Calorimeters                                              | 11         |
|           |      | 3.2.4. Inner Detector                                            | 11         |
|           | 3.3. | Pixel Detector – Overview and Introduction                       | 11         |
|           | 3.4. | Pixel Modules                                                    | 13         |
|           |      | 3.4.1. Sensor                                                    | 13         |
|           |      | 3.4.2. Front End-Chip                                            | ١7         |
|           |      | 3.4.3. Bump-Bonding                                              | 20         |
|           |      | 3.4.4. Module Control Chip                                       | 21         |
|           | 3.5. | Off Detector Read-Out                                            | 24         |
|           |      | 3.5.1. Optoboard                                                 | 25         |
|           |      | 3.5.2. Back of Crate card                                        | 27         |
|           |      | 3.5.3. Read Out Driver                                           | 33         |
|           | 3.6. | ATLAS Upgrade Plans – The Insertable B-Layer                     | 36         |
| 4.        | Fiel | d Programmable Gate Arrays                                       | 89         |
|           | 4.1. | Introduction                                                     | 39         |
|           | 4.2. | 9 9                                                              | 39         |
|           |      | 4.2.1. History                                                   | 36         |
|           |      |                                                                  | <b>1</b> C |
|           |      | 4.2.3. The Altera Flex FPGA Family                               | <b>1</b> C |
|           | 4.3. | Very High Speed Integrated Circuit Hardware Description Language | 13         |
|           |      | 4.3.1. Introduction                                              | 13         |
|           |      | 4.3.2. Language Characteristics                                  | 14         |
| <b>5.</b> | Det  | ector Read Out for Multi-Module Testsystems                      | Į9         |
|           | 5.1. |                                                                  | <b>1</b> 9 |
|           | 5.2. |                                                                  | <b>1</b> 9 |
|           |      |                                                                  | 51         |
|           |      | 5.2.2 Functional Sections of the eBOC and "step 1"               | 51         |

| 5.3.   | 5.2.4.<br>Measu | Operation at 80 MBit/s – "step 2"                                        | 55<br>57       |
|--------|-----------------|--------------------------------------------------------------------------|----------------|
|        | 5.3.2.          | Trigger                                                                  | 57<br>59<br>60 |
| 6. Su  | mmary           | and Outlook                                                              | 65             |
| _      |                 | iew of FPGA behaviours on the eBOC and PP0-2 depending on the mode state | <b>67</b>      |
| Biblio | graphy          |                                                                          | 68             |
| Ackno  | wledge          | ments                                                                    | 71             |

## 1. Introduction

Scientists are on the quest for the fundamental structure of matter for many centuries. For example, Goethe expressed the question in the famous words "So that I may perceive whatever holds the world together in its innermost folds". Since those days, many huge steps in knowledge came along.

At the beginning of the 20th century, the sub-structure of atoms was discovered and protons, neutrons and electrons were considered as the smallest existing particles. Around the middle of the last century a large number of new particles have been discovered in different experiments at particle accelerators and in experimental studies concerning cosmic radiation. These new particles were arranged in symmetry groups, which lead to the conclusion that protons as well as neutrons consist of smaller particles, the quarks.

From todays view the elementary particles of our universe are quarks, leptons and bosons, which are responsible for the mediation of forces between particles. All these particles and their interactions are described very successfully by the *standard model of particle physics*. This model predicts a lot of parameters and all of them have been confirmed by several experiments. One last piece is missing to complete the standard model, the *Higgs particle*. The theory predicts that this particle emerges from self-interaction of the *Higgs field*, which is thought to be responsible for the mass-generation of all particles. The search for the *Higgs particle* is one of the main tasks of modern particle physics.

The standard model is quite successful until now, but a lot of scientists feel confident that it needs some extensions at higher energies. Some of the extension ideas are that every particle has a super-symmetric partner or the existence of additional space dimensions.

For further investigation of already known particles as well as for discovering new particles, protons or electrons or their anti-particles are accelerated to high energies and brought to collision. The recent particle accelerator is the *Large Hadron Collider*, located at the European Organization for Nuclear Research, CERN. This accelerator is designed to collide protons at a centre-of-mass energy of 14 TeV at a luminosity of  $10^{34}$  cm<sup>-2</sup>s<sup>-1</sup>. Thus the LHC should open the door to an energy scale at which the *Higgs particle* can be generated and physics beyond the standard model might show up.

As important as the accelerator itself are tools to detect the generated particles. The particle detector systems provide a reconstruction of the known particles generated at the proton collisions and thus hints for unknown particles can be found. The different detector systems are typically located around the interaction point in an onion shape structure. The innermost detectors are used for particle tracking, followed by calorimeters for energy measurements and finally muon detectors for triggering and muon identification.

Since the rate of particles that have to be detected is extremely high, the detectors and the data read-out has special requirements. On the one hand they have to be very fast, on the other hand they have to be very insensitive concerning radiation damage. The highest requirements relating to these two tasks apply to the innermost detectors.

In case of the ATLAS detector the innermost part is the *pixel detector*. It is a hybrid silicon detector, composed of single modules. One of the main tasks during development of the pixel modules was to make them as radiation hard as possible combined with a low radiation length.

Although this development was very successful, a new innermost detector layer is required after half of the ATLAS lifetime, thus after about four years of operation.

The new innermost layer will be placed even closer to the interaction point than any other detector layer of ATLAS. This leads to even harder requirements concerning radiation hardness and additionally some properties of the data read-out will change. Thus a testing environment for the development of the new detector modules is needed. Since neither the existing data read-out chain nor the old testing environment can be used for testing of the new modules, a new data read-out chain is needed for this purpose.

In chapter 2, an overview on the physics that will be studied at ATLAS is given. Chapter 3 gives an overview of the ATLAS detector and a detailed overview of the ATLAS pixel detector. Additionally some details concerning the upgrade of the pixel detector are presented. An introduction to the structure and the configuration of FPGAs<sup>1</sup> is given in chapter 4. This is of special interest, since the upgrade of the existing electrical data read-out chain required main work on an FPGA. Chapter 5 describes the electrical read-out chain in detail as well as the changes that have been done to upgrade the system for the new pixel modules.

<sup>&</sup>lt;sup>1</sup>Field Programmable Gate Arrays

# 2. Physics

#### 2.1. Introduction

This chapter summarizes the standard model of particle physics. Subsequently, supersymmetry is introduced as one possible extension to the standard model. The search for the latter is one of the many motivations to build the LHC.

#### 2.2. The Standard Model

The standard model of particle physics [1] describes the properties and interactions of elementary particles very precisely on small distances, where gravitational effects can be ignored. The predictions of the standard model have been confirmed with an extremely high accuracy in precision measurements.

The standard model has been developed in the early 1970s. It is a quantum field theory, where the Lagrangian is invariant under  $SU(3)_C \times SU(2)_L \times U(1)_Y$  gauge transformations and consistent with quantum mechanics as well as special relativity.

In this model the elementary particles are divided into three groups. The leptons, the quarks (both spin- $\frac{1}{2}$  particles) and the spin-1 and spin-0 gauge bosons. The quarks and leptons are ordered in three families. They are listed in table 2.1. The electroweak and the strong interactions act via gauge bosons that are exchanged between the particles. An overview of these bosons is given in figure 2.1.

| Family 1 |                                                   | Family 2   |                                                   | Family 3   |                                                   |                          |
|----------|---------------------------------------------------|------------|---------------------------------------------------|------------|---------------------------------------------------|--------------------------|
| symbol   | $\max \left[\frac{\text{MeV}}{\text{c}^2}\right]$ | symbol     | $\max \left[\frac{\text{GeV}}{\text{c}^2}\right]$ | symbol     | $\max \left[\frac{\text{GeV}}{\text{c}^2}\right]$ | charge [e <sup>-</sup> ] |
|          | leptons                                           |            |                                                   |            |                                                   |                          |
| $ u_e$   | $< 3 \times 10^{-6}$                              | $ u_{\mu}$ | $< 0.19 \times 10^{-3}$                           | $ u_{	au}$ | 0.0182                                            | 0                        |
| e        | 0.511                                             | $\mu$      | 0.1057                                            | au         | 1.777                                             | -1                       |
|          | quarks                                            |            |                                                   |            |                                                   |                          |
| u        | 1.5 - 3                                           | С          | $1.25 \pm 0.09$                                   | t          | $174.2 \pm 3.3$                                   | $+\frac{2}{3}$           |
| d        | 3 - 7                                             | s          | $0.095 \pm 0.025$                                 | b          | 4.2 - 4.7                                         | $-\frac{1}{3}$           |

Table 2.1.: The most important properties of the three lepton and quark generations.



Figure 2.1.: Overview of the different bosons and their interactions between the particles of the standard model.

The standard model without extension leads to a problem. The requirement of local gauge invariance implies, that the gauge bosons are massless. However, it was found experimentally that the  $W^{\pm}$  and the Z are massive. Peter Higgs [2] predicted in 1964 a scalar background field that would solve this problem via spontaneous symmetry breaking. Through self-interaction of this field a new particle would be generated, the *Higgs particle*. All of its parameters are predicted by the model, except the mass. The *Higgs particle* has not been found yet. Only a lower limit on its mass 114.4 GeV [3] has been set and an upper limit follows from theoretical considerations at around 1 TeV. The LHC design allows Higgs-searches over the whole possible mass-range.

Altogether the model contains some properties and questions that motivate a search beyond the standard model. For example:

- What are the 19 free parameters?
- Is the *Higgs boson* of standard model type?
- Does a unification of the fundamental forces exist?
- Astro-particle physics delivered a remarkable number of observations that can not be explained by the standard model, e.g. why does an asymmetry between particles and anti-particles in the universe exist?
- What is dark matter?
- How can the hierarchy problem be solved?

## 2.3. Supersymmetry

Since the standard model leaves some important and interesting questions unanswered, new theories are being developed and summarized as "Physics beyond the Standard Model". One of the most promising models is supersymmetry. With the help of this theory several problems would be solved. Firstly it would provide an answer to the hierarchy problem. Secondly a

unification of the electro-magnetic force, the weak force and the strong force would be achieved at high energy scales. And thirdly the string theory seems to require supersymmetry for its consistency, whereas the string theory could be essential to add gravity to the existing models. For details see [4].

Supersymmetry predicts that every elementary particle has a superpartner. These superpartners have the same mass, coupling and quantum numbers as the standard model particles. Only their spin differs by  $\frac{1}{2}$ . Since the charges of the standard model fermions and bosons differ, it is not possible that some of these particles are superpartners. Thus superpartner particles for all standard model particles must exist in the supersymmetric model, whereas until now none of the superpartners have been discovered. This means that supersymmetry must somehow be broken. An overview of the supersymmetric particle zoo is given in figure 2.2.



Figure 2.2.: On the left side, the known particle zoo of the standard model plus the Higgs particle is shown, the right side shows the particles resulting of an extension by supersymmetry.

One of the most promising theories to be discovered at LHC is a minimal supersymmetric extension of the standard model. This theory is called *Minimal Supersymmetric Standard Model* (MSSM) and was originally invented in 1981 to solve the hierarchy problem. In case hints for the MSSM-theory are found at LHC, it could provide a good candidate for cold dark matter particles.

In summary this was only a short overview of some research areas that are investigated by the LHC experiments. Since the number of open questions that could be answered or what new hints could be found by the LHC experiments is huge and their scale reaches from sub-atomic order to the scale of galaxies, scientists all over the world are looking forward to the launch of this experiment.

# 3. The ATLAS Pixel Detector

## 3.1. Experimental Setup - The LHC

The Large Hadron Collider<sup>1</sup> (short: LHC) is the biggest machine humans ever built. It is a circular hadron accelerator, located at CERN<sup>2</sup>, Geneva. In total it has a circumference of about 27 km, set up in a tunnel 100 m below surface, that was previously used by the Large Electron Positron Collider experiment (see figure 3.1). In the LHC, protons are accelerated using several pre-accelerators and finally the LHC itself so that collisions take place with a center of mass energy of up to 14 TeV. Thus the LHC outranges all other accelerator experiments by approximately one order of magnitude.



Figure 3.1.: Overview on the LHC and positions of the experiments [5].

The run schedule foresees to have 2808 proton packages in the LHC ring, each consisting of  $1.1 \times 10^{11}$  particles. The distance between two packages will be 25 ns and luminosity is planned to be ramped up to  $10^{34}$  cm<sup>-2</sup>s<sup>-1</sup>. There are four points at which collisions are planned to take place with a rate of 40 MHz. At each collision on average about 20 inelastic interactions are expected, whereas the mean number of produced charged particles is expected to be around 1000. At every collision point an experiment is placed, of which two cover the whole spectrum

<sup>&</sup>lt;sup>1</sup>www.cern.ch/lhc

<sup>&</sup>lt;sup>2</sup>Conseil Européen pour la Recherche Nucléaire

of physics questions handled at the LHC: ATLAS<sup>3</sup> [6] and CMS<sup>4</sup> [7]. The other two experiments are designed for special questions. On the one hand there is LHC-B [8], concentrating on b-physics and on the other hand ALICE<sup>5</sup>. ALICE is designed to investigate lead-lead collisions that take place alternatively to the proton-proton collisions. For the proton runs the design values of LHC are shown in table 3.1.

| circumference             | 26659 m                                 |
|---------------------------|-----------------------------------------|
| magnetic field strength   | 8.4 T                                   |
| centre of mass energy     | 14 TeV                                  |
| number of proton packages | 2808                                    |
| protons per package       | $1.1 \times 10^{11}$                    |
| distance of packages      | 25  ns                                  |
| collision rate            | 40 MHz                                  |
| luminosity                | $10^{34} \text{ cm}^{-2} \text{s}^{-1}$ |

Table 3.1.: LHC design values for proton runs [5].

### 3.2. The ATLAS Detector

To get an overview of the layout, figure 3.2 shows the setup of ATLAS.



Figure 3.2.: Layout of the ATLAS Detector and positions of the sub-detectors [6].

The detector has been built in an onion shape structure [9]. This means, that ATLAS consists

<sup>&</sup>lt;sup>3</sup>A Toroidal LHC Apparatus

<sup>&</sup>lt;sup>4</sup>Compact Muon Solenoid

<sup>&</sup>lt;sup>5</sup>A LHC Ion Collider Experiment

of several subdetectors, whereas each is designed for special purposes, e.g. precise tracking or energy measurement. All these detectors are nested together in a way such that (almost) every direction in space is covered.

In the ATLAS layout overview one can see that the outermost parts of the whole experiment are the muon chambers. For detection of the muon's momentum this subdetector is placed in a  $0.5~\rm T$  toroidal magnetic field. Moreover the muon chambers are used for triggering. The toroidal magnets are one of the most characteristical attributes in the shape of ATLAS. The next internal subdetectors are the hadronic and the electromagnetic calorimeter, which measure the energy of particles and jets. Finally there is the "Inner Detector", consisting of a transition radiation tracker, a silicon strip detector and a silicon pixel detector. The Inner Detector is used for precise tracking of charged particles. It is covered by a  $2~\rm T$  solenoid magnetic field to provide a momentum measurement of the particles. All these different subdetectors and magnets together lead to a size of about  $25~\rm m \times 44~m$  and a mass of around 7000 tons.

For the detector design several aspects had to be taken into account. In summary the highest priority aims were: Firstly a  $4\pi$  coverage of the interaction point had to be realized. Especially by the calorimeters, which in particular is important for measurement of missing transverse energy caused by unknown particles. But of course for whole ATLAS a coverage in  $\Phi$ -direction and almost full coverage in  $\eta^6$  were aimed to cover events – as far as possible – completely. Secondly a precise measurement of muon momentum was desired. Thirdly an efficient and extremely precise particle tracking at high luminosities as well as at low luminosities was required to provide full event reconstruction.

#### 3.2.1. The Muon Spectrometer

The muon system uses different detectors for triggering and tracking. For triggering Resistive Plate Chambers (RPCs), which are operated in avalanche mode, are used in the barrel region and Thin Gap Chambers (TGCs), operated in saturated mode, are used in the end-caps. For high resolution tracking on the one hand Monitored Drift Tubes (MDTs) are used for  $|\eta| < 2$ . The MDTs are drift chambers made of aluminium and filled with a gas mixture that survives high doses of radiation without ageing. On the other hand, Cathode Strip Chambers (CSCs), which are multi-wire proportional chambers, are applied for  $2 < |\eta| < 2.7$ . The first two detector types are mainly used for event triggering since they are faster – in the order of 2 ns respectively 4 ns – than MDTs and CSCs. However they are not as precise in terms of spatial resolution as the latter ones. MDTs and CSCs have a spatial resolution in the order of 50  $\mu m$  but are too slow for trigger tasks.

#### 3.2.2. The Magnet System

The magnetic field that was requested by the different ATLAS subdetector groups is realized using four superconducting magnets [10]. The Central Solenoid, the Barrel Toroid and two End-Cap Toroids.

The Central Solenoid (CS) provides an axial magnetic field with a peak field strength of about 2.6 T for the central detector. This requires a current of 7600 A for the 5.3 m  $\times$  2.4 m big magnet. It is integrated into the liquid Argon calorimeter cryostat and has been tested up to 8400 A in the factory and up to 8100 A after assembly at the detector site. Figure 3.3 illustrates the dimensions of the Central Solenoid.

<sup>&</sup>lt;sup>6</sup>pseudorapidity  $\eta$  is defined as  $\eta = \ln \tan \frac{\theta}{2}$ , where  $\theta$  is the polar angle of the particle relative to the beam line



Figure 3.3.: ATLAS Central Solenoid ready for transport to CERN. On the outside supporting rings to keep the magnets' shape are visible [10].

The two End-Cap Toroids (ECT) consist of 5 m  $\times$  5 m coil modules. The coil modules are linked by eight so called keystone boxes. The ECT is pulled inside the Barrel Toroid with up to 300 tons, because the magnetic windings of the toroids overlap. The peak field provided is about 4.1 T.

Finally the Barrel Toroid (BT) is made up of eight racetrack coils, each with a size of 5 m  $\times$  25 m. An inner and outer set of huge rings keeps the coils in their toroidal shape. The coils have been tested up to a current of 22 kA. The peak field provided by the barrel toroid coils is about 3.9 T. All in all the toroid is assembled on 18 feet on the ground.

Figure 3.4 gives an overview of the toroid magnet system.



Figure 3.4.: Schematic overview of the Barrel Toroid, consisting of eight coils, and the End-Cap Toroids, where one is inserted and one retracted [10].

#### 3.2.3. Calorimeters

The calorimeter system is subdivided into the hadronic and the electromagnetic calorimeter. The hadronic calorimeter is a sampling calorimeter which in the barrel part is made of iron absorbers and plastic scintillators. In the regions of higher pseudorapidity copper or tungsten and liquid argon are used because of the higher radiation dose in this area. The electromagnetic calorimeter is also a sampling calorimeter, made out of lead absorbers in liquid argon. Both calorimeters are designed such that they have a thickness in the order of 20 interaction lengths to ensure absorption of the particle species they are designed for. Using the shower depth and the cone size of the emerging particle shower, the particles' energy can be calculated precisely.

#### 3.2.4. Inner Detector

Compared to the size of ATLAS the Inner Detector is quite small with dimensions of 2.3 m  $\times$  7 m, although this volume houses three different detector types. One of them is the Transition Radiation Tracker (TRT). The TRT consists of a central barrel and two end caps on both sides [11]. Altogether it provides 420,000 read-out channels and is used for tracking of charged particles as well as electron identification. The detector is made up of straws, filled with a gas mixture and one sense wire per straw. On average a particle crosses 36 straws for  $|\eta| < 2.1$ . Every time the particle traverses a transition, photons are emitted. This transition radiation is used to distinguish between electrons and heavy particles, e.g. pions. On top of that the gas mixture in the straws is ionized, which causes the main tracking signal that is detected. This delivers a resolution of about 130  $\mu m$ .

On the way from the outer detector types to the interaction point the next sub-detector is the Semi Conductor Tracker (SCT). The SCT is assembled in four barrel layers and nine disks at each of the two end-caps. The barrels and disks are composed of modules at which two pairs of single sided p-in-n microstrip sensors are glued back to back. The silicon sensors are rotated at a 40 mrad stereo angle, which guarantees two-dimensional hit information. As opposed to the disk modules, the barrel modules are all identical. Depending on their position the disk modules have four different geometries. All in all the SCT provides 6.2 million read-out channels and a spatial resolution of 16  $\mu$ m in  $R\Phi$ -direction and 580  $\mu$ m in z-direction.

Because the pixel detector – and especially its communication with the outside world – is the main topic of this thesis, it is described in detail in the next section.

#### 3.3. Pixel Detector – Overview and Introduction

The pixel detector is the innermost part of the ATLAS detector. It is placed close to the beam line, which leads to special requirements concerning spatial resolution, radiation hardness, two particle separation and occupancy tolerance. By using a pixel size of 50  $\mu$ m × 400  $\mu$ m the single channel occupancy is kept low enough to ensure an efficient read-out. The concept of pixel detectors is to deliver truly two-dimensional track measurements. Compared to other detector systems, where the two-dimensionality is based on added one-dimensional track information, there is not the risk of getting "ghost hits" in case of multiple hits per detector unit. Since the pixel detector has an excellent single-point resolution and provides the previously mentioned features, it gives the feasibility of b-tagging, which is crucial for the detection of SUSY and Higgs signatures.

The detector is arranged in three barrel layers in the centre part and three disk layers in both forward directions, as can be seen in figure 3.5. This set-up delivers three space points per



Figure 3.5.: Schematic view of the ATLAS Pixel Detector.

charged particle for a pseudorapidity of up to  $|\eta| = 2.5$ . All barrel layers are composed of several staves, where the number of staves is different for every layer, since they all have different radii (see table 3.2). The staves themselves are all mechanically the same. They are arranged in a way, that there is a slight overlap in z-direction to ensure a full coverage in this direction. Every stave consists of 13 active parts, the *pixel modules* (see next section).

In contrast the disk layers are all set up the same way. They are subdivided into eight sectors, where every sector has three modules on the front side and three modules on the backside. Again the modules on front- and backside overlap to avoid gaps in the active area. An overview of the different configurations of the layers is given in table 3.2 - for barrel layers and disk layers outwards seen from the interaction point.

|         | Radius (mm)     | No. of Staves | No. of Modules |
|---------|-----------------|---------------|----------------|
| B-Layer | 50.5            | 22            | 286            |
| Layer 1 | 88.5            | 38            | 494            |
| Layer 2 | 122.5           | 52            | 676            |
|         | z-Position (mm) | Sectors       | Modules        |
| Disk 1  | $\pm 495$       | 8+8           | 48+48          |
| Disk 2  | $\pm 580$       | 8+8           | 48 + 48        |
| Disk 3  | $\pm 650$       | 8+8           | 48+48          |
| Total   |                 |               | 1744           |

Table 3.2.: Set-up and dimensions of the ATLAS Pixel Detector [12].

#### 3.4. Pixel Modules

The pixel detectors' smallest active unit is called "pixel module". A pixel module consists of a  $2 \times 6$  cm<sup>2</sup> silicon sensor, the read-out electronics and a kapton-flex PCB<sup>7</sup>, see figure 3.6.



Figure 3.6.: Schematic view of the pixel module components.

Each sensor is subdivided into 47,232 pixels, with a size of 50  $\mu$ m × 400  $\mu$ m or – due to geometrical reasons – 50  $\mu$ m × 600  $\mu$ m. Every pixel is connected to a single electronic cell via bump bonds (see subsection 3.4.3). The read-out electronics is arranged in 16 ASICs<sup>8</sup>, the Front-End chips (short: FE-chips). Arriving signals are amplified and digitized by the FE-chips and forwarded to the MCC<sup>9</sup>. The major task of the MCC is communication with the off-detector read-out electronics (section 3.5). For the connection between FE-chips and MCC, the wire-bond technology was chosen. A more detailed view on the just mentioned parts follows in the next sections. Furthermore, every pixel module is equipped with a Negative Temperature Coefficient Thermistor (NTC) for temperature monitoring of every module.

#### **3.4.1.** Sensor

#### **Basic Principles of Semiconductor Detectors**

The chosen semiconductor sensor material for the ATLAS Pixel Detector is diffusion-oxygenated float zone silicon. In case a charged particle traverses a semiconductor, electron-hole pairs are generated along its path due to the particles' energy loss in the material. For silicon, the energy gap in the band structure, which is the energy needed to ionize an atom of the particular

<sup>&</sup>lt;sup>7</sup>Printed Circuit Board

<sup>&</sup>lt;sup>8</sup>Application Specific Integrated Circuits

<sup>&</sup>lt;sup>9</sup>Module Control Chip, see subsection 3.4.4

material, is 1.12 eV. However the mean energy that is needed to produce an electron-hole pair is 3.62 eV. The difference can be explained by phonons, or rather thermal energy at the end, produced during this process.

This energy loss as a function of track length is described by the Bethe-Bloch formula for particles with  $m \gg m_e$ .

$$-\frac{dE}{dx} = 2\pi N_a r_e^2 m_e c^2 \rho \frac{Z}{A} \frac{z^2}{\beta^2} \cdot \left[ \ln \left( \frac{2m_e \gamma^2 v^2 W_{max}}{I^2} - 2\beta^2 - \delta - 2\frac{C}{Z} \right) \right]$$
(3.1)

with (in case of silicon):

 $\frac{dE}{dx}$ : mean energy loss per track length

 $N_a$ : Avogadro's number:  $6.022 \times 10^{23} \text{ mol}^{-1}$ 

 $r_e$ : classical electron radius:  $2.817 \times 10^{-13}$  cm

 $m_e$ : electron mass: 511 keV

 $\rho$ : density of absorbing material: 2.33  $\frac{g}{cm^3}$ 

I: average effective ionization potential:  $\approx 173 \text{ eV}$ 

Z: atomic number of absorbing material: 14

A: atomic weight of absorbing material: 28

z: charge of traversing particle in units of e

 $\beta$ :  $\frac{v}{c}$  speed of traversing particle in terms of c

 $\gamma$ :  $\frac{1}{\sqrt{1-\beta^2}}$ 

 $\delta$ : density correction

C: shell correction

 $W_{max}$ :  $2m_ec^2\beta^2\gamma^2$  if  $M\gg m_e$  maximum energy transfer in a single head on head collision

In formula 3.1 the density effect correction  $\delta$  – which is especially important for higher energies – and the shell correction C – important for low energies – are included. The formula is valid for  $\beta$ -values down to  $\beta \approx 0.1$ .

The polarization process is caused by the presence of the electric field which involves a polarization of the atoms along the path of the traversing particle. Hence there is a shielding from the full electric field for electrons that are far away from this path. Therefore the density effect depends on the density of the absorbing material.

The wrong assumption of a stationary electron in the material is the reason for the integrated shell correction, which reduces the energy loss at low energies.

The Bethe-Bloch formula has a minimum at around  $v \approx 0.96\,$  c which means  $\beta\gamma\approx 3.5.$  Particles with an energy corresponding to that point are called minimum ionizing particles, or short m.i.p. For the example of 250  $\mu$ m thick silicon figure 3.7 shows that a minimum ionizing pion with a mass of  $m_{pion}=139.57\,$ MeV has an energy loss of about 97.5 keV, which corresponds to a generation of around 27,000 electron-hole pairs. The most probable value for the energy loss is  $\approx 69.9\,$ keV which corresponds to a generation of about 19,300 electron-hole pairs. The difference can be explained by the asymmetry of the energy loss density function. It is not a Gaussian, but has – due to possible interactions with higher energy transfer – a tail to higher energies.

#### The ATLAS Pixel sensor

The silicon sensor of the ATLAS Pixel Detector [14] has dimensions of  $63 \times 18.6 \times 0.25 \text{ mm}^3$  with an active area of  $60.8 \times 16.4 \text{ mm}^2$ . As the name suggests the sensor is subdivided into



Figure 3.7.: Correlation between different particles' energies and their energy loss in 250  $\mu m$  thick silicon [13].

pixels to yield true 2-dimensional spatial information for every hit. The chosen sensor type is  $n^+$  in n which on the one hand makes a preparation of both sensor sides necessary, but offers on the other hand a lot of advantages. Irradiation of the sensor causes a type inversion of the n-bulk, see figure 3.8. With a change of the concentration of the effective doping for increasing irradiation a type inversion of the n-bulk into a "p"-bulk occurs. The change of the effective doping concentration entails a change of the depletion voltage that is needed for detector operation. The depletion of the sensor is essential for tracking, because without depletion the - in case of a particle crossing - emerging electron-hole pairs would immediately recombine. Thus the signal would be lost. In the initial state the depletion zone grows from the sensor's back side, whereas it grows from the pixel side after type inversion. This offers an isolation of the pixels against each other, additionally to the used p-spray technique, even for a not completely depleted sensor. In the initial (unirradiated) state the depletion voltage is in the order of  $|V_{depl}| = 70$  V and a bias voltage of  $|V_{bias}| = 150$  V and can be pushed up to  $|V_{depl}| = 700$  V with increasing irradiation. Because the sensor's edges are electrically conductive, guard rings are used to avoid the destruction of the sensor.

Since the Pixel Detector is placed extremely close to the interaction point the development of the used silicon came along with many hard requirements. After a long time of research the idea of oxygen-doped silicon arose and turned out as the best solution in terms of radiation hardness [15]. The reason for this choice is shown in figure 3.9. It shows a comparison of the needed depletion voltages for standard silicon and for oxygenated silicon based on the original LHC runschedule for the first 10 years. The depletion voltage increases with time because the continuous irradiation changes the doping concentration and thus the ohmic resistance. Several tests showed that the dimension of the just mentioned sensor-effects is quite temperature sensitive. It turned out that the optimal operating temperature for the sensor is  $-10^{\circ}$ C. At higher temperatures – during breaks – the silicon can partially repair itself, which is called annealing. This effect



Figure 3.8.: Development of the depleted region before and after type-inversion [12].

explains the drops of the needed depletion voltage in figure 3.9 once a year. However this positive effect turns into the opposite if the sensor is kept for a long time at a higher temperature. Because of this the sensors are warmed up for 20 days per year and afterwards cooled down to operating temperature again. The plot shows, that Layer 1, and thus Layer 2 as well, can be operated fully depleted for the whole run-time due to the temperature cycling and oxygen doping techniques. For the B-Layer, the limit is expected after about 7 years. After this time the needed depletion voltage for the sensor is expected to be above the limit of  $|V_{depl}| = 700 \text{ V}$ . Thus the electrons and holes can not be separated by the electric field anymore and the signal is lost. That is why a new B-Layer will be inserted after about 5 years (see subsection 3.6).

Most of the sensor pixels, as already mentioned, have a size of  $50~\mu\text{m} \times 400~\mu\text{m}$  which results in a spatial resolution of  $15~\mu\text{m}$  in  $R\Phi$ -direction and  $115~\mu\text{m}$  in z-direction. Because every single sensor pixel is connected to an electronics circuit, the pixels had to be arranged in a way that somehow fits with the design of the 16 Front End-chips per sensor. The circuits on the FE-chips are arranged in 160 rows and 18 columns. Per sensor eight (long side) times two (short side) FE-chips are used. In case of pixels with all the same size, this would result in dead regions at every gap between two FE-chips. This problem has been solved by an extended pixel size of  $50~\mu\text{m} \times 600~\mu\text{m}$  at the long sides of the chips, called long pixels. On the short side of each chip eight pixels are not covered. Every uncovered pixel is ganged to a connected pixel. These pixels are called ganged pixels, since they are connected with another pixel to one electronics cell. This results in four electronic cells per column that are each connected to two pixel cells, thus four ganged pixels per column. As an overview figure 3.10 shows the sensor's top metallisation. It also shows that the four ganged pixels are located in the outer area of the sensor. Thereby



Figure 3.9.: Depletion Voltages for pure silicon (dashed lines) and oxygenated silicon (solid lines) for 10 years of operation. The upper two lines are for the B-Layer, the lower two lines are for Layer 1 [16].

every second pixel of the outer eight pixels is connected to one pixel of the following block of four adjacent pixels. Thus a distinction which of the two ganged pixels was hit is possible, because at least two neighbouring cells show a signal in case of a hit. By investigation of the hit pattern it can be calculated which of the two ganged cells has been traversed by the particle. Altogether the picture in figure 3.10 shows the normal pixels (50  $\mu$ m × 400  $\mu$ m in size), long pixels (50  $\mu$ m × 600  $\mu$ m), ganged pixels and long ganged pixels. The different types are clearly visible e.g. in threshold-scans (see chapter 5) where different pixel types show a different noise behaviour.

#### 3.4.2. Front End-Chip

Sixteen Front End-chips per pixel module are needed to digitize the signals of one sensor. The chips have been designed in a radiation hard deep sub micron technology and were produced in a 250 nm process at IBM [17]. Every FE-chip is organized into 9 column pairs and 160 rows (see figure 3.11), which means 2880 read-out cells per chip, each with a size of 400  $\mu$ m × 50  $\mu$ m. This results in a chip-size of 7.2 mm × 10.8 mm. The chip design contains a digital part and an analogue part, each of them has its own design supply voltage. For the digital part, this is  $V_{DD} = 2.0 \text{ V}$  and for the analogue part  $V_{DDA} = 1.6 \text{ V}$ .

Every read-out cell contains two cascaded amplifiers, which are followed by a very fast differential discriminator. The discriminator supports adjustable threshold values from 0  $e^-$  up to 12,000  $e^-$  [18]. The benchmark is 4000  $e^-$ , which permits efficient tracking, especially compared to the expected charge for a m.i.p. of around 19,300  $e^-$  at the beginning and still about 10,000  $e^-$  at the end of the irradiation time. These numbers will fluctuate on a small scale for the individual pixels due to noise in the order of 200  $e^-$ . The adjustment can be done during



Figure 3.10.: Schematic view of the sensor showing the region where four FE-chips meet. Pixel cells are painted as horizontal cuboids, whereas ganged pixels have junctions drawn as vertical lines [12].



Figure 3.11.: Photomicrograph of a FE-chip, showing the alignment of the column pairs [17].

operation and is realized using two DACs<sup>10</sup>. The threshold of the whole chip can be changed using a coarse 5-bit global DAC (short: GDAC). On top of that, every cell can be fine-tuned individually using a 7-bit trim DAC (short: TDAC). This tuning capability is quite important, since a homogeneous answer of the electronics is needed for a fixed injected charge to provide a reliable tracking.

The analogue part of the chip has two amplifier stages, where the first stage is a charge sensitive (pre-)amplifier. It is equipped with a 10 fF feedback capacitor for charge collection, which is discharged by a constant current source. The discharge current can be adjusted using two different DACs. On the one hand there is the global 8-bit IF-DAC and on the other hand the individually settable 3-bit feedback DAC. These DACs are used to achieve a homogeneous amplifier recovery time over the chips.

To test the behaviour of these parts a charge injection circuit has been implemented (see chapter 5.3). A schematic view of the analogue part is given in figure 3.12. In case of a charge



Figure 3.12.: Pixel unit cell in a schematic drawing [12].

injection into the electronics circuit the amplified signal rises and after a short time it exceeds the threshold. At this point a time stamp is set (leading edge time, short: LE). The falling edge of the signal is linear, so that altogether the signal shape is almost triangular. Together with the leading edge time and the also recorded time when the signal drops below the threshold again, the charge deposited in the sensor can be calculated. The time difference between these two events is called time-over-threshold (short: ToT). Thus the energy loss of a particle that traversed the sensor can be measured with the FE-chips, using the recorded 8-bit ToT information [19]. The relation between the amplifier output and the ToT-value is visualized in figure 3.13. One of the pre-amplifier's characteristics is that its output signal always peaks at the same time, for high charges as well as for low charges. This leads to the so called timewalk-effect, which is also visible in figure 3.13. For a smaller charge the rising edge of the amplifier is not as steep as for a high charge. Thus the threshold is crossed later and – since a typical signal is about 30 bunch-crossings long – this may cause a signal delay higher than a 40 MHz clock cycle. In order not to assign a hit to the wrong bunch-crossing, the FE-chip has a mechanism implemented to recover

<sup>&</sup>lt;sup>10</sup>Digital Analogue Converters



Figure 3.13.: Amplifier and discriminator outputs for a high charge (green) and a low charge (red) and the resulting ToT-values. On the left side the timewalk-effect is shown, on the right side the different outcomes in case of strongly differing discharge currents for almost the same charge [19].

hits that are recorded too late because of the timewalk. This mechanism uses the also recorded time-stamp of each hit and thus hits can later be reassigned to the correct bunch-crossing. Hits with a too small ToT-value can directly be deleted, since they have a large probability to be a noise hit.

For every detected rising edge the Gray coded<sup>11</sup> time stamp of the leading edge is stored in the 8-bit LE-RAM, which is a piece of the FE's digital part. There are two more RAMs per cell. On the one hand there is another 8-bit RAM to store the pixel address and on the other hand an 8-bit RAM to store the time stamp of the falling edge in case of a hit detected (FE-RAM).

In this case, a hit flag is set for the related pixel, which causes the read-out logic to check all the pixels of the column pair. In case multiple pixels are hit, first the topmost one is read out. The pixel address, leading edge (LE) time stamp and falling edge (FE) time stamp are sent to the end-of-column logic, where the ToT value is calculated and afterwards all the hit information is stored in one of the 64 available buffer cells. Then the hit flag for this pixel has to be reset and the next pixel in the chain is read out. For the time that is needed to read out the chain a freeze signal is sent, which prevents pixels from setting their hit flags. In case of new hits during the freeze signal is active, the information is stored in the pixels until the readout is released again.

The next step for hits stored in the end-of-column buffers is that their LE-time stamps are compared to a readout time stamp counter. For hits with fitting time stamps and a Level1 trigger signal the data-travelling continues to the serializer. From there they are forwarded to the MCC (see chapter 3.4.4) together with the internally calculated bunch-crossing ID. This is needed to assign every hit to its event and hence for event-building in the MCC. All hits that passed the previous requirements apart from having an active Level1 signal are deleted.

#### 3.4.3. Bump-Bonding

As shown in figure 3.14 the pixel sensor and the FE-chips are connected using a special technique. They are connected via small bumps, made out of either indium or lead-tin [20]. The production of the bumps has been realized in four steps.

<sup>&</sup>lt;sup>11</sup>Gray code is a binary numeral system where two successive values differ in only one bit, which results in a lower digital chip activity

- 1. For the modules connected with indium bumps, the sensor wafers as well as the FE-chip wafers were equipped with bumps. In contrast the lead-tin connected modules only got bumps on the FE-chip wafers and a special "Under Bump Metallization" on the sensor wafers.
- 2. The FE-wafers are thinned down to 210  $\mu$ m.
- 3. Both wafer types are cut and after a quality check only the sensors and FE-chips marked as good are chosen for further processing.
- 4. Using the flip-chip technique, the FE-chips are put and aligned on the sensors and afterwards connected with a defined pressure and temperature.

After assembly the indium bumps have a cylindrical shape with a radius of about 20  $\mu$ m and a height of around 8  $\mu$ m. The lead-tin bumps have a spherical shape with about 20  $\mu$ m diameter. Another difference between the two materials is that lead-tin bumps have a resistance lower than 1  $\Omega$ , whilst indium bumps have a resistance of several ten Ohms [21]. In figure 3.15, the important steps in the production of both bump-types are shown.



Figure 3.14.: Schematic side-view of a pixel module [19].

#### 3.4.4. Module Control Chip

Each pixel module is featured with a Module Control Chip (MCC) which has bidirectional tasks [22]. On the one hand it receives configuration data, triggers and the clock signal and has to distribute these signals to all 16 FE-chips on the module. On the other hand it is responsible for data read-out and event fragment building. All data that has to be passed to the FE-chips is received by the on-detector (opto)-electrical interface and accordingly the read out event data is sent to the off-detector electronics by this interface, see sections 3.5.1 and 5.2.1.

The received clock signal is transformed by the MCC into the 40 MHz clock that is needed for the module read-out (XCK) and into the 5 MHz clock used for module configuration (CCK)<sup>12</sup>. Furthermore, the received configuration data has to be transformed into another data format before it can be transmitted to the FE-chips. For testing purposes the configuration data can be read out from the FE-chips.

For received timing and reset signals the MCC's task is not only to distribute them to all FE-chips, but also different reset signals are generated. In case of overflowing buffers these reset signals are needed to avoid a wrong hit assignment due to timing errors.

 $<sup>^{12}</sup>$ see figure 3.16



Figure 3.15.: Schematic drawing of the production steps for indium-bumps (left) and lead-tin-bump (right) [20].

After a received Level1 trigger, the data from the FE-chips is read out into 16 FIFOs<sup>13</sup>. This whole data set is then combined to one event and furnished with a time stamp. Subsequently, the resulting data package is forwarded to the further read-out electronics as a serial data stream. Figure 3.16 shows a block diagram of the MCC, where the communication paths are highlighted in grey. The communication with the ATLAS data-taking system is realized via the "module port", which has three LVDS<sup>14</sup>-inputs and two outputs. The input lines are one data line (DCI) and two timing lines (CK and XCKIN), whereas the module port is synchronized using the CK timing line. The MCC has two data out lines, DTO and DTO2. The default speed per data line is 40 MBit/s. This speed can be increased to 80 MBit/s per data line by sending data not only at the rising edge of the 40 MHz cycle, but also on the falling edge. On top of that, both data lines can be used in parallel, which offers read-out speeds of 40 MBit/s, 80 MBit/s (either by using  $1 \times 80$  MBit/s or  $2 \times 40$  MBit/s) and 160 MBit/s (which means  $2 \times 80$  MBit/s). An illustration is given in figure 3.17.

All data received on the DCI-port is sent to the "command decoder", where it is in case of configuration data directly forwarded to the "Front End port". The Front End port has seven lines on which data is sent to the FE-chips and 16 differential ports for data read-out from the FE-chips (DTO0...DTO15). Out of the seven data lines to the chips, three simple lines are needed for module configuration (DAO, LD and CCK) and the other four (SYNC, LV1, STRB and XCK) are differential lines used for time critical operations such as trigger- and reset-signals.

Every time a Level1 command is sent to the MCC, a Level1 trigger is transmitted by the

<sup>&</sup>lt;sup>13</sup>First In, First Out; used as buffer

<sup>&</sup>lt;sup>14</sup>Low Voltage Differential Signalling



Figure 3.16.: Block diagram of the Module Control Chip [23].

command decoder to the TTC<sup>15</sup>. The TTC generates the L1 trigger for the Front End-chips upon reception of the Level1 command. It can also generate various reset signals for the MCC and the FE-chips, as well as calibration signals for the FEs. In case the number of events in the "PendingEvCnt" waiting for transmission to the off-detector read-out has reached 16, the generation of trigger signals is stopped and an error flag is set. If the number is lower than 16, a trigger is released and the number of bunch-crossings (BcoCnt) as well as the number of Level1 triggers (Lv1Cnt) in the "PendingLv1FIFO" is increased by 1. This information is later used for the event-building tasks of the MCC.

If a trigger has been transmitted to the FE-chips, all data belonging to this trigger is sent to the MCC in parallel, after which every chip sends an "End-Of-Fragment" word to signalize that the data set is complete. To store the data sets the MCC has 16 "Receiver Channels", one for every FE-chip and each equipped with a 128 words deep FIFO. After a Receiver Channel detects

<sup>&</sup>lt;sup>15</sup>Trigger, Timing and Control



Figure 3.17.: Possible read-out speeds supported by the MCC and the consequences for data encoding [22].

the End-Of-Fragment word, a signal is sent to the "EventScoreBoard" in the "Event Builder". As soon as all 16 channels show a detected End-Of-Fragment word in the EventScoreBoard the EventBuilder reads the data of the 16 Receiver FIFOs and the whole set is transmitted, together with the information stored in the PendingLv1FIFOs, to the data-out lines. Afterwards the FIFOs are reset.

The "RegisterBank" stores the internal configuration and status registers that are needed for the MCC operation.

The MCC also offers the possibility that artificial events together with a trigger are sent from the outside world (especially by a test driving operator) to the module port and are then written into the FIFOs. This causes the read-out chain to proceed in the way described above and the event is sent back. In chapter 5.3 one of the described measurements is based on exactly this feature, because it provides the chance to test the whole data-taking chain from beginning to end.

#### 3.5. Off Detector Read-Out

The long distance from the detector to the counting room combined with the high data rate require optical data transmission between the detector modules and the off-detector electronics. Consequently opto-electrical interfaces are needed on the on-detector part – the so called optoboards – as well as on the off-detector part, called Back of Crate card (BOC). They are connected via about 80 m long fibres. One fibre per module is needed for the data transfer from



Figure 3.18.: Schematic overview of the Pixel detector read-out chain [?].

the off-detector electronics to the module, in particular for the clock and command signals. For the data transfer from the module to the off-detector electronics there is one fibre, or for the B-layer two fibres, needed per module. On the counting room side 1, 2 or 4 optoboards are connected to one BOC, depending on the chosen bandwidth. Every BOC is connected one-to-one to a Read Out Driver (ROD). All in all 132 BOC - ROD pairs are needed to read out the whole ATLAS pixel detector.

The ROD - BOC pairs are arranged in nine 9U VME crates. Each crate houses up to 16 of these card pairs and is controlled by a Single Board Computer (SBC). Via ethernet, the SBCs establish the connection to several computers that steer and control the data acquisition. On top of that every crate houses an interface card for distribution of (global) timing and trigger commands, sent by LHC control. This interface is called Timing and Control Interface Module (TIM). Figure 3.18 shows a schematic overview of the whole data read-out chain. As can be seen in this figure every BOC card has an optical data output to the Read Out Buffer (ROB), which is the data path during data-taking mode. In the following sections all the parts mentioned above are described in more detail.

### 3.5.1. Optoboard

There is always one half-stave or one sector of the pixel detector connected to one optoboard [12]. This means, that there are 6 or 7 modules connected to one optoboard, which is a compromise of two aims. On the one hand it is desired not to have too much material within the detector volume, on the other hand not to loose too many modules in case of a damaged optoboard.

The optoboards are beryllium-oxide printed circuit boards, every optoboard is mounted on the Patch Panel 0 (PP0) which establishes the connection to the modules. Each optoboard is featured with one 8-way PIN diode array and for the outer layers with one 8-way VCSEL array. For the B-layer two 8-way VCSEL arrays are applied to every optoboard to achieve a read-out speed of 160 MBit/s<sup>17</sup>. The two outer layers are read out at 80 MBit/s or 40 MBit/s, for which reason only one VCSEL array per optoboard is needed.

<sup>&</sup>lt;sup>16</sup>Vertical Cavity Surface Emitting Laser

<sup>&</sup>lt;sup>17</sup>running two data lines, each at 80 MBit/s



Figure 3.19.: The BeO based optoboard with one VCSEL array and one PIN diode array, as it is used for the two outer detector layers. The size of the optoboards is 2 cm  $\times 6.5$  cm.

All in all 272 optoboards are needed to read out the whole detector. The connection between optoboards and counting room is realized using optical fibres. Every fibre is ribbonized into 8-way ribbons. One ribbon is used to send the trigger, timing and command signals to the detector and for Layer 1 and Layer 2 on one ribbon the data from the detector to the counting room is sent. The B-Layer requires two ribbons to send its data to the counting room. Thereby each module has its own communication channels. The clock and command signals are encoded by the TX plug-ins into one BPM<sup>18</sup> signal, which is sent to the optoboard. This provides the combined information on one instead of two fibres, thus additional material within the detector volume is avoided. The combined signal is received by one PIN-diode per channel and afterwards decoded in a special ASIC, the DORIC<sup>19</sup>, which is explained in the following subsection. The DORIC provides separated electrical clock and command signals to the module.

In the opposite direction the MCC sends data electrically to the optoboard, where another ASIC, the VDC<sup>20</sup>, drives the VCSEL channel by channel. The data from one module is sent always on the same ribbon to the counting room. The data transfer is realized using a Non-Return-to-Zero (NRZ) signal.

Figure 3.20 shows an example for a BPM-coded signal. The encoding of data streams into BPM signals provides a better clock recovery than data streams without an encoding scheme. In case the binary data stream is sent without modification, the result can be a long series of logical zeros or ones, if BPM encoding is not used. This leads to problems concerning clock recovery and synchronization. BPM encoding ensures at least one transition between every data bit, because the clock and the data signal are combined as shown in figure 3.20. Thus synchronization and clock recovery is much easier.

<sup>&</sup>lt;sup>18</sup>BiPhase Mark

<sup>&</sup>lt;sup>19</sup>Digital Opto-Receiver Integrated Circuit

<sup>&</sup>lt;sup>20</sup>VCSEL Driver Chip



Figure 3.20.: Principle of encoding a 40 MHz clock signal and a command signal into a BPM-signal.

#### **DORIC**

The PIN-diode receives optical signals and converts them into electrical signals. These signals are forwarded to the DORIC, which is a four channel chip. Its task is to decode the received BPM signals into separated clock and command signals. The input current has to be between 40  $\mu A$  and 1000  $\mu A$ . The allowed duty cycle for the reconstituted clock is  $(50 \pm 4)$ % with a timing error of less than 1 ns. After receiving a radiation dose of 50 Mrad the bit error rate of the DORIC has to be less than  $10^{-11}$ .

Since the signal that has to be decoded is a BPM signal, the clock looks like a 20 MHz clock in case of only zeros being decoded. This is always the case after the system has been powered up. Thus the DORIC has to be reset after power up to get reliably into the 40 MHz frequency. After the correct frequency is achieved, the 40 MHz clock is used as an input to a delay-lock loop. There the internal delays are adjusted until the desired 50% duty cycle is reached. In this operating mode the data recovery circuit is sensitive to the edges of the input signal. In case the recovered clock is "high" a data signal is recovered, for a "low" clock signal no data is recovered. This signal is synchronized to the recovered clock and then sent as an LVDS signal to the connected module.

#### VDC

The VDC converts the LVDS signals that are sent by the modules, into signals driving the VCSEL channels. The VDC also is a four channel chip, like the DORIC. The VDC's output current can be adjusted by an external control voltage. This voltage is converted into a control current and thus the control current can be varied between 0 mA and 20 mA. During operation the VCSEL is always kept at a minimum current of about 1 mA to increase the switching speed.

#### 3.5.2. Back of Crate card

On the off-detector side the Back of Crate Card (BOC) is the opto-electrical interface [12]. The BOC has been designed for the SCT as well as the pixel detector. For both detectors the layout is the same, just the number of used channels differs. While every BOC that is used in the SCT read-out chain always handles 48 modules, a BOC in the pixel detector read-out is connected

to 6 or 7 B-Layer modules, 12 disk modules, 13 Layer 1 modules or 26 Layer 2 modules. The different numbers of connected modules are due to the different bandwidths used.

The BOC provides the complete timing functionality for the pixel detector. The Back of Crate card is – as the name indicates – located on the back side of the read-out crates used for the ATLAS pixel detector. As previously mentioned, all in all 132 BOC cards are needed



Figure 3.21.: Back of Crate card in pixel configuration [12].

for the pixel detector. Each BOC is connected to a ROD<sup>21</sup>. The BOC provides the interface between ROD and on-detector electronics, as well as to the Read Out Buffers (ROB). Since the module data received on the BOC is forwarded to the ROD, the number of connectable modules depends on the bandwidth of the ROD. This leads to a maximum number of 32 connectable pixel modules, whereas due to modularities not all available channels are used and thus only up to 26 modules are connected. The following list contains the main tasks for the BOC:

- Reception and distribution of the clock signal provided by the TIM.
- Reception of the electrical control signals for the modules that are sent by the ROD. These signals are converted into BPM-encoded optical signals and fed into the laser plug-ins.
- For the transmitted signals several adjustment functions, such as masking, timing and laser current adjustment, are provided.
- For the received data a timing and threshold adjustment for the PIN diode and a clock synchronization are provided.
- The data streams of Layer 1 modules and B-Layer modules are demultiplexed from 80 MBit/s per data line to 2 × 40 MBit/s.
- The optical signals containing the modules' data packages that are sent by the optoboard are received, converted into electrical signals and forwarded to the ROD.
- The parallel event data stream is received from the ROD and sent to the Read Out Buffer (ROB) via S-link.
- An off-detector laser safety interlock is provided.

<sup>&</sup>lt;sup>21</sup>see section 3.5.3

The BOC card can be divided into functional blocks, the clock section, the transmission section, the data receive section and the S-link section. The main functions on the BOC are performed by seven CPLDs<sup>22</sup>. The communication with the ROD and the main control of the BOC is provided by one CPLD, two are used for the communication with the transmitting (TX) and receiving (RX) plug-ins and four are controlling the receiving section, splitting of data streams for high bandwidth modes and registration of data.

#### **Clock Section**

All BOCs in a crate receive the 40 MHz system clock from one Timing, Trigger and Control Interface Module (TIM). On every BOC, this clock signal is multiplexed into five clocks, one of which is transmitted directly to the ROD. Another one – the BPM- or P-clock – is passed to the transmission section and thus to the modules. For the BOC internal chip functionality another clock is used, the A-clock. The remaining two clocks, the B-clock and the V-clock, are used for data recovery. As previously mentioned, some of the clocks are adjustable. The P-clock can be delayed by 0 ns up to 24 ns in 1 ns steps to adjust the timing of the modules. The same range is available for the B-clock, which is the clock for normal data recovery. For data recovery in 80 MBit/s mode the V-clock is used. This clock can be delayed coarsely from 0 ns up to 49 ns in 1 ns steps and additionally from 0 ns up to 10.2 ns in 40 ps steps. A summary of the timing capabilities is given in table 3.3.

| clock signal | purpose                                        | delay capability                 |
|--------------|------------------------------------------------|----------------------------------|
| system clock | received from the TIM                          | none                             |
| A-clock      | for BOC-internal chip functionality            | none                             |
| P-clock      | encoded together with command signals for the  |                                  |
|              | modules into BPM signal                        | 0  ns to  24  ns, 1  ns steps    |
| B-clock      | normal data recovery clock                     | 0  ns to  24  ns, 1  ns steps    |
| V-clock      | data recovery clock for 80 MBit/s mode         | 0  ns to  49  ns, 1  ns steps    |
|              |                                                | 0  ns to  10.2  ns, 40  ps steps |
| ROD-clock    | clock transmitted to ROD, copy of system clock | none                             |

Table 3.3.: Clock signals and their adjustment ranges used on the BOC.

#### **Transmission Section**

The commands for the modules are sent from the ROD to the transmission section of the BOC. In the transmission section four TX plug-ins are located. Every TX plug-in houses an 8-way VCSEL array and a BPM chip where data and clock signals are encoded into the BPM signal. Each plug-in has an 8-way fibre ribbon connected, which establishes the connection to one optoboard. Figure 3.22 shows a TX plug-in.

The transmission section provides several tuning capabilities. For every TX plug-in the delayable P-clock is multiplexed for all module channels. The BPM-chip offers an additional tunable delay for each channel individually. This delay can be set using a fine delay for 0 ns up to 35.56 ns in 280 ps steps as well as by a coarse delay for values from 0 ns up to 775 ns in 25 ns steps. Thus every module timing can be tuned individually with respect to the bunch crossing timer. Additionally the TX laser output power, the mark-to-space ratio of the BPM signal and

<sup>&</sup>lt;sup>22</sup>Complex Programmable Logic Device



Figure 3.22.: TX plug-in with a size of 1.5 cm  $\times 3.5$  cm, which encodes the BPM signal and transmits it to the optoboard [12].

an inhibit signal to block the BPM encoding of the data can be adjusted. An overview of the parameters is given in table 3.4.

| parameter            | range                          | purpose                          |
|----------------------|--------------------------------|----------------------------------|
| BPM inhibit          | on/off                         | prohibits BPM encoding of data   |
| BPM fine delay       | 0 ns to 35.56 ns, 280 ps steps | channelwise settable delay       |
|                      |                                | for data stream to module        |
| BPM coarse delay     | 0 ns to 775 ns, 25 ps steps    | channelwise settable delay       |
|                      |                                | for data stream to module        |
| BPM mark-space-ratio | settings 0 to 31               | adjustment of signal mark space  |
|                      |                                | from 30:70 to 70:30              |
| laser current        | settings 0 to 255              | sets the laser current to values |
|                      |                                | between 0 mA and 18 mA           |

Table 3.4.: Tunable parameters of the BOC's transmission section and their ranges.

#### Receiver Section

The receiver section of the BOC card is treated in more detail, since the same data management that is used on the optical BOC's FPGA had to be implemented to the eBOC.

In the receiver section the module data, sent by the optoboard, arrives on the RX plug-ins. Every pixel BOC card can serve up to four RX plug-ins, each of which provides eight channels. Thus up to 32 modules can be connected to each BOC. The main parts of a RX plug-in (see figure 3.23) are an 8-way PIN diode array, which is the converter from optical to electrical signals, and an amplifier chip. The amplifier chip uses eight channels, although it has twelve, which are used for SCT read-out. For every RX channel the threshold can be controlled to ensure a reliable differentiation between "0"s and "1"s. The received electrical signals are amplified and sent differentially to the BOC card. There the signals are synchronized to the ROD clock and transmitted as 40 MBit/s streams. The delay needed for synchronization can be varied between 0 ns and 24 ns in steps of 1 ns for every stream.

The receiver section houses four CPLDs, one for every RX plug-in. In the CPLDs the received data is registered and clocked out with the B-clock, which provides a stable phase to the ROD clock.



Figure 3.23.: RX plug-in with a size of 1.5 cm  $\times$ 3.5 cm, which transforms the received optical signals into electrical and passes them to the BOC [12].

Since the ROD only accepts data streams at 40 MBit/s the CPLDs have another functionality for data received at 80 MBit/s (Layer 1) or 160 MBit/s (B-Layer). Every 80 MBit/s stream is demultiplexed into two streams, each at 40 MBit/s. The 160 MBit/s mode is treated identically, since it corresponds to  $2 \times 80$  MBit/s.

Because the number of used streams doubles in case of switching from 40 MBit/s to 80 MBit/s and quadruples at switching from 40 MBit/s to 160 MBit/s, the number of connectable modules decreases. Thus at 80 MBit/s two TX plug-ins and two RX plug-ins are used on one BOC and at 160 MBit/s one of each is used.

For the demultiplexing of 80 MBit/s streams into 40 MBit/s streams both recovery clocks – B-clock and V-clock – are used. One 40 MBit/s stream is generated reading every second bit of the faster stream and another 40 MBit/s stream is generated reading the other bits. For generation of the first stream the B-clock is used, for the second stream the V-clock is used, which is opposite to the B-clock. This principle is illustrated in figure 3.24.



Figure 3.24.: Demultiplexing an 80 MBit/s stream into two 40 MBit/s streams with respect to the two recovery clocks [24].

A small routing board is located between the four CPLDs processing the received data to have a balanced work load even in case of less than four used RX plug-ins. Thus the complete

data line sorting can be done on the BOC card. All available tuning parameters of the receiving section are shown in table 3.5. In summary, the data flow through the BOC at 40 MBit/s is shown in figure 3.25 and at 80 MBit/s or 160 MBit/s in figure 3.26.

| parameter          | range                        | purpose                                                    |
|--------------------|------------------------------|------------------------------------------------------------|
| RX threshold       | settings 0 to 255            | channelwise threshold, range from 0 $\mu A$ to 250 $\mu A$ |
| RX data delay      | 0 ns to 25 ns, 1 ns steps    | channelwise adjustment of the data phase                   |
| B-clock            | 0 ns to 25 ns, 1 ns steps    | global clock for data sampling to the ROD                  |
| V-clock            | 0 ns to 25 ns, 1 ns steps    | additional demultiplexing clock                            |
| V-clock fine phase | 0 ns to 10.2 ns, 40 ps steps | V-clock fine adjustment                                    |

Table 3.5.: Receive section parameters of the BOC card [12].



Figure 3.25.: Data routing through the BOC in 40 MBit/s mode [12].

Altogether it requires quite a long tuning procedure to attune all optical components to each other and reach a reliable state. One of these steps is the BOC scan. This scan determines the best working point for the optical data link. A predefined bit pattern is sent to the modules and passed back to the ROD. Afterwards the number of bit flips is colour coded in a 2-dimensional histogram, where an example is shown in figure 3.27. The position of the error free region is determined and thus the optimal combination of receiver delay and receiver threshold can be set. These values depend on the temperature and the laser power of the optoboard.

#### S-Link Section and Further Functionalities

Every BOC card carries the S-Link card, or HOLA<sup>23</sup>, for sending out the data to the Read Out Buffers (ROB). It connects BOC and ROB using a parallel 32 bit connection at 40 MHz.

<sup>&</sup>lt;sup>23</sup>High speed Optical Link for ATLAS



Figure 3.26.: Data routing through the BOC in 80 MBit/s or 160 MBit/s mode [12].

The BOC card provides some monitoring functions for its hardware. The supply voltages and currents of the PIN-diodes on the RX plug-ins can be read out as well as the temperatures of the TX and RX plug-ins can be measured using NTCs<sup>24</sup>.

Due to safety reasons the BOC card has to turn off the TX lasers in case of work going on close to them. Therefore two interlock lines are implemented into the system. These interlock lines are "AND"ed together, thus the lasers only work in case both lines are switched to the "laser on" state. The reason for two lines is that one interlock is located at the crates, where the TX plug-ins are located and the other one is connected to the on-detector laser interlock system. Thus safety is provided in case work is going on at the crates as well as for the case work is done at the on-detector side of the optical fibres, near the optoboards.

#### 3.5.3. Read Out Driver

The Read Out Driver (ROD) is, like the BOC, common for SCT and pixel configuration [19]. All main functions of the ROD are performed by FPGAs<sup>25</sup> and DSPs<sup>26</sup>. The firmware running on the RODs differs for SCT and pixel mode.

The main functions of detector control and read-out are provided by the ROD. Triggers received from the TIM are transmitted to the modules by the ROD, it generates the commands for the modules, transmits them to the BOC and is responsible for the formatting of event data received from the modules. Additionally, it provides several monitoring capabilities. Thus the ROD is not only essential for the detector in data-taking mode, but also for detector calibration and testing.

In data-taking mode trigger signals are received from the TIM and directly transmitted to the modules. The event data is processed in the ROD data-path and transmitted via S-Link to

<sup>&</sup>lt;sup>24</sup>Negative Temperature Coefficient resistors

<sup>&</sup>lt;sup>25</sup>Field Programmable Gate Arrays, see section 4.1

<sup>&</sup>lt;sup>26</sup>Digital Signal Processors



Figure 3.27.: The BOC scan determines the best working point for the optical data link, whereas the plot shows receiver delay vs. receiver threshold. The number of bit flips is colour coded.

the Read Out Subsystem (ROS). In calibration mode the triggers are generated by the ROD and histogramming and data processing is done on the four Slave DSPs located on the ROD.

The ROD can be separated into several functional regions. They are explained in the following subsections.

#### Controller FPGA and Master DSP

The ROD data path, which is needed to use the BOC and for processing of module configuration and trigger propagation, is set up by the Controller FPGA and the Master DSP.

The Master DSP sets all BOC and ROD registers and has two serial outputs, which are used to send the pixel module configurations. Additionally the serial ports are used to send triggers in calibration mode. In data-taking mode the trigger, Level1 ID, bunch-crossing ID and trigger type are received from the TIM. This information is processed by the Controller FPGA and the trigger signals that are required by the pixel modules are created.

#### Formatter FPGAs

Every ROD is equipped with eight Formatter FPGAs, which receive the 40 MHz event data streams from the BOC. In pixel assembly the firmware provides four inlinks per Formatter



Figure 3.28.: The Read Out Driver (ROD) as it is used for SCT and pixel detector.

FPGA (which will be of special interest for the eBOC, see section 5.2), out of 12 available inlinks, that are used in SCT mode. Since in 160 MBit/s mode, as used for B-Layer modules, the BOC produces four streams at 40 MBit/s per module, the event data of one module can be processed by one Formatter. This processing capability limits the number of connectable pixel modules depending on the used bandwidth.

The conversion from serial to parallel format of the event data as well as the derandomizing of event fragments is also done by the Formatters. Additionally checks for data errors are provided and the "event complete" signal is transmitted to the Controller FPGA.

#### Event Fragment Builder FPGA

The Formatter's output is collected by the Event Fragment Builder FPGA (EFB). The Level1 ID and bunch crossing ID are checked and event header and trailer are generated. For the generation of header and trailer the Level1 ID and bunch crossing ID information is received from the Controller FPGA and compared to the information received from the Formatters. In case of errors, such as FE chip error flags or unphysical data, the EFB sets flags in the event header and trailer. In case an event is processed and header and trailer are generated, it is transmitted to the Router FPGA.

#### Router FPGA and Slave DSPs

All formatted event data is sent to the ROS via the S-Link and/or to the Slave DSPs by the Router FPGA. The received data is checked by four Slave DSPs and error counting, event capturing and histogramming are done. In calibration mode the Slave DSPs are used for histogramming and further data processing, e.g. fitting.

Since an intervention of the Master and Slave DSPs is not needed during data-taking mode after the data path in the ROD is set up, they can be used for event capturing and online monitoring. In the unlikely case of a too fast ROD data output to the S-Link and overflowing buffers, a "ROD BUSY" signal is generated and the transmissions of triggers from the TIM is stopped. A more probable reason for the generation of a "ROD BUSY" signal is backpressure from the ROS, which e.g. can be caused by noisy pixel modules.

#### 3.6. ATLAS Upgrade Plans – The Insertable B-Layer

It is expected that the integrated radiation dose received by the pixel detector and especially the B-Layer causes a damage of the sensor material. This leads to weak signals and at the end a useless detector layer [25] after about four years of operation. Since the pixel detector is important for b-tagging in particular, it is desired to have the detector layers as close to the beampipe as possible. On top of that a reliable event reconstruction is hard to realize with only two space points per particle. Thus it has been decided to insert a new B-Layer after around four years of operation, the *Insertable B-Layer* (IBL). This new B-Layer will have a radius of about 37 mm. Compared to the current B-Layer, which has a radius of 50 mm, the IBL project comes along with new requirements for sensor and electronics concerning radiation hardness (see figure 3.29). It has to withstand about 3 to  $5 \times 10^{15} \frac{n_{eq}}{cm^2}$ , which also means that the read-out has to cope with higher hit rates.



Figure 3.29.: Expected particle fluence as a function of the detector radius, calculated for an integrated luminosity of  $2500 \text{ fb}^{-1}$  [18].

These requirements necessitate a redesign of the sensor and the Front End-chip. The expected hit rates lead to the decision to run the new FE-chip using a  $160~\mathrm{MHz}$  clock (and thus  $160~\mathrm{MBit/s}$ 

read-out speed) for the uplink, instead of the current 40 MHz clock. Additionally it has been thought of using 8b/10b encoding to provide a bit transmission error check combined with clock recovery. The downlink (TTC) stays at 40 MBit/s. Altogether this means that a new BOC as well as a new optoboard are needed for the IBL, especially to keep the changes on the ROD as small as possible. The development of the new optical data read-out chain will need some time and work. The full pixel module testing procedure requires data handling and processing by the ROD. Since this would mean that the main operation tests only could take place quite close to the installation date, the idea of finishing and upgrading a testing assembly came up. This assembly provides electrical communication between the ROD and the pixel modules, instead of optical communication. Thus it is easier to handle and the development time for an adjustment to the IBL requirements is shorter. The work that has been done on this electrical read-out chain is described in section 5.

## 4. Field Programmable Gate Arrays

#### 4.1. Introduction

Field Programmable Gate Arrays (FPGAs) are modern micro electrical devices which are (re-)programmable after assembly and (even) after integration into a digital circuit ("field-programmable"). The flexibility of FPGAs makes it unnecessary to engineer expensive single semiconductor devices for prototype or user-specific set-ups, especially during a development phase. The first FPGAs were developed by Xilinx about 20 years ago. In contrast to other digital devices they allow for parallel processing which increases flexibility and speed for the solution of complex problems<sup>1</sup>.

#### 4.2. Programmable Logic Architecture

#### **4.2.1.** History

The first device especially designed for the implementation of logic circuits was the (F)PLA<sup>2</sup> [26]. The first PLA was invented by Philips in the early 1970s. PLAs have two levels of logic gates, namely programmable "wired" AND-planes, followed by programmable "wired" OR-planes. In the AND-plane any of the inputs can be AND'ed together, thus any output of an AND-plane can accord with any (input) logical multiplication. Likewise the logical sum of any AND-plane outputs can be produced using the OR-planes. These two features together made PLAs quite useful for implementing logical functions of the sum-of-products form.

Programmable Logic Arrays suffered from two disadvantages. On the one hand they were quite expensive and on the other hand, due to the two level logic design, too slow. As a result PALs<sup>3</sup> came up to counter these problems. PALs only have one level of programmability, in fact wired AND-planes that feed fixed OR-gates, as can bee seen in figure 4.1. A lot of different PAL types were produced to counter the lack of generality, due to the fixed OR-planes. Many PALs have flip-flops connected to the OR-gates outputs to offer the possibility of designing sequential circuits. This invention was a big step towards modern digital hardware design, since costs decreased and pin-to-pin speed performance increased. PLAs, PALs and related devices are called SPLDs<sup>4</sup>.

Devices with higher capacity were developed with the increase of technology and knowledge. The problem of fast growing logic-plane sizes for strict SPLD architectures arises together with the higher capacity and growing number of inputs. One solution for this problem was the integration of multiple SPLDs on one chip and the implementation of interconnects to be able to connect the blocks via programming. Several modern field programmable devices are based on this technique and are grouped under the name CPLDs<sup>5</sup>. These devices provide the equivalent

<sup>&</sup>lt;sup>1</sup>remark: the biggest actual FPGA-based task is the just solved "26-queens problem" of chess players, see: http://queens.inf.tu-dresden.de

<sup>&</sup>lt;sup>2</sup>(Field) Programmable Logic Array

<sup>&</sup>lt;sup>3</sup>Programmable Array Logic devices

<sup>&</sup>lt;sup>4</sup>Simple Programmable Logic Devices

<sup>&</sup>lt;sup>5</sup>Complex Programmable Logic Devices



Figure 4.1.: Structure of a six-stage Programmable Array Logic device [26].

logic capacity of up to 50 SPLDs. But at this point the realizable complexity of CPLDs reaches a limit. Since higher logic capacity was desired, another approach was needed.

#### 4.2.2. Modern FPGAs

The next step towards current FPGAs was the invention of MPGAs<sup>6</sup>. As the name says, these devices consist of an assembled array of transistors, which are connected with custom wires. The disadvantages of MPGAs are a long production time and the fact that the devices design (almost) can not be changed after the first assembly. This lead to the invention of FPGAs, which consist of an array of unconnected logic blocks, whereas every block contains a LUT<sup>7</sup> and a flip-flop. These blocks can be connected via interconnection resources, that are set by the user. This is mainly realized using either SRAM- or antifuse-technology<sup>8</sup>. The architecture of a typical modern FPGA can be seen in figure 4.2.

An overview of the logic capabilities of the three presented FPD<sup>9</sup>-types is given in figure 4.3, where *equivalent gates* corresponds to the number of 2-input NAND gates. This chart is to be seen as an indicator for the steps in logic device development. Since some devices are optimized for special requirements, a device with higher number of *equivalent gates* in this chart is not necessarily the better choice for a design than a device with less *equivalent gates*.

#### 4.2.3. The Altera Flex FPGA Family

As an example of modern FPGAs a more detailed description of the Altera Flex8000 FPGA is given. This device is similar to the Flex6000 FPGAs used on eBOC and PP0-2, which are part of the electrical data read-out chain of the ATLAS pixel detector (see chapter 5). Just the adjunct "modern" is, due to a higher number of logic gates, more appropriate here and therefore they are of special interest.

<sup>&</sup>lt;sup>6</sup>Mask Programmable Gate Arrays

<sup>&</sup>lt;sup>7</sup>Look Up Tables

<sup>&</sup>lt;sup>8</sup> for details, see [26]

<sup>&</sup>lt;sup>9</sup>Field Programmable Device



Figure 4.2.: Example for the modern FPGA structure, which offers a two-dimensional connection potential of the logic blocks and thus much more flexibility and speed than the one-dimensional PALs [26].



Figure 4.3.: Logic capacities of different FPD-types, where the x-axis can be seen as a not scaled time axis [26].

The Flex8000 series is set up in a three-level hierarchy. The lowest level consists of a set of LUTs. A LUT is a one bit wide memory array. The inputs of the logic block are the address lines for the memory and the LUT output is the one bit output from the memory. This means a LUT with K inputs corresponds to a  $2^K \times 1$  bit memory. It can realize any logic function of the K inputs by programming the logic functions truth table directly into the memory. A classical example for the usage of LUTs is the trigonometric table, where e.g. several sine-values

are calculated at the beginning of a process. Because this calculation may take a lot of time and the whole function would have to be called up for every sine-calculation, the considered value is rounded to the next already calculated value available in the LUT. This procedure shortens operations a lot, since values just have to be looked up instead of being calculated.

The basic logic block of the Flex8000 is a four-input LUT. One LUT, a flip-flop and special-purpose carry circuitry for arithmetic circuits are grouped together as so called Logic Elements (LEs). To support the implementation of wide<sup>10</sup> AND functions there is also a cascade circuitry included in every LE. A logic element is illustrated in figure 4.4 in detail.



Figure 4.4.: The basic block of an Altera Flex8000 FPGA, a Logic Element [27].

In the Flex8000 design eight LEs are grouped into a LAB<sup>11</sup>. A LAB contains local interconnect, where every local wire can connect any LE to another LE that is in the same LAB, see figure 4.5. On the other hand every local interconnect is connected to the so called FastTrack, which is the global interconnect of the Altera Flex series. A FastTrack wire stretches across the full width or height of the device. The full architecture of an Altera Flex8000 FPGA can be seen in figure 4.6, containing the previously described elements.

Most available modern FPGAs can be reprogrammed several times. Their configuration is saved in an integrated memory, which can be SRAM<sup>12</sup>, EEPROM<sup>13</sup>, PROM<sup>14</sup> or Flash memory.

Usually the device's functionality can be changed during run-time by loading new configurations to the FPGA's memory. Some state-of-the-art FPGAs, such as the Xilinx Virtex-family, even offer the possibility to reprogram them partially. In this case the developer can change a part of the configuration, while the rest of the FPGA is still working. The configuration file itself is a binary file, produced by a synthesis tool, such as Altera's Quartus II [27]. These synthesis tools translate a human readable language like VHDL<sup>15</sup> or Verilog into binary files and give the opportunity to verify the design's behaviour.

<sup>&</sup>lt;sup>10</sup>large number of inputs

<sup>&</sup>lt;sup>11</sup>Logic Array Block

<sup>&</sup>lt;sup>12</sup>Static Random Access Memory

<sup>&</sup>lt;sup>13</sup>Electrically Erasable Programmable Read-Only Memory

<sup>&</sup>lt;sup>14</sup>Programmable Read-Only Memory

<sup>&</sup>lt;sup>15</sup>Very High Speed Integrated Circuit Hardware Description Language, see next section



Figure 4.5.: The next bigger unit of the Altera Flex8000 FPGA, the Logic Array Block. A LAB contains several Logic Elements [26].

## 4.3. Very High Speed Integrated Circuit Hardware Description Language

#### 4.3.1. Introduction

Design tools that were used in the past to configure PLDs<sup>16</sup> do not fulfil today's requirements anymore. These are for example the support for the two-dimensional logic block connection, as well as predefined standard functions. Furthermore mighty design tools and a standardization of hardware description languages for modern devices, such as CPLDs<sup>17</sup> and FPGAs, are needed. The US-American "Department of Defense" initiated the first version of VHDL in 1983. As a result of long discussions the Very High Speed Integrated Circuit Hardware Description Language was founded in 1987 as industry standard IEEE-1076 [28] and updated in 1993 and 2002. In the last years there was a large increase of design complexity and a decrease of available design time. As an example, one could think of mobile phones, which are typically operated by FPGAs. Thus, old fashioned ways, like computing schematics based on single logic cells, are out of date, especially because of meanwhile more than one million available logic gates on a single FPGA. Modern synthesis tools translate the coded circuit behaviour into schematics and binary files and offer the possibility of timing and behavioural simulations. Since these tools are provided by the FPGA vendors, they can be customized for the used device family and therefore provide an almost complete simulation coverage of the design's behaviour.

<sup>&</sup>lt;sup>16</sup>Programmable Logic Devices

<sup>&</sup>lt;sup>17</sup>Complex Programmable Logic Devices



Figure 4.6.: Zoom-in to the Altera Flex8000 Architecture, which shows the same basic structure as shown in figure 4.2. Every LAB is connected to the horizontal and vertical FastTrack interconnects [27].

#### 4.3.2. Language Characteristics

The language VHDL is used for description and simulation of digital systems and their environment [29]. The provided language complexity covers all descriptions of the circuitry which are needed during the design phase, whereas a simulation of the design is possible any time. The main concepts VHDL is based on are:

#### • Hierarchy:

The design can be split into several (sub-)components, whereupon the whole project is made up by the synthesis tool in hierarchical order out of these components. A component can be everything, from a simple logic gate up to a micro controller or processor core. This concept has been used for the eBOC FPGA code (see chapter 5), to have different data-flow directions separated.

#### • Models:

There are several ways to describe design units (called entities). One way is to specify the behaviour of an entity. This means an algorithmic description using the resources of a higher programming language, with which both sequential and parallel processes can be modelled. Another approach is to depict the structure, where a connection scheme is directly implemented and components are connected in a hierarchy. A third way is the description of the data flow. The data flow modelling is a mixture of behavioural and structural description at which the construction of data paths is depicted. Thereby operations on the data exist as elementary functions.

#### • Data conservation:

The library concept of VHDL is similar to the concept of many other programming languages. It supports recycling of existing designs, embedding of manufacturer-specific libraries and access of working groups to common databases. Data conservation is useful for nearly every design, because standard functions are available in several open access libraries.

#### • Alternatives:

Almost every application can be realized by a huge number of different possible designs. Usually they mainly differ in the level of abstraction and, since synthesis tools support a mixture here, even in the used Hardware Description Language (e.g. VHDL, Verilog HDL). Part of the FPGA design that was done in the context of this thesis has been realized in graphics mode, while the main part is coded in text mode<sup>18</sup>.

New designs in VHDL are realized in a top-down process, which means that first a behavioural characterization of the design is coded and its functionality is simulated. The provided synthesis tools support the top-down method (see figure 4.7), since they offer a huge library of commands just for coding the behaviour of a design. These commands do only compile for timing and behavioural simulations and have to be replaced during the top-down process by elements of synthesizable cell libraries or manufacturer specific libraries. Because the shifting of data signals by a constant time-value is essential for the later described eBOC, a D-FlipFlop has been chosen as code-example (see figure 4.8). However, synthesizable code can always be used for simulation. After verification of the chosen design it is split into several functional blocks to gain a structural description.



Figure 4.7.: Design flow of a top-down process. Beginning with the desired structure (here: XOR), continuing with the behaviour of the structure and finally the coded design.

The code example in figure 4.8 shows an important quality of VHDL. The language offers explicit timing-commands, which are especially needed because all processes on FPGAs are executed in parallel. Timing commands have to be used to avoid asynchronous processes or to achieve sequential processes. For simulation purposes absolute timing commands as in code

<sup>&</sup>lt;sup>18</sup>see figure 4.9

#### (1) D-FlipFlop, behavioral, not synthesizeable

```
entity DFlipFlop is
2
        port(D, clk: in bit;
3
         Q: out bit);
 4
     end DFlipFlop;
 5
 6
    architecture Behavior of DFlipFlop is
 7
         constant T_clk_Q: time := 3.54 ns;
8
   begin
9
   process
10
         begin
              wait until clk'event and clk'last value='0' and clk='1';
11
12
             Q<=D after T clk Q;
13
         end process;
      end Behavior;
```

#### (2) D-FlipFlop, behavioral, synthesizeable

```
entity DFlipFlop is
         port(D, clk, nResetAsync: in bit;
2
3
          Q: out bit);
 4
     end DFlipFlop;
 5
 6
    architecture Behavior of DFlipFlop is
 7
    Bbegin
8
         process(clk, nResetAsync)
9
         begin
10
              if nResetAsync='0' then
11
                  Q<='0';
              elsif clk'event and clk='1' then
12
13
                  Q \leftarrow D;
14
              end if;
15
         end process;
16
      end Behavior:
```

#### (3) D-FlipFlop, manufacturer's architecture

```
library VendorLib;
larchitecture Vendor of DFlipFlop is
component Dff port(D, clk: in bit; Qout: out bit); end component;
begin
ff1: Dff port map (D=>D, clk=> clk, Qout=>Q);
for all: Dff use entity VendorLib.Component.Dff;
end Vendor;
```

Figure 4.8.: A D-Flip-Flop in a top-down process, where in the first attempt (1) the Flip-Flop is coded directly as behavioural description, in step (2) as behavioural description that is synthesizeable and in step (3) the final completed design.

example (1), line 7 are allowed ("3.54 ns"). Simulation is always the first step in VHDL programming, because only the circuits behaviour is coded. There are neither FPGA-specific commands nor the consideration of available logic capacities needed. For the final synthesizeable code ab-

solute timing commands are not allowed. It can only contain timing commands that are relative to the FPGA clock like in code example (2), line 12.

The first two sections (number (1) and (2)) of the code example in figure 4.8 both show two blocks of code. First the *entity*, which stands for the black box or the outside view of a design, since it only contains the ports and the information, whether they are inputs, outputs or both. This step of "outside view programming" can also be done in a graphical way, which is illustrated in figure 4.9. Everything declared in the entity section can be used in the whole design project. The second block called *architecture* contains the design implementation. As the name suggests, the architecture comprises the plan of what happens inside the "black box". The third section of this code example shows that for standard functions like a flip-flop every hardware manufacturer offers commands that are optimized in code length and speed.



Figure 4.9.: The designs outside view can also be coded in a graphics mode, here, as an example, part of the eBOCs FPGA code (see chapter 5).

# 5. Detector Read Out for Multi-Module Testsystems

#### 5.1. Introduction

As seen in the previous section, the optical data transmission needs a lot of tuning to work reliably. Even during usage one has to take care of – and eventually re-tune – several voltages that are needed to run the components for the data transmission. Since optical data transmission is not needed for testing purposes, there has been the idea to develop a system, which replaces all optical components and transmits the signals electrically. For testing mode the (optical) Back of Crate Card (see chapter 3.5.2) can be replaced by the electrical Back of Crate Card (eBOC). Every eBOC is connected, like the optical BOC, one-to-one to a ROD¹. It is on the one hand responsible for sending clock and command signals to the connected pixel modules and on the other hand it receives the modules' data and transmits these signals to the ROD. Since the signals are transferred electrically instead of optically a replacement for the optoboard is used, the PP0-2 card (section 5.2.1).

An overview of the electrical read-out chain is given in figure 5.1.



Figure 5.1.: The electrical read-out chain as it can be used for testing of pixel modules.

#### 5.2. The Electrical Back of Crate Card

Depending on the modules' read-out speed, due to design reasons, one eBOC can handle up to 7 modules at 160 MBit/s, up to 14 modules at 80 MBit/s or and up to 28 modules at 40 MBit/s. The main components of the eBOC are differential signal drivers, a clock section and an Altera Flex6000 FPGA (see figure 5.2). On the right side of the upper half the signal drivers responsible

<sup>&</sup>lt;sup>1</sup>Read Out Driver, see previous chapter

for direct transmission of clock and command signals to the pixel modules are placed. In the lower half the modules' data streams are received. Depending on the read-out speed one or two data lines per module are used. These data streams are routed to the FPGA, where they can be split, re-routed and sent to the ROD for further processing. The first eBOC design has been realized at Berkeley National Laboratories, California.



Figure 5.2.: Functional sections of the electrical BOC.

For pixel module testing purposes the optical BOC is too sensitive to changes of its environment and an optical data transmission is not required due to distances only in the order of meters between modules and crate. The electrical read-out chain is more robust to e.g. changes of the environment's temperature and easier to set up, than the optical system. Since a "revision 0" eBOC design had been realized for testing of the current pixel modules, the idea was to improve the eBOC design to provide a module testing assembly for the new IBL modules (section 3.6). Therefore the following three steps were realized during this thesis. Firstly an eBOC had to be assembled and a "revision 1" eBOC operating stable at 40 MBit/s had to be set up (see chapter 5.2.2). Secondly the operation at 80 MBit/s and a possibility to switch between the two read-out speeds have been implemented. And finally the operation at 160 MBit/s, which corresponds to  $2 \times 80$  MBit/s, had to be provided by the eBOC.

#### 5.2.1. The PP0-2 Card

In this chapter, the last missing part of the electrical read-out chain is introduced, the PP0-2 card (see figure 5.3). The PP0-2 card is only used together with the eBOC and replaces the optoboard. Thus the same PP0 card as it is used for the optical read-out chain is connected to every PP0-2. Like on the eBOC one half of the PP0-2 card is used to route through clock and command signals to the modules and mainly consists of LVDS receivers and drivers. In figure 5.3 it is the lower half of the card. The clock and command signals are sent from the eBOC to the PP0-2 via ribbon cables. The upper half of the PP0-2 card is used to send the module data to the eBOC. The main parts are also LVDS drivers and receivers and additionally one FPGA for data routing.

On the backside of the card four d-sub connectors are mounted to connect the PP0-2 via ribbon cables to the eBOC. The four connectors for connecting PP0 cards are also mounted on the backside. Like on the eBOC an Altera Flex6000 FPGA is mounted in the centre of this card. The difference is that the FPGA only handles the data signals coming from the modules, but not the signals to the modules. In  $1 \times 40$  MBit/s mode and  $1 \times 80$  MBit/s mode the FPGA is just forwarding the modules data signals to the eBOC. This was the PP0-2 FPGA configuration before the upgrade was realized.

Since a second data line is used for  $2 \times 40$  MBit/s and  $2 \times 80$  MBit/s for every module, the PP0-2 mode has to switch. These data lines are rerouted now by the PP0-2's FPGA as shown in the scheme below (see figure 5.4), because only 28 data-out lines are available on the PP0-2. Thus the FPGA switches between the forwarded data lines. In single line mode the FPGA input lines "DataIn\_0" to "DataIn\_27" are transmitted to the eBOC, in double line mode the FPGA inputs "DataIn\_0" to "DataIn\_6" and "DataIn\_0" to "DataIn\_6" are sent to the eBOC. Thus the maximum number of seven modules the ROD can handle at  $2 \times 80$  MBit/s mode is supported. Since the PP0-2 card has 28 output lines and the ROD in principle could handle twice the amount of streams at  $2 \times 40$  MBit/s, the FPGA configuration in double line mode supports a routing of the inputs "DataIn\_0" to "DataIn\_13" and "DataIn\_2\_0" to "DataIn\_2\_13". The desired operation mode of the PP0-2 card is set by using the switches in the mode selection section of the eBOC. The corresponding signals are transmitted via the ribbon cables to the PP0-2.

In case the double line mode is selected, the PP0-2 card in its current design only supports one PP0, since only for one PP0-slot the second data lines, that are required for this mode, are connected. Because on this slot the data lines are twisted, a specific card ("PP0\*") that is just usable for this slot has been designed. It supports two pixel modules at all speed modes and is shown in figure 5.5. On the left side the supply voltages are connected, the high voltage for depletion of the sensor can be applied via Lemo connectors, the module temperatures can be read out via the NTCs and on the right side a ribbon cable is used as connection between this card and the PP0-2. The connectors for the modules are on the backside.

In principle the PP0-2's functionalities could be implemented in the eBOC design. But the big advantage of having two separate boards is that in test beam operation only the pixel modules and the PP0-2 are set under radiation, but not the other parts of the read-out chain. Especially the eBOC and the ROD can be kept away from the radiation.

#### 5.2.2. Functional Sections of the eBOC and "step 1"

Similar to the optical BOC, the eBOC has a clock section, transmission section and receiver section. Since the eBOC is only used for testing and calibration, it has no S-Link. Figure 5.2 shows the functional sections of the eBOC.



Figure 5.3.: The PP0-2 card, which replaces the optoboard in an electrical read-out chain.

In the clock section a 40 MHz clock is generated by a crystal oscillator. This clock signal is on the one hand transmitted to the ROD and on the other hand to the modules. In contrast to the optical BOC, the clock signal is additionally sent on a clock data line to the PP0-2 and passed back from there to the eBOC. Hence the clock signal is almost sent through the same cable length as the data signals and therefore both signals are always synchronized, independent of the cable length between eBOC and PP0-2. This returned clock is sent to the FPGA that handles all data on the eBOC. Thus the received module data can be (if necessary) decoded by using the return clock. Altogether the advantage of using the return clock is that the data processing can be done without any trouble caused by signal delays, because the clock is always as "late" as the received data.

The mode selection offers an easy switching of speed modes and signal phase settings. In the current setting, only a switching between two speed modes (e.g. 40 MBit/s and 80 MBit/s) is supported. The number of switches would provide the selection between all three desired speed modes, but since the 160 MBit/s mode is not realizable with the current eBOC the two possible settings suffice. The selected modes are shown by LEDs<sup>2</sup> in the mode indicator section.

The  $LVDS^3$  drivers receive the module configuration data, trigger signals and the clock from

<sup>&</sup>lt;sup>2</sup>Light Emitting Diode

<sup>&</sup>lt;sup>3</sup>Low Voltage Differential Signalling



Figure 5.4.: PP0-2 FPGA data routing, left for  $1 \times 40$  MBit/s and  $1 \times 80$  MBit/s mode and right for  $2 \times 40$  MBit/s and  $2 \times 80$  MBit/s mode.

the FPGA. Since the FPGA requires CMOS<sup>4</sup> signals, the LVDS drivers convert the CMOS signals into LVDS signals. This is done to reduce the noise sensitivity of the signals and to provide a cable length between crate and modules in the order of some meters. The LVDS receivers convert the received data and clock signals into CMOS signals for the FPGA.

In the central region of the eBOC its "heart" is placed, the FPGA. It handles all data streams from the ROD to the modules, from the modules to the ROD, and all clock signals. The module configuration signals can be routed to every single command channel out of the 28 available channels. On the other hand the received data streams can be demultiplexed, delayed, compared or routed to every available output channel. The routing of the output channels is a direct mapping of the signals to different formatters and formatter channels of the ROD. The speed mode selection signal, that can be set in the mode selection section, is also directly fed into the FPGA. By switching the speed mode, the user switches between two different configurations of the FPGA (see subsection 5.2.3).

Since the used FPGA has no memory to store a configuration file, a PROM<sup>5</sup> is placed next

<sup>&</sup>lt;sup>4</sup>Complementary Metal-Oxide-Semiconductor

<sup>&</sup>lt;sup>5</sup>Programmable Read-Only Memory



Figure 5.5.: PP0-like card, supporting two pixel modules at all speed modes from  $1\times40~\mathrm{MBit/s}$  to  $2\times80~\mathrm{MBit/s}$ , named "PP0\*".

to it. On every start-up of the system, the FPGA is configured by the PROM and keeps this configuration until a reset signal is sent to the FPGA or the system is powered down.

Step 1: The eBOC in its original state showed some connection problems with the ROD. These problems were caused by a too weak clock signal ("ROD clock") that is sent back to the ROD for synchronization reasons. Without receiving a "ROD clock" signal, no data processing can be done on the ROD and thus the whole read-out chain is not working. This problem can be solved by using a PECL<sup>6</sup> signal driver for the clock signal, instead of the primarily intended LVDS<sup>7</sup> driver. The PECL standard is also a differential signal, but has a higher differential output voltage of about 1 V. This signal strength is required by the ROD and about a factor three higher than the LVDS output voltage. The additionally required PECL driver causes the yellow wires crossing the eBOC.

Another required fix has to be done concerning the clock distributor. The operating frequency of the current read-out system is 40 MHz, which is provided by the mounted crystal. This clock has to be multiplexed (like on the optical BOC) for the modules and the ROD. The used clock distributor in its default state divides the input clock frequency of 40 MHz by the factor two, which leads to synchronization problems between eBOC and ROD. This problem can be solved by changing the configuration of the clock distributor.

After fixing these (and some smaller) problems the eBOC could be operated reliably at 40 MBit/s. In this mode, all signals are just forwarded by the FPGA without modifications or data line twists.

#### 5.2.3. Operation at 80 MBit/s – "step 2"

The main work of the eBOC FPGA begins, if data signals are sent at 80 MBit/s per data line. Since the ROD only accepts data at 40 MBit/s per data line, the eBOC has to demultiplex the signals. The FPGA has to decode the 80 MBit/s signal into two data streams, each at 40

<sup>&</sup>lt;sup>6</sup>Pseudo Emitter-Coupled Logic

<sup>&</sup>lt;sup>7</sup>Low Voltage Differential Signalling

MBit/s, with respect to the ROD's requirement that signals have to arrive on the formatter FPGAs in 25 ns intervals. Additionally the signals have to be clocked by the ROD clock, which has to be synchronized to the return clock. Otherwise the received data could be mismatched to a wrong event or the ROD simply might be unable to process the data.

Since the signal encoding takes place every 12.5 ns the "second" signal has to be delayed by another 12.5 ns to arrive on-time. This procedure has to take place twice simultaneously per module for 160 MBit/s read-out speed. The implementation of this feature has been realized by upgrading and extending the existing FPGA configuration. In case the speed mode switch of the eBOC is set to 40 MBit/s, the FPGA forwards the module configuration without any modification and also transmits the received module data without modification. The module data is sent to the VME connector so that always four out of the twelve inlinks of each ROD formatter receive data. In the 80 MBit/s configuration the FPGA processes data of up to 14 modules, whereas the streams have to be demultiplexed to two streams at 40 MBit/s each. Since the same procedure is needed for the optical BOC at 80 MBit/s read-out speed, the same decoding scheme as shown in figure 3.24 is used.

The FPGA of the eBOC receives one clock signal from the clock section, which is synchronized to the return clock, and creates the inverted clock ("Clock-B"). At every rising edge of one of the two clocks the received 80 MBit/s signal is decoded and written alternating into the resulting two data streams that are sent to the ROD. To achieve that both data streams are synchronized to a 40 MHz clock upon arrival on the ROD, one of the two streams is delayed by 12.5 ns using a latch that is implemented in the FPGA configuration. The delayed stream is clocked out with the clock of the first stream ("Clock-V"). These two streams have to arrive on two neighbouring inlinks of the ROD formatters and have to be synchronized to a 40 MHz clock, otherwise the ROD cannot process the data. Thus the outlink routing of the FPGA is completely different compared to the 40 MBit/s mode.

During testing it turned out that a mismatch of the outlink mapping between optical BOC and eBOC exists. This results in a maximum number of eight modules that can be handled at 80 MBit/s. More details are explained in the next subsection.

#### 5.2.4. Operation at 160 MBit/s – "step 3"

To serve as a testing and development environment for IBL modules, the read-out chain has to support a read-out speed of 160 MBit/s (see chapter 3.6). Therefore the FPGA configuration that has been realized for an operation at 80 MBit/s already contains all decoding capabilities that are needed for 160 MBit/s operation. The difference on the PP0-2 card side is that it has to transmit the data that is sent on the second data line per module to the eBOC. On the eBOC itself the demultiplexing procedure of each of the two 80 MBit/s streams per module is exactly the same that has already been used in the previous step. The difference is that another link mapping for the signals passed to the ROD is required. The ROD expects all four data streams that are received per module on one formatter. Otherwise the four data streams cannot be combined by the ROD as data from only one module.

At this point it turned out that without hardware modifications the current set-up of ROD and eBOC cannot be used for module testing at 160 MBit/s. As previously mentioned, every formatter of the ROD has twelve inlinks. The optical BOC sends its data to the first four inlinks of each formatter. Per default the eBOC sends the data to the formatter inlinks three to six, see figure 5.6.

This mismatch is the hardware reason why the 160 MBit/s mode cannot be realized with the current set-up, since the ROD per default reads formatter inlinks 1 to 4. In the ROD design



Figure 5.6.: Different link assignments of the optical BOC and the eBOC. Per default the optical BOC sends the data to inlinks 1 to 4, while the eBOC sends it to inlinks 3 to 6.

a multiplexer for the formatter inlinks has been implemented, but during the system tests for the current pixel detector it turned out, that the multiplexer does not work as expected [30]. It stays in its default state and thus only the first four inlinks can be used. This means that the overlap of formatter inlinks used by the eBOC and formatter inlinks supported by the ROD is only two per formatter. Because the 160 MBit/s mode requires four inlinks on one formatter and a replacement of the multiplexers on all existing RODs is almost impossible, the highest bandwidth cannot be realized without a redesign of the eBOC.

In summary, on the one side the PP0-2 and the eBOC and on the other side the ROD support the desired read-out speed. The reason why no bit error measurement (see 5.3.3) has been done at 160 MBit/s is a communication problem between the two cards. To solve this problem a redesign of the eBOC should include an outlink mapping that is the same as on the optical BOC. Since tests at 160 MBit/s could not be performed, the following measurements to proof the functionality of the eBOC are all done at 80 MBit/s. For the proof of reliability at the faster speed mode this is no problem, because (as explained above) the decoding routine of the FPGA is the same at 80 MBit/s and 160 MBit/s.<sup>8</sup>

<sup>&</sup>lt;sup>8</sup>The communication problems between the eBOC and ROD have been solved after completion of this thesis. The read out is now also possible at  $2 \times 80$  MBit/s.

#### 5.3. Measurements

As explained in the previous section, the measurements to prove the new functionalities of the eBOC have been done at 80 MBit/s read-out speed. Some of the measurements have also been done at 40 MBit/s to be sure that the switching between the two speed modes is working reliably and to illustrate the previously described demultiplexing procedure using real data. Therefore the read-out chain was set up as shown in figure 5.1 with one pixel module connected.

#### 5.3.1. Trigger

One of the most important signals that is needed to realize full scans is a trigger signal. It can be initiated by using an existing software ("STcontrol"). The expected answer of the module is a bitstream that begins with the standard event header "11101", followed by a bandwidth specific bitstream, containing e.g. an event-ID number. Since the eBOC has some test points at its data lines, the bitstreams that are sent to the FPGA and the bitstreams that are sent from the FPGA to the ROD can be visualized with the help of an oscilloscope. Figures 5.7, 5.8 and 5.9 show screenshots of the bitstreams that are sent to the eBOC after a trigger has been released. Firstly the bitstream at 40 MBit/s is shown and secondly at 80 MBit/s.



Figure 5.7.: Trigger-response sent by the module to the eBOC at 40 MBit/s. The purple line is the bitstream input to the FPGA, the blue line and the yellow line are the two neighbouring output lines that are connected to one ROD formatter. Both of them are used at 80 MBit/s and one is used at 40 MBit/s. Here the length of one bit is 25 ns.

The screenshots show exactly the desired shape. At 40 MBit/s the FPGA input can be found on one data out line, just shifted by the time the signal needed to be transmitted through the FPGA. The bitstream that is sent at 80 MBit/s to the FPGA is demultiplexed alternating to



Figure 5.8.: Trigger-response sent by the module to the eBOC at 80 MBit/s. The colour code of the lines is the same as above, but the length of one bit at the FPGA input line is 12.5 ns, at the two FPGA outputs it is 25 ns.



Figure 5.9.: Trigger-response sent by the module to the eBOC at 80 MBit/s - zoom-in.

the two neighbouring data out streams, where the length of one bit is extended from 12.5 ns at the input to 25 ns at the output. This measurement has been repeated 30 times and always showed the desired result. The results of a bit error test are shown in table 5.1.

#### 5.3.2. Digital Scan

After a successful measurement of trigger signals, the functionality of the read-out chain has been tested by processing a digital scan (see below) of the pixel module. The bitstream resulting from this scan is many orders longer than the bitstream resulting from a trigger signal. Thus it is a good indicator for the reliability of the eBOC system.

A digital scan (or "digital test") is a scan of the functionality of the pixel modules' digital read-out part. The strobe signal is directly sent to the output of the discriminator every pixel cell on the FE-chip is equipped with (see chapter 3.4.2). Thus — as the name indicates — only the digital part of the pixel cell is tested, as well as the MCC. In the used scan configuration the digital injection is repeated 200 times for every pixel cell of the entire module. In case of all FE-chips working perfectly, every pixel cell should report 200 hits. Thus a really big amount of data, compared to a trigger signal, has to be processed by the read-out chain.

The same measurement procedure as for the trigger signals has been done for the digital scan signals, first a scan at 40 MBit/s, afterwards at 80 MBit/s, see figures 5.10 and 5.11.



Figure 5.10.: Part of the data stream caused by a digital scan at 40 MBit/s read-out speed. The colour code of the lines is the same as it has been used at the trigger response screenshots. The bitlength is again 25 ns.

Since a digital scan requires histogramming of the results to make them human readable, the ROD receives the scan results and – in case everything is working as expected – processes them. The resulting map of the pixel module is shown in figure 5.12.



Figure 5.11.: Part of the data stream caused by a digital scan at 80 MBit/s read-out speed. The bitlength is 12.5 ns for incoming signals and 25 ns for outgoing signals.

The digital scan at 80 MBit/s worked as expected. This test has been repeated several times and worked without any problems. The resulting map shows some pixels with less than 200 hits. This behaviour is a result of the rising module temperature – and especially the MCC temperature – during the test runs. In case the MCC temperature exceeds about 45°C it is not in a stable operation mode anymore. This effect already has been observed during the system tests for the current pixel detector.

To solve this problem, a cooling for the modules was manufactured. After the modules were cooled down to 15°C, the digital tests did not show these errors anymore.

#### 5.3.3. Bit Error Rate Measurement

To finally check the quality of the data transmission, a bit error rate measurement has been performed. This measurement is based on a debugging tool ("Single Events Generator") that has been used during qualification tests of the optical BOC. The tool first initializes the whole read-out chain and then generates always the same event on the MCC. Thus a known bit pattern is sent to the MCC, which generates a full event including header, trailer, error flags and trigger ID. This event is sent to the eBOC, from there transmitted to the ROD, where it is stored in an on-board memory.

The measurement routine, that runs on a computer connected via ethernet to the crate, reads the full event from the memory and stores it on the harddisk. Afterwards the bit pattern is translated to hex format to make it more human readable and the known string is searched. In hex format the string has 216 characters, every hex character corresponds to four bits. Three of the characters are used for the trigger ID and one character is the corresponding error flag. Since the trigger ID is not submitted by the Single Events Generator, these four bits are ignored.



Figure 5.12.: Map of a pixel module, showing the results of a digital scan. During the scan procedure 200 injections are sent to the digital part of each pixel cell. The number of counts received by the read-out electronics is colour coded. The read-out speed has been set to 80 MBit/s.

The bit error rate can be defined as:

$$BER = \frac{number\ of\ errors}{number\ of\ transmitted\ bits} \tag{5.1}$$

where number of transmitted bits means the bitstring without the four ignored bits.

The bit error rate measurement has been performed mainly at 80 MBit/s read-out speed and only a few times at 40 MBit/s to cross-check that the lower bandwidth mode still works reliably. The results are shown in table 5.1.

| Bandwidth | Number of    | Number of         | Bit Error Rate       |
|-----------|--------------|-------------------|----------------------|
|           | Error Counts | Bits sent         | Upper Limit (90% CL) |
| 40 MBit/s | 0            | $4.7 \cdot 10^6$  | $2.26 \cdot 10^{-8}$ |
| 80 MBit/s | 0            | $2.15 \cdot 10^7$ | $4.90 \cdot 10^{-9}$ |

Table 5.1.: Results of the bit error rate measurement, whereas the results of all runs at the particular bandwidths have been combined. The limits are calculated to 90% confidence level.

The Bit Error Rate Upper Limit has been calculated using the Bayesian method. With  $\epsilon$  being the probability of a correct data transmission, n the number of checked bits and k the number of correctly transmitted bits, one has:

$$p(\epsilon|n,k) \propto p(k|\epsilon,n) = \binom{n}{k} \cdot \epsilon^k \cdot (1-\epsilon)^{n-k}$$
$$= \frac{n!}{k! (n-k)!} \cdot \epsilon^k \cdot (1-\epsilon)^{n-k}$$

Since the number of bit errors has been 0, in this case n = k. Thus:

$$p(\epsilon|n, k = n) \propto p(k = n|\epsilon, n) = \frac{n!}{n! \cdot 0!} \cdot \epsilon^n \cdot (1 - \epsilon)^0$$
$$= \epsilon^n$$

The probability density function with full normalization is:

$$p\left(\epsilon|n, k=n\right) = \frac{p\left(k=n|\epsilon, n\right) \cdot p_0\left(\epsilon\right)}{\int_0^1 d\epsilon \cdot p\left(k=n|\epsilon, n\right) \cdot p_0\left(\epsilon\right)} = (n+1) \cdot \epsilon^n$$

The 90% lower limit on the parameters can be calculated by integration. With  $x_{q_0}$  being the quantile:

$$\int_0^{x_{q_0}} d\epsilon \cdot p(\epsilon|n, k=n) = \frac{n+1}{n+1} \cdot \epsilon^{n+1} \Big|_0^{x_{q_0}} = x_{q_0}^{n+1} \stackrel{!}{=} 0.9 .$$

Thus the bit error rate upper limit can be calculated using:

$$x_{q_0} = 1 - 0.9^{\frac{1}{n+1}} (5.2)$$

The result that no transmission errors appeared, neither at 40 MBit/s nor at 80 MBit/s and the achieved bit error limit, in the order of  $10^{-8}$  for 40 MBit/s and  $10^{-9}$  for 80 MBit/s, qualify the electrical read-out chain as a reliable testsystem for ATLAS pixel modules.

The FPGA configuration already contains the code for 160 MBit/s operation, which is basically the same as for 80 MBit/s. For both modes the same decoding routine is used, just the FPGA-internal link mapping changes. Thus the eBOC could also be used for module testing at the highest bandwidth, if the hardware output link mapping of the eBOC would be redesigned.

## 6. Summary and Outlook

The Insertable B-Layer (IBL) project requires a testing environment for the new Front End-chips (FE-chips) in the near future. Since the pixel modules for this project will be a completely new design, the module data handling will change in many ways, e.g. the used read-out bandwidth will be fixed at 160 MBit/s. The adjustment of the optical read-out chain to the IBL requirements will need a lot of time. Many of the optical components will need a redesign or at least an upgrade.

Many of the institutes that are contributing to the IBL-project will need a full data read-out chain, for example to run scans of the new FE-chips. Because on the one hand the development of the new pixel modules might be handicapped by a late availability of the optical read-out chain and on the other hand the optical components are difficult to handle, an electrical read-out system would be helpful for testing purposes.

The development of such an electrical data chain already started for the development phase of the current pixel detector. In principle the hardware was designed for all three bandwidths that appear at the current detector, 40 MBit/s, 80 MBit/s and 160 MBit/s =  $2 \times 80$  MBit/s. But finally the read-out chain was only used and configured for Layer 2 modules, which means for a read-out speed of 40 MBit/s.

In the context of this thesis, the electrical read-out chain has been adapted to the higher bandwidth modes. Especially the configurations of the FPGAs on the PP0-2 card and the eBOC have been extended and in particular the demultiplexing of 80 MBit/s data streams to two 40 MBit/s data streams has been implemented. The proof that the upgraded system works at 40 MBit/s and 80 MBit/s and the switching between these two modes during operation has been adduced. Digital scans as well as a bit error test have been run several times and worked without any errors. Detailed results can be found in chapter 5.3.

The desired 160 MBit/s mode, that is required for IBL modules, has been implemented in the FPGA configurations. But a real testing of this bandwidth was not possible due to a link mapping mismatch between eBOC and ROD, as mentioned in chapter 5.2.3. In the current design, eBOC and ROD only have two data lines per ROD-formatter in common. For a processing of module data that is sent at 160 MBit/s four data lines per formatter are needed, because the ROD only accepts streams at 40 MBit/s<sup>1</sup>.

Since the ATLAS pixel community showed interest in an electrical read-out chain, a redesign of the eBOC and the PP0-2 card should be aimed for. The most important change compared to the current eBOC design must be the correct link mapping between eBOC and ROD. Additionally a general upgrade of the eBOC is recommended. The currently used components are outdated and a more robust buffering between PP0-2 card and eBOC would be preferable. In case the cable lengths between the two cards exceeds about two meters, communication problems appear. They are caused by signal losses of the LVDS signals over the cables. Furthermore a more flexible connectivity concerning the PP0-2 card would be important, such that not only pixel modules can be connected, but e.g. also the currently developed module emulator for testing the read-out with new FE-chip's data formats. Since the FPGA on the PP0-2 card is only used to switch

<sup>&</sup>lt;sup>1</sup>see comment in chapter 5.2.4

between some input channels, a new PP0-2 card should eventually be designed without an FPGA mounted.

Once the final characteristics of the FE-chips are known, especially the data format and the question whether to implement 8b/10b data encoding or not, the redesign of the mentioned components can in principle be started. The suggested changes are with reservation, since the decisions concerning changes of the ROD are pending.

All these changes in the design of eBOC and PP0-2 card should be realizable on a much shorter timescale, than the redesign of the optical data chain. Additionally an electrical chain is easier to handle, requires no tuning of the components and provides all testing capabilities that are needed for module testing in the lab or in test-beam assembly.

## A. Appendix

# A.1. Overview of FPGA behaviours on the eBOC and PP0-2 depending on the mode select state



Figure A.1.: On the left side the FPGA behaviour on eBOC and PP0-2 level is shown for the 40 MBit/s mode (mode select bit = 1), on the right side behaviour on the same levels is shown for 80 MBit/s mode (mode select bit = 0).

## **Bibliography**

- [1] D. Griffiths, Einführung in die Elementarteilchenphysik, vol. 1, Akademie Verlag, (1996).
- [2] P. Higgs, Broken Symmetries and the Masses of Gauge Bosons, Physical Review Letters 13 (1964), doi:10.1103/PhysRevLett.13.508.
- [3] The LEP Higgs Working Group, Search for the Standard Model Higgs Boson at LEP, arXiv:hep-ex/0306033v1.
- [4] J. Ellis, Limits of the Standard Model, arXiv:hep-ph/0211168v1.
- [5] LHC Design Report, CERN-2004-003-V-2 (2004).
- [6] The ATLAS Experiment, www.atlas.ch.
- [7] The CMS Experiment, cms.cern.ch.
- [8] The LargeHadronCollider Beauty Experiment, lhcb.cern.ch.
- [9] The ATLAS Collaboration, Technical Design Report, CERN/LHCC 99-14 (1999).
- [10] H. ten Kate, ATLAS Superconducting Toroids and Solenoid, IEEE Transactions on applied Superconductivity 15(2) 1267-1270, doi:10.1109/TASC.2005.849560.
- [11] The ATLAS TRT collaboration, *The ATLAS TRT end-cap detectors*, 2008 JINST 3 P10003.
- [12] T. Flick, Studies on the Optical Readout for the ATLAS Pixel Detector, PhD Thesis, Bergische Universitaet Wuppertal (2006), URN: urn:nbn:de:hbz:468-20060600.
- [13] C. Grah, Development of the MCM-Technique for Pixel Detector Modules, PhD Thesis, Bergische Universitaet Wuppertal (2005), URN: urn:nbn:de:hbz:468-20050296.
- [14] F. Huegging, *Der ATLAS Pixel Sensor*, PhD Thesis, University of Dortmund (2001), http://hdl.handle.net/2003/2350.
- [15] M. Moll, Radiation Damage in Silicon Particle Detectors microscopic defects and macroscopic properties, PhD Thesis, DESY Hamburg (1999), DESY-THESIS-1999-040.
- [16] O. Krasel, Charge Collection in irradiated Silicon-Sensors, PhD Thesis, University of Dortmund (2004), http://hdl.handle.net/2003/2354.
- [17] K. Einsweiler, ATLAS On-detector Electronics Architecture, Draft V3.0 (2003).
- [18] J. Grosse-Knetter, Vertex Measurement at a Hadron Collider The ATLAS Pixel Detector, Habilitationsschrift, University of Bonn (2008), BONN-IR-2008-04.
- [19] J. Weingarten, System Test and Noise Performance Studies at The ATLAS Pixel Detector, PhD Thesis, University of Bonn (2007), CERN-THESIS-2008-033.

- [20] T. Stockmanns, Multi-Chip-Modul-Entwicklung für den ATLAS Pixeldetektor, PhD Thesis, University of Bonn (2004), BONN-IR-2004-12.
- [21] C. Gemme, Study of indium bumps for the ATLAS pixel detector, Nuclear Instruments and Methods in Physics Research **465** (2001) 200-203, doi:10.1016/S0168-9002(01)00390-4.
- [22] R. Beccherle and G. Darbo, MCC-I2.1 Specifications (2004), http://www.ge.infn.it/ATLAS/Electronics/MCC-I2/Specs-I2.1/Receiver.pdf.
- [23] R. Beccherle, The Module Controller Chip (MCC) for the ATLAS Pixel Detector, NIM-A 492 (2002) 117, doi:10.1016/S0168-9002(02)01279-2.
- [24] A. Korn, How to read out a 80 million channel Silicon Pixel Detector, Summer Lecture (2006), private communications.
- [25] A. G. Clark and G. Mornacchi, *B-Layer Task Force: Final Report*, (2009) ATU-SYS-MR-0001.
- [26] S. Brown and J. Rose, Architecture of FPGAs and CPLDs: A Tutorial, University of Toronto, Department of Electical and Computer Engineering (2000), http://www.eecg.toronto.edu/jayar/pubs/brown/survey.pdf.
- [27] Altera Corporation, www.altera.com.
- [28] Institute of Electrical and Electronics Engineers, www.ieee.org.
- [29] A. Mäder, VHDL Kompakt, University of Hamburg, Department Computer Sciences (2008), http://tams-www.informatik.uni-hamburg.de/vhdl/doc/kurzanleitung/vhdl.pdf.
- [30] A. Korn, private communications.

### Acknowledgements

This thesis could not have been realized without the amazing support by many people. Since I can only mention some of them explicitly, my acknowledgement includes all those, who are not named here explicitly.

First and foremost I would like to thank my family and especially my parents and sisters. Without your support and patience during the last years, I could not have done all this.

I also want to thank Prof. Dr. Arnulf Quadt and PD Dr. Jörn Grosse-Knetter for offering me this thesis and supervising me from the beginning of my study. During this thesis they provided me with many helpful suggestions and good advices. Furthermore I am grateful that they introduced me to the ATLAS pixel collaboration, where I met a lot of interesting people and really had a nice time. And of course I had the chance to be "live dabei!" in the recent developments and discussions relating to the ATLAS pixel upgrade.

Special thanks go to the people who supported me by proofreading this thesis, in particular Dr. Jens Weingarten, Dr. Kevin Kröninger, Nina Krieger and my sister Julia. I also want to thank Andreas Korn for the support, the patience and the discussions during the last months. Moreover I want to thank my office mate Steffen Klemer for the productive discussions and the fun during my study.

Furthermore I would like to thank all the members of the  $2^{nd}$  Institute of Physics for the enjoyable and friendly atmosphere and for giving me the chance to make new friends.

Last but not least I want to thank all of my friends for the support and fun over the last years and especially the last months.

<sup>&</sup>lt;sup>1</sup>can not be translated