













## **Pipeline Performance Measures**

- Cycle time: t<sub>c</sub>
  - is determined by the worst-case processing time of the longest stage
- Repetition Rate: R
  - the shortest possible time interval between subsequent independent instructions in the pipeline
- Performance potential of a pipeline: P

$$P = 1/(R * t_c)$$

• PowerPC603 FP double Mul. e.g. R = 2,  $t_c = 12$  nsec  $P = 1/(R * t_c) = 1/(2*12$ nec) = 44.6 MFLOPS

## Performance: RAW-dependent

- · Latency:
  - → specifies the amount of time that the result of a particular instruction takes to become available in the pipeline for a subsequent dependent instruction.
- Define-use latency (10 to 100 cycles)
  - → mul r1, r2, r3
  - → add r5, r1, r4
- Load-use latency (1 to 3 cycles)
  - →load r1, x
  - → add r5, r1, r2
- Stalled: the immediately following RAW-dependent instruction has to be stalled in the pipeline for n-1 cycle

# **Improve Performance**

- Multiple-operation instructions
- HP PA 7100
   FMPYADD RM1, RM2, RM3, RA1, RA2
   RM3 \u2208 RM1\*RM2 RA2 \u2208 RA1+RA2
- PowerPC
  - → FMA for performing (A\*C) + B











# • data and control dependencies occur more frequently • stalled and wait for data • reload pipe in case of branch • subtask becomes less balances (in execution time) • cycle time is determined by the worst-case processing time of the longest stage • In most case • 5-10 stages





# Bypasses (data forwarding in RAW)

- · Unless special arrangements are made,
- the results of the operation instruction is written into the register file, or into the memory,
- and then it is fetched from there as a source operand.































































