## E\&CE 327 Final Solution <br> 2011t1 (Winter)

## Q1 (20 Marks) Short Answer

## Q1a (10 Marks) Hardware Description Languages

One of the annoying features of VHDL and Verilog is that it can be difficult to predict what hardware will be synthesized for a VHDL or Verilog program, because the semantics of these languages defines the behaviour of a digital system, but says nothing about the structure of the hardware.

Would a hardware description language whose semantics defined both the structure and the behaviour of hardware be an improvement over VHDL and Verilog? For full marks, you must justify your answer.


#### Abstract

Answer: A hardware description language whose semantics define the structure of hardware will not be an improvement over VHDL and Verilog.

The disadvantages of the new language include: - Prevent synthesis tools from doing any optimizations beyond those defined in the original semantics of the language. - Restrict or prevent the use of new implementation technologies (e.g. FPGA cells) - Each use of an operation such as " + " will have to have a syntactic mechanism to define how it will be implemented. For example, we could have the default adder be ripple carry, and then use attributes to select a different type of adder (e.g. a +'carry_select b). - Similarly to operators, enumerated datatypes and constants would need to have their encoding specified, preventing optimizations by synthesis tools.


The only advantage is that the new language would clearly define what is synthesizable. This could be the entire language, which would be a disadvantage compared to VHDL and Verilog. If there are parts of the language that are not synthesizable, this could complicate the semantics, because for some parts of the language, the structural semantics would be something similar to NULL or "unsynthesizable".

## Q1b (10 Marks) Functional Verification

Your task is to design a small ( $16 \times 16$ pixel $)$ test image for functional verification of Kirsch edge detectors. The goal is to have an image that is small enough that timing simulation can be done quickly, but yet still have high confidence that an edge detector that computes the edges and directions correctly for the test image will work correctly for all images.

Using words and/or pictures, describe an image that meets these goals.

## Q2 (20 Marks) Dataflow Diagram

In this question, you will design a dataflow diagram for the following equation:

$$
(a+b) \times(b+c) \times(c+d) \times(d+e)
$$

## NOTES:

1. There is a maximum of two inputs ports
2. The only algebraic optimizations you may use are commutativity and associativity:

|  | Commutativity | Associativity |
| :--- | :---: | :---: |
| Addition | $\alpha+\beta=\beta+\alpha$ | $(\alpha+\beta)+\gamma=\alpha+(\beta+\gamma)$ |
| Multiplication | $\alpha \times \beta=\beta \times \alpha$ | $(\alpha \times \beta) \times \gamma=\alpha \times(\beta \times \gamma)$ |

3. Your goals are, in order of priority, from highest priority to lowest:
(a) Maximize throughput
(b) Maximize clock speed
(c) Minimize area
(d) Minimize latency
4. All signals are declared as signed (15 downto 0)
5. Do not worry about overflow
6. Inputs shall be registered. Outputs may be either combinational or registered.
7. The relative areas of the components are:

16-bit register $\quad 2 \quad 16$-bit input port 5
16-bit 2:1 multiplexer $1 \quad 16$-bit output port 5
16-bit adder 2
16-bit multiplier 5
8. The multiplier has a latency of two clock cycles, throughput of $1 / 2$, and a minimum clock period of $M$. You may draw a multiplier as shown below. The multiplier has an internal register that is used to hold temporary results for the second clock cycle.

9. Maximum clock period $=$ flop $+\max ($ add, $M)$
10. The dataflow diagram is to be drawn on the next page.

## Q2a (14 Marks) Dataflow Diagram

Draw a dataflow diagram that satisfies the requirements and goals given above.

## Answer:

1. Data dependency graph

2. Dataflow diagram


To read in 5 input values with a maximum of 2 input ports requires three clock cycles.
Therefore, maximum throughput will be 3 .
Design all stages to support a throughput of 3 , then do wiring to reduce registers and muxes.

## Marking:

## Baselines

14 marks 2-input, throughput=1/3
10 marks 3 -inputs, throughput $=1 / 2$, clock period $=$ flop $+\max (M, a d d)$
8 marks 3 -inputs, throughput $=1 / 2$, clock period $=$ flop $+\max (M, a d d)+M$
8 marks 3 -inputs, throughput=1, clock period $=$ flop $+\max (M, a d d)$
8 marks 2-inputs, unpipelined, throughput=1/7

## Deductions

-1 mark registered outputs (increases latency)
-1 mark non-overlapping pipe stages (if overlapping would have reduced latency)
-1 mark for each sub-optimal number of multipliers, registers, adders, multiplexers in comparison to baseline

## Q2b (6 Marks) Analysis

Based upon your diagram, fill in the table below:

## Answer:

| Number of stages | 3 | Number of 16-bit registers (not in- | 8 |
| :--- | :---: | :--- | :--- |
| Latency | 7 | cluding registers inside the multipli- |  |
| Minimum clock period | flop $+\max ($ add, $M)$ | ers) |  |
| Maximum throughput | $1 / 3$ | Number of 16-bit 2:1 multiplexers | 0 |
|  |  | Number of 16-bit adders | 2 |
|  |  | Number of 16-bit multipliers | 3 |

Marking:
Marking is done with respect to the dataflow diagram drawn for part a.
6 all correct
51 mistake
4 2-3 mistakes
3 4-5 mistakes
2 6-7 mistakes
1 some correct information

## Q3 (20 Marks) Timing Analysis

In this question, you will analyze two circuits to determine if a path is a false path.

| Gate | Delay |
| :---: | :---: |
| $D \infty$ | 2 |
| $D D D$ | 4 |
| $D$ | 6 |

## Q3a (10 Marks) First circuit

For the circuit below, find the longest path and determine if the longest path is a false path.
If the longest path is a false path, explain why it is false.
If the longest path is not a false path, give a set of values on the primary inputs that will exercise the path.


## Marking:

10 marks correct answer
8 marks excitation uses ' 1 ' rather than rising edge
7 marks false path: explain that have contradiction in finding non-controlling values for side inputs
5 marks false path: give definition false path
-4 marks incorrect longest path

## Q3b (10 Marks) Second circuit

For the circuit below, the longest path $\langle c, e, g, i, k, m, n\rangle$ is a false path. Find the second longest path and determine if it is a false path.
If the second longest path is a false path, explain why.
If the second longest path is not a false path, give a set of values on the primary inputs that will exercise the path.

## Answer:

When looking at constraint for non-controlling values on all side inputs, find that there is a contradiction between $k[i]=1$ leading to $g=1$ and $h[j]=1$ leading to $g=0$.

However, $k[i]$ is a late-arriving side input. The path for this late-arriving side input is c,e,g,i.

The viability condition for the late-arriving side path is: $i[d]=1$.
The condition for $k[h]$ to have a controlling value is: $h=0$, which leads to $\bar{a} b$ on the inputs.
Putting together the different parts of the viability condition for the candidate path, we have:

The final viability condition is: $\bar{a} b \bar{c} d$, which give an excitation of:


## Marking:

10 marks correct answer
9 marks excitation uses static values rather than edges for $b$ or $c$
7 marks false path: explain that have contradiction on $g$ coming from $k[i]$ and $h[j]$
5 marks false path: give definition false path
-4 marks incorrect longest path

## Q4 (20 Marks) Power

Your manager has asked you to design a clock gating scheme for the circuit described below (the information describes the circuit without any clock gating):

| Latency | 10 clock cycles |
| :--- | :---: |
| Throughput | 1 |
| Clock speed | 800 MHz |
| Percentage of input parcels that are valid | $40 \%$ |
| Average number of consecutive valid parcels | 20 |
| Short circuiting power | 1 mW |
| Leakage power | 3 mW |
| Switching power | 5 mW |

You are evaluating two clock gating schemes:
Option A A single "cool", or gated, clock is used for the entire pipeline.
Option B Each stage of the pipeline has its own gated clock.

## Q4a (14 Marks) Clock gating scheme

Your manager has asked you to choose which clock-gating scheme to implement based upon the information above and reasonable assumptions. She has suggested two assumptions:

1. Option A and option B will both have an effectiveness of $100 \%$.
2. The clock-enable circuitry for option A and option B will both have negligible power consumption compared to the main circuit.
If you need to make any additional assumptions for Question Q4a, list them here:

## Answer:

- Assume that the pipeline is 10 stages and that a parcel spends exactly one clock cycle in each stage.

Which clock gating scheme (option A or B) will result in the lowest power consumption, and what will be the total power consumption if you use this scheme?

## Answer:

$40 \%$ of the parcels are valid
20 consecutive valid parcels
window $=20 / 0.40=50$ clock cycles
The latency through a single stage is 1 clock cycle, so each stage needs its clock-enable turned on for 21 clock cycles.

Percentage of time that clock is turned on $=21 / 50=42 \%$
Clock gating affects switching and short-circuiting power.
Therefore total power is $3+0.42(1+5)=5.52 \mathrm{~mW}$

## Marking:

8 marks calculation of percentage of time that clock is turned on
4 marks window length
4 marks number of clock cycles that clock turned on
-4 marks ignore latency and use only the percentage of clock cycles that have valid input data

3 marks correct use of different types of power with respect to activity factor (1 mark for each type of power)
2 marks justification for choice of clock gating scheme
1 mark putting the pieces together for the final answer

## Q4b (3 Marks) Burst length

Late on a Friday afternoon, your manager drops by your desk and says that the original estimate for the average number of consecutive valid parcels was too low. Assuming everything else stays the same, can you determine whether increasing the number of consecutive valid parcels will decrease, have no effect on, or increase the total power consumption? For full marks, you must justify your answer.

## Answer:

If answering for option B:
Total power consumption will not change significantly.
Increasing the number of consecutive valid parcels without increasing the percentage of valid parcels will lead to longer windows. If the latency was significant compared to the window length, then increasing the window length will lead to a significant increase in the percentage of clock cycles in which the clock can be turned off. However, because the latency through a single stage is just one clock cycle, increasing the number of consecutive valid parcels has a negligible effect on the total power consumption. This can be seen with a simple example of doubling the number of consecutive valid parcels:
window length $=40 / 0.4=100$
PctClk $=41 / 100=41 \%$, in comparison to $42 \%$ for the original situation.
If answering for option $A$ :
Total power will decrease.
Using the scenario from above, if the number of valid parcels increses to 40, the number of clock cycles that the circuit needs to be powered will go from 30 to 40, which will cause PctClk to go from $30 / 50=60 \%$ to $40 / 100 \%=40 \%$.

In general, if we add $\Delta$ clock cycles of valid input parcels:

## Marking:

3 marks correct, complete, and clear
2 marks some analysis of effect on window length
1 mark some correct information

## Q4c (3 Marks) Clock speed

Another Friday afternoon, just as you are packing up to head home, your manager drops by again and says that when the first chips come back from fabrication, the maximum clock speed at which they worked correctly was 750 MHz , rather than the 800 MHz of the original circuit on the same fabrication line. Can you make any hypothesis as to what could have caused the drop in clock speed? For full marks, you must justify your answer.

## Answer:

The drop in clock speed is most likely caused by clock skew, which would be caused by the and gate on the clock signal. By clock-gating each pipe stage individually, each stage will have a slightly different clock skew than its neighbors due to manufacturing variances in gate sizing, etc.

Another reasonable answer is to say that the clock gating circuit introduces a new critical path that limits the clock speed to 750 MHz .

## Marking:

3 marks correct, complete, and clear
2 marks discussion of clock-gating circuit affecting clock signal or delay
1 mark some correct information

## Q5 (20 Marks) Faults and Testing

The hot topic at last years Manufacturing Faults Conference was the "Single-stuck-at-1" fault model. In keeping with the rapid pace of research in digital hardware, this year, the focus changed to the "Single-stuck-at-0" (SS0) fault model.

## Q5a (15 Marks) Test Vectors

Find the minimum set of test vectors to catch all SS0 faults in the circuit below, and list the test vectors in the order to run them, from first to last.

## NOTES:

1. If you do not know how to answer the question using the SS0 fault-model, then for partmarks, you may answer the question using the fault model that we use in ECE-327. If you do so, write a $\sqrt{ }$ in the box: $\square$

2. Write an " $X$ " in the box for any test vector that is not needed.
3. A colleague who really enjoys Karnaugh maps started to work on this problem, but fell asleep at his desk because he had been up all night working on a project for his son's nursery school. His partially completed work is on the next page. The work that he completed is correct.
4. There are copies of the circuit and Karnaughmap templates on the pages following this question.

## Answer:



Gate collapsing


L1@0=L5@0=L7@0

L3@0=L4@0=L9@0

L9@0=L11@0

L2@0=L10@0
L6@0=L10@0


Required test vectors: abcd, abcd

Remaining faults:

L8@0


Choose either $\overline{\mathrm{a}} \overline{\mathrm{b}} \mathrm{cd}$ or $\mathrm{a} \overline{\mathrm{c}} \mathrm{c} d$

Order to run: abcd $\bar{a} b c d \quad \bar{a} \bar{b} c d$
111101110011

## Marking:

2 marks equation for non-faulty circuit
4 marks fault locations and model

4 marks difference $k$-maps
3 marks calculation and selection of test vectors
2 marks ordering of test vectors
-2 marks inclusion of superfluous test-vectors
-2 marks Boolean-algebra error

## Q5b (5 Marks) Detected Faults

For each of the two faulty circuits below, answer whether the set of test vectors that you chose in Question Q5a will detect that the circuit is faulty. For full marks, you must justify your answer.

|  | Detected |  |
| :---: | :---: | :---: |
| Faults present in circuit | Yes | No |
| a shorted to 0 and e shorted to 0 | $\mathbf{X}$ | $\square$ |


#### Abstract

Answer: Detected, because the two faults are equivalent and $\mathbf{e}$ shorted to 0 was used to generate the test vector $\bar{b} c d$.


## Detected

Fault present in circuit


[^0]
[^0]:    Answer:
    Not detected. The equation for the circuit with $\mathbf{f}$ shorted to 1 is $a+\bar{b}$, which has the same behaviour as the correct circuit for the three test vectors.

