# 3. Method of Logical Effort¶

In this chapter, we introduce the *method of logical effort* to design
fast digital circuits. This method is based on the *Model of Logical Effort*
for the delay of digital circuits. The method of logical effort is
sufficiently accurate and simple enough to facilitate paper-and-pencil
design of fast circuits. Experiment 3.1 offers a first glance
at the problem of designing a fast inverter chain.

The digital circuit below is a chain of four inverters and a load capacitance, which represents the input capacitance of one or more logic gates. The delay of the circuit is the time it takes the inverter chain to charge or discharge the load capacitance after the input changes. This delay is proportional to the transistor gate capacitances and, therefore, the size of the inverters and the load. You can vary the scale factors of the inverters and the size of the load with the sliders below the schematic. The stage-1 inverter is a reference inverter. Try varying the scale factors of the stage-2, stage-3, and stage-4 inverters to minimize the delay for a given load.

Inverter 2: 1 Inverter 3: 1 Inverter 4: 1

Load: 400 Delay: 407

For a capacitive load of 1200 units, you should be able to reduce the delay from 407 time units when using reference inverters with scale factor 1 by a significant factor (close to 20) when choosing the inverter sizes properly. In fact, gate scaling enables us to reduce the delay to its square root.

If Experiment 3.1 makes you wonder whether there is a guiding
principle to design a fast circuit, this chapter should satisfy your
curiosity. We study the delay of digital circuits, beginning with a
single gate. Then, we consider circuit topologies with gradually
increasing complexity, and introduce techniques for analyzing their
delay. As a whole, these techniques constitute the **method of
logical effort**, perhaps best perceived as a way of thinking about
the design of fast digital circuits. The hallmark of the method is
not merely its use for timing analysis but the insight it provides on
how to minimize the delay of a circuit. Ivan Sutherland and Bob
Sproull, two contemporary computer scientists, pioneered the method of
logical effort [SSH99]. In this chapter, we introduce the method
of logical effort in a refined form.

## 3.1. Delay of Logic Gates¶

The *Model of Logical Effort* defines the dimensionless delay of a logic gate as

where \(g\) is the logical effort, \(h\) is the electrical effort, and \(p\) is the parasitic delay.

Logical effort and parasitic delay are gate specific, and are constant for a particular gate. For example, a reference inverter has logical effort \(g_{inv} = 1\) and parasitic delay \(p_{inv} = 1.\) These values do not change if we scale the reference inverter in size by increasing its transistor widths. An inverter that is twice as large as the reference inverter still has logical effort \(g_{inv} = 1\) and parasitic delay \(p_{inv} = 1.\)

In contrast, the electrical effort of a logic gate depends on its size. If a gate with input capacitance \(C_{in}\) drives capacitive load \(C_L,\) then the electrical effort of the gate is

Because input capacitance \(C_{in}\) is proportional to the size of the logic gate, the larger the gate, the smaller electrical effort \(h.\) Consider the reference inverter with normalized input capacitance \(C_{inv} = 3.\) If we scale the reference inverter with scale factor \(\gamma \ge 1,\) then the scaled inverter has input capacitance \(C_{in} = \gamma C_{inv} = 3 \gamma.\) The electrical effort of the inverter is

which is by factor \(\gamma\) smaller than the electrical effort of the reference inverter with the same load. In a digital circuit consisting of an inverter driving capacitive load \(C_L,\) the delay of the inverter is

Experiment 3.2 enables you to choose load \(C_L,\) and vary the size of the inverter by means of scale factor \(\gamma\) in range \(1 \le \gamma \le 10.\) Observe that for a given load, increasing the size of the inverter reduces the delay according to Equation (1).

Inverter: 1 Load: 15 Delay: 6

The delay of the inverter in Experiment 3.2 is a linear function of the electrical effort. When designing circuits, we may vary the electrical effort of a logic gate by gate sizing or by changing its load. Figure 1.45 shows the delay of an inverter as a function of electrical effort \(h.\) Also shown in Figure 1.45 is the delay of a NAND gate. Since the NAND gate has a logical effort of \(g_{nand}=4/3,\) its delay grows faster with increasing \(h\) than for the inverter. The delay of a NOR gate with \(g_{nor} = 5/3\) increases even faster with increasing \(h\) than the delay of the NAND gate.

Compare the delay of the reference inverter with load \(C_L = 18\) capacitive units to that of a scaled inverter with \(\gamma = 4.\)

- Compute the delay of the reference inverter (\(\gamma = 1\)).
- Compute the delay of the scaled inverter for \(\gamma = 4.\)
- Use Experiment 3.2 to verify the delays in (a) and (b).

The delay of a reference inverter is \(d = g_{inv} h + p_{inv} = h + 1,\) because \(g_{inv} = 1\) and \(p_{inv} = 1\) by definition. Electrical effort \(h = C_L / C_{inv} = 18 / 3 = 6,\) because the reference inverter has input capacitance \(C_{in}(inv) = C_{inv} = 3.\) We conclude that the delay of the reference inverter driving load \(C_L = 18\) is

time units.

If we scale the inverter with scale factor \(\gamma = 4,\) the pMOS transistor has normalized width \(W_p = \gamma \cdot 2 = 8\) and the nMOS transistor \(W_n = \gamma \cdot 1 = 4\) units. Thus, the input capacitance is scaled by \(\gamma\) to become \(C_{in} = \gamma \cdot C_{inv} = 4 \cdot 3 = 12.\) As a consequence, the electrical effort to drive load \(C_L = 18\) decreases to \(h = 18 / 12 = 3/2,\) and the delay of the scaled inverter is

time units. Increasing the size of the inverter by a factor of four decreases its delay by a factor of \(14/5,\) or almost by a factor of three.

We can verify both delays in Experiment 3.2 by adjusting the load to 18 capacitive units, and then observing the delays by adjusting the inverter scale factor to \(\gamma = 1\) and \(\gamma = 4.\)

Draw the delay points at the boundaries of the slider ranges of inverter scale factor and load in Experiment 3.2 in a \(d(h)\)-diagram as shown in Figure 1.45.

Use Experiment 3.2 to determine the delays for scale factors \(\min \gamma = 1\) and \(\max \gamma = 10\) and loads \(\min C_L = 1\) and \(\max C_L = 30.\) The corresponding pairs of electrical efforts \(h = C_L / \gamma C_{inv} = C_L / 3 \gamma\) and delays are:

\(\gamma\) | \(C_L\) | \(h\) | \(d\) | |
---|---|---|---|---|

1 | 1 | 1 | 0.33 | 1.33 |

2 | 1 | 30 | 10 | 11 |

3 | 10 | 1 | 0.03 | 1.03 |

4 | 10 | 30 | 1 | 2 |

The \(d(h)\)-diagram below shows the four delay points on the inverter line.

Compare the delay of the matched NAND gate with load \(C_L = 18\) capacitive units to that of a scaled NAND gate with \(\gamma = 6.\)

- Compute the delay of the matched NAND gate (\(\gamma = 1\)).
- Compute the delay of the scaled NAND gate for \(\gamma = 6.\)
- Draw the delay points of (a) and (b) in a \(d(h)\)-diagram as shown in Figure 1.45.

The delay of a matched 2-input NAND gate is \(d = g_{nand2} h + p_{nand2},\) where \(g_{nand2} = 4/3\) and \(p_{nand2} = 2.\) The electrical effort of the NAND gate with load \(C_L =18\) is \(h = C_L / C_{in}(nand2).\) We deduce the input capacitance of the matched NAND gate given that we recall logical effort \(g_{nand2} = C_{in}(nand2)/C_{inv} = 4/3.\) Rearranging the definition of \(g_{nand2},\) we find \(C_{in}(nand2) = g_{nand2} C_{inv} = 4/3 \cdot 3 = 4\) capacitive units. Now, for given load \(C_L = 18\) we obtain \(h = 18/4 = 9/2,\) and the delay of the matched 2-input NAND gate is

time units.

Scaling the NAND gate with scale factor \(\gamma = 6\) yields a normalized input capacitance of \(C_{in}(nand2) = \gamma \cdot 4 = 24\) capacitive units. Correspondingly, the electrical effort decreases by a factor of 6 to \(h = C_L / C_{in}(nand2) = 18 / 24 = 3/4.\) Since scaling affects neither the logical effort nor the parasitic delay of a gate, the delay of the scaled NAND gate driving load \(C_L = 18\) is

time units, which is almost three times faster than the matched NAND gate.

The \(d(h)\)-diagram below shows delay point 1, \((d,h) = (9/2, 8),\) and delay point 2, \((3/4, 3).\) Both points lie on the delay line of the 2-input NAND gate.

## 3.2. Delay of Two-Stage Paths¶

In this section we study the delay of a two-stage path with two gates in series. Experiment 3.3 examines an inverter pair.

Consider the path below with back-to-back inverters driving a load capacitance. You can vary the size of the inverters and the load with the sliders below the schematics, and inspect the associated circuit delay.

- Vary the size of the stage-2 inverter. Notice that the delay increases towards both small and large sizes. The minimum delay of 8.33 units occurs at scale factor \(\gamma_2 = 3\) for inverter 2, assuming inverter 1 is sized like a reference inverter, \(\gamma_1 = 1,\) and the load is \(C_L = 30\) capacitive units.
- Double the size of inverter 1, \(\gamma_1 = 2,\) and vary the size of inverter 2. The qualitative behavior is the same as in experiment 1: The minimum delay of 6.5 units occurs at scale factors \(\gamma_2 = 4\) and \(\gamma_2 = 5\) for inverter 2.
- Increase the load to \(C_L = 60\) capacitive units, and vary the size of inverter 2. The qualitative behavior is the same as in experiment 1. The minimum delay is 8.33 units, if inverter 1 is scaled by \(\gamma_2 = 2.\)
- Increase the size of inverter 1 to scale factor \(\gamma_1 = 10.\) Now, the delay decreases monotonically with increasing size of inverter 2. The minimum delay with a load of size 60 is 5 time units.

We observe that the delay is sensitive to the size of inverter 2. There exists a minimum delay, if the size of inverter 2 is chosen carefully, neither too small nor too large.

Inverter 1: 1 Inverter 2: 3

Load: 30 Delay: 8.33

In the following we study the delay of the generic path in Figure 3.1 with two gates in series, gate 1 and gate 2, and gate 2 drives capacitive load \(C_L.\) Gate 1 has logical effort \(g_1,\) parasitic delay \(p_1,\) and is scaled such that its input capacitance is \(C_1.\) Analogously, gate 2 has logical effort \(g_2,\) parasitic delay \(p_2,\) and input capacitance \(C_2.\)

The delay of the path is the sum of the delays of each gate. The input capacitance of gate 2 is the load capacitance of gate 1. Thus, gate 1 has electrical effort \(h_1 = C_2/C_1\) and delay

Gate 2 has electrical effort \(h_2 = C_L/C_2\) and delay

The **path delay** is the sum of the gate delays:

Writing path delay \(D\) in terms of stage efforts \(f_1\) and \(f_2\) yields

which has the same form as the delay of a single gate, i.e. the path
delay is the sum of the **path effort F** and **path parasitic delay
P**.

We wish to minimize path delay \(D,\) assuming that we can resize gate 2. If the scale factor for gate 2 is our only design parameter, then input capacitance \(C_2\) is the only free variable in \(D.\) In particular, since parasitic delay \(P\) remains constant under sizing, minimizing \(D\) requires minimizing path effort \(F = f_1 + f_2.\)

Notice that the product of the stage efforts

is independent of \(C_2.\) Therefore, the geometric mean of the
stage efforts \(\sqrt{f_1\,f_2}\) remains constant under changes
of \(C_2.\) Since the logical efforts and capacitances of a
circuit are positive numbers, we can apply the *Theorem of
Arithmetic and Geometric Means* to minimize the
path effort as follows. The geometric mean of the stage efforts
cannot be greater than their arithmetic mean

and equality holds if and only if the stage efforts are equal,
\(f_1 = f_2.\) Furthermore, according to the *corollary of
the minimium arithmetic mean*, the arithmetic mean
assumes its minimum for equal stage efforts, if the geometric mean is
constant. Therefore, we conclude that **the path effort is minimal if
the stage efforts are equal**:

This is the crucial insight that enables us to minimize the path delay by sizing gate 2. Gate 2 must have an input capacitance \(C_2\) that yields equal stage efforts:

With this choice of \(C_2,\) let \(\hat{f}\) be the **stage
effort that minimizes the path effort**, i.e.

Then, we can write the minimal path effort as:

We introduce the **path logical effort G** \(= g_1 g_2\) and the **path
electrical effort H** \(= h_1 h_2 = C_L/C_1,\) such that \(\hat{f} =
\sqrt{G H}.\) Then, we can write the minimum path delay \(\hat{D}
= 2 \hat{f} + P\) of the two-stage path as:

Path logical effort \(G\) and path electrical effort \(H\) are easily identified in a given circuit. Path logical effort \(G = g_1 g_2\) is the product of the logical efforts of the gates on the path. Path electrical effort \(H = C_L/C_1\) is the fanout of the path, i.e. the ratio of the load capacitance and the input capacitance of the path. The path electrical effort is also the product of the stage electrical efforts

Capacitance \(C_2\) serves as both input and load capacitance and cancels out. Thus, to minimize the path delay of a two-stage path, all we need to do is identify \(G\) and \(H,\) because \(\hat{f} = \sqrt{GH}.\)

In summary, given a two-stage circuit with load capacitance \(C_L\) and gate 1 with input capacitance \(C_1,\) we may size gate 2 to minimize the path delay. Stage effort \(\hat{f} = \sqrt{GH}\) determines input capacitance \(C_2\) of gate 2.

If we wish to know the scale factor for gate 2, we recall two facts
from the *Model of Logical Effort*: (1) the logical
effort of a gate is the ratio of the input capacitance of the
*matched gate* and the reference inverter with
input capacitance \(C_{inv} = 3\):

and (2) the input capacitance of a sized gate is scale factor \(\gamma\) times the input capacitance of the matched gate:

These basic facts determine scale factor \(\gamma\):

In case of gate 2 of the two-stage path, given \(C_2\) we obtain scale factor \(\gamma_2\):

The following examples illustrate the design of two-stage paths.

We analyze the path delay of the inverter pair in Experiment 3.3. In terms of the generic two-stage path in Figure 3.1 both gates are inverters with identical logical efforts, \(g_1 = g_2 = g_{inv} = 1,\) and identical parasitic delays, \(p_1 = p_2 = p_{inv} = 1.\) The input capacitance of a sized inverter is \(C_{in} = \gamma C_{inv} = 3 \gamma.\) The scale factors of the inverters, \(\gamma_1\) and \(\gamma_2,\) are determined by the position of the sliders in Experiment 3.3. Therefore, the input capacitances are \(C_1 = 3 \gamma_1\) and \(C_2 = 3 \gamma_2.\) Furthermore, load capacitance \(C_L\) can be adjusted with the slider in Experiment 3.3.

Given input capacitance \(C_1\) and load capacitance \(C_L,\) we wish to minimize the delay of the path. Path delay \(D\) is minimal if the stage efforts are equal to \(\hat{f} = \sqrt{G H}.\) Path logical effort \(G\) is the product of the logical efforts of the inverters:

Path electrical effort \(H\) is the ratio of load capacitance \(C_L\) and path input capacitance \(C_1\):

We conclude that

Thus, we minimize the path delay by choosing \(C_2\) such that both stage efforts are equal to \(\hat{f}.\) We have several equations available to calculate \(C_2.\) First, there is

Second, we may use the fact that stage effort \(f_2\) must be equal to \(\hat{f}\):

and, third, stage effort \(f_1\) must be equal to \(\hat{f}.\) All three equations yield the same result for \(C_2.\) With this choice of \(C_2,\) and path parasitic delay \(P = p_1 + p_2 = 2,\) the minimum path delay is

Hence, given \(\gamma_1\) and \(C_L,\) we minimize the path delay by adjusting scale factor \(\gamma_2\) such that

This knowledge takes the guesswork out of Experiment 3.3. Instead, we can predict the delay and design the circuit for minimum delay. For example, given \(\gamma_1 = 1\) and \(C_L = 48,\) we choose \(\gamma_2 = 4\) to obtain minimum path delay \(\hat{D} = 10.\) As another example, consider \(\gamma_1 = 10\) and \(C_L = 60.\) We would obtain minimum path delay \(\hat{D} \approx 4.83\) with \(\gamma_2 = \sqrt{200} \approx 14.14.\) Thus, the minimum path delay lies outside the slider range for inverter 2.

Consider Experiment 3.3 with fixed load \(C_L = 21.\) Then, the path delay is:

If both inverters are reference inverters, i.e. \(\gamma_1 = \gamma_2 = 1,\) then the stage efforts of the inverters are

and the path delay is

We find that the stage efforts are unbalanced, with the second inverter bearing seven times the effort of the first inverter. If we increase the size of the second inverter to bear stage effort \(f_2 = 1,\) then scale factor \(\gamma_2\) must be \(\gamma_2 = 7 / f_2 = 7.\) Since \(C_2\) is the load capacitance of the first inverter, changing \(\gamma_2\) affects stage effort \(f_1.\) Thus, resizing the second inverter to have \(\gamma_2 = 7\) results in stage efforts

and path delay

The delay is the same, but the burden of the larger stage effort has shifted from stage 2 to stage 1. If we keep \(\gamma_1 = 1\) constant, then we can express path delay \(D\) as a function of scale factor \(\gamma_2\):

The plot \(D(\gamma_2)\) in Figure 3.2 shows that the path delay assumes its minimum if the stage efforts are equal, i.e. for \(\gamma_2 = \sqrt{7}.\)

Minimize the delay of the 2-stage path using calculus.

- Derive the stage delay for each stage.
- Express path delay \(D\) as a function of scale factor
\(\gamma\) and minimize \(D(\gamma)\) using
*calculus*. - Compute minimum path delay \(\hat{D}\) and the associated scale factor \(\gamma\) for the NOR gate.

The 2-stage path has a 2-input NAND gate in stage 1 and a 2-input
NOR gate in stage 2. According to the *Model of Logical Effort*, the
NAND gate incurs delay

and the NOR gate has delay

The normalized input capacitances depend on the scale factors. The NAND gate is scaled by factor 2, such that \(C_{in}(nand2) = 2 \cdot 4 = 8,\) because the normalized input capacitance of the matched 2-input NAND gate is 4. Scale factor \(\gamma\) of the NOR gate is variable. Since the matched 2-input NOR gate has a normalized input capacitance of 5 units, we have \(C_{in}(nor2) = 5 \gamma.\) Given load capacitance \(C_L = 10,\) the stage delays are

and

The path delay is the sum of the stage delays:

and is a function of \(\gamma.\) *Calculus* tells
us that function \(D(\gamma)\) has a minimum if the derivative is zero. The
derivative of path delay \(D(\gamma)\) is

Setting the derivative to zero yields \(\gamma\):

Since the scale factor for a CMOS gate must be positive, we find \(\gamma = 2.\) The minimum path delay is

time units. Strictly speaking, we should also inspect the second derivative to be sure that \(\gamma = 2\) yields a minimum rather than a maximum.

As circuit designers with a choice to size the NOR gate, we minimize the delay of the 2-stage path by scaling the NOR gate with factor \(\gamma = 2,\) and obtain a minimum path delay of \(\hat{D} = 22/3\) time units.

Use the method of logical effort to minimize the delay of the 2-stage path in Exercise 3.4.

- Determine path logical effort \(G,\) path electrical effort \(H,\) and path parasitic delay \(P.\)
- Calculate best stage effort \(\hat{f} = \sqrt{G H}.\)
- Calculate minimum path delay \(\hat{D} = 2 \hat{f} + P\) and the associated scale factor \(\gamma\) of the NOR gate.

We begin by identifying the characteristic metrics of the 2-stage path. Path logical effort \(G\) is the product of the gate logical efforts:

Path electrical effort \(H\) is the ratio of the load capacitance of the path and the path input capacitance:

Path parasitic delay \(P\) is the sum of the gate parasitic delays:

The best stage effort that minimizes the path delay is

When each stage of the 2-stage path bears stage effort \(\hat{f}\) the minimum path delay is

To determine scale factor \(\gamma\) of the NOR gate we exploit the fact that the normalized input capacitance of the scaled NOR gate is \(C_2 = \gamma C_{in}(nor2),\) where \(C_{in}(nor2) = 5\) is the input capacitance of the matched NOR gate, and that best stage effort \(\hat{f} = g_2 h_2\):

Alternatively, we could exploit the fact that stage 1 of the circuit, the NAND gate, must bear the best stage effort as well, i.e. \(\hat{f} = g_1 h_1\) to minimize the path delay.

## 3.3. Multistage Paths¶

An \(n\)-stage path provides more than just a straightforward generalization of the two-stage path to \(n \ge 1\) stages. The number of stages of a path can serve as additional design parameter for a circuit, because the delay of a path is not only a function of the gate sizes but is also a function of the number of stages. In this section we study the path sizing problem of how to choose the number of stages together with the gate sizing problem in order to minimize its delay.

### 3.3.1. Gate Sizing¶

The **gate sizing problem** determines the scale factors for the gates
on an \(n\)-stage path in order to minimize its delay.
Figure 3.3 shows a generic path with \(n\) gates in
series and load capacitance \(C_L.\) In stage \(i,\) \(1
\le i \le n,\) logic gate \(i\) has input capacitance \(C_i,\)
logical effort \(g_i,\) electrical effort \(h_i,\) and
parasitic delay \(p_i.\)

The path delay of the \(n\)-stage path is the sum of the stage delays \(d_i\):

The path logical effort of the \(n\)-stage path is

the path electrical effort is

and the path parasitic delay is

Our goal is to minimize path delay \(D = \sum_{i=1}^n (g_i h_i) +
P\) under the constraint of a constant path electrical effort \(H
= C_L/C_1.\) Observe that constant \(H\) implies that path effort
\(F = G H\) is constant, because all logical efforts \(g_i\)
and, hence, \(G\) are constant. We invoke the *Theorem of
Arithmetic and Geometric Means* to minimize path
effort \(F = \sum_{i=1}^n f_i = \sum_{i=1}^n g_i h_i.\) Given a
constant geometric mean \(\sqrt[n]{G H} = \sqrt[n]{f_1 f_2 \cdots
f_n},\) the arithmetic mean of the stage efforts is minimized, and is
equal to the geometric mean if all stage efforts are equal:

Therefore, the minimum path effort is

Since path parasitic delay \(P\) is constant for a given path, we conclude that the path delay of an \(n\)-stage path assumes its minimum

if and only if all stages bear the same effort \(\hat{f},\) equal to the \(n^{th}\) root of path effort \(F\):

We perform gate sizing to minimize the delay of the inverter chain in Experiment 3.1. The path consists of \(n=4\) inverters, and the stage-1 inverter is a reference inverter, that is \(C_1 = C_{inv} = 3.\)

Given load \(C_L,\) the path electrical effort of the inverter chain is \(H = C_L/C_1 = C_L/3.\) The path logical effort is \(G = 1,\) because the logical effort of an inverter is \(g_{inv} = 1\) and \(G = g_{inv}^4.\) Furthermore, since the parasitic delay of an inverter is \(p_{inv}=1,\) the path parasitic delay is \(P=4 p_{inv} = 4.\) Thus, the path delay of the 4-stage path is:

To minimize the path delay, all stage efforts must be equal. Since all gates on the path are inverters with \(g_{inv} =1,\) the stage effort equals the electrical effort \(f_i = g_i h_i = h_i.\) Thus, all stage electrical efforts must be equal, and must be equal to

This criterion enables us to determine the scale factors for each stage either by working from front to tail of the chain or vice versa. Let’s work backwards starting with stage 4:

The scale factor for the stage 4 inverter derives from \(C_4 = \gamma_4 C_{inv}\) as

Knowing \(C_4,\) we can size stage 3:

and the corresponding scale factor is:

Analogously, we obtain for stage 2:

with scale factor

The stage-1 inverter is a reference inverter with scale factor is \(\gamma_1 = 1.\)

Since \(C_L/3 = H,\) we can list the scale factors emphasizing their common form:

and see that the scale factor for stage \(i\) for \(i > 0\) is

The scale factors form a geometric progression. This observation
generalizes to arbitrary \(n\)-stage inverter chains. We
conclude that **minimizing the path delay of an inverter chain
requires that all inverters must bear equal stage effort, which we
accomplish by growing the inverter sizes geometrically**.

With the inverter sizes determined above, the minimum path delay of the 4-stage path is according to Equation (2):

For example, given \(C_L = 1200,\) we obtain minimum delay \(\hat{D} = 21.9\) by scaling the inverters with factors \(\gamma_2 = \sqrt[4]{400}^1 = 4.47,\) \(\gamma_3 = \sqrt[4]{400}^2 = 20,\) and \(\gamma_4 = \sqrt[4]{400}^3 = 89.4.\) The sliders in Experiment 3.1 do not support fractions. Thus, we may round the scale factors to \(\gamma_2 = 4,\) \(\gamma_3 = 20,\) and \(\gamma_4 = 89.\) To calculate the resulting path delay, we resort to the sum of the stage delays:

We note that rounding causes unequal stage efforts in range \(4 \le f_i \le 5.\) However, the differences are small enough to slow down the circuit by \(0.18\,\%\) only.

Consider a circuit for \(Y = A B + D.\) The AND gate is implemented with a NAND gate followed by an inverter, and the OR gate is implemented with a NOR gate followed by an inverter. The NAND gate shall be sized like a matched NAND gate and the load driven by output \(Y\) shall be \(C_L = 600\) capacitive units. Perform gate sizing to minimize the delay of the path.

We observe that the paths \(A \rightarrow Y\) and \(B
\rightarrow Y\) are equal, whereas path \(D \rightarrow Y\)
passes through the NOR gate and the NOR inverter only. Thus, we
assume that the **path of interest** is \(A \rightarrow Y,\)
because it incurs the largest delay. Path \(B \rightarrow Y\)
has the same delay, and the delay of \(D \rightarrow Y\) is
smaller.

Our plan is to determine the minimum path delay, and then derive the gate sizes from the associated stage effort. To simplify the notation, we enumerate the gates on the path: the NAND gate is gate 1, the inverter gate 2, the NOR gate is gate 3, and the second inverter is gate 4. The path consists of \(n=4\) gates. Therefore, the minimum path delay is

To determine path effort \(F = G H,\) we need to determine path logical effort \(G\) and path electrical effort \(H.\) The path electrical effort is the ratio of load capacitance \(C_L\) and input capacitance \(C_{in}(A).\) The input capacitance of the matched NAND gate is \(C_1 = C_{in}(A) = 4\) units, cf. Figure 1.48. Thus, we find

The path logical effort is the product of the gate logical efforts:

Thus, the path effort is \(F = G H = 1000/3 = 333.3.\) Together with path parasitic delay

we obtain a minimum path delay of \(\hat{D} = 4 \cdot 333.3^{1/4} + 6 = 23.1\) units.

To achieve minimum path delay, all four stages must bear the same stage effort \(\hat{f} = F^{1/4} = 4.27.\) This constraint enables us to determine the gate sizes. The NAND gate in the first stage is a matched gate with scale factor \(\gamma_1 = 1,\) as required by the problem formulation. The stage effort of the NAND gate is

because the matched NAND gate has input capacitance \(C_1 = 4.\) For minimum path delay, the stage effort must be \(f_1 = \hat{f},\) such that \(C_2 = 3\,\hat{f}.\) Thus, the scale factor for the stage-2 inverter must be

because the reference inverter has input capacitance \(C_{inv} = 3.\) The stage effort of the inverter is

Since minimum path delay requires \(f_2 = \hat{f},\) the input capacitance of the NOR gate is \(C_3 = 3 \hat{f}^2.\) Because the matched NOR gate has input capacitance \(C_{in}(nor) = 3 g_{nor},\) the scale factor of the NOR gate in stage 3 must be

The stage effort of the NOR gate is

Because the NOR gate must bear stage effort \(f_3 = \hat{f},\) we obtain the input capacitance of the stage-4 inverter, \(C_4 = (9/5) \hat{f}^3.\) The scale factor of the stage-4 inverter must be

Like all other gates, the stage-4 inverter must bear stage effort \(\hat{f}.\) Because \(f_4 = g_{inv} h_4 = C_L/C_4,\) we can check our arithmetic:

which is equal to 600 within the accuracy of the arithmetic precision. Had we used additional fractional digits for the value of \(\hat{f},\) e.g. \(\hat{f} = 4.273,\) we would have obtained a more accurate arithmetic result.

Note that the scale factors do not form a geometric progression as for inverter chains with minimum path delay, see Example 3.3, because the logic path includes logical efforts. However, we may recognize the sequence as a geometric progression, weighted with logical efforts:

We would have arrived at the same scale factors by working backwards through the path, starting with the stage-4 inverter. We leave it as an exercise to double check that these scale factors yield the minimum path delay by first computing the individual stage delays \(d_i = g_i h_i + p_i,\) and then taking their sum \(\hat{D} = \sum_{i=1}^4 d_i.\)

We wish to drive a load capacitance of 1200 units, either with (a) a single buffer or (b) two back-to-back buffers. In both circuits the stage-1 inverter shall be a reference inverter. The remaining inverters may be sized for minimum delay.

Intuitively, we may prefer the shorter 2-stage path of option (a), expecting that two inverters have a smaller path delay than four. This is not always the case, however.

In Example 3.3 we minimize the delay of the 4-stage path in option (b). The smallest possible delay of the 4-inverter chain is \(\hat{D}_4 = 21.9\) delay units. The inverter pair of option (a) constitutes a 2-stage path. According to Equation (2), the minimum path delay with load \(C_L = 1200\) and input capacitance \(C_1 = 3,\) i.e. \(F = H = C_L/C_1 = 400,\) is

delay units. We find that the minimum delay of the 2-stage path \(\hat{D}_2 = 42\) is almost two times larger than the minimum delay of the 4-stage path \(\hat{D}_4 = 21.9.\) The next section on path sizing offers the background that puts this result into perspective.

Minimize the delay of the 3-stage path using the method of logical effort.

- Compute minimum path delay \(\hat{D}.\)
- Compute the scale factors \(\gamma_{nand}\) and \(\gamma_{nor}.\)

The 3-stage path has path logical effort

path electrical effort

and path parasitic delay

Therefore, the minimum path delay is

We obtain the minimum path delay by scaling the NAND and NOR gates such that each stage bears stage effort \(\hat{f} = \sqrt[3]{G H} = 5.\) Working backwards, we start with the scale factor for the NOR gate in stage 3. Since \(f_3 = g_3 h_3\) must be equal to \(\hat{f}\) and the matched 4-input NOR gate has input capacitance \(C_{in}(nor4) = 3 g_{nor4},\) we find:

Next, the 3-input NAND gate in stage 2 must also bear stage effort \(\hat{f} = f_2 = g_2 h_2.\) Since the matched 3-input NAND gate has input capacitance \(C_{in}(nand3) = 5,\) we obtain:

We find that scale factors \(\gamma_{nand} = 3\) and \(\gamma_{nor} = 5\) minimize the delay of the path.

If in doubt, there are two ways to double check our gate sizing solution: (1) stage 1 must also bear stage effort \(\hat{f},\) and (2) the sum of the stage delays must be equal to \(\hat{D}.\) Let’s check the stage effort \(f_1 = g_1 h_1\) of the reference inverter in stage 1 first:

We find that \(f_1 = 5 = \hat{f},\) indeed, which verifies our arithmetic.

The second way of double checking our solution is to compute the delay of each gate on the path given the scale factors. We start with the inverter in stage 1:

time units. The NAND gate in stage 2 with scale factor \(\gamma_{nand} = 3\) has a delay of

time units, and the NOR gate in stage 3 with scale factor \(\gamma_{nor} = 5\) has a delay of

time units. The path delay is sum of the gate delays

and is equal to the minimum path delay \(\hat{D}\) calculated above, which confirms our solution.

### 3.3.2. Path Sizing¶

The **path sizing problem** determines the best number of stages for a
path in order to minimize its delay. As a concrete example, consider
the inverter chain with \(n\) inverters and load \(C_L\)
shown in Figure 3.4.

Given the number of stages \(n\) and path electrical effort \(H=C_L/C_1,\) we can minimize the path delay through gate sizing. According to Equation (2), the minimum path delay of a chain with \(n\) inverters is

Figure 3.5 plots the minimum path delay as a function of
the number of stages \(n\) for fixed path electrical effort
\(H = 400.\) Note that the minimum path delay \(\hat{D}(n)\)
assumes the minimum \(\hat{D}(n^\star) = 21.57\) for a number of
stages \(n^\star = 5.\) Whereas gate sizing enables us to find a
*local minimum* of the path delay for a given number of stages
\(n,\) finding the *global minimum* of the path delay requires
optimizing the number of stages \(n.\)

Given path electrical effort \(H,\) we determine the best number of stages \(n^\star\) by minimizing \(\hat{D}(n).\) We determine the minimum using calculus by deriving \(\hat{D}\) w.r.t. \(n\) for fixed \(H\):

Function \(\hat{D}(n)\) has a minimum where its derivative is
zero. Call the **best stage effort** \(f^\star = H^{1/n^\star}\)
the stage effort with path electrical effort \(H\) when using the
best number of stages \(n^\star.\) This is the stage effort where
\(\hat{D}(n)\) is minimal, i.e. where derivative \(d
\hat{D}/d n\) equals zero:

We have arrived at an equation for best stage effort \(f^\star\) that has no closed-form solution. However, we can compute \(f^\star\) numerically, for example with Newton’s method for finding zeros. In case of Equation (3), we want to find \(x\) such that function \(g(x)= x (\ln x - 1) - 1\) is zero. With derivative \(g'(x) = \ln x,\) we obtain the Newton iteration with iteration index \(i\):

Iterating with a suitable starting value, e.g. \(x_0 = 4,\) produces result \(x \approx 3.59.\) Thus, Equation (3) holds for best stage effort

which enables us to derive the best number of stages

because \(H = {f^\star}^{n^\star}\) by definition and \(\ln H = n^\star \ln f^\star.\)

This result is very useful. First, since each inverter on the path must bear best stage effort \(f^\star = 3.59,\) we know how to derive the scale factors for sizing each of the inverters in geometric progression. Second, given a path with electrical effort \(H,\) we find the best number of stages by taking the logarithm of \(H\) to base 3.59. Table 3.1 lists path efforts and the corresponding best number of stages. The numbers are rounded for ease of reading. For more accurate numerical results, use the path sizing calculator below the table. Table 3.1 is suited to look up the best number of stages for a given path effort \(F,\) which equals the path electrical effort \(H\) in an inverter chain with \(G=1.\) For example, using one stage is best for path efforts \(F\) up to 5.8. For efforts in range \(5.8 \le F \le 22.3,\) a 2-stage design is best, and so on. Path effort \(F=400\) of Example 3.5 lies between 300 and 1090, where \(n^\star = 5\) stages yield the fastest inverter chain.

path effort \(F\) | stages \(n^\star\) | delay \(\hat{D}(n^\star)\) | stage effort \(\hat{f}\) |
---|---|---|---|

0 | 1.0 | ||

1 | 0-5.8 | ||

5.8 | 6.8 | ||

2 | 2.4-4.7 | ||

22.3 | 11.4 | ||

3 | 2.8-4.4 | ||

82.2 | 16.0 | ||

4 | 3.0-4.2 | ||

300 | 20.6 | ||

5 | 3.1-4.1 | ||

1090 | 25.3 | ||

6 | 3.2-4.0 | ||

3920 | 29.8 | ||

7 | 3.3-3.9 | ||

14200 | 34.4 | ||

8 | 3.3-3.9 | ||

51000 | 39.0 | ||

9 | 3.3-3.9 | ||

184000 | 43.6 | ||

10 | 3.4-3.8 | ||

661000 | 48.2 |

Table 3.1 also includes the minimum path delay \(\hat{D}(n^\star)\) for the given path efforts. For example, given \(F=300,\) a 4-stage path yields \(\hat{D}(4) = 4 \sqrt[4]{300} + 4 = 20.6,\) as does a 5-stage path, \(\hat{D}(5) = 5 \sqrt[5]{300}+ 5 = 20.6.\) Furthermore, the right-most column lists the corresponding stage efforts \(\hat{f} = F^{1/n^\star}.\) For example, consider 5-stage inverter chains, which yield the smallest delay for path efforts in range \(300 \le F \le 1090.\) At the lower end, for \(F=300,\) the stage effort with minimum delay is \(\hat{f} = \sqrt[5]{300} = 3.1\) delay units, and at the upper end for \(F=1090,\) the stage effort is \(\hat{f} = \sqrt[5]{1090} = 4.1\) delay units. The best stage effort \(f^\star = 3.59\) of a 5-stage inverter chain produces the minimum delay for path effort \(F = {f^\star}^5 = 596.\)

We design an inverter chain with input capacitance \(C_{in} = 3\) and load capacitance \(C_L = 500\) with minimum delay.

The path electrical effort of the chain is

Since the path logical effort of an inverter chain is \(G = 1,\) the path effort is \(F = G H = 166.7.\) The best number of stages is

Stage effort

happens to be equal to best stage effort \(f^\star.\) The delay of the 4-stage path is

if the inverters are sized in geometric progression. The stage-1 inverter must be a reference inverter, because the path input capacitance \(C_{in} = 3\) is given. Thus, the scale factor of the inverter in stage \(i\) is \(\hat{f}^{i-1},\) such that:

The path sizing calculator and Experiment 3.1 enable you to verify this result.

Determine the number of stages for an inverter chain with path electrical effort \(H = 24\) in order to minimize the delay.

According to Equation (5), the best number of stages is

which is a real number rather than an integer. Since the number of inverters must be an integer, we might decide to round 2.4864 to the nearest integer 2. However, Table 3.1 tells us that we should use 3 stages. The reason is that the delay of a 3-stage path is less than the delay of a 2-stage path for \(H=24\):

Although such a small difference between the delays rarely matters in practice, the lesson to be learned is that rounding \(n^\star\) may not yield the best solution. Instead, to find the minimum delay you may want to check the delay for both numbers of stages the floor and the ceiling of \(n^\star.\)

Consider the circuit in Example 3.4 with a larger capacitive load of \(C_L = 3240\) units. We can increase the path length of a circuit by inserting buffers (inverter pairs) between its output and the load. Perform path sizing and determine the minimum path delay of the circuit.

Path \(A \rightarrow Y\) of the circuit in Example 3.4 has a matched NAND gate with input capacitance \(C_1 = 4\) in stage 1. Therefore, the path must bear electrical effort \(H = C_L/C_1 = 810.\) Since an inverter has logical effort \(g_{inv}=1,\) the circuit has path electrical effort \(G = g_{nand} \cdot g_{nor} = 20/9,\) no matter how many inverters we insert on the path. Since the path electrical effort is also independent of the number of additional inverter stages, the extended path must bear path effort \(F = G H = 1800.\) The path sizing calculator prescribes using 6 stages, or inserting two inverters at output \(Y.\)

The minimum delay of path \(A \rightarrow Y\) is \(\hat{D} = 6 F^{1/6} + P = 6 \cdot 1800^{1/6} + 8 = 28.927,\) which exceeds the minimum delay of a 6-stage inverter chain, given by the path sizing calculator, by 2 parasitic delay units due to the NAND and NOR gates.

We investigate the contribution of parasitic delay to the total path delay of inverter chains.

Consider \(n\)-stage inverter chains with path efforts that are powers of the best stage effort:

The corresponding minimum path delay is

The second summand represents path parasitic delay \(P = n p_{inv} = n\) because \(p_{inv} = 1.\) Thus, the percentage of parasitic delay of an \(n\)-stage inverter chain is

independent of the number of stages.

If the path effort differs from the powers of the best stage effort, yet we use the best number of stages according to Table 3.1, then the percentage of the path parasitic delay is largest at the lower end of the path effort range and smallest at the upper end. Table 3.2 lists the percentages of path parasitic delay relative to the total delay for the \(1 \le n \le 5\) for the path efforts at the boundaries of the ranges for which the number of stages is best. For example, \(n=3\) stages are best for \(22.3 \le F \le 82.2.\) At the lower end of the range, where \(F=22.3,\) the path parasitic delay of 3 units contributes \(26.3\,\%\) to the minimum total delay of 11.4 delay units. At the upper end, for \(F=82.2,\) the parasitics amount to only \(18.7\,\%\) of the the minimum total delay of 16.0 units.

n\F | 0 | 5.8 | 22.3 | 82.2 | 300 | 1090 |
---|---|---|---|---|---|---|

1 | 100 | 14.7 | ||||

2 | 29.3 | 17.5 | ||||

3 | 26.2 | 18.7 | ||||

4 | 24.9 | 19.4 | ||||

5 | 24.2 | 19.8 |

Minimize the delay of the inverter chain by path sizing.

- Compute \(\hat{f}\) of the 3-stage inverter chain.
- Determine the best number of stages \(n^\star.\)
- Compute the minimum path delay \(\hat{D}\) of the path-sized inverter chain.

The stage effort of the 3-stage inverter chain with a reference inverter in stage 1 and path electrical effort

is

We observe that this stage effort exceeds the best stage effort \(f^\star = 3.59\) by more than a factor of 2. This is a strong indication that more than three stages are needed to minimize the path delay. In fact, the best number of stages is

Table 3.1 yields the same result. Using \(n^\star = 5\) stages, the minimum path delay is

time units. For comparison, the minimum path delay with \(n=3\) stages would be \(\hat{D}(3) = 3\,H^{1/3} + 3\,p_{inv} = 28.3\) time units.

## 3.4. Branching Effort¶

So far, we have learned how to minimize the delay of a circuit path
that drives a given load. We have assumed that the load is the
equivalent load capacitance of one or more gates. Even if the load
consists of a single logic gate only, the last stage of the path fans
out to drive the transistor gate terminals of the logic gate, such
as the nMOS and pMOS transistor gates in case of an inverter. In many
circuits, fanout occurs not only at the end of a path, but anywhere
along a path. When a gate in stage \(k\) on a path drives more
than one gate in stage \(k+1,\) we say that the path **branches**.
At the first glance, branching complicates the delay minimization of a
circuit, as Experiment 3.4 illustrates. However, the method of
logical effort extends seamlessly to branching circuits.

Consider the 3-stage inverter chain below with branches after stage 1 and stage 2. The off-path inverters above and below the inverter chain are assumed to drive additional loads that are not shown, because they do not affect the delay of the chain.

The stage-1 inverter is a reference inverter. Choose the size of the off-path inverters and the load, and try sizing the on-path inverters with the goal to minimize the path delay of the 3-stage inverter chain.

Inverter 2: 1 Inverter 4: 1

Upper off-path inverter 3: 1 Lower off-path inverter 5: 1

Load: 300 Delay: 107

To understand the effect of branching on the delay of a circuit path, consider the annotated circuit in Figure 3.6. Our path of interest is \(A \rightarrow Y.\) We examine the branch at the output of stage-1 inverter 1 to off-path inverter 3.

Assuming off-path inverter 3 were absent, then inverter 1 would invest
its entire output current \(i_1\) to drive inverter 2 on the path
of interest. In contrast, in the presence of off-path inverter 3,
current \(i_3\) is diverted off the path of interest. Now,
current \(i_2\) drives on-path inverter 2, which, by *KCL*, is:

Drive current \(i_2\) is smaller than output current \(i_1\) of inverter 1. Since input capacitances \(C_2\) of on-path inverter 2 and \(C_3\) of off-path inverter 3 are composed in parallel, the equivalent capacitive load of inverter 1 is their sum \(C_2 + C_3.\) The capacitors form a current divider. If \(i_1\) drives load \(C_2 + C_3,\) then \(i_2\) must be

The larger off-path capacitance \(C_3\) is, the less drive current is available for the path of interest, and the larger is its delay.

In the method of logical effort, we account for the reduced drive
current on the path of interest by introducing the **branching effort b**
of a logic stage:

Note that \(b \ge 1,\) and \(b = 1\) if \(C_\text{off-path} = 0,\) i.e. when there is no branching. In case of the stage-1 branch in Figure 3.6, we have \(C_\text{on-path} = C_2\) and \(C_\text{off-path} = C_3,\) so that the branching effort at the output of stage 1 of the path is:

Branching effort \(b_1\) enables us to rewrite Equation (6) as \(i_2 = i_1 / b_1.\) The current dividing branch directs only one \(b_1^{th}\) of output current \(i_1\) of inverter 1 to drive inverter 2 on the path of interest. Less drive current on the path of interest increases the effort delay of the stage. Since a stage with branching drives both on-path and off-path load capacitances, the stage effort with a branch is

rather than \(f = g h,\) because

accounts for the actual capacitive load of the stage including off-path load, and assuming that \(h\) captures the on-path electrical effort only. \(C_\text{in}\) is the input capacitance of the logic gate that drives the branch.

For \(n\)-stage paths, we define the **path branching effort B**
as the product of the stage branching efforts:

where \(b_i\) is the branching effort of stage \(i.\)
Furthermore, we generalize our definition of **path effort F** to
include off-path branching:

where \(G\) is the path logical effort and \(H\) the path electrical effort on the path of interest, as before. On a path without branching, we have \(B = 1,\) and \(F\) retains the meaning of our original definition \(F = G H.\) The minimum path delay of an \(n\)-stage path with branching retains its form \(\hat{D} = n F^{1/n} + P = n (G B H)^{1/n} + P,\) but includes the branching effort in the generalized path effort term.

The concept of branching effort enables us to determine the minimum delay of a path with branching almost as easily as for a path without branching, provided the on-path and off-path capacitances can be scaled in a correlated fashion. Consider 3-stage path \(A \rightarrow Y\) in Figure 3.6 again. Without the off-path inverters, the path effort would be \(F = G H,\) where \(G = 1,\) since the path consists of inverters with \(g_{inv}=1\) only, and the path electrical effort is \(H = C_L / C_1.\) Including the off-path inverters requires considering the path branching effort. Stage 1 has branching effort \(b_1 = (C_2 + C_3)/C_2,\) and the inverter in stage 2 drives on-path load \(C_4\) and off-path load \(C_5,\) resulting in branching effort \(b_2 = (C_4 + C_5)/C_4.\) The path branching effort is the product

With path parasitic delay \(P = 3 p_{inv} = 3,\) the minimum path delay is

Gate sizing for minimum delay becomes feasible if the on-path and off-path load capacitances of a stage are related. For example, if we are given design constraints \(C_3 = C_2\) and \(C_5 = 2 C_4,\) then \(F\) reduces to

For a given path electrical effort \(H,\) we can now apply the method of logical effort to calculate the stage effort \(\hat{f} = \sqrt[3]{6 H}\) that each stage must bear, and deduce the scale factors to minimize the path delay.

We analyze the delay of path \(A \rightarrow Y\) of the circuit in Figure 3.6 and Experiment 3.4. Stage-1 inverter 1 is a reference inverter with input capacitance \(C_1 = 3\) and scale factor \(\gamma_1 = 1.\) Assuming the other inverters have variable sizes, the delay of path \(A \rightarrow Y\) is the sum of the stage delays:

You can vary the scale factors \(\gamma_2, \ldots, \gamma_5\) and load capacitance \(C_L\) in Experiment 3.4 to verify this path delay expression.

We can determine the minimum delay of path \(A \rightarrow Y\) for a given load capacitance \(C_L\) only, if we also specify the off-path capacitances \(C_3\) and \(C_5.\) Fixing these capacitances by specifying constant scale factors for the off-path inverters rarely makes sense, and may even complicate the minimization problem. However, if the design permits scaling both on-path and off-path capacitances of a branch in a correlated fashion, we can apply the method of logical effort directly. Consider the constraints \(\gamma_3 = \gamma_2\) and \(\gamma_5 = 2 \gamma_4,\) that we introduced above already. Then, the stage effort that minimizes the path effort is \(\hat{f} = \sqrt[3]{6 H} = \sqrt[3]{2 C_L}.\) Given \(C_L = 300,\) for instance, we find \(\hat{f} = 8.43\) and minimum path delay \(\hat{D}_{A \rightarrow Y} = 3 \hat{f} + 3 = 28.3\) delay units. We derive the scale factors \(\gamma_2\) and \(\gamma_4\) starting with \(\gamma_4.\) Stage 4 of the path bears stage effort

which must be equal to \(\hat{f},\) so that

Due to constraint \(\gamma_5 = 2 \gamma_4,\) scale factor \(\gamma_5 = 23.7.\) Analogously, stage 2 bears stage effort

The stage effort must be equal to \(\hat{f}.\) Rearranging the equation for \(C_2,\) we find:

According to constraint \(\gamma_2 = \gamma_3,\) we also have \(\gamma_3 = 4.2.\) You can verify the minimum path delay in Experiment 3.4 using rounded scale factors. Furthermore, the summation of the stage delays yields \(D_{A \rightarrow Y} = 2 \cdot 4.2 + (11.9 + 23.7) / 4.2 + 300/(3 \cdot 11.9) + 3 = 28.3\) in agreement with \(\hat{D}_{A \rightarrow Y}.\)

Consider the 2-stage path \(A \rightarrow Y\) below with off-path capacitance \(C_3.\) We minimize the delay of the path without using the branching effort first, and then demonstrate how much simpler the analysis becomes when using the branching effort.

Stage 1 of the path consists of an inverter with load capacitance \(C_2 + C_3.\) Therefore, the stage effort of stage 1 is

because \(g_1 = g_{inv} = 1.\) Here, electrical effort \(h_1\) denotes the ratio of the total load capacitance to input capacitance rather than referring to the on-path load capacitance only. The delay of stage 1 is

because \(p_1 = p_{inv} = 1.\) Stage 2 of the path consists of an inverter with load capacitance \(C_L.\) Therefore, stage 2 has stage effort

and stage delay

The path delay from input \(A\) to output \(Y\) is the sum of the stage delays

To minimize \(D,\) each stage must bear the same effort. According to the method of logical effort, this occurs under the condition \(f_1 + f_2 = 2 \sqrt{f_1 \cdot f_2}.\) To determine the minimum path effort

we need a constraint that relates on-path capacitance \(C_2\) and off-path capacitance \(C_3.\) As a concrete example, assume that \(C_3 = 3 C_2.\) Then, we obtain

Given load capacitance \(C_L\) and input capacitance \(C_1\) of the path, we can determine \(C_2\) by asserting the minimzation condition and rearranging it into a polynomial in \(C_2\):

The positive root of this quadratic equation is

and the minimum path effort is

Therefore, the minimum path delay is

Recall that \(C_L/C_1\) is the path electrical effort \(H\) of the branching path.

Note that the derivation of \(\hat{D}\) requires quite a bit of algebra which increases the chances of introducing errors in the calculation. For comparison, we redo the minimization with the alternative method based on branching efforts. The method of logical effort states that the minimum path delay of a 2-stage path with branching is

In our example, under constraint \(C_3 = 3 C_2\) the path effort is

Therefore, the minimum path delay is

Note that we arrive at the result by following the method of logical effort like a cookbook recipe, without actually minimizing the delay and laboring through the associated algebra to determine \(C_2.\) Using the branching effort simplifies the delay minimization significantly. This is where the method of logical effort begins to shine.

When analyzing a branching path with the method of logic effort we account for the branch at the output of a logic stage, because a branch increases the electrical effort and, thus, the delay of the stage. In this example we analyze the branching circuit below, which is the circuit of Example 3.11 with the stage-1 inverter removed.

Path \(A \rightarrow Y\) contains one inverter. Thus, the path has a single stage only. According to the method of logical effort, the minimum delay of the 1-stage path is

where branching effort \(B = (C_1+C_2)/C_1\) accounts for the branch before the inverter, and \(C_{in}\) is the path input capacitance \(C_{in} = C_1 + C_2,\) because \(C_1\) and \(C_2\) are composed in parallel. Thus, we can simplify \(\hat{D}\) to obtain

We find that the minimum path delay is independent of off-path capacitance \(C_2.\) This result may be not be what we might have expected when comparing to Example 3.11, where all off-path capacitances affect the minimum path delay. Thus, we may develop doubts about the correctness of our derivation of \(\hat{D}.\) Did we do anything wrong in our analysis?

It turns out that the method of logical effort happens to yield the correct result. We just need to interpret \(\hat{D}\) carefully. Unlike Example 3.11 our circuit does not include a logic stage that drives input \(A.\) Thus, in our application of the method of logical effort, we included the branching effort perhaps naively. If we interpret the branch as a current divider, then we might conclude that the branch reduces the current that drives the inverter and, hence, slows down the inverter stage. However, this slow down depends on the driver stage, which the circuit does not specify. Hence, we cannot quantify the slow-down due to the branch as part of the inverter stage after the branch. This argument becomes obvious if we interpret the parallel capacitances of the branch as a current divider, where the important quantity is the voltage of node \(A\) rather than the currents into the capacitors. The currents divide such that the voltage across the parallel capacitances remains the same. The voltage of input \(A\) changes with a delay proportional to \(C_1 + C_2.\) Therefore, the branch does not slow down the inverter stage but the speed by which the input voltage of the inverter, and \(C_2,\) transitions.

In our analysis with the method of logical effort, the branching effort cancels out input capacitance \(C_{in},\) and the remaining input capacitance \(C_1\) corresponds to the path after the branch. Imagine removing \(C_2\) in the circuit diagram, then the minimum delay of the path is exactly \(\hat{D}\) as determined above. The lesson to be learned from this example is that branches do not affect the delay of the path after the branch. Instead, branches effect the delay of their driver stage. If we do not specify the driver stage, we cannot quantify the effect of the branch on the path delay. In the method of logical effort the branch disappears almost magically, just as it should.

Minimize the delay of branching path \(A \rightarrow Y\) using calculus.

- For each stage, derive the stage delay.
- Express path delay \(D\) as a function of scale factor
\(\gamma,\) and minimize \(D(\gamma)\) using
*calculus*. - Compute \(\hat{D}\) and the associated \(\gamma.\)

Path of interest \(A\rightarrow Y\) consists of two stages, inverter 1 in stage-1 and inverter 2 in stage-2. Inverter 3 branches off the output of stage 1. Hence, off-path inverter 3 diverts drive current from inverter 1 to inverter 2. We begin with the analysis of stage 1. The load of inverter 1 consists of parallel input capacitances of inverters 2 and 3, that we call \(C_2\) and \(C_3.\) Therefore, the electrical effort of inverter 1 with input capacitance \(C_1,\) counting both on-path and off-path loads, is

and the delay of stage 1 is

Since the scale factors of the inverters are given in the schematic, we know that inverter 1 is a reference inverter with \(C_1 = C_{inv} = 3\) capacitive units, inverter 2 has input capacitance \(C_2 = \gamma C_{inv} = 3 \gamma,\) and inverter 3 has input capacitance \(C_3 = 9 \gamma.\) Substituting the capacitances in \(d_1,\) we obtain

Since stage 2 does not branch, the delay of inverter 2 is

The delay of path \(A\rightarrow Y\) is the sum of the stage delays:

The path delay is a function of scale factor \(\gamma.\) Therefore, we can minimize the path delay by setting the derivative to zero:

Thus, the minimum path delay from \(A\) to \(Y\) is

time units.

Minimize the delay of branching path \(A\rightarrow Y\) in Exercise 3.8 using the method of logical effort.

- Determine path logical effort \(G,\) path electrical effort \(H,\) path branching effort \(B,\) and path parasitic delay \(P.\)
- Compute best stage effort \(\hat{f} = \sqrt{GBH}.\)
- Compute minimum path delay \(\hat{D} = 2 \hat{f} + P\) and associated scale factor \(\gamma.\)

We begin by determining the branching efforts of the stages on the path of interest. Inverter 1 in stage 1 drives on-path inverter 2 and off-path inverter 3. The input capacitances of these inverters are \(C_1 = C_{inv} = 3,\) \(C_2 = \gamma C_{inv} = 3 \gamma,\) and \(C_3 = 9 \gamma.\) Branching effort \(b_1\) of stage 1 is the ratio of the on-path and off-path load capacitances:

Stage 2 on the path of interest consists of inverter 2. It’s output is connected to load capacitance \(C_L\) without branching. Therefore, we account for branching effort \(b_2\) of stage 2 as

The path branching effort of path \(A\rightarrow Y\) is the product of the stage branching efforts:

The path logical effort of path \(A\rightarrow Y\) is the product of the stage logical efforts:

The path electrical effort of path \(A\rightarrow Y\) is the product of the stage electrical efforts:

Note that we use the on-path electrical efforts for the stage efforts \(h_1\) and \(h_2\) as if there were no off-path branches. Branches are accounted for separately with branching effort \(B.\)

The path parasitic delay of path \(A\rightarrow Y\) is the sum of the stage parasitic delays:

Path effort \(F\) of a path with branching is \(F = G B H,\) and best stage effort \(\hat{f} = \sqrt[n]{F}.\) For our 2-stage path \(A\rightarrow Y,\) we find

We obtain minimum path delay \(\hat{D}\) if each stage of the path bears best stage effort \(\hat{f}.\) Then, we have for our 2-stage path:

This is the same result that we found through the minimization by calculus in Exercise 3.8, however, with straightforward algebra due to the method of logical effort.

To determine scale factor \(\gamma,\) we notice that each stage on path \(A\rightarrow Y\) must bear best stage effort \(\hat{f} = 6.\) In particular, the stage effort of inverter 2 must be equal to \(\hat{f},\) which yields an equation for \(\gamma\):

Alternatively, we could use the stage effort of inverter 1, \(g_1 b_1 h_1 \stackrel{!}{=} \hat{f},\) to deduce \(\gamma = 3/2.\)

Use the method of logical effort to minimize the delay of branching path \(A \rightarrow Y\) assuming \(C_3 = 4 C_2.\)

- Determine path logical effort \(G,\) path electrical effort \(H,\) path branching effort \(B,\) and path parasitic delay \(P.\)
- Compute best stage effort \(\hat{f}.\)
- Compute minimum path delay \(\hat{D}.\)

Path of interest \(A\rightarrow Y\) consists of two stages, i.e. \(n = 2.\) Stage 1 consists of a 2-input NAND gate with input capacitance \(C_1,\) and stage 2 features a 3-input NAND gate with input capacitance \(C_2.\) The 2-input NOR gate with input capacitance \(C_3\) branches of the output of stage 1. We analyze the circuit, starting with the path logical effort of path \(A\rightarrow Y\):

The path electrical effort of path \(A\rightarrow Y\) is the ratio of load capacitance and path input capacitance:

Path branching effort \(B\) is the product of the stage branching efforts. Given that \(C_3 = 4 C_2,\) we find:

The path parasitic delay is the sum of the stage parasitic delays:

Since the path has two stages, the best stage effort is

and the minimum path delay

We note that path \(A\rightarrow Y\) is overburdened, because \(\hat{f} = 10 > f^\star = 3.59.\) Thus, we should be able to reduce the path delay below \(\hat{D}\) by path sizing.

## 3.5. Forks¶

Branching circuits occur in countless applications, for example to
drive the complemented and uncomplemented inputs of an *XOR gate*. If we have a circuit that produces signal \(A,\) we
need to generate \(\overline{A}\) to drive both the \(A\) and
\(\overline{A}\) inputs of the XOR gate. A branching circuit is a
**fork**, if it branches to output both the complemented and
uncomplemented input. More specifically, we define an **n-fork** for
\(n \ge 1\) as a fork with two legs, one with \(n\) inverters
and the other with \(n-1\) inverters.

Depending on the choice of \(n,\) one of the legs, the *even leg*,
features an even number of inverters while the *odd leg* has an odd
number of inverters. Since the number of stages in the even and odd
legs differ, the question is how to minimize the delay of the
\(n\)-fork. At the first glance, we may identify the longer
\(n\)-stage leg as the path of interest, and minimize its path
delay. Since the shorter leg branches off the longer leg, we are in
need of a constraint to correlate the sizes of the stage-1 inverters
of the legs. In terms of branching effort, the stage-1 inverter of
the leg with \(n\) inverters presents the on-path capacitance, and
the stage-1 inverter of the leg with \(n-1\) stages the off-path
capacitance. In the following, we derive a perhaps less obvious
constraint for the on-path and off-path capacitances that minimizes
the delay of an \(n\)-fork by equalizing the delays of both legs.

### 3.5.1. Fork Delay¶

We study \(n\)-forks with an \(n\)-stage leg and an \((n-1)\)-stage leg as shown in Figure 3.8, strictly speaking for \(n \ge 2.\) We assume that the on-path electrical efforts for the legs are given as \(H_1\) for leg 1 and \(H_2\) for leg 2. We express the load capacitances of the legs as a function of fork input capacitance \(C_{in}.\) Leg 1 is the \(n\)-stage leg. The stage-1 inverter of leg 1 has input capacitance \(C_1\) and load capacitance \(C_{L,1} = H_1 C_{in}.\) Leg 2 is the \(n-1\)-stage leg. Its stage-1 inverter has input capacitance \(C_2,\) and the load capacitance is \(C_{L,2} = H_2 C_{in}.\) From the perspective of the fork input, the legs present a parallel composition of input capacitances, so that

Since a fork branches into two legs, the path effort of leg 1 is

and of leg 2

If we replace the fork with an equivalent circuit, the equivalent circuit has input capacitance \(C_{in}\) and output capacitance \(C_L = C_{L,1} + C_{L,2} = (H_1 + H_2) C_{in}.\) Thus, the electrical effort of the entire fork or its equivalent circuit is \(H = C_L/C_{in} = H_1 + H_2.\)

To keep things simple, we begin our study of forks by assuming that the capacitive loads of both legs are equal, i.e. \(C_{L,1} = C_{L,2} = H_l C_{in},\) where leg electrical effort \(H_l = H_1 = H_2.\) Thus, the total load of the fork is \(C_L = 2 H_l C_{in}.\) The path efforts of the legs, \(F_1 = H_l C_{in}/C_1\) and \(F_2 = H_l C_{in}/C_2,\) may differ, depending on the choice of leg input capacitances \(C_1\) and \(C_2.\) In fact, since the lengths of the legs differ by one stage, the longer leg 1 can support a larger path effort than the shorter leg 2. Given equal load capacitances, we may increase the path effort of leg 1 by reducing \(C_1.\) If we require a constant input capacitance \(C_{in},\) decreasing \(C_1\) must be balanced by increasing \(C_2.\) A shift of input capacitance between the two legs corresponds to a proportional shift in the division of the drive current between the legs. Experiment 3.5 allows you to vary the input capacitances \(C_1\) and \(C_2\) of the legs, and observe the effect on the fork delay.

Assume that input capacitance \(C_{in}\) is fixed, for example by other circuit design constraints. Then, the leg input capacitances \(C_1\) and \(C_2\) are correlated through the parallel composition: \(C_1 + C_2 = C_{in}.\) Choose \(C_{in}\) and total load \(C_L = 2 H_l C_{in}\) by tuning leg effort \(H_l.\) Minimize the fork delay by sizing the inverters.

Inverter 1: 1 Inverter 3: 1 Inverter 5: 1

Inverter 2: 1 Inverter 4: 1

Leg effort: 25 Total Load: 300 Delay: 124.5

The key observation for minimizing the fork delay in Experiment 3.5 is that the allocation of input capacitance \(C_{in}\) to leg input capacitances \(C_1\) and \(C_2\) matters. Of course, this fact applies to any branching circuit, not only to forks. In case of a fork with equal loads, the correlation between the input capacitances when keeping \(C_{in} = C_1 + C_2\) constant enables us gain additional insight into the the cause of the fork delay. The delay of an \(n\)-stage inverter chain is minimal, if each stage bears the same effort \(\hat{f} = \sqrt[n]{H}.\) Each leg of a fork is an inverter chain. If we size the inverters to minimize the leg delays, we obtain minimum delay \(\hat{D}_1(n) = n \sqrt[n]{F_1} + n\) for leg 1, and minimum delay \(\hat{D}_2(n-1) = (n-1) \sqrt[n-1]{F_2} + (n-1)\) for leg 2. We may be satisfied with a delay of the fork that is the maximum of the independently minimized leg delays, \(\max(\hat{D}_1, \hat{D}_2).\) However, we can do better by recognizing that the minimum fork delay occurs if the leg delays are equal. If they are not, we can reduce the larger leg delay at the expense of increasing the smaller delay of the other leg.

Figure 3.9 illustrates the idea for a 3-fork with leg electrical effort \(H_l = H_1 = H_2 = 25\) and input capacitance \(C_{in} = 12.\) In this case, the load capacitances are \(C_{L,1} = C_{L,2} = 300.\) First, assume that the legs present equal loads of \(C_1 = C_2 = C_{in}/2 = 6\) capacitive units each. Then, both legs bear equal path efforts \(F_1 = F_2 = 50.\) The plot shows the minimum path delays \(\hat{D}_1(3)\) of the 3-stage leg and \(\hat{D}_2(2)\) of the 2-stage leg as a function of path effort \(F.\) For \(F = 50,\) 3-stage leg 1 has minimum delay \(\hat{D}_1(3) = 14.05,\) and is faster than 2-stage leg 2 with minimum delay \(\hat{D}_2(2) = 16.14.\) The fork delay is dominated by leg 2. We can minimize the fork delay by reducing path effort \(F_2\) of leg 2 at the expense of increasing path effort \(F_1\) of leg 1 to honor the constraint \(C_{in} = C_1 + C_2.\) When reducing \(F_2 = C_{L,2}/C_2\) by increasing input capacitance \(C_2\) of leg 2, we reduce \(C_1\) by the same amount. In turn, the reduction of \(C_1\) increases \(F_1 = C_{L,1}/C_1\) and, consequently, leg delay \(\hat{D}_1(3).\) The fork delay is minimal when the leg delays are equal, \(\hat{D}_1(3) = \hat{D}_2(2).\) In Figure 3.9 this occurs when \(F_1 = 62.5\) and \(F_2 = 41.6.\) The corresponding input capacitances are \(C_1 = 4.8\) for leg 1 and \(C_2 = 7.2\) for leg 2. As a result the minimum fork delay amounts to \(14.9\) delay units.

We can determine the leg input capacitances for minimum fork delay by
formalizing the idea of shifting input capacitance between the legs.
To that end, we introduce **skew factor** \(\boldsymbol{\alpha}\) in range \(0 <
\alpha < 1,\) and express the leg input capacitances as fractions of
the fork input capacitance:

such that \(C_1 + C_2 = C_{in}.\) Now, the path efforts are
correlated through skew factor \(\alpha.\) For leg 1 we have
\(F_1 = H_1/\alpha,\) and for leg 2 \(F_2 = H_2/(1-\alpha).\)
The key insight for minimizing the fork delay is to **equalize the leg
delays** rather than minimizing the leg delays independently:

Given leg electrical efforts \(H_1\) and \(H_2\) for an
\(n\)-fork, even in the special case with equal loads, \(H_1
= H_2 = H_l,\) this constraint presents a nonlinear equation in
\(\alpha.\) Solving this equation for \(\alpha\) analytically
is difficult for arbitrary \(n.\) We could determine
\(\alpha\) numerically, for example by means of a Newton
iteration. Alternatively, we can reduce the problem of computing
\(\alpha\) to finding the *roots of a polynomial*, for which solutions exist as part of mathematical
software packages like Octave
or Mathematica with online portal
WolframAlpha.

### 3.5.2. 1d-Analysis Method¶

The method of logical effort does not require explicit use of skew factor \(\alpha\) to derive the minimum fork delay. We demonstrate this derivation by means of the 3-fork in Figure 3.10 below. First, we apply the key insight of the method of logical effort, i.e. a path achieves minimum delay if all stages bear the same effort. This insight enables us to allocate stage efforts to each inverter as follows. Consider the 3-stage leg of the 3-fork, and think of the stage effort as effort delay in units of time. If the effort delay of the whole leg is \(d_1,\) then each of the 3 stages must have effort delay \(d_1/3\) to minimize the path delay. Analogously, if the 2-stage leg has effort delay \(d_2,\) then each stage must have effort delay \(d_2/2\) to minimize the path delay. Then, the leg delays are

Second, we apply our key insight about minimizing fork delay, i.e. both legs must have equal path delay. In case of the 3-fork, we demand

This equality is satisfied by an effort delay \(d,\) if \(d_1 = d\) and \(d_2 = d + 1,\) because

Figure 3.10 annotates the inverters with the effort delays that yield minimum fork delay as a function of delay \(d.\)

Two aspects of the **effort delay allocation** in Figure 3.10 deserve a closer look. First, the effort delays minimize the
fork delay by equalizing the leg delays, which determines skew factor
\(\alpha\) implicitly, and by minimizing the path delay of each
leg, because each gate on a leg bears the same stage effort. To solve
the delay minimization problem, we merely need to determine the value
of \(d.\) Second, our algebraic parameterization of leg effort
delays \(d_1(d) = d\) and \(d_2(d) = d+1\) may be viewed as
compensating for one unit of parasitic delay on leg 1 with one unit of
effort delay on leg 2. The path parasitic delay of leg 1 is
\(P_1 = 3 p_{inv} = 3,\) whereas leg 2 has path parasitic delay
\(P_2 = 2 p_{inv} = 2.\) We call the difference \(\Delta p =
P_1 - P_2 = 1\) in \(d_2(d) = d + \Delta p\) the **parasitic delay
compensation**, because it compensates the excess parasitic delay of
leg 1 with effort delay in leg 2.

Next, we analyze the delay of the fork legs as a function of \(d.\) The effort delay of an inverter, \(f = g_{inv} h,\) is equal to the electrical effort, because the logical effort is \(g_{inv} = 1.\) Thus, for the inverters of leg 1, we find for the stage-1 inverter effort delay

for the stage-2 inverter effort delay

and for the stage-3 inverter effort delay

Note that the product of the right-hand sides is a telescoping product. Therefore, multiplying the equations yields

Analogously, for the inverters of leg 2, we find the effort delay of the stage-1 inverter

the effort delay of the stage-2 inverter

and the product of the equations

The input capacitance of the fork is the sum of the leg input capacitances. Substituting the expressions in Equation (7) and Equation (8) for \(C_1\) and \(C_2,\) we find

Rearranging the equality we obtain a polynomial in variable \(d\):

The solutions to this equation are the *roots of the polynomial*, i.e. those values of \(d\) where the polynomial
is zero. A polynomial of degree 5 has 5 roots. Given \(H_1\) and
\(H_2,\) we can determine the roots with a mathematical software
package like WolframAlpha. For
\(H_1 = H_2 = 25,\) copy or type this equation:

```
d^5 + 2 d^4 + (1 - 4 * 25) d^3 - 27 * 25 d^2 - 54 * 25 d - 27 * 25 = 0
```

into the WolframAlpha input form and click the equal sign.

WolframAlpha returns the roots:

Root \(d_3\) is the only feasible solution for effort delay \(d,\) which must be a positive real number to qualify as a time. Knowing that \(d = 11.91,\) we can deduce the design parameters of our fork. The 3-fork has a minimum delay of \(\hat{D}_1(3) = \hat{D}_2(2) = d + 3 = 14.91\) delay units, if we scale the stage-1 inverters of the legs to present input capacitances \(C_1 = 4.8\) according to Equation (7), \(C_2 = 7.2\) according to Equation (8), and all other inverters for the input capacitances derived above. Note that skew factor \(\alpha = C_1/C_{in} = 0.4,\) although we do not need to know \(\alpha\) to minimize the fork delay. Experiment 3.5 enables you to verify our fork design.

The methodology for deriving the parameters of a fork with minimum
delay generalizes to branching circuits with two legs. We call it the
*1d-analysis method*, because it is based on finding the roots of a
univariate polynomial with effort delay \(d\) as the only
variable.

**1d-Analysis Method**

Given a branching circuit with an \(n\)-stage leg 1 and an \(m\)-stage leg 2, input capacitance \(C_{in} = C_1 + C_2,\) and load capacitances \(H_1 C_{in}\) and \(H_2 C_{in}.\) Follow these steps to minimize the circuit delay:

Determine the leg parasitic delays \(P_1\) and \(P_2.\)

Without loss of generality, assume that \(P_1 \ge P_2,\) and

*parasitic delay compensation*\(\Delta p = P_1 - P_2.\) Allocate effort delay \(d/n\) to each stage of leg 1 and \((d+\Delta p)/m\) to each stage of leg 2.Form the telescoping products of the leg effort delays, including logical efforts:

\[\Bigl(\frac{d}{n}\Bigr)^n = \frac{G_1 H_1 C_{in}}{C_1}\,,\qquad \Bigl(\frac{d+\Delta p}{m}\Bigr)^m = \frac{G_2 H_2 C_{in}}{C_2}\,.\]Substitute leg input capacitances \(C_1\) and \(C_2\) in the branch constraint \(C_{in} = C_1 + C_2\) to derive a univariate polynomial and determine its roots:

\[d^n (d + \Delta p)^m - m^m G_2 H_2 d^n - (d + \Delta p)^m n^n G_1 H_1 = 0\,.\]Identify the positive real root \(d\) that solves our problem, and determine the scale factors of all gates. The minimum circuit delay is \(d + P_1 = (d + \Delta p) + P_2.\)

The 1d-analysis method enables us to analyze the 1-fork with a 0-stage leg and equal loads in Figure 3.11. The unassuming circuit designer might be tempted to use this degraded fork to generate the complemented and uncomplemented input. However, a 1-fork is rarely a good circuit to use.

According to the 1d-analysis method, we assign effort delay \(d\) to the inverter in leg 1. Then, leg 1 has delay \(D_1 = d + 1\) and leg 2 has delay \(D_2 = 0.\) Note that negative effort delay \(d = -1\) would be required to equalize the leg delays. Furthermore, the effort delay of leg 1 is \(d = H_l C_{in}/C_1,\) and the input capacitance of leg 2 is the load capacitance, \(C_2 = H_l C_{in}.\) Thus, the branch constraint yields

Since CMOS inverters have a positive effort delay, \(d > 0,\) we conclude that \(1 - H_l > 0,\) and obtain the constraint on the leg electrical effort:

This constraint determines the input capacitance of the inverter in leg 1. For example, leg electrical effort \(H_l = 3/4\) yields

In terms of \(C_1,\) we can express leg load \(H_l C_{in} = (3/4) C_{in} = 3 C_1\) and input capacitance \(C_{in} = 4 C_1.\) Thus, all capacitances of the 1-fork are determined by choosing \(H_l.\) As a consequence, effort delay \(d\) of the inverter in leg 1 is \(d = 3 C_1 / C_1 = 3,\) and the delay of leg 1 is \(D_1 = d + p_{inv} = 4\) time units. This style of 1d-analysis shows that the 1-fork is effective for \(H_l \approx 1/2.\) However, if \(H_l\) approaches value 1, the inverter has to stem an infinitely large effort. The underlying problem is that the 1-fork has only one inverter to equalize the leg delays. Therefore, as a rule of thumb, it is rarely a good idea to use a 1-fork. Use a 2-fork instead.

Use the *1d-analysis method* to minimize the
delay of the 2-fork assuming \(H_1 = 10\) and \(H_2 = 15.\)

- Determine the leg parasitic delays and allocate effort delays using one parameter \(d.\)
- For each leg, form the products of the effort delays, and derive a polynomial in \(d.\)
- Determine the root of the polynomial and the minimum leg delays \(\hat{D}_1\) and \(\hat{D}_2.\)

The 2-fork has two inverters on leg 1 and one inverter on leg 2. The parasitic delay of leg 1 is \(P_1 = 2 p_{inv} = 2\) and of leg 2 \(P_2 = p_{inv} = 1.\) The difference

is the parasitic delay compensation that we allocate in the stage effort assignments to balance the leg delays.

Assuming that leg 1 has a total effort delay of \(d,\) we allocate stage effort delay \(d/2\) to each of the two inverters, because path delay is minimized if all stages bear the same effort. Since leg 2 has a single stage only, the inverter has to bear all of the path effort plus the parasitic delay compensation, which amounts to stage effort \(d+1\) for the inverter in leg 2.

Given the allocation of effort delays, we form the products of the stage effort delays, which are simple expressions because the product of the stage electrical efforts telescopes. For leg 1, we obtain

Leg 2 has one stage only, and the stage effort equals the electrical effort because \(g_{inv} = 1\):

We rearrange these equations to obtain expressions for \(C_1\) and \(C_2\):

Next, we substitute \(C_1\) and \(C_2\) in the branch constraint, \(C_{in} = C_1 + C_2,\) and rearrange the equation into a polynomial:

Effort delay \(d\) is a root of the polynomial

given \(H_1 = 10\) and \(H_2 = 15.\) This polynomial has two complex roots and one real root:

Therefore, leg 1 has a delay of \(\hat{D}_1 = d + P_1 = 16.56 + 2 = 18.56\) time units and leg 2 has the same delay \(\hat{D}_2 = (d+1) + P_2 = 17.56 + 1 = 18.56\) time units.

## 3.6. Branching Circuits¶

The 1d-analysis method enables us to design not only forks but also more general instances of branching circuits with minimum delay. In this section, we discuss design issues of branches with logic gates, how to determine the best number of stages of branches, how to handle more than two branches, and branches that reconverge again.

### 3.6.1. Path Sizing Branches¶

In Section *Forks*, we minimize the delay of a 3-fork. We did
not ask whether the 3-fork has appropriate leg lengths, or whether a
2-fork or 4-fork would result in an even smaller delay for the given
load. In the following, we examine the path sizing problem for a
branching circuit if both legs are inverter chains of equal length,
see Figure 3.12.

If the electrical efforts of the legs were equal, \(H_l = H_1 = H_2,\) the circuit is symmetric. We would choose equal leg input capacitances, \(C_1 = C_2,\) to supply both legs with equal drive currents. According to the branch constraint, \(C_{in} = C_1 + C_2,\) and given input capacitance \(C_{in},\) we scale the stage-1 inverters of the legs such that \(C_1 = C_2 = C_{in}/2.\) Since both legs bear the same electrical effort, we can determine the best number of stages \(n\) using the path sizing calculator with path effort \(F = 2 H_l.\)

In general, the leg electrical efforts may differ such that \(H_1
\ne H_2.\) According to the method of logical effort, we minimize the
delay of each leg, independently, if each inverter stage of a leg
bears the same effort. For leg 1, given path effort \(F_1 = G_1
B_1 H_1 = C_{L,1}/C_1,\) the stage effort that minimizes the leg delay
is \(\hat{f}_1 = \sqrt[n]{C_{L,1}/C_1}.\) Analogously, the stage
effort for leg 2 is \(\hat{f}_2 = \sqrt[n]{C_{L,2}/C_2}.\) To
minimize the delay of the entire branching circuit we equalize the leg
delays, a lesson we have learned from the *design of forks*.

Since both legs have the same number of stages \(n,\) this constraint simplifies to

We find that we can minimize the branch delay by choosing the leg input capacitances proportional to the leg electrical efforts. Given the branch constraint with input capacitance \(C_{in} = C_1 + C_2,\) we obtain the leg input capacitances:

Furthermore, Equation (9) yields

i.e. we minimize the delay of the branching circuit, if the stage efforts in the two legs are equal.

We solve the path sizing problem of the branching circuit in Figure 3.12 for input capacitance \(C_{in} = 12\) and load capacitances \(C_{L,1} = 600\) and \(C_{L,2} = 300.\)

First, we deduce the leg electrical efforts \(H_1 = C_{L,1}/C_{in} = 50\) and \(H_2 = C_{L,2}/C_{in} = 25.\) Second, due to Equation (9), we notice that input capacitance \(C_1\) of leg 1 should be twice as large as \(C_2.\) Furthermore, since we are given \(C_{in} = 12,\) we can calculate in leg input capacitances:

Third, the path efforts of the legs are equal, and amount to \(F_1 = F_2 = H_1 + H_2 = 75.\) With the aid of the path sizing calculator, we find that we should use \(n = 3\) stages in each leg. Then, we minimize the delay of the circuit by sizing the six inverters such that each inverter bears stage effort \(\hat{f} = \sqrt[3]{75} = 4.2.\) The minimum delay is \(\hat{D} = 3 \hat{f} + 3 = 15.65\) delay units.

Let us contemplate the solution to the path sizing problem for the
branching circuit in Example 3.13: We
scale the input capacitances of the stage-1 inverters in proportion to
the path electrical efforts. As a result, we equalize the path
efforts of the legs, which enables us to determine the path lengths of
both legs simultaneously, as if we were sizing a single path. This
observation motivates us to introduce the concept of an **equivalent
inverter path**, in analogy to equivalent *resistive* and *capacitive*
networks. Figure 3.13 shows the equivalent inverter path
of the branching circuit in Figure 3.12 with input
capacitance \(C_{in}\) and the sum of the leg electrical efforts
\(H_1 + H_2\) as path electrical effort.

The equivalent inverter path enables us to reduce the path sizing
problem for a branching circuit to that of a multistage path as
discussed in Section *Path Sizing*. Once we have determined the
best number of stages, we proceed by employing Equation (9)
to scale the stage-1 inverters of the legs, and perform gate sizing
according to the method of logical effort to minimize the path delay
of the branching circuit. This method generalizes to branching
circuits with arbitrary logic gates, if we replace the leg electrical
efforts with the leg path efforts to account for the leg logical
efforts.

For branching circuits with logic gates on the legs, the path efforts of the legs are:

Assume that leg 1 has \(n\) stages and leg 2 has \(m\) stages. Then, equalizing the leg delays to minimize the delay of the branching circuit yields the constraint:

In the special case for legs with equal lengths, \(n = m,\) and equal path parasitic delays, \(P_1 = P_2,\) this constraint simplifies to

Then, the branch constraint, \(C_{in} = C_1 + C_2,\) enables us to size the stage-1 gates of the legs such that the input capacitances are

The equivalent inverter path of this branching circuit has load capacitance \(C_L = (G_1 H_1 + G_2 H_2) C_{in}.\)

In case where both legs have the same number of stages but different parasitic delays, we may assume that Equation (10) holds approximately:

If the number of stages differ, we may append inverter pairs to the shorter leg until the path lengths differ by at most one stage. Then, Approximation (11) may still be accurate enough to permit path sizing of the equivalent inverter path with load \(C_L = (G_1 H_1 + G_2 H_2) C_{in}.\) If the best number of stages requires increasing the leg lengths, we can accomplish the task by appending inverter pairs. Otherwise, if the best path length is smaller, we may redesign the logic on the legs to reduce the number of stages or settle with a suboptimal design.

We wish to minimize the delay of the the branching circuit in Figure 3.14 with path electrical efforts \(H_1 = 9\) and \(H_2 = 15.\)

We begin by deriving the path logical efforts and path parasitic delays of the legs. Figure 3.14 annotates each gate with its logical effort and parasitic delay. The path logical efforts of the legs are the products of their gate logical efforts:

The path parasitic delays of the legs are the sums of their gate parasitic delays:

We notice that both legs have 3 stages but different path parasitic delays. The equivalent inverter path enables us to determine the best number of stages approximately. The equivalent inverter path has input capacitance \(C_{in}\) and load capacitance

Thus, the equivalent inverter path has path effort \(F = C_L/C_{in} = 53,\) for which the path sizing calculator prescribes a best number of stages of \(n = 3.\) Since both legs of the branching circuit have 3 stages already, we retain the logic without modifications. To minimize the delay, Approximation (11) permits estimating the input capacitances of the stage-1 gates of the legs as:

Furthermore, if each of the gates bears stage effort \(\hat{f} \approx \sqrt[n]{F} = \sqrt[3]{53} = 3.765,\) then the path effort delay of the branching circuit is approximately \(3 \hat{f} = 11.3.\) Including the path parasitic delays, we estimate the delay of leg 1 to be \(\hat{D}_1(3) \approx 3 \hat{f} + P_1 = 17.3\) and the delay of leg 2 to be \(\hat{D}_2(3) \approx 3 \hat{f} + P_2 = 15.3.\) This approximation is easy to obtain on the back of an envelope.

Alternatively, with a little extra work, we apply the exact 1d-analysis method after determining path size \(n=3\) with the approximating equivalent inverter path. We find parasitic delay compensation \(\Delta p = 2,\) and assign effort delay \(d/3\) to each gate of leg 1 and \((d+2)/3\) to each gate of leg 2. We obtain the polynomial \(d^3 (d+2)^3 - 27 \cdot 28 (d+2)^3 - 25 \cdot 27 d^3,\) which we do not need to expand further for WolframAlpha to find the roots. The only positive real root is \(d = 10.495,\) and the resulting minimal circuit delay \(\hat{D} = 16.5\) delay units. Note that \(\hat{D} = 16.5\) is reasonably close to our approximate leg delays of \(\hat{D}_1(3) = 17.3\) and \(\hat{D}_2(3) = 15.3,\) validating the accuracy of the approximation. The 1d-analysis rewards the extra design effort with exact rather than approximate information for gate sizing, by prescribing different stage efforts \(\hat{f}_1 = d/3 = 3.5\) for leg 1 and \(\hat{f}_2 = (d+2)/3 = 4.2\) for leg 2. With a little practice and the aid of mathematical software like WolframAlpha for root finding, the 1d-analysis is as easy as the back-of-the-envelope approximation.

We wish to minimize the delay of the branching circuit in Figure 3.15 with path electrical efforts \(H_1 = H_2 = 15.\)

Analogous to Example 3.14 we first derive the approximating equivalent inverter path to determine the best path size. The leg logical effort of leg 1 is the logical effort of the NAND gate, \(G_1 = 4/3,\) and for leg 2, we obtain \(G_2 = 1 \cdot 5/3 = 5/3.\) The load capacitance of the approximating equivalent inverter path is

For path effort \(F = C_L / C_{in} = 45,\) the path sizing calculator prescribes a path length of \(n = 3.\) Since leg 1 of our branch has only one stage, we append two inverters. Leg 2 we leave unmodified, because two stages may be sufficiently fast, and appending an inverter pair would overshoot the best number of stages by one. The modified branching circuit is shown in Figure 3.16.

Next, we apply the *1d-analysis method* to
minimize the branch delay. The path parasitic delays are
\(P_1 = 2 + 1 + 1 = 4\) for leg 1 and \(P_2 = 1 + 2 = 3\)
for leg 2. We find a parasitic delay compensation of \(\Delta
p = 1,\) and assign stage delay \(d/3\) to each gate of leg 1
and \((d+1)/2\) to each gate of leg 2. Multiplying the effort
delays of each path, we obtain \((d/3)^3 = G_1 H_1 C_{in}/C_1\)
and \(((d+1)/2)^2 = G_2 H_2 C_{in}/C_2.\) Substituting
\(C_1\) and \(C_2\) in the branch constraint, we obtain the
polynomial \(d^3 (d+1)^2 - 4\cdot 25 d^3 - 20 \cdot 27
(d+1)^2.\) WolframAlpha finds the
positive real root \(d = 11.5.\) If each gate in leg 1 bears
stage effort \(\hat{f}_1 = d/3 = 3.8\) and each gate in leg 2
stage effort \(\hat{f}_2 = (d+1)/2 = 6.2,\) then the branching
circuit has a minimum delay of \(\hat{D} = 15.5\) delay units.
Note that the stage effort \(\hat{f}_2\) exceeds the best stage
effort \(f^\star = 3.59\) by more than \(70\,\%.\) To
reduce stage effort \(\hat{f}_2,\) we may want to try extending
leg 2 with an inverter pair after all.

### 3.6.2. Branches with Driver Path¶

When we design a circuit with a branch, the input capacitance of the branch presents a load to another logic circuit. In the following, we discuss the design of circuits with branches that are driven by a logic path. Figure 3.17 shows an example, where the driver path consists of a single stage only, a NOR gate.

If we wish to minimize the delay of the branching circuit in
Figure 3.17, we are faced with the problem of
how much effort the driver path and how much effort the legs of the
branch should bear. In general, the legs of the branch may bear
different path efforts, e.g. \(d_2\) and \(d_2+1\) in
Figure 3.17, and, furthermore, the path effort
of the driver path may differ from path efforts of the legs. We know
how to use the 1d-analysis method to assign effort delays to the
gates of the legs. Assume the driver path bears effort delay
\(d_1,\) and the branch bears effort delay \(d_2,\) which we
allocate to the legs according to the 1d-analysis method.
Figure 3.17 includes the allocation of the
effort delays to the gates of the circuit. Now, the design task
involves deriving effort delays \(d_1\) and \(d_2\) such that
the circuit delay is minimized. The *2d-analysis method* enables us
to find these two delays:

**2d-Analysis Method**

Given a branching circuit and a driver path with input capacitance \(C_{in}\) and load capacitances \(H_1 C_{in}\) and \(H_2 C_{in}.\) Follow these steps to minimize the circuit delay:

- Given an \(n\)-stage driver path, assign effort delay \(d_1/n\) to each gate of the driver path, and use the 1d-analysis method to allocate delay \(d_2\) to the gates of the legs.
- Form the products of the leg efforts and the driver path efforts, and use the branch constraint to express \(d_1\) as a function of \(d_2\) or vice versa.
- Express path effort delay \(D = d_1 + d_2\) as a function of \(d_2\) (or \(d_1\)), and minimize \(D,\) e.g. by means of calculus.

We introduce the 2d-analysis method with two examples.

We minimize the delay of the circuit in Figure 3.17 using the 2d-analysis method. Step 1 of the 2d-analysis method, allocating effort delay \(d_1\) to the driver path and \(d_2\) to the gates of the branch, is already complete. Note that the 1d-analysis method assigns path effort delay \(d_2 + 1\) to leg 1, which is by one delay unit larger than path effort delay \(d_2/2 + d_2/2 = d_2\) of leg 2 in order to compensate for the larger parasitic delay of leg 2. Next, in step 2 of the 2d-analysis, we derive the telescoping products of the leg efforts:

Rearranging these equations yields expressions for leg input capacitances \(C_1\) and \(C_2\):

The effort delay of the driver stage with load capacitance \(C_1 + C_2\) is:

Here, we have applied the branch constraint that the branch input capacitance, i.e. the load of the driver path, is the sum of leg input capacitances. Substituting \(C_1\) and \(C_2\) in this equation, yields \(d_1\) as a function of \(d_2\):

In step 3 of the 2d-analysis we express path effort delay \(D\) of the whole circuit as a function of \(d_2\):

To find the minimum of path effort delay \(D,\) with take derivative w.r.t \(d_2,\) and set the result to zero:

Rearranging this equation into polynomial form reduces our minimization problem into yet another application of root finding:

In principle, we also need to check whether the \(2^{nd}\) derivative is positive for \(D(d_2)\) to attain a minimum. Alternatively, we can inspect the plot of \(D(d_2)\) in WolframAlpha to verify a minimum at the relevant root. For instance, given path electrical efforts \(H_1 = H_2 = 15,\) we invoke WolframAlpha to find the roots of \(d_2^3 (d_2 + 1)^2 - (100/3) d_2^3 - (800/3) (d_2 + 1)^2.\) The only positive real root is \(d_2 = 7.8.\) Thus, we find \(d_1 = 6,\) and the minimum path effort delay of the branching circuit is \(D = 13.8.\) Including path parasitic delay \(P_2 = 5,\) the minimum path delay of the entire circuit is \(\hat{D} = D + P_2 = 18.8\) delay units.

In Section 1d-Analysis Method, we minimize the delay of the 3-fork in Figure 3.10. For leg electrical efforts \(H_1 = H_2 = 25,\) we obtain a minimum delay of \(\hat{D} = 14.91\) units. In this example, we investigate whether it is beneficial to replace two inverters of the 3-fork with one inverter on the drive path of a 2-fork. Figure 3.18 shows the circuit, which is logically equivalent to a 3-fork, but saves one inverter.

We use the 2d-analysis method to minimize the delay of the 2-fork with a 1-stage driver path. In step 1 we allocate the effort delays as annotated in Figure 3.18 already. The inverter in lower leg 2 shoulders parasitic delay compensation \(\Delta p = P_1 - P_2 = 2 - 1 = 1\) w.r.t. upper leg 1. Next, we derive the telescoping products of the effort delays for the legs of the 2-fork:

and for the driver path with load capacitance \(C_1 + C_2\):

Substituting \(C_1\) and \(C_2\) yields \(d_1\) as a function of \(d_2\):

Step 3 minimizes path effort delay \(D = d_1 + d_2,\) by taking the derivative of \(D\) w.r.t. \(d_2,\) and finding the relevant root of the resulting polynomial:

For \(H_1 = H_2 = 25,\) WolframAlpha finds the positive real root \(d_2 = 6.9,\) from which we deduce \(d_1 = 5.2.\) Including parasitic delay \(P_1 = 3,\) the minimum path delay is \(\hat{D} = d_1 + d_2 + P_1 = 15.1\) units, which is marginally larger than the minimum delay of the 3-fork.

### 3.6.3. Reconvergent Branches¶

We conclude our discussion of branching circuits by applying the 1d-analysis and 2d-analysis methods to branches whose legs reconverge at the inputs of a gate. Figure 3.19 shows such a circuit. The NAND gate in stage 1 drives a branch with legs in stage 2 that reconverge at the NAND gate in stage 3. Furthermore, each of the inputs of the circuit branches. One of the two legs drives the stage-1 NAND gate and the other the stage-2 NAND gate. The design challenge is to minimize the delay of this circuit.

First, we recognize that the longest path through the circuit traverses three NAND gates, one in each stage. The circuit is symmetric if we assume that the NAND gates in stage 2 have equal size. Then, the path from the upper input through the stage-1 NAND gate, the upper stage-2 NAND gate, and the NAND gate in stage 3 is representative for the four longest paths from either input to the output. This is our path of interest.

To determine the delay of the circuit, we allocate effort delays \(d_1,\) \(d_2,\) and \(d_3\) to the corresponding stages, as shown in Figure 3.19. Analysis of the circuit reveals the stage effort delays:

Here, the load of the stage-1 NAND gate is the sum of the leg input capacitances, both of which are \(C_2\) assuming that the stage-2 NAND gates have equal size. The branch constraint for the circuit input is

enabling us to capture the relation between the effort delays in a single equation:

To solve Equation (12), we need to constrain variables \(d_1,\) \(d_2,\) and \(d_3.\) A simplistic constraint assumes that all stage efforts are equal, \(d_1 = d_2 = d_3,\) as motivated by the key insight from the analysis of multistage paths. Although our circuit is not a simple path, this constraint yields a univariate polynomial of degree three. For example, given \(H = 6,\) we obtain the positive real root \(d_1 = 4.18\) and path delay \(D_{1d} = 3 d_1 + 3 p_{nand} = 18.54.\)

We might expect a better design from a 2d-analysis, assuming that the NAND gates in stages 2 and 3 are the legs of the branch, with the additional constraint that the stage-3 NAND gate belongs to both legs. To equalize the leg delays, we allocate effort delay \(d_2 = d_3\) to stages 2 and 3. However, we assume a potentially different delay \(d_1\) for the driver path in stage 1. Now we wish to minimize path delay \(D = d_1 + 2 d_2.\) We rearrange Equation (12) to express \(d_1\) as a function of \(d_2,\) assuming \(d_3 = d_2\):

such that

We minimize \(D\) by setting the derivative of \(D\) w.r.t. \(d_2\) to zero, and find the root \(d_2 = 4.7\) for \(H = 6.\) As a result, we obtain \(d_1 = 2.5\) and \(D_{2d} = d_1 + 2 d_2 + 3 p_{nand} = 17.89.\) We conclude that the 2d-analysis produces a slightly faster circuit than the 1d-analysis.

To reduce the delay of the circuit beyond \(D_{2d},\) we observe that the branches at the circuit inputs have the topology of reconvergent 1-forks that drive the stage-2 gates. One leg includes the stage-1 NAND gate instead of an inverter, and the other leg contains no logic. In Section 1d-Analysis Method, we identified a 1-fork as an undesirable circuit. As a case in point, we improve the circuit in Figure 3.19 by replacing the 1-fork with 2-fork topologies. To that end we insert back-to-back inverters as shown in Figure 3.20. The back-to-back inverters in the upper leg and the NAND gate in the lower leg resemble a 2-fork, except that the 1-stage leg contains the NAND gate rather than an inverter.

Since the path parasitic delay of two inverters equals the parasitic delay of a NAND gate, the delays of the two legs of the 2-forks are equal if we allocate effort delay \(d\) to the NAND gate and \(d/2\) to each of the inverters. Using a 1d-analysis, we also allocate effort delay \(d\) to the remaining NAND gates, as shown in Figure 3.20. The stage efforts of the inverter pair and the NAND gates are:

Applying branch constraint \(C_{in} = C_1 + C_2\) yields:

Given \(H = 6,\) we find the positive real root \(d = 3.44.\) The minimum delay of the circuit in Figure 3.20 is \(\hat{D}_{1d} = 3 d + 3 p_{nand} = 16.3\) units, which is faster than the original circuit in Figure 3.19.

Use the *1d-analysis method* to minimize the
delay of the branching circuit assuming \(H_1 = H_2 = 12.\)

- Determine the leg parasitic delays and allocate effort delays using one parameter \(d.\)
- For each leg, form the products of the effort delays, and derive a polynomial in \(d.\)
- Determine the root of the polynomial and the minimum leg delays \(\hat{D}_1\) and \(\hat{D}_2.\)

The branching circuit has a 3-input NAND gate followed by an inverter on leg 1 and a 2-input NOR gate followed by a 2-input NAND gate on leg 2. The path parasitic delay of leg 1 is

and of leg 2

Since \(\Delta p = P_1 - P_2 = 0,\) we do not need a parasitic delay compensation. Therefore, we assume that each leg has effort delay \(d,\) and allocate stage effort delay \(d/2\) to each of the four gates.

The product of the effort delays of leg 1 equals its path effort including the path logical effort:

and for leg 2:

The path logical effort of leg 1 is \(G_1 = g_{nand3} g_{inv} = 5/3.\) Given \(H_1 = 12,\) we find \(G_1 H_1 = 20.\) The product equation yields an expression for

Analogously, for leg 2 we find path logical effort \(G_2 = g_{nor2} g_{nand2} = 20/9,\) and with \(H_2 = 12,\) \(G_2 H_2 = 80/3.\) Rearranging the product equation results in

Next, we substitute \(C_1\) and \(C_2\) in the branch constraint to obtain a polynomial in \(d\):

This equation represents a degraded polynomial of degree 2, which is easy to solve. Since we are interested in positive delays only, we find effort delay \(d\) by taking the positive square root:

Therefore, both legs have a minimum path delay of \(\hat{D}_1 = d + P_1 = \hat{D}_2 = d + P_2 = 17.66\) time units.

## 3.7. Summary¶

The method of logical effort enables us to design digital circuits with minimum delay. It offers insight, recipies for systematic delay analysis, and methods for gate and path sizing. The 1d-analysis and 2d-analysis methods facilitate the design of forks and branching circuits, that rely on the delay minimization of a basic multistage path. The following steps summarize the path design procedure:

- Calculate the path effort \(F = G B H\) of the path of interest. Path logical effort \(G\) is the product of the gate logical efforts, and path electrical effort \(H\) is the telescoping product of the gate fanouts. For paths without branches, path branching effort \(B\) equals 1.
- Use the path sizing calculator to determine the best number of stages \(n^\star\) for the path of interest. If necessary, adapt the circuit to have \(n^\star\) stages.
- Size the gates on the path of interest such that each gate bears stage effort \(\hat{f} = F^{1/n^\star}.\)
- Calculate path parasitic delay \(P\) as the sum of the gate parasitic delays. The minimum path delay is \(\hat{D} = n^\star \hat{f} + P.\)

The key insight due to the path design procedure is the fact that each stage must bear the same effort to minimize the path delay. Together with the insight that the legs of a branch must have equal delay to minimize the delay of the whole branching circuit, the method of logical effort leads to the 1d and 2d-analysis methods. These methods permit the design of complex circuit topologies with nothing but paper and pencil, and the aid of a root finding package.