# 5. Combinational Logic¶

A **combinational circuit** is a digital circuit whose outputs depend
on the input values only.[1] In particular,
combinational circuits do not contain *memory elements* and are commonly acyclic. Example 5.1
illustrates the type of problem that a combinational circuit may
solve, and how we approach the solution.

(1170-1241)

Perhaps the most famous number sequence of all time is the sequence
of **Fibonacci numbers**, named after Leonardo da Pisa, called
Fibonacci because he was the son of Bonaccio, *filius Bonaccii* in
Latin. The Fibonacci sequence

models the growth of rabbit populations, and can be defined by a recurrence such that \(Fib(n)\) is the \(n^{th}\) Fibonacci number in the sequence:

We wish to design a combinational circuit that recognizes the Fibonacci numbers among 4-bit binary numbers. More specifically, the circuit shall have as input a 4-bit binary number \(A = A_3 A_2 A_1 A_0,\) and output \(Y\) such that \(Y(A)=1\) if \(A\) is a Fibonacci number, and 0 otherwise.

Our first step is to translate the problem statement into a truth
table. For each 4-bit *binary number* in
range \([0,15],\) we specify whether \(Y\) is 0 or 1.

\(A_3\) \(A_2\) \(A_1\) \(A_0\) \(Y\) 00 0 0 0 1 10 0 0 1 1 20 0 1 0 1 30 0 1 1 1 4 0 1 0 0 0 50 1 0 1 1 6 0 1 1 0 0 7 0 1 1 1 0 81 0 0 0 1 9 1 0 0 1 0 10 1 0 1 0 0 11 1 0 1 1 0 12 1 1 0 0 0 131 1 0 1 1 14 1 1 1 0 0 15 1 1 1 1 0

As the second step, we transform the truth table into a Boolean
expression for \(Y(A).\) A simple, systematic transformation
is based on the *SOP normal form*, which is
the disjunction of minterms in those rows of the truth table where
\(Y = 1\):

The third step is the synthesis of a digital circuit. Here, we transform the Boolean expression into the corresponding AND-OR gate circuit. For comparison, we also show a series-parallel switch network, where each series composition corresponds to an AND gate and the parallel composition to the OR gate.

The black-box symbol of the Fibonacci circuit is shown on the left,
the gate-level circuit with NOT, AND, and OR gates in the middle,
and the switch network on the right. The next step towards
realizing the Fibonacci number recognizer could be the design a
fast CMOS circuit with the *method of logical effort*.

We build large combinational circuits from smaller ones. To facilitate reuse of a circuit as part of a hierarchical design methodology, we characterize each circuit module as a black box with:

- one or more binary input terminals,
- one or more discrete (binary or
*Z*) output terminals, - a functional specification that relates the outputs logically to the inputs,
- a timing specification of the delay before the outputs respond to a change of the inputs.

A **black box** module does not expose the internal implementation,
whereas a **glass box** module does. *Logic gates*
are often considered elementary circuits in combinational logic
design, and the *basic circuits* tend to
be used as black box modules, assuming every designer knows about
their specifications.

The primary challenges of combinational circuit design are (1) finding
the minimal logic expression for each output function, (2) mapping the
function to a target technology for physical implementation, and (3)
minimizing the circuit delay. These tasks are not independent of each
other, and may require several design iterations before arriving at a
satisfying solution. The tools for the design of a combinational
circuit are *Boolean algebra* and the
*method of logical effort*. Boolean algebra is
useful to derive minimal logic functions for each circuit output, and
to provide the inspiration for inventing functionally equivalent
circuits with different topologies that serve as candidates for a
high-speed CMOS design by means of the method of logical effort.
Logic minimization rarely yields the fastest CMOS circuit but the
problem is too hard to crack in general without logic minimization as
a yardstick.

## 5.1. Logic Minimization¶

The Fibonacci number recognizer in Example 5.1 implements the desired functionality but requires more logic gates than necessary. For instance, the Boolean expression \(Y'(A_0, A_1, A_2, A_3) = \overline{A}_2\,\overline{A}_3 + A_0\,\overline{A}_1\,A_2 + \overline{A}_0\,\overline{A}_1\,\overline{A}_2\) is logically equivalent to SOP normal form \(Y(A)\) derived in Example 5.1 yet smaller. Therefore, we could have designed a gate-level circuit with one 2-input AND gate, two 3-input AND gates, and a 3-input OR gate to realize expression \(Y'\) rather than seven 4-input AND gates and a 7-input OR gate for expression \(Y.\) In general, it pays off to find a smaller Boolean expression representing a given Boolean function, because the resulting circuit is smaller and likely to be faster. The question is whether a smaller expression, once we have found it, is the “smallest” expression or whether there exists an even smaller expression. If we had a criterion to identify the smallest expression, we can terminate our search once we have found an expression that fulfills the minimization criterion. Even better are constructive methods that enable us to derive the smallest expression systematically rather than finding it by sheer luck.

Logic minimization encompasses the most satisfactory methods for
finding a smallest Boolean expression known to date. However, the
**logic minimization problem** that we know how to solve
systematically is formulated in a rather narrow technical sense which
limits the expressions to a *sum-of-products*
structure reminiscent of the AND-OR topology of *compound gates*. Because these circuits have two levels of gates
from input to output, the logic minimization problem is also called
**two-level minimization problem**:

Given a Boolean function \(f(x_0, x_1, \ldots, x_{n-1})\) find an equivalent sum-of-products expression \(g(x_0, x_1, \ldots, x_{n-1})\) that minimizes the number of products and the number of inputs of all products.

The logic minimization problem may be viewed as a cost minimization problem. The cost of a Boolean expression can be defined in various ways. Throughout this chapter, we define the cost to reflect the area requirements of a CMOS circuit in terms of number of wires and transistors:

Thecost\(\mathcal{C}(g)\) of Boolean expression \(g\) equals the number of all gate inputs in the corresponding combinational circuit, not counting inverters.

For example, the cost \(\mathcal{C}(g)\) of sum-of-products
expression \(g\) equals its number of products plus the number of
inputs of all product terms. The inputs of a product term are a
subset of the variables \(x_0, x_1, \ldots, x_{n-1}\) either in
complemented or uncomplemented form, called **literals**. We count
literals rather than variables and, thereby, ignore the cost of the
inverters required to generate the complemented variables. This is
sensible, because the polarity of the inputs is independent of the
intrinsic cost of an SOP expression. SOP expressions do not have
negations except for complemented variables. Renaming variable
\(x_i\) to \(y_i = \overline{x}_i\) does not change the SOP,
but the variable name only. In Example 5.1, the literals
of function \(Y\) are \(A_0,\) \(\overline{A}_0,\)
\(A_1,\) \(\overline{A}_1,\) \(A_2,\)
\(\overline{A}_2,\) \(A_3,\) \(\overline{A}_3,\) and the
cost of SOP normal form \(Y(A)\) is \(\mathcal{C}(Y) = 7 + (7
\cdot 4) = 35.\) The cost of SOP expression \(Y' =
\overline{A}_2\,\overline{A}_3 + A_0\,\overline{A}_1\,A_2 +
\overline{A}_0\,\overline{A}_1\,\overline{A}_2\) is significantly
smaller, \(\mathcal{C}(Y') = 3 + (2 + 3 + 3) = 11,\) and is indeed
the minimum cost. Note that counting literals and products of an SOP
expression to determine its cost is equivalent to counting in the
two-level combinational circuit the number of inputs of the AND gates
plus the number of AND gates, which equals the number of inputs of the
OR gate.

Derive the cost of the SOP normal forms and POS normal forms of the XNOR function \(f_9\) and implication \(f_{11}.\)

\(x\) | \(y\) | \(f_9\) | \(f_{11}\) |
---|---|---|---|

0 | 0 | 1 | 1 |

0 | 1 | 0 | 1 |

1 | 0 | 0 | 0 |

1 | 1 | 1 | 1 |

The normal forms of \(f_9\) are

We could derive the cost of each normal form by inspecting their associated two-level circuits, see Exercise 2.5. However, it is as easy to deduce the cost from the Boolean expressions directly. The cost of an SOP equals its number of products plus the number of literals of all products. The SOP normal form of \(f_9\) has two products each with two literals, product \(\overline{x}\,\overline{y}\) with literals \(\overline{x}\) and \(\overline{y}\) and product \(x\,y\) with literals \(x\) and \(y\):

Analogously, the cost of a POS equals its number of sums plus the number of literals of all sums. The POS normal form of \(f_9\) has two sums each with two literals, sum \(x + \overline{y}\) with literals \(x\) and \(\overline{y}\) and sum \(\overline{x} + y\) with literals \(\overline{x}\) and \(y\):

We find that the SOP and POS normal forms of the XNOR functions have equal cost. This is not the common case, though.

The normal forms of \(f_{11}\) are

The SOP normal form has three products, each with two literals, so its cost is

The POS normal form has no product, because it consists of just one sum with two literals. Its cost is

We conclude that the POS normal form of the implication has a much lower cost than the SOP normal form, indicating that the POS yields a smaller and probably faster implementation of this function.

### 5.1.1. Geometry of Boolean Functions¶

To understand the logic minimization problem, it is helpful to
interpret a Boolean function geometrically. The Boolean points
\(\mathcal{B}^n\) occupy an \(n\)-dimensional space. Point
\(P = (x_0, x_1, \ldots, x_{n-1})\) is uniquely specified in an
\(n\)-dimensional Cartesian coordinate system, where each
coordinate \(x_i\) of point \(P\) is confined to the Boolean
values 0 and 1, i.e. \(x_i \in \{0, 1\}.\) Figure 5.1 shows the points of \(\mathcal{B}^1,\)
\(\mathcal{B}^2,\) and \(\mathcal{B}^3.\) In three
dimensions, the eight points form the corners of a cube. Therefore,
an \(n\)-dimensional Boolean space is also called **n-cube** or
**hypercube**. A 0-cube is a single point representing a constant
Boolean value.

An \(n\)-cube can be constructed from two \((n-1)\)-cubes, and a set of new edges connecting the vertices with equal coordinates in the \((n-1)\)-dimensional space. For example, the 3-cube in Figure 5.1 consists of two 2-cubes, the front square and the rear square in the \((x,y)\)-space. Four new edges connect the corresponding corners of the squares in the \(z\)-dimension. Thus, an \(n\)-cube contains subcubes of smaller dimensions. The 3-cube in Figure 5.1 has six 2-dimensional subcubes, which are the squares on its faces. Furthermore, it has twelve 1-dimensional subcubes, each represented by an edge and its incident vertices.

Since each point of an \(n\)-cube represents a minterm, we can
characterize a Boolean function \(f(x_0, x_1, \ldots, x_{n-1})\)
by means of a partitioning of the points into the **on-set** and the
**off-set**. The on-set contains all **1-points** for which
\(f=1,\) i.e. all minterms of the *SOP normal form* of \(f.\) The off-set contains all
**0-points** where \(f = 0,\) i.e. all maxtems of the *POS
normal form* of \(f.\)

Function \(f(x, y, z)\) in three variables with SOP normal form

and truth table

\(x\) \(y\) \(z\) \(f\) minterm 0 0 0 1 \(\overline{x}\,\overline{y}\,\overline{z}\) 0 0 1 1 \(\overline{x}\,\overline{y}\,z\) 0 1 0 0 \(\overline{x}\,y\,\overline{z}\) 0 1 1 0 \(\overline{x}\,y\,z\) 1 0 0 0 \(x\,\overline{y}\,\overline{z}\) 1 0 1 1 \(x\,\overline{y}\,z\) 1 1 0 0 \(x\,y\,\overline{z}\) 1 1 1 1 \(x\,y\,z\)

has

on-set \(= \{\overline{x}\,\overline{y}\,\overline{z},\ \overline{x}\,\overline{y}\,z,\ x\,\overline{y}\,z,\ x\,y\,z\}\)

and

off-set \(= \{\overline{x}\,y\,\overline{z},\ \overline{x}\,y\,z,\ x\,\overline{y}\,\overline{z},\ x\,y\,\overline{z}\}\,.\)

The 3-cube representation of \(f\) is shown on the right. The 1-points of the on-set are drawn black and the 0-points of the off-set as circles.

Maurice Karnaugh discovered a projection of \(n\)-cubes into the
two dimensions of a sheet of paper. Figure 5.2 illustrates
the projection of the 3-cube into two dimensions. The four vertices
of the front square are mapped into the top row, and the four vertices
of the rear square into the bottom row. The edges of the squares form
a torus in the 2-dimensional projection. The **Karnaugh map**, or
**K-map** for short, draws the vertices of the cube as cells and omits
the edges. The crucial feature of the K-map is the arrangement of the
cells such that neighboring cells in the K-map correspond to adjacent
vertices in the cube, keeping in mind that the rows of the K-map wrap
around the torus. The coordinates of the cells of the K-map are
annotated as point coordinates. For example, at the intersection of
column \(x\,y = 01\) and row \(z=1\) we find the cell of point
\((x,y,z) = (0,1,1)\) corresponding to minterm
\(\overline{x}\,y\,z.\) Note that the column coordinates are not
in the order of binary numbers but form a *Gray code*, where exactly
one bit flips between neighboring cells. This property holds also for
the wrap-around between \(x\,y = 00\) and \(x\,y = 10,\) where
the \(x\)-coordinate changes and the \(y\) coordinate remains
constant. In fact, every Hamiltonian cycle [2] through the
3-cube and the corresponding K-map visits the vertices in an order
such that exactly one bit of the point coordinate flips on each step.

The K-maps for two and four variables are shown in Figure 5.3. They correspond to a 2-cube, i.e. a 2-dimensional square, and a 4-cube, which is easier to draw as a K-map than as a 4-dimensional cube.

We represent a Boolean function \(f\) in a K-map by marking each cell of the on-set of \(f\) with a 1 and each cell of the off-set of \(f\) with a 0. Figure 5.4 shows the K-maps for the Boolean functions of Example 5.1 and Example 5.2.

The 4-variable K-map has implicit wrap-arounds in both the horizontal and the vertical direction. K-maps for more than four variables exist, but identifying all neighboring cells that correspond to adjacent vertices in the \(n\)-cube becomes less intuitive as the number of variables grows. Therefore, we will use K-maps for Boolean functions with up to four variables only.

### 5.1.2. K-Map Minimization¶

In this section, we solve the *two-level minimization problem* graphically in a K-map. The key is
the geometric interpretation of the *combining theorem*.

On the right, we show the 3-cube of function \(f(x,y,z)=\overline{x}\,\overline{y}\,\overline{z} + \overline{x}\,\overline{y}\,z + x\,\overline{y}\,z + x\,y\,z\) in Example 5.2, with black vertices marking the 1-points of the on-set. Two adjacent 1-points constitute a 1-dimensional subcube of the 3-cube, for example the encircled 1-points \(\overline{x}\,\overline{y}\,z\) and \(x\,\overline{y}\,z.\) Observe that we can apply the combining theorem to the disjunction of the minterms of the 1-cube:

such that literal \(x\) vanishes on the *rhs*. If we substitute
\(\overline{y}\,z\) for the two minterms in \(f\):

then \(f\) remains in SOP form, although it is not an SOP normal
form any longer. More importantly, the SOP form is *smaller* than the
SOP normal form, because the substitution replaces two minterms with
one product, which has one literal less than each of the minterms.
Merging two minterms into one product term removes the variable that
appears in both complemented and uncomplemented form and retains the
common literals in the product term of the merged subcube. In the
example, we merge the minterms in the \(x\)-dimension, so that the
\(x\)-literals disappear and product \(\overline{y}\,z\) with
the common literals represents the merged subcube. This example
generalizes to higher dimensional subcubes. From the geometric point
of view, the key observation is that merging two adjacent
\((n-1)\)-cubes into an \(n\)-cube removes the literals of the
merged dimension. The larger the dimensionality of the cube, the fewer
literals has the associated product.

#### K-map Minimization Method¶

We assume that a Boolean function \(f\) is given in SOP normal
form. Then, the **K-map minimization method** consists of three
steps:

- Mark the on-set of \(f\) in the K-map with 1’s. Leave the cells of the off-set unmarked.
- Identify the maximal subcubes of the on-set.
- Identify the minimal cover of the on-set as a subset of the maximal subcubes.

We explain the K-map minimization method by means of examples. In Experiment 5.1, you can practice minimization with an interactive K-map.

Consider function

in two variables, given in SOP normal form. The corresponding 2-cube below marks each 1-point associated with a minterm as a black vertex. In the K-map, we mark the cells of the minterms with value 1. This is step 1 of the K-map minimization method.

In a 2-variable K-map no wrap-around is required to identify neighboring cells. We mark each application of the combining theorem in the K-map by encircling the adjacent cells corresponding to a 1-cube, as shown in both the 2-cube representation and the K-map:

The combining theorem yields:

The first application merges the 1-cube in the \(x\)-dimension, resulting in degenerate a product \(y\) that consists of a single literal only. The second application merges the \(y\)-dimension resulting in degenerate product \(x.\) Note that we use minterm \(x\,y\) in both applications of the combining theorem. In terms of Boolean algebra, we may express the minimization procedure as the transformation:

This algebraic transformation has a simple graphical equivalent that is easy to remember by itself. Merge neigboring 1-cells in the K-map, such that the resulting encircled cells form a subcube. A subcube in the K-map is a rectangle, perhaps a square of cells, and the number of encircled cells is a power of 2. The subcube represents a product term that we read off the K-map by including those coordinates that are unchanged. For example, the 1-cube in column \(x=1\) covers both rows of the K-map, the top row where \(y=0\) and the bottom row, where \(y=1.\) Since \(x\) is unchanged, the subcube represents the degenerate product \(x.\) Analogously, the 1-cube in row \(y=1\) covers both columns of the K-map and represents the degenerate product \(y.\) These two 1-cubes are the maximal subcubes, because extending the circles two cover the next larger cube, a 2-cube with four cells, would include the 0-point of cell \(\overline{x}\,\overline{y}.\) We have found two maximal subcubes, as required in step 2 of the K-map minimization method.

The algebraic transformation shows that we need to form the sum (disjunction) of the products representing the maximal subcubes to obtain the minimal SOP form \(f(x,y) = x + y.\) In this particular example, the two subcubes together cover all 1-cells of the on-set. In general, we wish to find the smallest subset of maximal subcubes that covers all 1-cells of the K-map. This is step 3 of the K-map minimization method.

Note that \(f(x,y) = x + y\) can be implemented with a single OR gate. Since no AND gates are required, we view the circuit as a degenerated two-level circuit that requires one level of logic gates only.

We minimize the Boolean function of Example 5.2:

The corresponding 3-cube and the K-map are shown below. The K-map has three maximal subcubes, each of which is a 1-cube that covers two neighboring 1-cells. Reading the products associated with the 1-cubes off the K-map, we find that the blue 1-cube changes in the \(x\)-dimension along the toroidal wrap-around and represents product \(\overline{y}\,z.\) The red 1-cube changes in the \(y\)-dimension and represents product \(xz.\) The green 1-cube changes in the \(z\)-dimension and represents product \(\overline{x}\,\overline{y}.\)

Step 3 of the K-map minimization method turns out to be slightly trickier in this example. If we form the sum of the products representing the maximal subcubes, we obtain the SOP form

This SOP form has cost \(\mathcal{C}(f) = 3 + (2 + 2 + 2) = 9.\) Notice in the K-map, however, that the blue subcube covers two 1-cells that are also covered by the other two subcubes. In particular, both the blue and the green subcubes cover the 1-cell of minterm \(\overline{x}\,\overline{y}\,z,\) and the blue and the red subcubes both cover the 1-cell of minterm \(x\,\overline{y}\,z.\) We conclude that the blue subcube is redundant, because the red and green subcubes include all 1-cells that the blue subcube covers. Therefore, a smaller SOP form for \(f\) is

with cost \(\mathcal{C}(f) = 2 + (2 + 2) = 6.\)

Since the red subcube is the only maximal subcube to cover the
1-cell of minterm \(xyz\) and the green subcube is the only
maximal subcube to cover the 1-cell of minterm
\(\overline{x}\,\overline{y}\,\overline{z},\) these two
subcubes cannot be removed without changing the logical function.
We say that a maximal subcube is **essential** if it is the only
one to cover a 1-cell in a K-map. The minimal cover must include
all essential maximal subcubes. Since the green subcube
\(\overline{x}\,\overline{y}\) and the red subcube \(x z\)
are essential maximal subcubes, we conclude that \(f(x,y,z) =
\overline{x}\,\overline{y} + x z\) is the minimal SOP form. The
implementation is a two-level circuit with two AND gates and one OR
gate. Note that \(\overline{y}\,z\) is the *consensus* of \(\overline{x}\,\overline{y}\) and
\(x z.\)

We wish to minimize Boolean function

The corresponding 3-cube and K-map are shown below. Identifying the maximal subcubes in the K-map yields seven 1-cubes that we can extend into two 2-cubes. The red subcube covers the four 1-cells in the bottom row, and the blue subcube is the square of four 1-cells wrapping around the rows. In the 3-cube, the two 2-cubes correspond to the botton and rear faces.

The red subcube changes in the \(x\) and \(y\)-dimensions but not in the \(z\)-dimension, and represents the degenerate product \(z.\) The blue subcube does not change in the \(y\)-dimension, and represents the degenerate product \(\overline{y}.\) Both subcubes are maximal and essential. Therefore, the minimal SOP form is \(f(x,y,z) = \overline{y} + z.\)

If we insist on minimizing a Boolean function algebraically rather
than using a K-map, consider the following Gedankenexperiment. We
can derive the minimization steps due to the *combining
theorem* in the reverse order starting with
the minimal SOP form, which is equivalent to performing the
*Shannon expansion*, plus subsequent
removal of duplicate minterms:

For logic minimization by means of Boolean algebra, we start with the last equality, the SOP normal form, and proceed in the opposite direction. First, we would duplicate minterms \(\overline{x}\,\overline{y}\,z\) and \(x\,\overline{y}\,z\) by idempotence. Then, we would merge minterms into 1-cubes by applying the combining theorem four times. At last, we would merge 1-cubes into 2-cubes by applying the combining theorem twice. What appears like divine foresight in the first step reduces to craftsmanship in the K-map method. We conclude that the K-map method suits digital circuit designers who have not acquired the powers of an oracle yet.

We wish to minimize Boolean function

The K-map of \(f\) is shown below. We find that \(f\) consists of six 1-cubes \(\overline{x}\,\overline{y},\) \(\overline{x}\,\overline{z},\) \(y\,\overline{z},\) \(x y,\) \(x z,\) and \(\overline{y} z.\)

All 1-cubes are maximal subcubes. Step 3 of the K-map minimization method, finding the minimal cover, is even trickier than in Example 5.4, because none of the maximal subcubes is essential. Every 1-cube covers 1-cells that are also covered by other 1-cubes. Thus, there is no obvious choice for the minimal cover. Instead, this function has two minimal covers

where \(f_1\) contains the blue subcubes and \(f_2\) the red subcubes. There exists no smaller cover, because the red and blue subsets of the set of maximal subcubes cover each 1-cell of the K-map exactly once. Both covers have minimal cost \(\mathcal{C}(f_1) = \mathcal{C}(f_2) = 3 + (2 + 2 + 2) = 9,\) and can be implemented with two-level circuits using three AND gates and one OR gate. If the minimal cover is not unique, the cost function provides no further guidance other than choosing one of them arbitrarily.

We wish to minimize 4-variable function

The 4-variable K-map is shown below. A 4-variable K-map wraps around horizontally and vertically. The blue subcube \(y\) is a 3-cube. The green subcube \(\overline{w}\,\overline{z}\) is a 2-cube that wraps around vertically. The red 2-cube \(\overline{x}\,\overline{z}\) is the only subcube of a 4-variable K-map that wraps around horizontically and vertically.

All three subcubes are maximal and essential. Therefore, the minimal cover is unique and yields SOP form:

Overlooking the red subcube of the four corner cells is a common beginner’s mistake.

You may use this K-map to minimize Boolean functions with three or four variables in three steps: (1) define the function by selecting the 1-cells of the on-set, (2) group the 1-cells into prime implicants (maximal subcubes), and (3) choose the prime implicants for the minimal cover.

Consider 3-variable function \(f(x, y, z) = \overline{x}\,y\,\overline{z} + \overline{(x + y)} + x\,z\,.\)

- Identify all maximal subcubes (also called
*prime implicants*). - Identify the essential maximal subcubes (also called
*essential prime implicants*). - Determine all minimal covers of \(f\).

We wish to represent \(f\) in a K-map to identify all maximal subcubes. To that end we need to determine all minterms of the on-set of \(f\). First, we apply

*De Morgan’s theorem*to transform \(f\) into sum-of-products form:\[\begin{eqnarray*} f(x, y, z) &=& \overline{x}\,y\,\overline{z} + \overline{(x + y)} + x\,z \\ &=& \overline{x}\,y\,\overline{z} + \overline{x}\,\overline{y} + x\,z\,. \end{eqnarray*}\]Then, we apply

*Shannon expansions*to transform each product into minterms. Note that the first product is a minterm already. We expand the second product about \(z\) and the third product about \(y\):\[\begin{eqnarray*} f(x, y, z) &=& \overline{x}\,y\,\overline{z} + \overline{x}\,\overline{y} + x\,z \\ &=& \overline{x}\,y\,\overline{z} + (\overline{x}\,\overline{y}\,\overline{z} + \overline{x}\,\overline{y}\,z) + (x\,\overline{y}\,z + x\,y\,z)\,. \end{eqnarray*}\]The K-map with the five minterms of the on-set and all maximal subcubes is:

The maximal subcubes are the orange subcube \(\overline{x}\,\overline{z},\) the green subcube \(\overline{x}\,\overline{y},\) the blue subcube \(\overline{y}\,z,\) and the red subcube \(x\,z.\)

The essential maximal subcubes are the orange and the red subcubes \(\overline{x}\,\overline{z}\) and \(x\,z.\) The green and blue subcubes are not essential. They are redundant prime implicants.

The minimal cover of \(f\) must include all essential subcubes, here the orange and red subcubes. These subcubes leave a single 1-cell uncovered, the 1-cell in the bottom left corner. To cover this 1-cell, we may use either the green or the blue subcube. Since they incur the same cost, function \(f\) has two minimal covers:

\[\begin{eqnarray*} f(x, y, z) &=& \overline{x}\,\overline{z} + x\,z + \overline{x}\,\overline{y} \\ &=& \overline{x}\,\overline{z} + x\,z + \overline{y}\,z\,. \end{eqnarray*}\]Both minimal covers have cost 9.

Consider the SOP normal forms:

- Identify the on-sets in the associated \(n\)-cubes.
- Represent the functions by means of K-maps.
- Minimize the functions:
- identify all maximal subcubes (also called
*prime implicants*), - identify the essential maximal subcubes (also called
*essential prime implicants*), - find a minimal cover and its associated SOP expression.

- identify all maximal subcubes (also called
- Determine the cost reductions due to two-level minimization.

Each minterm of an SOP normal form represents an element of the on-set of a function. The on-set of function \(f(A,B)\) is

\[\text{on-set}(f) = \{ \overline{A}\,\overline{B}\,, \overline{A}\,B\,, A\,B \}\,.\]Since \(f\) is a function in two variables, \(A\) and \(B,\) the on-set of \(f\) are the 1-points of \(f\) which are a subset of the vertices of a 2-cube. In the Figure below, we draw the three 1-points of \(f(A,B)\) as black vertices, and the 0-point as a circle. The on-sets of \(g(A,B,C)\) on the 3-cube and \(h(A,B,C,D)\) on the 4-cube are drawn analogously.

The \(n\)-cubes in (a) have the corresponding K-maps shown below. We have marked the 1-cells associated with the minterms of the functions only. Blank cells represent 0-points.

We apply the K-map method to minimize the functions, starting with \(f(A,B).\)

The K-map of function \(f\) on the right contains two prime implicants. They are the largest subcubes of the 2-cube that contains \(f.\) The subcubes of a 2-cube are 1-cubes represented by two adjacent cells in the K-map, and 0-cubes represented by a single cell of the K-map. The blue prime implicant (1-cube) consists of the column where \(A = 0,\) and the red prime implicant (1-cube) consists of the row where \(B = 1.\) Therefore, the prime implicants of \(f\) are \(\overline{A}\) and \(B.\) Both prime implicants are essential, because the blue prime implicant \(\overline{A}\) is the only one to cover 1-cell \(\overline{A}\,\overline{B}\) and the red prime implicant \(B\) is the only one to cover 1-cell \(A\,B.\) Therefore, both prime implicants must be part of the minimal cover. Since together both prime implicants cover all 1-cells of \(f,\) we have found the minimal cover:

\[f(A,B) = \overline{A} + B\,.\]As a 3-variable function, \(g(A,B,C)\) occupies a 3-cube. The K-map on the right shows three prime implicants, each of which covers two adjacent 1-cells and hence correspond to 1-cubes. The next larger subcube would be a 2-cube of four adjacent 1-cells, that correspond to the four corners of one of the six faces of the 3-cube shown in (a). The four 1-cells of \(g\) to not form a 2-cube, though. The blue prime implicant covers 1-cells where \(C=1\) and \(A=0.\) Variable \(B\) changes, and can be eliminated by the combining theorem. Thus, the blue prime implicant is \(\overline{A}\,C.\) The red prime implicant is \(A\,B,\) because it covers 1-cells where \(A=1\) and \(B=1,\) while \(C\) changes. Similarly, the green prime implicant is \(B\,C,\) because it covers two 1-cells where \(B=1\) and \(C=1\) while \(A\) changes. The blue prime implicant is essential, because it is the only one to cover 1-cell \(\overline{A}\,\overline{B}\,C.\) The red prime implicant is essential too, because is is the only prime implicant to cover 1-cell \(A\,B\,\overline{C}.\) In contrast, the green prime implicant is not essential because both 1-cells it covers are covered by another prime implicant as well. Therefore, the blue and red essential prime implicants must be part of the minimal cover. Since the blue and red prime implicants cover all 1-cells of \(g,\) they are also sufficient for the minimal cover:

\[g(A,B,C) = A\,B + \overline{A}\,C\,.\]Function \(h(A,B,C,D)\) is a 4-variable function. The K-map on the right identifies three prime implicants. Each prime implicant covers four 1-cells, and corresponds to a 2-cube. The blue prime implicant covers 1-cells where \(A = 1\) and \(C = 1,\) while \(B\) and \(D\) change. Two applications of the combining theorem are necessary to eliminate variables \(B\) and \(D\) from the sum of minterms, and arrive at prime implicant \(A\,C.\) The red prime implicant wraps around vertically, and covers 1-cells where \(A=1\) and \(D=0\) while \(B\) and \(C\) change. Therefore, the red prime implicant is \(A\,\overline{D}.\) The green prime implicant wraps around both directions vertically and horizontally. It covers 1-cells where \(B=0\) and \(D=0\) while \(A\) and \(C\) change. Thus, the green prime implicant is \(\overline{B}\,\overline{D}.\) We notice that all three prime implicants are essential, because each of them covers at least one 1-cell that no other prime implicant covers. Therefore, the minimal cover of \(h\) is

\[h(A,B,C,D) = A\,C + A\,\overline{D} + \overline{B}\,\overline{D}\,.\]The cost reduction of the minimization is the difference of the costs of the SOP normal form and the minimal cover. The cost of an SOP form, normal or not, is the number of products plus the number of literals of all products. For function \(f,\) we save 7 units of cost:

\[\begin{eqnarray*} \mathcal{C}(\text{sop}(f)) &=& 3 + 3 \cdot 2 = 9 \\ \mathcal{C}(\text{mc}(f)) &=& 2 + 0 \cdot 1 = 2 \\ \Delta \mathcal{C}(f) &=& 9 - 2\ =\ 7\,. \end{eqnarray*}\]Minimization of function \(g\) saves 10 units of cost:

\[\begin{eqnarray*} \mathcal{C}(\text{sop}(g)) &=& 4 + 4 \cdot 3 = 16 \\ \mathcal{C}(\text{mc}(g)) &=& 2 + 2 \cdot 2 = 6 \\ \Delta \mathcal{C}(g) &=& 16 - 6\ =\ 10\,. \end{eqnarray*}\]The largest savings occur for function \(h\):

\[\begin{eqnarray*} \mathcal{C}(\text{sop}(h)) &=& 8 + 8 \cdot 4 = 40 \\ \mathcal{C}(\text{mc}(h)) &=& 3 + 3 \cdot 2 = 9 \\ \Delta \mathcal{C}(h) &=& 40 - 9\ =\ 31\,. \end{eqnarray*}\]

Derive all minimal covers of Boolean function:

Function \(f\) has 12 *prime implicants* (maximal subcubes):

none of which is essential. There are 32 minimal covers with cost \(\mathcal{C}(f) = 24\):

#### Dual K-map Minimization Method¶

The *K-map minimization method* can
be used not only to determine a minimal SOP form but also for a
minimal POS form. The procedure is based on the dual *combining
theorem*. From the geometric perspective, we
combine subcubes of the off-set rather than subcubes of the on-set of
a function.

Recall function \(f(x, y, z)\) of Example 5.2 with SOP normal form

The truth table below specifies \(f,\) and lists both minterms and maxterms associated with the combination of input values in each row.

\(x\) \(y\) \(z\) \(f\) minterm maxterm 0 0 0 1 \(\overline{x}\,\overline{y}\,\overline{z}\) \(x + y + z\) 0 0 1 1 \(\overline{x}\,\overline{y}\,z\) \(x + y + \overline{z}\) 0 1 0 0 \(\overline{x}\,y\,\overline{z}\) \(x + \overline{y} + z\) 0 1 1 0 \(\overline{x}\,y\,z\) \(x + \overline{y} + \overline{z}\) 1 0 0 0 \(x\,\overline{y}\,\overline{z}\) \(\overline{x} + y + z\) 1 0 1 1 \(x\,\overline{y}\,z\) \(\overline{x} + y + \overline{z}\) 1 1 0 0 \(x\,y\,\overline{z}\) \(\overline{x} + \overline{y} + z\) 1 1 1 1 \(x\,y\,z\) \(\overline{x} + \overline{y} + \overline{z}\)

The *POS normal form* is the product of those
maxterms where \(f(x,y,z) = 0\):

Function \(f\) assumes value 0 if one of its maxterms is 0. Since the on-set of a function is the set of 1-points, the on-set of \(f\) is the set of maxterms that do not appear in the the POS normal form:

on-set \(= \{x + y + z,\ x + y + \overline{z},\ \overline{x} + y + \overline{z},\ \overline{x} + \overline{y} + \overline{z}\}\,,\)

and the off-set is the set of maxterms that do appear in the POS normal form:

off-set \(= \{x + \overline{y} + z,\ x + \overline{y} + \overline{z},\ \overline{x} + y + z,\ \overline{x} + \overline{y} + z\}\,.\)

From the geometric point of view, the maxterms are the 0-points in the 3-cube representation of \(f.\) The dual combining theorem applies to adjacent 0-points, for instance

We interpret the application of the combining theorem as merging two adjacent 0-points in the \(x\)-dimension into 1-cube \(\overline{y} + z\) by eliminating the \(x\)-literals in complemented and uncomplemented form and retaining the common literals. The geometric interpretation is identical to the interpretation in the K-map minimization method, except that we perform the minimization on the off-set of \(f\) rather than the on-set. Since the K-map minimization method is independent of the interpretation of its cells as minterms or maxterms, we can use the K-map minimization method as is to derive the minimal SOP or the minimal POS form.

In Example 5.4 we derive the minimal SOP form of Boolean function

Here, we derive the minimal POS form. The off-set of \(f\) consists of its 0-points. In the 3-cube, the 0-points are the vertices marked with circles. Since \(f\) is given in SOP normal form, we may populate the K-map by first marking each cell associated with a minterm of \(f\) with a 1, and then all remaining cells with a 0. After erasing the 1’s, we obtain the 0-cells of the off-set shown below. Note that K-map coordinates of a maxterm correspond to complemented literals. For example, maxterm \(x + \overline{y} + z\) has coordinates \(x=0,\) \(y=1,\) and \(z=0.\)

According to step 2 of the K-map minimization method we find the maximal subcubes of the off-set. We read the sums associated with the three 1-cubes off the K-map. The blue 1-cube changes in the \(x\)-dimension. Since the unchanged coordinates are \(y=1\) and \(z=0,\) the 1-cube represents sum \(\overline{y} + z.\) The red 1-cube changes in the \(y\)-dimension, and represents sum \(\overline{x} + z.\) The green 1-cube changes in the \(z\)-dimension, and represents sum \(x + \overline{y}.\)

Following step 3 of the K-map minimization method we determine the minimal cover by observing that the red and green subcubes are essential, because they are the only subcubes covering one 0-cell each. In contrast, the blue subcube is redundant, because it covers no 0-cell that is not also covered by another subcube. Since the essential red and green subcubes cover the entire off-set, they form the unique minimal cover with corresponding minimal POS form

This POS form minimizes the cost among all POS forms. The
*cost* of a POS expression is the number of
literals plus the number of sums. Thus, the cost of the minimal
POS form of \(f\) is \(\mathcal{C}(f) = 2 + (2 + 2) = 6.\)
The resulting circuit is a two-level OR-AND circuit with two OR
gates and one AND gate. Note that the minimal SOP form in
Example 5.4 has the same cost. In general,
the costs of the minimal forms differ as the next example
demonstrates.

We derive the minimal POS form for Boolean function

of Example 5.7 with a unique minimal SOP form of cost \(\mathcal{C}(f) = 3 + (1 + 2 + 2) = 8.\)

The 4-variable K-map below shows the maximal subcubes of the off-set of \(f.\) The blue subcube is a 2-cube that is unchanged in the \(y\)-dimension and the \(z\)-dimension. The corresponding sum is \(y + \overline{z}.\) The red 1-cube changes in the \(z\)-dimension, and represents sum \(\overline{w} + \overline{x} + y.\)

Both subcubes are maximal and essential. Therefore, the minimal cover is unique, and the minimal POS form is

The cost of the minimal POS form is \(\mathcal{C}(f) = 2 + (2 + 3) = 7,\) which is by 1 smaller than the cost of the minimal SOP form. We conclude that the minimal POS form is superior to the minimal SOP form with respect to the cost metric of two-level logic minimization. The corresponding OR-AND circuit requires three gates, and is smaller than the AND-OR circuit with four gates.

Two-level logic minimization is a powerful tool for digital circuit designers. However, Example 5.9 emphasizes that the K-map minimization method solves a rather narrowly posed problem. If we wish to find the smallest two-level circuit of a Boolean function, we have the choice between an OR-AND and an AND-OR form. To find the smaller one, we need to solve two logic minimization problems, one to derive the minimal SOP form and the other to derive the minimal POS form, and then compare the costs of the minimal forms.

There exist Boolean functions where logic minimization is powerless.
The XOR and XNOR functions are examples, where the minimal SOP equals
the SOP normal form and the minimal POS equals the POS normal form,
and the cost of both forms are equal. The 3-input XOR or *parity
function* \(P(x,y,z) = x \oplus y \oplus z,\)
for instance, has these K-maps for the on-set shown on the left and
for the off-set on the right:

All maximal subcubes are 0-cubes covering a single minterm or maxterm
only. Both minimal two-level forms SOP and POS have cost
\(\mathcal{C}(P) = 16,\) which is the maximum cost of the minimal
two-level forms among all 3-variable functions. This is a sober
indication as to why small and fast *XOR gates*
are hard to design.

#### Incompletely Specified Functions¶

An **incompletely specified function** is unspecified for a subset of
its input combinations. There are two common design scenarios that
lead to incompletely specified functions. First, we may define a
function assuming that its inputs are constrained. For example, we
may define a 3-variable function \(f(x,y,z)\) assuming that
\(y = \overline{x},\) because we do not want to include the
negation of \(x\) inside the circuit module for \(f.\) Thus,
we assume that input combinations \(x=y=0\) and \(x=y=1\)
never occur, so we do not need to define the corresponding function
values. In fact, since we do not care whether the unspecified
function values are 0 or 1, we introduce symbol \(X\) to denote a
**don’t care** value. Interpret \(X\) as a Boolean wildcard
character that we may replace with constant 0 or 1 as we please. The
second common design scenario enables us to exclude certain input
combinations from a function, because we can guarantee that the driver
circuits never output these input combinations. We illustrate this
scenario by designing a *seven-segment display* decoder below.

Incompletely specified functions introduce flexibility into the
circuit design process that we exploit when performing logic
minimization. More succinctly, an incompletely specified function
with \(k\) don’t cares can be viewed as a set of \(2^k\)
distinct completely specified Boolean functions. Among the minimal
two-level forms of all these functions we wish to find the minimal
one. K-maps support this extended minimization problem almost
effortlessly. During step 1 of the *K-map minimization method* we mark all unspecified cells with don’t
care symbol \(X.\) The decisive difference occurs in step 2.
When maximizing the subcubes, we interpret the \(X\)‘s either as 0
or 1, so as to increase the size of the subcubes where possible.
Covering a don’t care cell with a subcube binds its value in the
minimal cover to 0 or 1, so that the resulting two-level form is a
completely specified Boolean function.

We illustrate the logic minimization of incompletely specified
functions with the K-map minimization method by designing a decoder
for a **seven-segment display**. If you have a digital wristwatch or
alarm clock, you probably have a seven-segment display. Each of the
ten decimal digits is displayed by illuminating the corresponding
subset of its seven segments \(S_0, S_1, \ldots, S_6\):

The input of a seven-segment decoder is a 4-bit **BCD** code word,
short for **binary coded decimal**, and the outputs are seven signals
\(S_0, S_1, \ldots, S_6,\) one for each segment of the display.
If a segment signal is 1, the segment illuminates. The BCD code
represents the decimal digits with their *binary number*. Since 10 digits require \(\lceil\lg 10\rceil
= 4\) bits in binary representation, the BCD code uses the first
ten of the sixteen 4-bit binary numbers only:

decimal BCD 0 0 0 0 0 1 0 0 0 1 2 0 0 1 0 3 0 0 1 1 4 0 1 0 0 5 0 1 0 1 6 0 1 1 0 7 0 1 1 1 8 1 0 0 0 9 1 0 0 1

Our first step in the design of a combinational seven-segment decoder is the specification of a truth table. For each 4-bit input \(A = A_3\,A_2\,A_1\,A_0\) in range \([0,9],\) we specify whether each of the outputs \(S_0, S_1, \ldots, S_6\) is 0 or 1. For the remaining inputs, where \(A\) is in range \([10,15],\) we don’t care whether the outputs are 0 or 1, silently assuming that the inputs are guaranteed to be legal BCD code words. Thus, we assign don’t cares to the corresponding outputs. We could also assign arbitrary 0 or 1 values, but that would restrict the logic minimization potential unnecessarily.

\(A_3\) \(A_2\) \(A_1\) \(A_0\) \(S_0\) \(S_1\) \(S_2\) \(S_3\) \(S_4\) \(S_5\) \(S_6\) 0 0 0 0 0 1 1 1 0 1 1 1 1 0 0 0 1 0 0 1 0 0 1 0 2 0 0 1 0 1 0 1 1 1 0 1 3 0 0 1 1 1 0 1 1 0 1 1 4 0 1 0 0 0 1 1 1 0 1 0 5 0 1 0 1 1 1 0 1 0 1 1 6 0 1 1 0 0 1 0 1 1 1 1 7 0 1 1 1 1 0 1 0 0 1 0 8 1 0 0 0 1 1 1 1 1 1 1 9 1 0 0 1 1 1 1 1 0 1 0 10 1 0 1 0 X X X X X X X 11 1 0 1 1 X X X X X X X 12 1 1 0 0 X X X X X X X 13 1 1 0 1 X X X X X X X 14 1 1 1 0 X X X X X X X 15 1 1 1 1 X X X X X X X

The second step of the design process is logic minimization. The decoder has seven output functions, each of which we minimize independent of the others. Below, the 4-variable K-map on the left shows of the on-set and the don’t cares of function \(S_0(A_0, A_1, A_2, A_3).\) The K-map in the middle demonstrates the lack of minimization potential if we exclude the don’t care cells from the minimization process. In contrast, the K-map on the right exploits the don’t cares to minimize the cost of the SOP form.

The K-map in the middle shows the maximal subcubes assuming that the don’t cares are not included in the on-set, effectively treating all don’t cares as 0-cells. We find two essential subcubes \(\overline{A}_3\,A_2\,A_0\) and \(A_3\,\overline{A}_2\,\overline{A}_1,\) and one of the three minimal covers results in minimal SOP form \(S_0 = \overline{A}_3\,A_2\,A_0 + A_3\,\overline{A}_2\,\overline{A}_1 + \overline{A}_3\,\overline{A}_2\,A_1 + \overline{A}_3\,\overline{A}_2\,\overline{A}_0\) with cost \(\mathcal{C}(S_0) = 16.\) If we take the liberty to replace a don’t care with a 1 whenever it permits increasing the size of a subcube, then we obtain the maximal subcubes in the K-map shown on the right. We interpret don’t care cell \(A_3\,\overline{A}_2\,A_1\,\overline{A}_0\) in the bottom right corner as a 1-cell, which allows us grow 1-cube \(\overline{A}_3\,\overline{A}_2\,\overline{A}_0\) into 2-cube \(\overline{A}_2\,\overline{A}_0\) covering the four corner cells. Likewise, interpreting don’t care cell \(A_3\,\overline{A}_2\,A_1\,A_0\) as a 1-cell permits growing 1-cube \(\overline{A}_3\,\overline{A}_2\,A_1\) into 2-cube \(\overline{A}_2\,A_1\) by wrapping around vertically. Analogously, we grow 1-cube \(\overline{A}_3\,A_1\,A_0\) into 2-cube \(A_1\,A_0,\) 1-cube \(\overline{A}_3\,A_2\,A_0\) into 2-cube \(A_2\,A_0,\) and 1-cube \(A_3\,\overline{A}_2\,\overline{A}_1\) into 3-cube \(A_3.\) There are two minimal covers, one of which corresponds to minimal SOP form \(S_0 = A_3 + A_2\,A_0 + \overline{A}_2\,\overline{A}_0 + A_1\,A_0\) with cost \(\mathcal{C}(S_0) = 11.\) The cost reduction is due to the implicit replacement of all don’t cares with value 1.

The K-maps for the other six output functions are shown below with a minimal cover representing a minimal SOP form.

Note that in the K-maps of functions \(S_2,\) \(S_4,\) \(S_5,\) and \(S_6\) we interpret only a subset of the don’t care cells as 1-cells to obtain a minimal cover. The remaining don’t care cells are implicitly interpreted as 0-cells. We leave it as an exercise to determine the minimal POS forms for each output function of the seven-segment decoder.

We are given Boolean function

with don’t care conditions:

- Use a K-map to minimize \(f.\)
- Determine the cost reduction due to minimization.

We begin by translating the minimization problem into a K-map. We note that \(f\) is a 4-variable function, which requires the K-map of a 4-cube:

Function \(f\) is given in SOP normal form. Therefore, each of the three minterms specifies a 1-cell in the K-map. In addition, we are given don’t care conditions in form of minterms. We mark the corresponding cells in the K-map with don’t care symbol X.

Next, we determine the

*prime implicants*(maximal subcubes). We exploit the don’t care cells by interpreting them as 1-cells where convenient to enlarge a prime implicant. Our strategy is to start with a single 1-cell, and expand this 0-cube into the highest dimensional cube possible covering 1’s or X’s. For example, start with 1-cell \(\overline{A}\,\overline{B}\,C\,\overline{D}\) in the top-right corner. We can expand this 0-cube into a 1-cube by including 1-cell \(A\,\overline{B}\,C\,\overline{D}\) in the bottom-right corner. Then, we include the X-cells to the left to obtain the red prime implicant, which is a 2-cube. Alternatively, we may start with 1-cell \(\overline{A}\,\overline{B}\,C\,\overline{D},\) and expand it into the blue prime implicant. The only other prime implicant is the green prime implicant that covers the bottom row, if we start the expansion from the 1-cell in the bottom-left or the bottom-right corner. Thus, the prime implicants of function \(f\) are:\[\overline{B}\,C\ \text{(red)},\ \overline{A}\,C\ \text{(blue)},\ A\,\overline{B}\ \text{(green)}\,.\]The green prime implicant is essential, because it is the only one to cover 1-cell \(A\,\overline{B}\,\overline{C}\,\overline{D}.\) The blue prime implicant is not essential, because the only 1-cell it covers is also covered by the red prime implicant. Similarly, the red prime implicant is not essential, because the two 1-cells it covers are also covered by the blue and green prime implicants. Therefore, we find two minimal covers for function \(f\):

\[\begin{eqnarray*} f(A,B,C,D) &=& A\,\overline{B} + \overline{B}\,C \\ &=& A\,\overline{B} + \overline{A}\,C\,. \end{eqnarray*}\]Note that the minimal covers fix all X’s to be either 1’s or 0’s. For example, minimal cover \(A\,\overline{B} + \overline{B}\,C\) interprets the X-cells covered by the green and red prime implicants as 1-cells, i.e. forces don’t care conditions \(A\,\overline{B}\,\overline{C}\,D,\) \(A\,\overline{B}\,C\,D,\) and \(\overline{A}\,\overline{B}\,C\,D\) into 1-cells, and interprets all other X-cells as 0-cells. In this interpretation the blue cover would not be a prime implicant of \(f,\) because it would cover two 0-cells.

The cost of function \(f\) in SOP normal form is

\[\mathcal{C}(\text{sop}(f)) = 3 + 3 \cdot 4 = 15\,.\]Both minimal covers are SOP forms with cost

\[\mathcal{C}(\text{mc}(f)) = 2 + 2 \cdot 2 = 6\,.\]The cost reduction due to minimization is the difference

\[\Delta \mathcal{C}(f) = 15 - 6 = 9\]cost units.

#### Truth Table Compaction¶

When circuit designers don’t care about some of the output values of a
combinational circuit, they exploit the don’t care outputs of the
*incompletely specified function* to minimize the corresponding
two-level circuits. There are also situations, where circuit
designers don’t care about some of the input values of a combinational
circuit. In the following we show how to exploit don’t care inputs to
reduce the number of rows of a truth table of a function. The more
compact a truth table the higher are our chances to deduce the minimal
two-level logic function without even drawing a K-map. We illustrate
the use of truth table compaction by means of examples.

We wish to design a combinational circuit with two data inputs
\(A\) and \(B,\) plus an *enable* input \(EN,\) such
that output \(Y\) computes the implication of \(A\) and \(B\) if the enable
input is 1, and output \(Y\) is constant 0 if the circuit is
disabled, i.e. the enable input is 0.

As a first step towards a combinational circuit, we specify the truth table to formalize the problem statement. Observe that we require that output \(Y=0\) when the circuit is disabled. Thus, we don’t care about inputs \(A\) and \(B\) when \(EN = 0.\) Only when \(EN=1,\) do we compute the implication of \(A\) and \(B.\) We use a asterisk \(*\) to mark don’t care inputs \(A\) and \(B\) when \(EN=0\) in the compact truth table:

\(EN\) \(A\) \(B\) \(Y\) 0 \(*\) \(*\) 0 1 0 0 1 1 0 1 1 1 1 0 0 1 1 1 1

The \(*\) symbol signifies that the input combination covers
all Boolean values, 0 and 1, whereas don’t care output symbol
\(X\) denotes the choice of one of 0 or 1. The compact truth
table enables us to conclude that the minimal SOP form for output
\(Y\) is \(Y = EN\cdot\overline{A} + EN\cdot B,\) because
\(Y\) is 1 if \(EN=1\) and if implication
\(\overline{A} + B\) is 1, i.e. if \(Y = EN \cdot
(\overline{A} + B).\) The latter expression is a minimal POS form
that is even less costly than the minimal SOP form that we obtain
by applying the *distributivity theorem*
to distribute \(EN\) over the disjunction.

We double check this argument by expanding the compact truth table into a complete truth table and use the K-map minimization method to derive the minimal SOP form. To that end we translate the specification into a complete truth table without don’t care inputs. In all rows where \(EN=0\) we set \(Y=0.\)

\(EN\) \(A\) \(B\) \(Y\) 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 1 1 0 1 1 1 1 0 0 1 1 1 1

The 3-variable K-map of the on-set of \(Y\) reveals two essential maximal subcubes.

The unique minimal SOP form is \(Y = EN\cdot\overline{A} + EN\cdot B,\) in accordance with our derivation from the compact truth table.

We wish to design an 8-bit priority encoder. Recall that a
*priority encoder* is a basic combinational
circuit with \(n\) inputs \(A_i\) and \(n\) outputs
\(Y_i,\) where \(0 \le i < n,\) such that

Designing a combinational circuit with \(n=8\) inputs exceeds the applicability of the K-map minimization method. However, the minimization problem becomes tractable with a compact truth table. The key insight is that an input pattern with leading zeros, i.e. \(A_j = 0\) for \(j < i\) and \(A_i=1\) determines all outputs independent of the remaining input values \(A_j\) for \(j > i.\) In other words, we don’t care about the inputs values beyond the first 1. This insight translates into a compact truth table for the 8-bit priority encoder:

\(A_0\) \(A_1\) \(A_2\) \(A_3\) \(A_4\) \(A_5\) \(A_6\) \(A_7\) \(Y_0\) \(Y_1\) \(Y_2\) \(Y_3\) \(Y_4\) \(Y_5\) \(Y_6\) \(Y_7\) 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 \(*\) 0 0 0 0 0 0 1 0 0 0 0 0 0 1 \(*\) \(*\) 0 0 0 0 0 1 0 0 0 0 0 0 1 \(*\) \(*\) \(*\) 0 0 0 0 1 0 0 0 0 0 0 1 \(*\) \(*\) \(*\) \(*\) 0 0 0 1 0 0 0 0 0 0 1 \(*\) \(*\) \(*\) \(*\) \(*\) 0 0 1 0 0 0 0 0 0 1 \(*\) \(*\) \(*\) \(*\) \(*\) \(*\) 0 1 0 0 0 0 0 0 1 \(*\) \(*\) \(*\) \(*\) \(*\) \(*\) \(*\) 1 0 0 0 0 0 0 0

The next step is to deduce the minimal output functions \(Y_i\) for \(0 \le i < 8\) from the compact truth table. We note that the column of output \(Y_0\) is identical to column \(A_0,\) and conclude that \(Y_0 = A_0.\) This function corresponds to a degenerate two-level circuit without any logic gates. For output \(Y_1,\) we notice that the function values are equal to \(A_1\) where input \(A_0 = 0,\) and \(A_1 = 0\) where \(A_0 = 1,\) for all combinations of input values of \(A_2, A_3, \ldots, A_7.\) We conclude that output \(Y_1\) is independent of variables \(A_j\) for \(j>1.\) Since \(Y_1\) is 1 only if \(A_0 = 0\) and \(A_1 = 1,\) we know that \(Y_1\) consists of one maximal subcube only, i.e. \(Y_1 = \overline{A}_0\,A_1.\) The corresponding circuit is a degenerate two-level circuit consisting of a single 2-input AND gate. Analogously, output \(Y_2\) is 1 only if \(A_0 = 0,\) \(A_1 = 0,\) and \(A_2 = 1,\) independent of the remaining inputs. We conclude that \(Y_2\) consists of a single subcube \(Y_2 = \overline{A}_0\,\overline{A}_1\,A_2.\) The complete list of output functions is then

Each output function consists of a single maximal subcube only. Since the subcube is essential, the minimal cover is unique.

### 5.1.3. Algebra of Minimization¶

The *K-map minimization method* is an
insightful tool for paper-and-pencil design of combinational
circuits. However, when designing combinational circuits with more
than four inputs, we employ algorithmic methods. In this section, we
discuss the algebraic treatment of the two-level minimization method,
because the design of a minimization algorithm hinges on a concise
problem formulation.

We begin by formalizing the notion of covering a Boolean function.
Given two Boolean functions \(f\) and \(g,\) we say that
\(f\) **covers** \(g\) if \(f \ge g,\) i.e. if \(f\)
is 1 for all input combinations where \(g\) is 1, and possibly
more. If \(f\) covers \(g\) and, furthermore, \(g\)
covers \(f,\) then \(f\) equals \(g.\) We may express
this insight in terms of Boolean operations as

The use of the two equal signs in such logical expressions is often
perceived as confusing. To obtain an unambiguous Boolean equation,
for example for a proof by perfect induction, we can replace equality
\(f = g\) on the *rhs* with an XNOR operation:

Recall that the Boolean magnitude comparison \(f \le g\) can be interpreted as implication of formal logic, written \(f \Rightarrow g.\) Analogously, \(f \ge g\) is the converse implication of formal logic, written as \(f \Leftarrow g.\)

\(f\) \(g\) \(\le\), \(\Rightarrow\) \(\ge\), \(\Leftarrow\) 0 0 1 1 0 1 1 0 1 0 0 1 1 1 1 1

Therefore, if \(f\) covers \(g\) then \(g\) **implies**
\(f,\) and vice versa. Covering and implication are two sides of
the same coin.

In general, when two Boolean functions \(f\) and \(g\) are given as arbitrary Boolean expressions, it is difficult to determine whether \(f\) covers \(g\) or, equivalently, whether \(g\) implies \(f.\) For example, given

to determine that \(f \ge g,\) we might use Boolean algebra,
perfect induction, or K-maps to compare the on-sets of \(f\) and
\(g.\) However, in case where the expressions of \(f\) and
\(g\) are **products of literals**, i.e. conjunctions of the
complemented or uncomplemented variables, then it is easy to determine
whether \(f\) covers \(g.\) For example, given the products
of literals

we find that \(g\) covers \(f,\) because \(g\) is a 1-cube
that covers 0-cube \(f.\) Thus, we have \(g \ge f\) or
\(f \Rightarrow g.\) In general, if a product of literals
\(f\) implies function \(g,\) we say that \(f\) is an
**implicant** of \(g.\) For example \(f(x,y,z) =
\overline{x}\,y\,z\) is an implicant of \(g(x,y,z) =
\overline{x}\,y + y\,z.\) The fewer literals an implicant has, the
larger it is w.r.t. the \(\ge\) order. Since \(g =
\overline{x}\,y + y\,z \ge \overline{x}\,y \ge \overline{x}\,y\,z =
f,\) we find that \(f\) is an implicant of \(\overline{x}\,y,\)
and \(\overline{x}\,y\) is an implicant of \(g.\)
Furthermore, we have \(g = \overline{x}\,y + y\,z \ge y\,z \ge
\overline{x}\,y\,z = f,\) so that \(f\) is also an implicant of
\(y\,z,\) and \(y\,z\) is an implicant of \(g.\)

We call an implicant **prime implicant** if it implies no other
implicant. Prime implicants are the largest implicants of a function.
For example, \(g = \overline{x}\,y + y\,z\) has three minterms or
0-cubes, \(x\,y\,z,\) \(\overline{x}\,y\,\overline{z},\) and
\(\overline{x}\,y\,z,\) and two 1-cubes, \(\overline{x}\,y\)
and \(y\,z.\) The 0-cubes are implicants but not prime, because
for each 0-cube there is a larger 1-cube, \(x\,y\,z \le y\,z,\)
\(\overline{x}\,y\,\overline{z} \le \overline{x}\,y,\) and
\(\overline{x}\,y\,z \le \overline{x}\,y.\) Both 1-cubes are
prime implicants, because \(g\) has no larger implicants.
Formally, \(p\) is a prime implicant of \(g,\) if no other
implicant \(f\) of \(g\) fulfills \(f \ge p.\) Indeed,
the prime implicants are the maximal subcubes we know from the
*K-map minimization method* already.

(1908-2000)

Prime implicants lead us to a fundamental theorem about Boolean functions due to Quine.

Theorem (Complete SOP Normal Form)Every Boolean function can be represented by the sum of all its prime implicants. The sum of all prime implicants of a function is unique up to their order, and is calledcomplete SOPnormal form.

**Proof.** Let \(f: \mathcal{B}^n \rightarrow \mathcal{B}\) be a
Boolean function and \(p_1, p_2, \ldots, p_k\) its prime
implicants. Assume \(f\) is given in SOP form \(f =
\sum_{i=1}^{m} c_i,\) where the \(c_i\) are the conjunctions or
products. In an SOP form, every \(c_i\) is an implicant, because
\(c_i \le f.\) Every implicant \(c_i\) has at least one prime
implicant \(p_j\) that covers \(c_i,\) i.e. \(c_i \le
p_j.\) If \(c_i\) implies \(p_j,\) then \(c_i + p_j =
p_j.\) Since all prime implicants cover all smaller implicants, we
conclude that any SOP form of \(f\) is equivalent to \(f =
\sum_{j=1}^k p_j.\)

The geometric interpretation of the theorem of the complete SOP is
that every Boolean function can be represented by the sum of all of
its maximal subcubes. Every subcube of a function is an implicant,
and each maximal subcube is a prime implicant. Recall that a maximal
subcube may be essential or not. Analogously, we define an
**essential prime implicant** of a Boolean function \(f\) to be a
prime implicant \(p_j\) such that removing \(p_j\) from the
complete SOP of \(f\) produces a sum \(\sum_{i\ne j} p_i < f\)
that does not cover \(f.\) If Boolean function \(f\) is
represented by a sum of a subset of all prime implicants such that
removing any of the prime implicants does not cover \(f\) any
longer then the sum of prime implicants is **irredundant**.

Another theorem due to Quine involves *monotone functions*. We state the theorem without proof.

Theorem (Minimal Complete SOP)The complete SOP of a monotone Boolean function is monotone and irredundant, and unique.

Since the complete SOP of a monotone function is unique, it is equal to the minimal cover. This is a practically relevant insight. If a given function is known to be monotone, we can construct the minimal cover by identifying all prime implicants and forming their disjunction. For example, the majority function \(M(x,y,z)= x y + x z + y z\) is monotone because all literals appear in uncomplemented form. Each of its three implicants is prime, because we cannot remove one of its two literals without changing the function, as is easily verified. Therefore, the expression for \(M\) is the unique minimal complete SOP. In general, if a given SOP expression is not monotone or not known to be monotone, it is nontrivial to deduce whether the SOP is a minimal cover. There may be redundant prime implicants none of which is part of the minimal cover, see Example 5.4, or there are redundant prime implicants some of which are part of the minimal cover and the SOP is not unique, see Exercise 5.2, or in the extreme case all prime implicants are redundant and the SOP is not unique, see Example 5.6.

We briefly mention that all of the properties and results about SOP
forms discussed above have duals. A **sum of literals** is an
**implicate** of function \(f\) if it is implied by \(f.\) An
implicate is prime if it is not implied by any other implicate of
\(f.\) Every Boolean function can be represented by the product
of prime implicates. For example, POS expression \(f(x,y,z) = (x
+ \overline{y}) (\overline{x} + z) (\overline{y} + z)\) consists of
three prime implicates. Prime implicate \(\overline{y} + z\) is
redundant, whereas prime implicates \(x + \overline{y}\) and
\(\overline{x} + z\) are essential, cf. Example 5.8.

### 5.1.4. Algorithmic Minimization¶

In this section, we outline an algorithmic solution to the
*two-level minimization problem*. As a first step, we find all prime
implicants or maximal subcubes of a given Boolean function. If the
function is known to be monotone, we can stop here, because the
*theorem of the minimal complete SOP*
tells us that the disjunction of the prime implicants is the unique
minimal cover. Otherwise, we perform a second step to determine those
prime implicants that are part of a minimal cover.

#### Identifying All Prime Implicants¶

The most widely known algorithm to identify all prime implicants of a
Boolean function is the **consensus method**. It transforms a given
SOP expression of Boolean function \(f\) into the *complete
SOP normal form* of \(f\) by applying the
dual *covering theorem* and the
*consensus theorem* repeatedly.

Let \(g(x_0, x_1, \ldots, x_{n-1})\) be an SOP expression of \(n\)-variable function \(f,\) such that \(g = \sum_{i=1}^m c_i\) is the sum of \(m\) products \(c_i.\) Then, repeat these two steps until neither applies any longer:

- [covering] If there exist two products \(c_i\) and \(c_j\) in \(g\) such that \(c_i\) covers \(c_j,\) i.e. \(c_i \ge c_j,\) then remove implicant \(c_j\) from \(g.\)
- [consensus] If there exist two products \(c_i = x_k \cdot c_i|_{x_k}\) and \(c_j = \overline{x}_k \cdot c_j|_{\overline{x}_k},\) then add consensus term \(c_i|_{x_k}\cdot c_j|_{\overline{x}_k}\) to \(g.\)

When the algorithm terminates, \(g\) is the sum of all prime implicants of \(f.\)

We illustrate the consensus method by means of SOP expression

In the first iteration, the covering theorem does not apply to any pair of products. However, the consensus theorem applies multiple times. The consensus of \(x_1 x_2\) and \(\overline{x}_0 \overline{x}_2 \overline{x}_3\) on \(x_2\) is \(x_1 \overline{x}_0 \overline{x}_3,\) the consensus of \(x_2 x_3\) and \(x_2 \overline{x}_3\) on \(x_3\) is \(x_2 x_2 = x_2.\) We add the consensus terms to \(g\) and obtain

In the second iteration, the covering theorem applies. In particular, \(x_2\) covers three products, \(x_2 \ge x_1 x_2,\) \(x_2 \ge x_2 x_3,\) and \(x_2 \ge x_2 \overline{x}_3.\) We remove the implicants from \(g_1\):

The consensus theorem applies to several pairs of products of \(g_2'.\) The consensus of \(\overline{x}_0 \overline{x}_2 \overline{x}_3\) and \(x_2\) on \(x_2\) is \(\overline{x}_0 \overline{x}_3,\) and the consensus of \(x_0 \overline{x}_1 \overline{x}_2 \overline{x}_3\) and \(x_2\) on \(x_2\) is \(x_0 \overline{x}_1 \overline{x}_3.\) We add both consensus terms to \(g_2'\) and obtain

In the third iteration, the covering theorem applies three times, because \(\overline{x}_0 \overline{x}_3 \ge \overline{x}_0 \overline{x}_2 \overline{x}_3,\) \(\overline{x}_0 \overline{x}_3 \ge \overline{x}_0 x_1 \overline{x}_3,\) and \(x_0 \overline{x}_1 \overline{x}_3 \ge x_0 \overline{x}_1 \overline{x}_2 \overline{x}_3.\) We remove the implicants from \(g_2\):

Now, the consensus theorem applies to \(\overline{x}_0 \overline{x}_3\) and \(x_0 \overline{x}_1 \overline{x}_3\) on \(x_0.\) We add consensus term \(\overline{x}_1 \overline{x}_3\) to \(g_3',\) and obtain

In the fourth iteration, the covering theorem applies, because \(\overline{x}_1 \overline{x}_3 \ge x_0 \overline{x}_1 \overline{x}_3.\) We remove the latter term and obtain

Since the consensus theorem does not apply to any pair of implicants, the consensus method terminates. The complete SOP of function \(f\) is expression \(g_4',\) cf. Example 5.7. A decisive advantage of the consensus method is that we do not need to know all minterms of a function to generate all of its prime implicants.

#### Extracting a Minimal Cover¶

Given the complete SOP of Boolean function \(f(x_0, x_1, \ldots, x_{n-1}) = \sum_{i=1}^k p_i\) with prime implicants \(p_i,\) we want to choose those prime implicants that form a minimal cover of \(f.\)

We introduce indicator variables \(s_i \in \mathcal{B},\) where \(1 \le i \le k,\) such that \(s_i = 1\) if we select prime implicant \(p_i\) to be in a minimal cover, and \(s_i = 0\) otherwise. Then, we form the SOP of the products \(s_i\,p_i\):

If \(s_i = 1\) for all \(i \in [1, k],\) then \(C_S\) is
the complete SOP of \(f.\) On the other hand, if we choose
\(s_i = 0,\) then \(s_i\,p_i = 0\) by *annihilation*, which effectively removes prime implicant
\(p_i\) from sum \(C_S.\) In the following, we interpret
\(S = (s_1, s_2, \ldots, s_k)\) as a Boolean vector \(S \in
\mathcal{B}^k.\) Since \(f\) covers \(C_S\) for every
\(S,\) we have

Given that \(C_S \le f,\) we can characterize those subsets of prime implicants \(S\) for which \(C_S = f\) by requiring also that \(C_S \ge f,\) because \(C_S \le f\) and \(C_S \ge f\) is equivalent to \(C_S = f.\) Thus, a minimal cover is a selection \(S\) such that \(C_S\) covers \(f\) and the cost of the corresponding sum of prime implicants is minimized. The key observation is that we can turn constraint \(C_S \ge f\) into a system of linear inequalities, which we can solve with existing methods.

Recall that a Boolean function \(f(x_0, x_1, \ldots, x_{n-1})\) is
the sum of its *minterms*. Let \(J\) be the set of
minterms such that \(f = \sum_{j\in J} j,\) where \(j \in J\)
is the *decimal encoding* of the minterm. Then,
the *restriction* of \(f\) in minterm \(j\)
is \(f|_j = 1\) for all \(j \in J\) and \(f|_j = 0\) for
\(j \not\in J.\) Using Iverson brackets, we denote whether
prime implicant \(p_i\) of \(f\) covers minterm \(j\) or
not:

With this notation, we split the Boolean inequality \(C_S \ge f\) into a system of linear inequalities, one per minterm:

The inequality for minterm \(j\) is the restriction of constraint
\(C_S \ge f\) in \(j.\) We can interpret the disjunctions on
the *lhs* as arithmetic additions, and the system of Boolean
inequalities turns into a system of pseudo-Boolean linear
inequalities. This twist yields an integer linear program known as
the **set covering problem**:

Minimize \(\qquad \mathcal{C}(C_S) = \sum_{i=1}^k s_i (1 + \mathcal{C}(p_i)),\)

subject to \(\quad\ \,\sum_{i=1}^k s_i [p_i \ge j] \ge 1\) for all minterms \(j\) of \(f,\) and \(s_i \in \mathcal{B}.\)

Assuming that cost \(\mathcal{C}(p_i)\) is the number of literals
of prime implicant \(p_i,\) then \(\mathcal{C}(C_S)\) is the
number of the selected prime implicants plus the total number of
literals, which coincides with our *cost function*. The solution to the set covering problem is a
selection \(S\) of prime implicants whose disjunction is a minimal
cover of \(f.\) The simplest method for solving the set covering
problem is a search of the corresponding *binary decision tree*, that we know from our discussion of the
*ite-function* already. Since the problem size
grows exponentially in the number of prime implicants and minterms,
more efficient algorithms have been developed.[3]

We illustrate the solution of the set covering problem by minimizing
the Boolean function of Example 5.6
algorithmically. The *consensus method*
yields the complete SOP

Call the prime implicants \(p_1 = \overline{x}\,\overline{y},\) \(p_2 = x\,y,\) \(p_3 = \overline{y}\,z,\) \(p_4 = y\,\overline{z},\) \(p_5 = \overline{x}\,\overline{z},\) and \(p_6 = x\,z.\) We introduce Boolean vector \(S = (s_1, s_2, \ldots, s_6)\) of indicator variables for each prime implicant, and obtain the extended SOP

For each prime implicant, we may use the *combining theorem* or *Shannon’s expansion theorem*, to deduce the minterms it covers. For example,
prime implicant \(p_1 = \overline{x}\,\overline{y} =
\overline{x}\,\overline{y}\,\overline{z} +
\overline{x}\,\overline{y}\,z = \sum (0, 1).\) Therefore \(p_1\)
covers minterm 0, \(p_1 \ge 0,\) and minterm 1, \(p_1 \ge 1.\)
Analogously, we find the minterms covered by the other prime
implicants:

prime implicant covered minterms \(p_1\) 0, 1 \(p_2\) 6, 7 \(p_3\) 1, 5 \(p_4\) 2, 6 \(p_5\) 0, 2 \(p_6\) 5, 7

For each of the covered minterms, we formulate an inequality associated with the restriction of constraint \(C_S \ge f.\) For instance, since minterm 0 is covered by prime implicants \(p_1\) and \(p_5,\) we obtain the restriction

Furthermore, we find that restriction \(f|_0 = f|_{\overline{x}\,\overline{y}\,\overline{z}} = 1,\) and constraint \(C_S \ge f\) turns for minterm 0 into inequality \(s_1 + s_5 \ge 1.\) Analogously, we derive the system of linear inequalities for all minterms \(j \in J = \{ 0, 1, 2, 5, 6, 7\}\):

Call this system of linear inequalities \(\Psi.\) We construct a
binary decision tree to find all selection vectors \(S\) that
solve \(\Psi.\) To facilitate a concise notation, we introduce
the \(\models\) symbol, pronounced *entails*. For example,
formula \(\Psi|_{s_1 s_5} \models (0, 1, 2)\) expresses that the
restriction of \(\Psi\) in \(s_1\) and \(s_5\) entails
inequalities 0, 1, and 2, i.e. the inequalities associated with
minterms 0, 1, and 2 are satisfied. The inequality of minterm 0
yields \(s_1 + s_5 = 2 \ge 1,\) for minterm 1 inequality
\(s_1 + s_3 \ge 1\) holds if \(s_1 = 1\) independent of
\(s_3,\) and for minterm 2 inequality \(s_4 + s_5 \ge 1\)
holds if \(s_5 = 1\) independent of \(s_4.\) Rather than
using arithmetic additions, we my also interpret the \(+\)-sign as
Boolean disjunction, and evaluate the inequalities using Boolean
operations. Furthermore, formula \(\Psi|_{\overline{s}_1
\overline{s}_5} \not\models (0)\) expresses that the restriction of
\(\Psi\) in \(\overline{s}_1\) and \(\overline{s}_5\) does
not entail inequality 0, i.e. the inequality associated with minterm 0
is not satisfied, because \(s_1 + s_5 = 0 + 0 \not\ge 1.\)
Figure 5.5 shows the binary decision tree for
\(\Psi.\) The root of the tree is shown on the left and the
leaves on the right. Starting with the expansion around \(s_1\)
at the root, restriction \(\Psi|_{\overline{s}_1}\) sets
\(s_1 = 0,\) which does not permit any decisions yet. In
particular, setting \(s_1 = 0\) does not prevent any of the
inequalities from being satisfied nor does it suffice to satisfy any
of the inequalities. In contrast, \(\Psi|_{s_1}\) entails
inequalities 0 and 1 regardless of all other selections.

The binary decision tree has 18 leaves of selections \(S\) that cover all minterms \(J\) of \(f.\) To find a minimal cover, we compute the cost of these selections, and pick the selections of minimum cost. There are two selections with minimum cost, framed in Figure 5.5, selection \(S_1 = (0, 1, 1, 0, 1, 0)\) of prime implicants \(p_2,\) \(p_3,\) and \(p_5,\) and selection \(S_2 = (1, 0, 0, 1, 0, 1)\) of prime implicants \(p_1,\) \(p_4,\) and \(p_6.\) Since the cost of each of the six prime implicants is \(\mathcal{C}(p_i) = 2,\) the cost of both covers is \(\mathcal{C}(C_{S_1}) = \mathcal{C}(C_{S_2}) = 3 (1 + 2) = 9.\) The corresponding minimal covers are \(C_{S_1} = x\,y + \overline{y}\,z + \overline{x}\,\overline{z}\) and \(C_{S_2} = \overline{x}\,\overline{y} + y\,\overline{z} + x\,z,\) as we found with the K-map minimization method in Example 5.6 already.

## 5.2. Technology Mapping¶

A particular realization of a digital circuit in hardware depends on
the choice of the implementation technology, such as CMOS circuits.
Besides custom designed CMOS circuits, industry offers a multitude of
circuit technologies for various target applications with different
price-performance points, including TTL, PLAs, ASICs,
and FPGAs. The
design of a combinational circuit from specification to *logic
minimization* is largely technology independent.
The step from a paper-and-pencil design to a real circuit includes
solving the **technology mapping problem**:

Given a Boolean expression \(f\) and a set \(C\) of digital circuit elements, we wish to generate a circuit that implements \(f\) with the circuit elements of \(C.\)

Boolean expression \(f\) may be given in any form, including in
minimized SOP or POS form. If we wish to implement any Boolean
function, then \(C\) should contain at least a *universal
set* of Boolean operators.

### 5.2.1. Universal Set Transformations¶

A Boolean expression in SOP form, whether as normal form or minimized, reflects the way humans tend to think about logical operations in a natural fashion. Thus, when we design a digital circuit with paper and pencil, we commonly derive a minimized SOP form, which serves as input of the technology mapping problem. If we want to implement the corresponding two-level circuit with complemented or uncomplement inputs as a CMOS circuit with elements \(C =\) {INV, NAND, NOR}, the technology mapping problem corresponds to a transformation from universal set \(U =\) {NOT, AND, OR} to universal set \(C =\) {INV, NAND, NOR}.

One possible strategy to solve the technology mapping problem is to
first perform the universal set transformation for each operation in
\(U,\) independent of the Boolean expression \(f\) we wish to
implement. Then, we replace the logic gates of the two-level circuit
for \(f\) with the equivalent operators derived by the
transformation. For example, we devise the universal set
transformation from \(U =\) {NOT, AND, NOR} to CMOS gates \(C
=\) {INV, NAND, NOR} by expressing each operation in \(U\) with the
CMOS gates. Formulated in terms of Boolean algebra, we obtain the
universal set transformation with the operations of \(U\) on the
*lhs* and the CMOS functions on the *rhs*:

Now, consider implementing the 2-input XOR function given by minimal SOP form

We apply the universal set transformation graphically to the corresponding two-level circuit, see Figure 5.6. To that end, we replace each AND gate with a NAND gate followed by an inverter, and the OR gate with a NOR gate followed by an inverter. The CMOS inverter implements the NOT operator. The circuit diagram shows the two-level SOP form on the left and the CMOS circuit after technology mapping on the right.

For many technology mapping problems, we can obtain a smaller circuit if we do not apply the universal set transformation systematically to each gate of the two-level circuit, but by transforming the original Boolean expression into an equivalent expression using the operators of \(C.\) We illustrate this alternative strategy using the XOR function as example. The key insight is to notice that we can transform the 2-input XOR function into an expression using NAND gates only:

This algebraic transformation generalizes to arbitrary SOP expressions.

NAND transform:Given a Boolean expression in SOP form, the SOP transforms into a NAND-NAND expression of literals by applying the involution theorem and then De Morgan’s theorem.

The same algebraic transformations applied to a POS form yield an expression using NOR gates only. Starting with the POS of the 2-input XOR function, we obtain:

It is easy to verify that this algebraic transformation generalizes to arbitrary POS expressions.

NOR transform:Given a Boolean expression in POS form, the POS transforms into a NOR-NOR expression of literals by applying the involution theorem and then De Morgan’s theorem.

The NAND and NOR transforms produce two-level circuits that are well suited for CMOS technology. As Figure 5.7 shows, the resulting CMOS circuits have a 3-stage critical path, including the input inverter, compared to the 5-stage critical path in Figure 5.6 on the right.

In case your target technology provides only NAND or only NOR gates,
the NAND and NOR transforms are still useful. We just need to
represent the input inverters with NAND or NOR gates. The
*idempotence theorem* and its dual suggest
the transformations

We conclude, that we can transform the two-level SOP and POS forms into circuits with universal singleton sets {NAND} and {NOR} of operators. Figure 5.8 shows the 2-input XOR function implemented with NAND gates only and with NOR gates only.

The NAND and NOR transforms are useful for the *synthesis* of
two-level logic, i.e. the transformation from universal set \(U
=\) {NOT, AND, OR} to one of the universal sets {INV, NAND, NOR}, {INV,
NAND}, {INV, NOR}, {NAND}, or {NOR} for implementation with CMOS
gates. On the other hand, the *analysis* of a CMOS circuit, such as
shown in Figure 5.7 or Figure 5.8, is more
error-prone for human readers than analyzing a two-level circuit with
gates from universal set \(U.\) We can simplify the analysis of a
CMOS circuit by first applying the NAND or NOR transforms in the
reverse direction to derive a circuit with operators from universal
set \(U\) before extracting the Boolean expression of the circuit.

### 5.2.2. Bubble Pushing¶

In this section, we discuss a graphical method for the design and
technology mapping of CMOS circuits. Recall that the *bubble* of the inverter symbol can be interpreted as graphical
represention of the logical negation. Without the bubble, the
inverter symbol turns into a *buffer* symbol,
i.e. a logic gate for the identity function. From the functional
perspective, it does not matter whether we draw the inverter bubble at
the input or the output of the buffer symbol. Both symbols represent
an inverter with function \(Y = \overline{A}\):

We take this idea one step further, and treat bubbles as independent
negation symbols that can move freely along the wires of a circuit.
More interestingly, we define graphical operations on bubbles that
enable us to manipulate a combinational circuit. As a first such
operation, consider two adjacent bubbles on a wire. Such a circuit
represents the series composition of two negations. According to the
*involution theorem*, the composition of two
negations is equal to the identity function. Therefore, we can always
place two bubbles on a wire or remove two bubbles from a wire without
changing the logical function of the circuit.

Removing the two inverter bubbles in Figure 5.9 produces a logically equivalent circuit of two buffers in series. If our goal is circuit simplification, we may remove the two buffers as well, such that \(Y = A.\)

Besides adding or removing pairs of adjacent bubbles, we may push
bubbles across the gates of a circuit. The graphical versions of
*De Morgan’s theorem* and its dual

enable us to push bubbles from the inputs to the output of a gate or vice versa. De Morgan’s theorem states that a NAND gate is equivalent to an OR gate with complemented inputs. From the graphical perspective, see Figure 5.10, we interpret De Morgan’s theorem such that an AND gate with an output bubble is equivalent to an OR gate with a bubble on each input. Since both symbols represent a 2-input NAND gate, we can exchange them freely in a circuit diagram. We may also interpret the transformation from the AND gate to the OR gate representation as pushing the output bubble toward the inputs of the AND gate. The bubble-push operation removes the output bubble, replaces the AND with an OR gate, and places one bubble on each input. The reverse transformation pushes the input bubbles to the output, while exchanging the OR gate with an AND gate.

The dual of De Morgan’s theorem has an analogous graphical interpretation. A NOR gate, i.e. an OR gate with an output bubble, is equivalent to an AND gate with bubbles on its inputs. Transforming one representation into the other can be viewed as a graphical bubble-push operation. To transform the OR gate representation into an AND gate, we push the output bubble toward the inputs. More precisely, we remove the output bubble, replace the OR gate with an AND gate, and place one bubble on each input. The reverse transformation pushes the input bubbles to the output, while exchanging the AND gate with an OR gate.

It is easy to remember both forms of De Morgan’s theorem as a single
bubble-push rule: *when pushing bubbles from the input to the output or
vice versa, replace an AND gate with an OR gate or vice versa.*

#### Two-Level Circuits¶

Bubble pushing enables us to analyze or solve the technology mapping problem for two-level CMOS circuits. We demonstrate the graphical version of the NAND transform without using Boolean algebra.

As a concrete example, consider the NAND-NAND circuit of Figure 5.7, shown again in Figure 5.12 on the left with the input inverters drawn as input bubbles of the stage-1 NAND gates. The bubbles obscure the logic function \(Y(A,B)\) of this circuit in the eyes of most human readers. We use bubble pushing to transform this two-level circuit into a human friendly SOP form. First, we push the bubble at output \(Y\) across the AND gate to obtain the equivalent circuit in the middle of Figure 5.12. The resulting circuit has pairs of adjacent bubbles on the internal wires. Removing the pairs of bubbles produces the two-level circuit on the right of Figure 5.12. This circuit corresponds directly to the SOP form, which is now trivial to extract from the circuit diagram: \(Y = A\,\overline{B} + \overline{A}\,B.\)

We can reverse the order of the bubble push operations, from right to
left in Figure 5.12, to solve the technology mapping
problem of the SOP form into universal set {INV, NAND}. First, insert
pairs of bubbles on the internal wires, then push the input bubbles of
the OR gate toward the output. If you do not appreciate the tactical
move to first insert bubble pairs on the internal wires, you can also
add a pair of bubbles on output \(Y,\) and push one bubble toward
the inputs leaving one bubble behind. Figure 5.13
illustrates this alternative *NAND transform* by
bubble pushing. On the left, we add a pair of bubbles to output
\(Y\) of the circuit in SOP form. Then, we push one of the two
bubbles toward the inputs of the OR gate, which yields the circuit in
the middle. The second step simply moves the bubbles along the
internal wires from the inputs of the stage-2 NAND gate to the outputs
of the stage-1 AND gates.

The *NOR transform* by bubble pushing proceeds
analogously, provided we start with a two-level circuit that
corresponds to the POS form of a Boolean function.

#### CMOS Gates¶

The bubble-push method is not only useful for transforming gate-level circuits, but also for the design of the pull-up and pull-down networks of CMOS gates. Given a gate-level circuit, we can derive its dual by bubble pushing before deriving the pull-up and pull-down networks of transistors.

As an example, consider the compound gate derived in Figure 4.14 for Boolean expression \(Y = \overline{(A + BC) \cdot D}.\) Figure 5.14 reproduces the gate-level representation of the CMOS circuit.

Our goal is to derive the pull-up and pull-down networks for the CMOS
compound gate graphically. Recall from our discussion of the
*principle of duality* that the pull-up network
computes output \(Y\) and the pull-down network the complement
\(\overline{Y}.\) Furthermore, we plan to derive both networks as
dual series-parallel networks. Since the series composition of a
switch network represents a conjunction and the parallel composition a
disjunction, both gate-level circuits may comprise the corresponding
AND and OR gates only. If we control the pull-down network with
uncomplemented inputs, then the dual pull-up network must be
controlled with the complemented inputs.

As a first step of the transformation of the compound gate in Figure 5.14 into a CMOS circuit, we derive the gate-level representation of the pull-up network. Since the pull-up network produces output \(Y\) in uncomplemented form, we push the output bubble in Figure 5.14 toward the inputs, until all bubbles have propagated to the inputs. Figure 5.15 shows the transformations in three steps. The resulting circuit has the desired form of AND and OR gates with complemented inputs.

The pull-down network uses the uncomplemented inputs and computes the complemented output \(\overline{Y}.\) As shown in Figure 5.16, we simply remove the output bubble from the gate-level circuit in Figure 5.14 to obtain the desired form.

The second step of the transformation translates each AND gate into a series composition of transistors, and each OR gate into a parallel composition of transistors. The pull-down network uses the uncomplemented inputs as gate signals for nMOS transistors, and the pull-up network the complemented inputs as gate signals for pMOS transistors. Figure 5.16 shows the resulting CMOS gate, which coincides with the design in Figure 4.14. Note that no Boolean algebra is required for the transformation process. In fact, it is straightforward to generalize this example, and formulate a graphical transformation algorithm from gate-level to transistor schematics.

#### Multilevel Circuits¶

Bubble pushing applies to combinational circuits with more than two
levels, as the example of the three-level compound gate in
Figure 5.14 demonstrates. Combinational
circuits with more than two levels of gates are *multilevel
circuits*. In general, multilevel circuits do not
have the regular structure of an SOP or POS form. For example, the
multilevel circuit in Figure 5.17 has a non-trivial
topology due to its reconverging branches, and computes the sum of a
*full-adder*. We demonstrate the bubble-push method
to solve the technology mapping problem of this circuit into universal
set {INV, NAND, NOR} for a CMOS implementation.

Figure 5.18 illustrates the solution of the technology mapping problem by bubble pushing. The multilevel circuit offers many opportunities to place two bubbles on a wire and push the bubbles with the goal to turn AND and OR gates into NAND and NOR gates. In Figure 5.18(a) we choose to insert two bubbles at the output of the circuit. Our plan is to push bubbles systematically toward the inputs, rather than the other way around or inside out. Since De Morgan’s theorem turns an AND gate with an output bubble into an OR gate with input bubbles, we push one of two output bubbles across the AND gate, and leave one bubble behind to obtain a NOR gate. Figure 5.18(b) show the circuit after pushing one bubble from the output of the stage-4 AND gate toward its inputs. As a result, we have a NOR gate with one bubble on the upper and a pair of bubbles on the lower input. We might be tempted to remove the bubble pair. However, we can use two bubbles to repeat the first bubble-push move and turn the stage-3 AND gate into a NOR gate. Figure 5.18(c) shows the result. We have also shifted the bubble from the upper input of the stage-4 NOR gate to the output of the driving OR gate, forming a NOR gate. Since we cannot push the input bubbles of the stage-3 NOR gate across wire joins, we choose to insert another pair of bubbles on the output of the stage-2 AND gate, and repeat the bubble-push steps as shown in Figure 5.18(d), Figure 5.18(e), and Figure 5.18(f). The resulting circuit requires four inverters and six NOR gates. No NAND gates are required.

We study the problem of pushing bubbles across wire joins as inspiration for an alternative technology mapping. Figure 5.19 shows an inverter driving two loads \(X = \overline{A}\) and \(Y = \overline{A}.\) If we wish to push the inverter across the wire join, we remove the inverter on the driver side and insert two inverters on each wire of the other side. This preserves the logical functions \(X = \overline{A}\) and \(Y = \overline{A}.\) In terms of a bubble push operation, we can view the wire join like a gate. Remove the bubble on the input and place one bubble on each output. If we want to push bubbles toward the input, we need one bubble on each output, to remove the output bubbles and insert a bubble on the input. In Figure 5.18(c), only one of two outputs has a bubble, the input bubble of the stage-3 NOR gate. Without a matching bubble on the other output, we cannot push this bubble across the wire join toward the inputs.

After inserting the pair of bubbles in Figure 5.18(d), we may push one of the bubbles across the wire join toward the outputs. This bubble-push move is shown in Figure 5.20(e). After removing the bubble pair from the input of the stage-3 NOR gate, and shifting the remaining bubble to the output of the stage-2 AND gate, we obtain a NAND gate in stage 2. We add another pair of bubbles, see Figure 5.20(f), and perform one more bubble-push move to turn all stage-1 and stage-2 gates into NAND gates. The resulting circuit in Figure 5.20(g) requires four inverters, three NAND gates, and three NOR gates. Since the 2-input NAND gate has a smaller logical effort than the 2-input NOR gate, the circuit in Figure 5.20 appears to be a superior alternative to the circuit in Figure 5.18.

Applying the bubble-push method to multilevel circuits can turn into an intellectual challenge, because the number of possible moves as well as the number of solutions to the technology mapping problem grows quickly. Undoubtedly, however, bubble pushing on a whiteboard or blackboard can be more fun than the equivalent method of rewriting expressions by Boolean algebra.

Consider the multilevel circuit:

Transform the circuit by bubble pushing such that it has

- AND and OR gates, and inverters on inputs only,
- NAND and NOR gates, and as few inverters as possible.

Our strategy is to push the output bubble towards the inputs until no bubbles are left, except on the bottom input, where we expand the bubble into an inverter.

We insert bubble pairs at outputs of AND and OR gates, and push excess bubbles towards the inputs. We terminate bubble pushing, when all AND and OR gates are transformed into NAND and NOR gates, and inserting additional bubble pairs would increase the total number of bubbles.

After step (a) all gates are NAND or NOR gates already, see (b). The circuit in (c) permits bubble pushing at the stage-1 NAND gate. The NAND gate would turn into a NOR gate, and introduce two bubbles at its inputs. However, this would increase the total bubble count by one. Therefore, we keep the bubble at the output of the NAND gate. The other opportunity for bubble pushing is at the output of the stage-1 NOR gate. There is a bubble on the upper branch of the wire join, but no obvious bubble push transformation that could utilize this bubble and reduce the total number of bubbles. Therefore, we keep this bubble as well. The circuit in (d) expands the two remaining bubbles into inverters.

### 5.2.3. Multiplexer Logic¶

We can implement any Boolean function with *multiplexers*, because multiplexers implement the *ite*-function and the *ite*-function is *universal*. In the following, we solve the technology
mapping problem to universal singleton set \(C =\) {*ite*} by
showing how to implement Boolean functions with multiplexers.

Recall the 4:1 multiplexer with \(n=2\) select inputs \(S=S_1\,S_0\) and \(2^n=4\) data inputs, \(D_0,\) \(D_1,\) \(D_2,\) and \(D_3.\) The multiplexer steers data input \(D_k\) to output \(Y\) if \(S = k,\) interpreting \(S\) as an \(n\)-bit binary number. We may express output \(Y\) as a function of inputs \(S\) and \(D,\) such that

This form resembles the Shannon expansion of Boolean function \(f:\mathcal{B}^2 \rightarrow \mathcal{B}\) in two variables:

where \(f_0 = f(\overline{x}_1, \overline{x}_0),\) \(f_1 =
f(\overline{x}_1, x_0),\) \(f_2 = f(x_1, \overline{x}_0),\) and
\(f_3 = f(x_1, x_0)\) are the function values. We observe that we
can implement an \(n\)-variable function with a \(2^n{:}1\)
multiplexer by connecting the input signals of the \(n\) variables
to the \(n\) select inputs, and by driving data input \(D_k\)
with function value \(f_k \in \{0, 1\}.\) Figure 5.21
illustrates the multiplexer implementations of a 2-variable NAND and
the 3-variable *majority function*. Each
data input of the multiplexer represents one row of the corresponding
truth table. This observation confirms that multiplexers are
universal circuit elements.

We can reduce the size of the multiplexer to implement a Boolean function, by using an input signal to drive one or more data inputs rather than a select input. The key insight underlying this optimization is a logical equivalence that we can spot in the truth table of the function. For example, in the truth table of the NAND function in Figure 5.21, observe (1) that \(Y=1\) in the two top rows where \(A=0,\) and (2) in the two bottom rows where \(A=1\) we find \(Y=\overline{B}.\) This suggests that a 2:1 multiplexer suffices, at the expense of one inverter to complement input \(B,\) see Figure 5.22 on the left. An analogous optimization applies if we use variable \(B\) to drive the select input. According to the truth table, in both rows where \(B=0\) we have \(Y=1,\) and if \(B=1\) then \(Y=\overline{A}.\) Figure 5.22 shows the corresponding 2:1 multiplexer implementation on the right.

In general, we can implement an \(n\)-variable function with a \(2^{n-1}{:}1\) multiplexer by connecting all but one input signal to the \(n-1\) select inputs and use the remaining input signal in complemented or uncomplemented form to drive a subset of the data inputs. For example, consider the truth table of the majority function in Figure 5.21. Assume that we connect inputs \(A\) and \(B\) to the select inputs of a 4:1 multiplexer. Then, the four data inputs correspond to the four combinations \(AB = 00, 01, 10, 11.\) Each of these combinations covers two rows in the truth table, with variable \(C\) changing. We find \(Y=0\) for \(AB=00,\) \(Y=C\) for \(AB=01\) and \(AB=10,\) and \(Y=1\) for \(AB=11.\) Figure 5.23 shows the 4:1 multiplexer implementation on the left. The other two implementations can be deduced by analogous logical arguments from the truth table.

The optimized multiplexer implementations motivate refinements to
our methods for representing Boolean functions. We may view the size
reduction as an alternative style of *truth table compaction*, where output column \(Y\) may contain a
complemented or uncomplemented variable in addition to constants 0 or
1. For example, the multiplexer implementations on the left of
Figure 5.22 and Figure 5.23 are represented by the
compacted truth tables in Figure 5.24.

If we translate the compact truth table representation of a Boolean function into a K-map, then the K-map contains variables in the corresponding cells. Figure 5.25(a) shows the K-map with cell-entered variables for the majority function.

A cell marked with a variable represents a product term of the variable and the corresponding combination of coordinate variables. For example, the top-right cell in Figure 5.25(a) represents product term \(A\,\overline{B}\,C\) and the bottom-left cell product term \(\overline{A}\,B\,C.\) K-maps can have cells with different variables. A complemented variable can be treated as a new variable by renaming. We can extract a cover, although not necessarily a minimal cover, from a K-map with cell-entered variables. The procedure extracts maximal subcubes for each variable: First, replace all variables with 0’s, and extract the prime implicants. Then, for each variable \(x,\) replace all 1’s with don’t care X’s, replace \(x\)‘s with 1’s, replace all other variables with 0’s, extract the prime implicants, and for each prime implicant form the conjunction with variable \(x.\) The Boolean function of the K-map is the sum of the prime implicants. In case of the majority function, the first step is shown in Figure 5.25(b). The resulting prime implicant is \(A\,B.\) For variable \(C,\) we extract two prime implicants from the modified K-map in Figure 5.25(c), \(A\) and \(B,\) both of which we AND with cell-entered variable \(C.\) Thus, we obtain \(Y(A,B,C) = A\,B + A\,C + B\,C,\) which happens to be the minimal cover of the majority function.

Analyze the multiplexer CMOS circuit:

- Perform a switch analysis of the CMOS circuit.
- Derive a gate-level schematic for the CMOS circuit.
- Express the logic function in form of a Boolean equation with a case distinction.

Figure 4.35 contains an interactive switch model of the CMOS circuit. You can use it to derive the truth table of the circuit:

\(S\) \(D_1\) \(D_0\) \(Y\) 0 0 0 1 0 0 1 0 0 1 0 1 0 1 1 0 1 0 0 1 1 0 1 1 1 1 0 0 1 1 1 0 We employ a K-map to derive the minimal cover of \(Y(S, D_1, D_0).\)

We find that \(Y = \overline{S}\,\overline{D}_0 + S\,\overline{D}_1.\) The corresponding two-level circuit is shown above.

We can extract a case distinction from the K-map or the gate-level circuit:

\[\begin{split}Y = \begin{cases} \overline{D}_0\,, & \text{if}\ S = 0\,, \\ \overline{D}_1\,, & \text{if}\ S = 1\,. \end{cases}\end{split}\]This case distinction emphasizes that the CMOS circuit implements a 2:1 multiplexer that outputs the complemented data input, also called

*inverting 2:1 multiplexer*.

Implement XNOR function \(f_9 = \overline{A \oplus B}\) and implication \(f_{11} = A \Rightarrow B\) with multiplexers.

\(A\) | \(B\) | \(f_9\) | \(f_{11}\) |
---|---|---|---|

0 | 0 | 1 | 1 |

0 | 1 | 0 | 1 |

1 | 0 | 0 | 0 |

1 | 1 | 1 | 1 |

- Use one 4:1 multiplexer.
- Use one 2:1 multiplexer.
*Note:*data inputs may be complemented. - Verify (b) by perfect induction.

A 4:1 multiplexer enables us to implement 2-variable functions \(f_9(A,B)\) and \(f_{11}(A,B)\) by connecting inputs \(A\) and \(B\) to the select inputs and apply the \(2^2 = 4\) function values as constant inputs to the data inputs, as specified in the truth table.

We find the optimized 2:1 multiplexer implementations by inspecting the truth table specifications. First, consider \(f_9.\) We notice that \(f_9 = B\) in the bottom two rows where \(A=1,\) and \(f_9 = \overline{B}\) in the top two rows where \(A=0.\) We can cast this observation into Boolean equation:

\[\begin{split}Y_9(A,B) = \begin{cases} \overline{B}\,, & \text{if}\ A = 0\,, \\ B\,, & \text{if}\ A = 1\,. \end{cases}\end{split}\]This case distinction matches the 2:1 multiplexer if select input \(S = A,\) data input \(D_0 = \overline{B},\) and data input \(D_1 = B.\) The associated schematic is shown below. Alternatively, we can drive the select input of the 2:1 multiplexer with input signal \(B.\) Then, the truth table tells us for \(B=0\) that \(f_9 = \overline{A}\) and for \(B=1\) that \(f_9 = A.\) We implement this circuit by connecting \(\overline{A}\) to \(D_0\) and \(A\) to \(D_1.\)

Second, we derive a 2:1 multiplexer implementation for \(f_{11}.\) We observe in the top two rows of the truth table, where \(A=0,\) that \(f_{11} = 1,\) and in the bottom two rows for \(A=1\) that \(f_{11} = B.\) The associated case distinction is:

\[\begin{split}Y_{11}(A,B) = \begin{cases} 1\,, & \text{if}\ A = 0\,, \\ B\,, & \text{if}\ A = 1\,. \end{cases}\end{split}\]We implement this expression by connecting \(A\) to the select input, set data input \(D_0 = 1\) by tying it to \(V_{DD},\) and assign \(D_1 = B\) by connecting input \(B\) to \(D_1.\) The 2:1 multiplexer circuit is shown below. The alternative 2:1 multiplexer implementation connects \(B\) to the select input, \(\overline{A}\) to \(D_0,\) and ties \(D_1\) to \(V_{DD}.\)

We verify our 2:1 multiplexer designs of (b) using perfect induction. To that end, we build up the truth table for each input combination of the 2:1 multiplexer circuit, and compare the result with the given specification. We begin with XNOR function \(f_9.\) The resulting truth table is:

\(A\) \(B\) \(Y_9\) \(f_9\) 0 0 1 1 0 1 0 0 1 0 0 0 1 1 1 1 We derive \(Y_9\) in each row of the truth table from the circuit schematic in subproblem (b). For \(A=0\) and \(B=0,\) the 2:1 mux steers data input \(D_0 = \overline{B} = 1\) to output \(Y_9,\) such that \(Y_9 = 1,\) as shown in the first row of the truth table. In the second row, \(Y_9 = 0,\) because \(A=0\) steers \(D_0 = \overline{B} = 0\) to output \(Y_9.\) In the third row, we have \(A=1,\) and the 2:1 mux steers data input \(D_1 = B = 0\) to output \(Y_9,\) so that \(Y_9 = 0,\) and in the fourth row \(A=1\) steers \(D_1 = B = 1\) to the output, so that \(Y_9 = 1.\) We find that the columns of mux output \(Y_9\) and specification \(f_9\) are equal, and conclude that \(Y_9 = f_9.\) This verifies that our multiplexer circuit implements XNOR function \(f_9.\)

The perfect induction for the implication is analogous. We create the truth table below by arguing about output \(Y_{11}\) of the 2:1 multiplexer circuit in (b) for each input combination.

\(A\) \(B\) \(Y_{11}\) \(f_{11}\) 0 0 1 1 0 1 1 1 1 0 0 0 1 1 1 1 When input \(A=0,\) the 2:1 mux steers data input \(D_0 = 1\) to output \(Y_{11}.\) Thus, \(Y_{11} = 1\) independent of \(B,\) i.e. in both the first and second row of the truth table. If \(A=1\) and \(B=0,\) the 2:1 mux steers data input \(D_1 = B = 0\) to the output, so that \(Y_{11} = 0\) in the third row. In the fourth row, the 2:1 mux steers data input \(D_1 = B = 1\) to the output and \(Y_{11} = 1.\) We conclude that \(Y_{11} = f_{11},\) because the corresponding columns are equal.

## 5.3. Multilevel Logic¶

When designing a combinational circuit for a given Boolean function,
the *minimal two-level form*
rarely represents the best circuit. Often, circuits with
*multiple stages* exist that minimize the
*cost function* and may even have smaller delays
than their equivalent two-level circuits. We call a **multilevel
circuit** a combinational circuit with more than two levels of logic
gates on the critical path. For example, consider the *symmetric
functions*:

The costs of these minimal two-level forms are \(\mathcal{C}(S_{1,2}) = 24\) and \(\mathcal{C}(S_{3,4,5}) = 40.\) The equivalent multilevel circuits shown in Figure 5.26 have significantly smaller costs \(\mathcal{C}(S_{1,2}) = 12\) and \(\mathcal{C}(S_{3,4,5}) = 18.\)

Finding minimal multilevel circuits like those in Figure 5.26 remains black magic, primarily due to the computational complexity of the minimization problem. In this section, we discuss the design of multilevel combinational circuits. In particular, we consider tree-structured circuits and then algebraically factored circuits.

### 5.3.1. Tree-Structured Circuits¶

A circuit with tree structure has multiple inputs, one output, and no branches. Each gate output of the circuit drives exactly one gate input. Combinational circuits that implement SOP and POS forms of Boolean functions are trees, except for the inputs which may branch to drive multiple inputs of the circuit. The multilevel circuit on the left of Figure 5.26 has tree structure, whereas the circuit on the right does not because it contains branches.

In its purest form, a tree-structured combinational circuit is a
*complete k-ary tree* of \(k\)-input gates, with \(n = k^l\)
inputs to the gates at the leaves of the tree, where \(l\) is the
number of levels or height of the tree, and the output of the circuit
is the output of the gate at the root of the tree. The tree has
\(\sum_{i=0}^{l-1} k^i = (k^l-1)/(k-1)\) gates. For example, the
binary AND-tree below with \(k=2\) inputs per gate and \(l=3\)
levels has \(n = 8\) inputs and \(7\) gates.

If we have a supply of 2-input AND gates and need an 8-input AND gate,
we may view the task at hand as a *technology mapping* problem. The solution to this problem can be
found in our discussion of *tree-structured logic gates*. The prerequisite for the design of a tree-structured
gate is the associativity of the logic function. If we can prove the
associativity of a binary operator, we can design a tree-structured
gate.

We wish to implement a circuit for the *odd parity function* with \(n=8\) inputs, given an unlimited
supply of 2-input XOR gates.

The 8-variable parity function

equals 1 if the number of 1-inputs is odd. Two-level minimization of \(P\) is ineffective. However, if the binary XOR operation were associative, we can implement a tree-structured circuit. To find out whether a binary XOR is associative, we wish to prove that

From our discussion of the *theorems of Boolean algebra*, we know that we may use a perfect induction
or Boolean algebra. The proof by perfect induction is
straightforward if we recall that \(x \oplus y\) equals 1 if
one of \(x\) or \(y\) equals 1 but not both:

\(x\) \(y\) \(z\) \(x \oplus y\) \((x \oplus y) \oplus z\) \(y \oplus z\) \(x \oplus (y \oplus z)\) 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 1 0 1 1 1 1 0 1 1 1 0 0 0 1 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 0 0 0 1 0 1 1 1 0 1 0 1

Since the truth table columns for \((x \oplus y) \oplus z\) and \(x \oplus (y \oplus z)\) are equal, we conclude that the XOR operation is associative.

Our XOR tree has \(n=8\) inputs. Binary XOR gates have arity \(k = 2.\) Therefore, the number of levels of the tree is \(l = \log_k n = \log_2 8 = 3,\) and we need \(n-1 = 7\) XOR gates to implement the tree.

Not surprisingly, the 8-input XOR gate has the same tree-structure as the 8-input AND gate above with the AND gates replaced by XOR gates.

Tree-structured gates like the XOR gate of Example 5.12
are pleasantly simple, because each tree node has the same gate and
the regular structure of a tree is easy to remember. If the number of
inputs \(n\) is not a power of arity \(k\) of the
\(k\)-input gates, we sacrifice the complete tree and replace it
with a balanced tree. In a complete tree all leaves have the same
depth, whereas in a *balanced tree* the depths of the leaves differ by
at most 1. Figure 5.27 shows two balanced trees for
XOR gates. The construction of an \(n\)-input tree is easy, if
you first draw a complete tree with a number of inputs that is the
largest power of \(k\) less than \(n,\) and then add as many
gates in the next level of the tree until the total number of inputs
is \(n.\) The 9-input XOR tree requires adding one gate in level
3, and for the 11-input XOR tree we need three gates in level 3 of the
tree.

The design of more complex circuits can benefit from the tree
structure, if we view the tree as a substructure of a circuit or,
alternatively, superimpose additional connectivity on top of the tree.
The *multiplexer tree* is an example of a
multilevel circuit with tree structure. Here, the inputs drive not
only the data inputs of the multiplexers at the leaves of the tree,
but also the select inputs at the inner nodes of the tree.

### 5.3.2. Algebraically Factored Circuits¶

A **factored form** of a Boolean function is a Boolean expression that
represents a tree-structured circuit, whose internal nodes are AND or
OR gates and each leaf is a literal. Factoring is the process of
transforming an SOP form into a factored form. For example, given
5-variable function \(f:\mathcal{B}^5 \rightarrow \mathcal{B}\) in
SOP form

\(\qquad f(v, w, x, y, z)\ =\ v x + v y + w x + w y + z\,,\)

an equivalent, factored form of \(f\) is

\(\qquad f(v, w, x, y, z)\ =\ (v + w) (x + y) + z\,.\)

One form of \(f\) translates into the other by applying the
*distributivity theorem*. The
corresponding combinational circuits are shown in Figure 5.28. The factored form is smaller than the SOP form,
because it has fewer gates and fewer gate inputs.

Factoring is a powerful method to derive multilevel circuits that are
smaller than their two-level counterparts. We present one particular
style, known as **algebraic factoring**, with the goal to minimize the
cost of the factored form. Although this optimization problem is too
complex in general to solve exactly, good heuristics lead to
acceptable solutions.

Given a Boolean expression \(f\) in SOP form, consider the
**algebraic division** \(f/d.\) Boolean expression \(d\) is
an **algebraic divisor** of dividend \(f\) if there exists a
quotient \(q \ne 0\) and a remainder \(r\) such that

and \(q\) and \(d\) have no variables in common. The latter
constraint guarantees that the Boolean expressions behave like
polynomials of real numbers, if we interpret a logical OR as an
algebraic addition and a logical AND as an algebraic multiplication.
Hence the name *algebraic* factoring. We say that \(d\) is a
**factor** of \(f\) if \(r = 0.\)

We investigate the factoring of Boolean function

given in SOP form. With the *distributivity theorem* in mind, observe that variable \(v\) is
a divisor of \(f,\) because we can rewrite \(f\) as

\(\qquad f = (x + y) \cdot v + (w x + w y + z)\,.\)

The quotient is \(q = x + y\) and the remainder is \(r = w x + w y + z.\)

Variable \(w\) is another divisor of \(f,\) also with quotient \(q = x + y\) but with a different remainder \(r = v x + v y + z,\) because

\(\qquad f = (x + y) \cdot w + (v x + v y + z)\,.\)

Analogously, we find that variables \(x\) and \(y\) are divisors of \(f,\) both with quotient \(q = v + w\) but with different remainders. For divisor \(d=x\) we find

\(\qquad f = (v + w) \cdot x + (v y + w y + z)\,,\)

and for \(d=y\):

\(\qquad f= (v + w) \cdot y + (v x + w x + z)\,.\)

Variable \(z\) is a divisor with trivial quotient \(q=1\) and remainder \(r = v x + v y + w x + w y\):

\(\qquad f = 1 \cdot z + (v x + v y + w x + w y)\,.\)

None of the variables is a factor of \(f,\) because each of the divisions has a nonzero remainder. The sums \(v+w\) and \(x+y\) are also divisors of \(f.\) The quotient associated with divisor \(v+w\) is \(q=x+y,\) and the remainder is \(r = z,\) because

\(\qquad f = (v+w) \cdot (x+y) + z\,.\)

The quotient associated with divisor \(x+y\) is \(q=v+w,\) because

\(\qquad f = (x+y) \cdot (v+w) + z\,.\)

Since the logical AND is commutative, the distinction between quotient and divisor is merely a matter of perspective. As such, the meanings of \(d\) as divisor and \(q\) as quotient are interchangeable.

We wish to factor the SOP form of Boolean function

The distributivity theorem does not apply, since the implicants
have no literals in common. Therefore, \(g\) has no obvious
factors and no nontrivial divisors. However, the *consensus
theorem* enables us to add implicant \(y
z\) by consensus on \(x,\) such that

This SOP form can be factored into

Both sums \(x+y\) and \(\overline{x} + z\) are factors of
\(g,\) because the remainder is zero. However, the factors are
not algebraic, because variable \(x\) appears in each of them.
We say that the factors are *Boolean* rather than algebraic,
because we need to invoke theorems specific to Boolean algebra to
prove that the factored form equals the SOP form. In this example,
we need the Boolean *complementation theorem*, \(x\,\overline{x} = 0\):

Note that \(g\) has yet another factor, \(y + z.\) This factor is Boolean too, because it shares variable \(y\) with factor \(x + y\) and variable \(z\) with factor \(\overline{x} + z.\) Using Boolean algebra, we can prove that the factorization into three factors equals the SOP form:

This proof relies on the *idempotence theorem*, \(x \cdot x = x,\) which is specific to
Boolean algebra. In the algebra of real numbers, \(x \cdot x
= x^2,\) and \(x^2 \ne x\) for \(x \ne 0,1.\)

Finding divisors of a Boolean expression, like factor \(x + y\) of \(g = \overline{x} y + x z,\) can be done in principle by translating the SOP form into the POS form. Algebraic factoring avoids the POS form by restricting the factors of a Boolean expression to algebraic factors, where divisor and quotient have no variables in common. The key advantage of this restriction is that it simplifies the identification of divisors.

The problem of factoring multilevel circuits is to find one or more
algebraic divisors of SOP form \(f\) such that the cost of the
multilevel circuit is minimized. In the following, we restrict
ourselves to divisors that are products of literals. For divisors in
product form, the quotient assumes SOP form. Below, we discuss more
complex divisors in SOP form, i.e. sums of at least two products, as
part of our study of *multilevel multioutput functions*. To maintain the algebraic nature
of the factorization, we restrict the SOP form of \(f\) to
products where no product covers another. Otherwise, we can simplify
the SOP form by virtue of the *covering theorem*. Furthermore, we treat complemented and
uncomplemented variables of \(f\) as distinct literals. We may
transform \(f\) into an SOP form with uncomplemented literals only
by replacing the complemented variables with new variable names.

Experience shows that a good divisor for the cost reduction of an SOP
form \(f\) is a large product, i.e. a conjunction of many
literals. The largest product that divides \(f\) is a product
that ceases to be a nontrivial divisor of \(f\) if we extend the
product by one more literal. Here, a nontrivial divisor \(d \ne
1\) is associated with a nontrivial quotient \(q \ne 1.\) For
example, \(d = x y\) is the largest product that is a divisor of
\(f = v x y + w x y\) with nontrivial quotient \(q = v + w.\)
Extending product \(x y\) with literal \(v\) yields a larger
divisor \(d = v x y,\) but with trivial quotient \(q = 1.\)
The extension with the only remaining variable \(w\) yields
\(q=1\) too. Variable \(x\) is also a divisor of \(f\)
with quotient \(v y + w y.\) However, \(x\) is not the largest
divisor because extending product \(x\) with literal \(y\)
yields nontrivial divisor \(x y.\) If \(f\) is a single
product, then \(f\) is also its own largest divisor but with
trivial quotient \(1.\) For example, \(x y z\) the largest
trivial divisor of \(g = x y z.\) The unique nontrivial quotient
associated with the largest algebraic divisor of an SOP form is called
**kernel**, and the largest nontrivial divisor itself is the
associated **cokernel**. An SOP form must contain at least two
products to have a pair of kernel and cokernel. In the preceding
examples, \(g = x y z\) has no kernel, whereas \(f = v x y +
w x y\) has kernel \(v + w\) and cokernel \(x y.\)

We can identify all pairs of kernels and cokernels of an SOP form
\(f\) by means of **rectangle covering**, which resembles the
*K-map* method for two-level minimization. The rectangle
covering method is based on the **product-literal matrix** of a given
SOP form. This matrix has one row for each product of \(f\) and
one column for each literal of \(f.\) A matrix element is 1 if
the literal associated with its column appears in the product
associated with its row, otherwise it is 0. For example, SOP form

has the product-literal matrix below. For clarity, we omit the 0 elements of the matrix.

The product-literal matrix exposes products with common literals. If
a column contains 1 elements, then the associated literal appears in
as many products as there are 1’s in the column. For example, literal
\(w\) appears in two products and literal \(x\) in all three.
A literal is shared by multiple products if the associated column has
two or more 1’s. When multiple literals appear in a product of the
SOP form they constitute a product of literals. In the
product-literal matrix, a product of literals appears in the product
of the SOP form if the literals have 1’s in the row of the SOP
product. *Rectangle covering* finds the kernels and cokernels of an
SOP form as follows:

Cover rows and columns of 1’s in the product-literal matrix, not necessarily of contiguous 1’s, that form the largest rectangle with at least two rows. The cokernel is the product of the literals associated with the columns of the rectangle. The kernel is therestrictionof the sum of the products of the rows in the literals.

The largest rectangles of \(f = w x y + w x z + x y z\) are shown in the product-literal matrix below. The blue rectangle covers the columns of literals \(w\) and \(x\) and the rows of products \(w x y\) and \(w x z.\) The extension to columns \(y\) or \(z\) would include 0’s into the rectangle. Similarly, including row \(x y z\) would include a 0 in the column of literal \(x.\) Since we cannot increase the size of the blue rectangle without overing 0’s, the blue rectangle is a largest rectangle covering literals \(w\) and \(x.\) Analogously, the red and green rectangles are largest rectangles, except that they do not cover contiguous rows or columns.

From a largest rectangle, we deduce the cokernel as the product of the literals, and derive the kernel by forming the sum of the row products and then the restriction of the sum in the literals. For example, the blue rectangle identifies cokernel \(w x\) of \(f.\) The sum of the row products is \(w x y + w x z\) and the kernel is the restriction in the literals of the cokernel \((w x y + w x z)|_{w, x}\ =\ y + z.\) If we interpret the cokernel as largest divisor and the kernel as quotient, then we can factor \(f\) such that \(f = (y + z) w x + x y z.\) The factorizations of \(f\) corresponding to the three largest rectangles in the product-literal matrix are:

cokernel kernel factorization \(w x\) \(y + z\) \((y + z) w x + x y z\) \(x y\) \(w + z\) \((w + z) x y + w x z\) \(x z\) \(w + y\) \((w + y) x z + w x y\)

There is one more largest rectangle, but it does not cover as many 1’s as the three largest rectangles above. The rectangle consists of the entire column of literal \(x,\) which covers all three rows. The rectangle cannot be enlarged because no other columns covers all rows. The kernel of cokernel \(x\) is \(wy + wz + yz.\)

We also discover cokernel \(x\) with a second round of rectangle covering, starting with any of the three factorizations above. We derive the factorization of a factorization by renaming the kernel. For example, introduce a new name \(v = y + z.\) Then, substitute \(v\) for the kernel in the factorization to obtain \(f = v w x + x y z,\) and treat \(v\) as a literal. The resulting product-literal matrix is:

We find only one largest rectangle that covers two rows, which is the rectangle consisting of column \(x.\) The kernel of cokernel \(x\) is the restriction of the sum of the row products in the literal of the cokernel \((v w x + x y z)|_x = v w + y z.\) Thus, we have found the factorization \(f = (v w + y z) x,\) or after expanding variable \(v\):

No further rounds of rectangle covering are necessary, because factorization \((v w + y z) x\) has reduced \(f\) to a single product \(u x,\) after introducing \(u\) as a new name for kernel \(u = v w + y z.\) Note that we obtain similar factored forms if we start the second round of rectangle covering with the other two factorizations, \(f = ((w + z) y + w z) x\) and \(f = ((w + y) z + w y) x.\) The corresponding multilevel circuits have the same topology, as shown in Figure 5.29 on the right.

Factoring has reduced the cost of the SOP form from 12 by 2 units to cost 10 of the factored form. The cost of the factored form can be deduced from the factored expressions by including the new variable names, here \(u\) and \(v\):

The cost of \(f\) is the number of literals plus the cost of the next level: \(\mathcal{C}(f) = 2 + \mathcal{C}(u),\) where \(\mathcal{C}(u) = 4 + 2 + \mathcal{C}(v),\) where \(\mathcal{C}(v) = 2,\) for a total of \(\mathcal{C}(f) = 2 + 6 + 2 = 10.\)

Unlike two-level minimization, there is no guarantee that the
multilevel circuit of factored form \(f\) minimizes the cost
\(\mathcal{C}(f).\) Picking the largest rectangles in each
round of rectangle covering is a *greedy heuristic* that succeeds
in many cases but not always.

We apply the rectangle covering method to derive multilevel circuits for the symmetric functions \(S_{1,2}(w, x, y, z)\) and \(S_{3,4,5}(v, w, x, y, z).\)

First, we consider the 4-variable function

given in minimal SOP form. The product-literal matrix exposes two largest \(2\times 2\) rectangles.

The blue rectangle represents cokernel \(w \overline{y}\) and kernel \(\overline{x} + \overline{z},\) and the red rectangle represents cokernel \(y \overline{z}\) and kernel \(\overline{w} + \overline{z}.\) With new names for the kernels, \(a = \overline{x} + \overline{z}\) and \(b = \overline{w} + \overline{z},\) we obtain the factorization

The product-literal matrix of this factorization has three largest \(2\times 1\) rectangles associated with literals \(\overline{x},\) \(\overline{y},\) and \(\overline{x}.\)

Since the rectangles share row products, the corresponding factorizations are not independent of each other. Hence, we pick one cokernel \(\overline{w}.\) The corresponding kernel is \(x \overline{y} + \overline{x} z.\) This pair of cokernel and kernel yields factorization

The cost of the factored form is \(\mathcal{C}(S_{1,2}) = ((3 + \mathcal{C}(a)) + (3 + \mathcal{C}(b)) + 8) + 3,\) where \(\mathcal{C}(a) = \mathcal{C}(b) = 2,\) for a total of \(\mathcal{C}(S_{1,2}) = 21.\) Compared to 24 units of the minimal two-level form, the factored form saves 3 units of cost. However, the cost is still relatively large compared to Knuth’s multilevel circuit [DEK4A] on the left of Figure 5.26 with a cost of just 12 units.

Second, we derive a factored form for 5-variable function

The product-literal matrix of \(S_{3,4,5}\) (round 1) contains ten largest \(3\times 2\) rectangles. Only two of the ten rectangles, the blue and red rectangles are shown.

We pick the blue and red rectangles because they do not share any rows. Furthermore, we pick the green and orange \(2\times 2\) subrectangles of the largest \(3\times 2\) rectangles. Together, these four rectangles cover each row of the product-literal matrix exactly once. Therefore, we can factor \(S_{3,4,5}\) w.r.t. all kernels in a single round. The cokernel of the blue rectangle is \(v w\) and the kernel \(x + y + z.\) For the red rectangle cokernel and kernel are \(y z\) and \(v + w + x,\) for the green rectangle \(v x\) and \(y + z,\) and for the orange rectangle \(w x\) and \(y + z.\) We rename the kernels such that \(a = x + y + z,\) \(b = v + w + x,\) and \(c = y + z,\) and obtain the factorization

We use this factored form for a second round of rectangle covering. We find cokernel \(c x\) and kernel \(v + w.\) The resulting factorization in expanded form is

Notice that kernel \(c = y + z\) is a summand of kernel
\(a\) and kernel \(d = v + w\) is a summand of kernel
\(b.\) We discuss the identification of shared kernels as part
of our study of *multioutput functions* below. We observe that we can
reduce the cost of \(S_{3,4,5}\) further by sharing kernels
\(c\) and \(d\) through algebraic rewriting:

Sharing kernels \(c\) and \(d\) yields a branching circuit that requires explicit naming of the associated common subexpressions. In contract, naming common subexpressions is not necessary to cast a tree-structured circuit like \(S_{1,2}\) into factored form, because each kernel expression occurs only once. The cost of the factored form of \(S_{3,4,5}\) is \(\mathcal{C}(S_{3,4,5}) = 20.\) This cost comes pretty close to Knuth’s factored form [DEK4A] on the right of Figure 5.26 with a cost of 18 units, in particular when compared to the cost of 40 units of the two-level form of \(S_{3,4,5}.\)

## 5.4. Multioutput Functions¶

A combinational circuit with \(n\) inputs and \(m\) outputs
implements a **multioutput function** \(f:\mathcal{B}^n
\rightarrow \mathcal{B}^m\) if \(m > 1.\) We may write a function
with multiple outputs in vector notation as a row vector:

For example, the *seven-segment decoder*
implements a multioutput function with \(n=4\) and \(m=7.\)

Multiple functions present the opportunity of sharing combinational
subcircuits if the functions have common subexpressions. As a result,
we may save logic gates at the expense of introducing *branches* that potentially complicate the task of minimizing
circuit delay.

### 5.4.1. Two-Level Multioutput Functions¶

Assume our goal is to design a multioutput function where each
function shall be represented by a two-level AND-OR circuit. If we
derive each of the functions by means of *two-level minimization*, an obvious opportunity for saving
AND gates is to share prime implicants.

As a concrete example, consider multioutput function \(f(x,y,z) = (f_0(x,y,z),\ f_1(x,y,z)),\) where

Figure 5.30 shows the K-map for \(f_0\) on the left and for \(f_1\) in the middle. If we minimize both functions independent of each other, we obtain the minimal covers

The total cost of the multioutput circuit is the sum of the costs of each function, \(\mathcal{C}(f_0) + \mathcal{C}(f_1) = 6 + 6 = 12.\) Observe that we can reduce the cost of the circuit because the functions have prime implicant \(y\,\overline{z}\) in common.

If two functions have a common prime implicant, we can share the prime
implicant by connecting the output of the AND gate to the OR gates of
both functions. Figure 5.31 contrasts the two-level
circuits without and with sharing of prime implicant
\(y\,\overline{z}.\) Sharing saves one AND gate. To account for
the savings, we determine the *cost* of a
two-level multioutput function as the total number of AND gate inputs
plus the total number of OR gate inputs. Then, the cost of the
circuit without sharing is the sum of the costs of each function as
before. The cost of the circuit with sharing is
\(\mathcal{C}(f_0,f_1) = 6 + 4 = 10,\) which is two units smaller
than without sharing.

The key to minimizing the cost of a two-level multioutput function is
to identify shared prime implicants. Observe in Figure 5.30 that the shared, red prime implicant covers those cells that
are 1-cells in both K-maps for \(f_0\) and \(f_1.\) The
1-cells shared among functions \(f_0\) and \(f_1\) are the
1-cells of their conjunction \(f_0\cdot f_1,\) i.e. the
*intersection* of the sets of 1-cells, as illustrated in
Figure 5.30 on the right. In general, a **shared
prime implicant** of two functions \(f_0\) and \(f_1\) is a
prime implicant of \(f_0\cdot f_1.\)

If the number of functions is larger than two, then sharing may be possible among pairs of functions, triples of functions, etc. For \(m\) functions, there are \(\binom{m}{k}\) combinations of pairs for \(k=2,\) triples for \(k=3,\) and so on, where \(2 \le k \le m.\) Figure 5.32 illustrates an example with \(m=3\) functions \(f_0,\) \(f_1,\) and

There are \(\binom{3}{2}=3\) pairs of functions with shared prime implicants, \(f_0 \cdot f_1,\) \(f_0 \cdot f_2,\) and \(f_1 \cdot f_2.\) Furthermore, we have \(\binom{3}{3}=1\) choice for a triple of three functions \(f_0 \cdot f_1 \cdot f_2.\) We find that the three functions have shared prime implicant \(\overline{x}\,y\,\overline{z}\).

We can reduce the cost of function pair \((f_0, f_1)\) by sharing prime implicant \(y\,\overline{z},\) as shown in Figure 5.31. Sharing is also effective between functions \(f_1\) and \(f_2,\) although these functions to not share a common prime implicant, but a subcube of their prime implicants only. As shown in Figure 5.32, the shared prime implicant of \(f_1 \cdot f_2\) is the orange minterm \(\overline{x}\,y\,\overline{z}.\) Nevertheless, although the shared prime implicant requires a 3-input AND gate, sharing reduces the size of the circuit because the 3-input AND gate replaces two 2-input AND gates. As a result, the cost of the multioutput circuit without sharing on the left in Figure 5.33, \(\mathcal{C}(f_1) + \mathcal{C}(f_2) = 6 + 6 = 12\) is by 1 unit larger than the cost of the circuit with sharing, \(\mathcal{C}(f_1,f_2) = 7 + 4 = 11.\)

Function pair \((f_0, f_2)\) shares prime implicant \(\overline{x}\,y,\) the green prime implicant in Figure 5.32. Although this prime implicant is essential for \(f_2\) it is not for \(f_0.\) There exists no selection of the four prime implicants of \(f_0\) and \(f_2\) or their subcubes, such that sharing \(\overline{x}\,y\) would reduce the cost of the multioutput circuit below that of the separately minimized circuits, \(\mathcal{C}(f_0) + \mathcal{C}(f_2) = 6 + 6 = 12.\)

Before solving the multioutput minimization problem of multioutput function \((f_0, f_1, f_2),\) we first establish five useful facts:

- Shared prime implicants are subcubes of the prime implicants of the multioutput function. For example, the orange shared prime implicant \(\overline{x}\,y\,\overline{z}\) is a 0-cube, and is a subcube of two 1-cubes, the red prime implicant \(y\,\overline{z}\) common to \(f_0\) and \(f_1\) and the green prime implicant \(\overline{x}\,y\) common to \(f_0\) and \(f_2.\) The red and the green shared prime implicants are not strictly subcubes but are equal to the prime implicants of the corresponding functions.
- The shared prime implicants of function \(f_i\) are the maximal subcubes shared by \(f_i\) with one or more functions. For example, function \(f_0\) has the orange shared prime implicant \(\overline{x}\,y\,\overline{z},\) because it is a maximal subcube of \(f_0\cdot f_1\cdot f_2.\) Function \(f_1\) does not have the green shared prime implicant \(\overline{x}\,y,\) because it is neither a shared prime implicant of \(f_0\cdot f_1,\) \(f_1\cdot f_2,\) or \(f_0\cdot f_1\cdot f_2.\)
- A prime implicant of \(f_i\) is essential for \(f_i\) if it is the only prime implicant to cover a 1-cell and no other shared prime implicant of \(f_i\) covers this 1-cell. For example, the red prime implicant \(y\,\overline{z}\) of \(f_0\) is essential for \(f_0,\) because it is the only one to cover 1-cell \(x\,y\,\overline{z},\) and the only other shared prime implicant of \(f_0,\) the orange prime implicant \(\overline{x}\,y\,\overline{z}\) does not cover 1-cell \(x\,y\,\overline{z}.\) The red prime implicant is not essential for \(f_1,\) however. Although it is the only prime implicant to cover 1-cell \(\overline{x}\,y\,\overline{z},\) this 1-cell is also covered by the organge shared prime implicant of \(f_1.\) Therefore, we have the choice between the red and orange implicants to cover 1-cell \(\overline{x}\,y\,\overline{z}\) of \(f_1.\)
- The minimal cover of function \(f_i\) includes the essential prime implicants of \(f_i.\) For example, in Figure 5.32 the essential prime implicants are those that are the only ones covering a 1-cell marked with a bold 1. The blue prime implicants are essential, and must be included in the minimal cover of their functions.
- The minimal cover \(f_i\) may include shared prime implicants of \(f_i\) to reduce the total cost. For example, the orange shared prime implicant is not essential for either function, but may be used together with the blue prime implicants to form a cover for \(f_1\) and \(f_2\) respectively.

We summarize our insights about prime implicants in the following theorem.

Theorem (Minimal Multioutput SOP Form)The minimal SOP form of multioutput function \(f=(f_0, f_1, \ldots, f_m)\) consists of an SOP form for each output \(f_i,\) which is the sum of the essential prime implicants of \(f_i\) and a subset of the non-essential prime implicants and shared prime implicants of \(f_i\) such that the total cost across all outputs is minimized.

In practice, given the prime implicants and shared prime implicants of a multi-output function, the question is which of the non-essential and shared prime implicants to include into the minimal cover. We illustrate the selection by means of the K-maps in Figure 5.32. The prime implicants, essential prime implicants marked with a \(*,\) and the shared prime implicants marked with symbol \(\dagger\) for each of the three functions are according to Figure 5.32:

The minimal cover of function \(f_0\) consists of its two essential prime implicants \(f_0 = \overline{x}\,z + y\,\overline{z}.\) Including non-essential prime implicant \(\overline{x}\,y\) or shared prime implicant \(\overline{x}\,y\,\overline{z}\) would increase the cost of \(f_0\) unnecessarily. Since function \(f_1\) shares prime implicant \(y\,\overline{z}\) with \(f_0,\) it is also candidate for the cover of \(f_1.\) The second possibility for sharing is to include shared prime implicant \(\overline{x}\,y\,\overline{z}.\) Function \(f_2\) might share \(\overline{x}\,y\,\overline{z}\) with \(f_1.\) We can exclude the third option of sharing prime implicant \(\overline{x}\,y\) between \(f_2\) and \(f_0,\) because including \(\overline{x}\,y\) in the SOP form of \(f_0\) increases its cost unnecessarily.

This leaves us with two options for the minimal cover. Option 1 shares the red prime implicant \(y\,\overline{z}\) between \(f_0\) and \(f_1.\) Since the \(y\,\overline{z}\) completes the cover of \(f_1,\) we minimize \(f_2\) separately by including the green prime implicant \(\overline{x}\,y\) rather than shared prime implicant \(\overline{x}\,y\,\overline{z}\) in the SOP form:

Option 2 uses the orange shared prime implicant for functions \(f_1\) and \(f_2.\) In this case, we do not include the red prime implicant \(y\,\overline{z}\) in \(f_1,\) because \(f_1\) is already covered with the orange shared prime implicant \(\overline{x}\,y\,\overline{z}\) and the blue essential prime implicant \(y\,z.\) Thus, we minimize \(f_1\) and \(f_2\) separately from \(f_0\):

The cost of option 1, \(\mathcal{C}_1(f_0,f_1,f_2) = 10 + 6 = 16,\) is by one unit smaller than the cost of option 2, \(\mathcal{C}_2(f_0,f_1,f_2) = 11 + 6 = 17.\) We conclude that option 1 constitutes the minimal cover.

### 5.4.2. Multilevel Multioutput Functions¶

The design of a multioutput function as a multilevel circuit opens up
opportunities for sharing larger subcircuits than prime
implicants in *two-level multioutput functions*. In the following, we extend the
method of *algebraic factoring*, and discuss
how to discover shared kernels, i.e. SOPs with at least two
products, in multioutput functions. As a concrete example, consider
multioutput function \(F: \mathcal{B}^4 \rightarrow
\mathcal{B}^3,\) where \(F(w,x,y,z) = (f(w,x,y,z), g(w,x,y,z),
h(w,x,y,z))\) and:

Note that SOP \(x + w y\) is a factor of \(f,\) a divisor of \(g\) with quotient \(z,\) and a divisor of \(h\) with trivial quotient \(1.\) Thus, \(x + w y\) is a common kernel of \(f,\) \(g,\) and \(h.\) Figure 5.34 contrasts the two-level multioutput circuit without sharing and the multilevel multioutput circuit with the shared kernel. Sharing the kernel reduces the cost \(\mathcal{C}(F) = 28\) of the two-level implementation to \(\mathcal{C}(F) = 19\) units.

We can solve the problem of identifying shared kernels by means of
rectangle covering. This method uses rectangle covering twice, first
to identify all kernels of each function as introduced for
*algebraic factoring* and, second, to find
the largest shared kernel. More specifically, the steps of the method
for **shared kernel factorization** by means of rectangle covering
are:

- For each function \(f_i\) of multioutput function \(f = (f_0, \ldots, f_{m-1})\) given in SOP form, determine
allkernels by rectangle covering the product-literal matrix of \(f_i.\)- Construct the
kernel-product matrixwith one row per kernel of all functions of \(f\) and one column per product term of all kernels of \(f.\) A matrix element is 1 if the product of the column is part of the kernel SOP in the row, and otherwise 0.- The largest rectangle in the kernel-product matrix that covers at least two rows and two columns is a shared kernel. Factor the functions \(f_i\) using the shared kernel.

We illustrate the method by rediscovering shared kernel \(x + w y\) of multioutput function \(F\) systematically. In step 1, we use the product-literal matrix for each of the functions \(f,\) \(g,\) and \(h\) to determine all of their kernels, not just the kernels corresponding to the largest rectangles. We also include the kernel associated with trivial cokernel 1. The covered product-literal matrices are:

Function \(f\) has kernel \(x + w y\) with cokernel \(\overline{z}.\) Since \(f = x\,\overline{z} + w\,y\,\overline{z}\) is divisible by \(\overline{z},\) the SOP form of \(f\) itself does not qualify as a kernel. Function \(g\) has one \(2\times 2\) rectangle with cokernel \(w y\) and kernel \(x + z.\) In addition, \(f\) has two \(2\times 1\) kernels, one with cokernel \(x\) and kernel \(z + w y\) and another with cokernel \(z\) and kernel \(x + w y.\) For trivial cokernel \(1,\) SOP form \(g = x z + w x y + w y z\) is a trivial kernel, that we include in our search for a common divisor. Function \(h\) has three \(2\times 1\) rectangles, cokernel \(w\) with kernel \(y + \overline{z},\) cokernel \(y\) with kernel \(w + \overline{z}\) and cokernel \(\overline{z}\) with kernel \(w + y.\) SOP form \(h = x + w\,y + w\,\overline{z} + y\,\overline{z}\) itself is a trivial kernel with trivial cokernel \(1.\)

In step 2 we construct the kernel-product matrix for multioutput function \(F.\) Each row corresponds to one kernel for each function:

For clarity, we have also listed the cokernels associated with the
kernels. Each column corresponds to one product term of the kernels.
Note that the product in the name *kernel-product matrix* refers to
the *kernel products*, whereas the product in the *product-literal
matrix* refers to the *SOP products* of the given function in SOP
form. To find the products for all columns of the matrix, list the
product terms of all kernels determined in step 1, and remove all
duplicates from the list. For example, kernel \(x + w y\) is a
sum of two products, \(x\) and \(w y.\) Each product is
represented by one column in the kernel-product matrix. Since SOP
\(x + w y\) is a kernel of \(f\) and \(g,\) we include one
row for this kernel for each function. In each of these rows we mark
the matrix elements in product columns \(x\) and \(w y\) with
a 1. The 0-elements are omitted for clarity. Duplicate kernel rows
are desired in the kernel-product matrix, because they expose the
sharing of kernel products across functions.

According to step 3, we find a shared kernel by applying rectangle covering to the kernel-product matrix. A good shared kernel corresponds to the largest rectangle covering at least two rows and two columns. The largest rectangle in the product-literal matrix of multioutput function \(F\) is the \(3\times 2\) rectangle shown in the matrix. The corresponding shared kernel \(x + w y\) is the sum of the column products. The factorizations of \(f,\) \(g,\) and \(h\) with kernel \(x + w y\) are:

In function \(h,\) we have also factored the remainder \(w
\overline{z} + y \overline{z}\) by extracting cokernel
\(\overline{z}.\) The corresponding multilevel multioutput
circuit is shown in Figure 5.34 on the right. In
general, the largest rectangles of the kernel-product matrix lead to
large cost reductions. However, as for *algebraic factoring* of single-output functions, there is no
guarantee that the largest rectangles minimize the cost of the
resulting multilevel circuit.

Shared kernel factorization is particularly effective in the context of multioutput functions. It can be beneficial for single-output functions like \(S_{3,4,5}\) of Example 5.15 too. Through two rounds of factorization we obtain the factored form

with four kernels, \(x + y + z,\) \(v + w + x,\) \(v + w,\) and \(y + z.\)

Rather than applying algebraic rewriting to reduce the cost of \(S_{3,4,5}\) further, we now employ rectangle covering to discover kernels rather than cokernels. To that end, we construct the kernel-product matrix with the four kernels identified in Example 5.15:

We find two largest \(2\times 2\) rectangles, the blue rectangle corresponds to shared kernel \(v + w\) and the red rectangle to shared kernel \(y + z.\) We introduce new names \(c = y + z\) and \(d = v + w,\) and substitute the shared kernels in

One additional, cost neutral step of renaming, such that \(a = x + c\) and \(b = d + x,\) yields the same factored form as in Example 5.15:

In general, we have the choice to apply either of the two methods of factorization in any order. For the time being the decision which order minimizes the cost of the multilevel circuit remains in the realm of black magic.

### 5.4.3. Tree-structured Multioutput Circuits¶

There exist multioutput functions where sharing of subcircuits appears to be an obvious design choice, yet deriving such circuits is anything but trivial. Nevertheless, the insights to be gained from the study of tree-structured multioutput circuits are particularly enlightening. As a concrete example, consider function \(f: \mathcal{B}^8 \rightarrow \mathcal{B}^8\) with eight inputs and eight outputs such that \(f_i\) is the conjunction of inputs \(x_0\) up to \(x_i\):

Thus, the eight output functions are

If we wish to implement \(f\) with as few AND gates as possible, we observe that we can express \(f\) as a linear recurrence:

This recurrence enables us to compute \(f_i\) once we know \(f_{i-1}\) by means of a conjunction with \(x_i.\) The corresponding combinational circuit in Figure 5.35 forms a chain of AND gates.

The AND chain maximizes the sharing of subcircuits. Each output
requires one 2-input AND gate only, for a total of seven AND gates.
If we generalize the number of inputs and outputs to \(n,\) then
the AND chains needs only \(n-1\) AND gates. However, all AND
gates lie on the critical path of the circuit from \(x_0\) to
\(f_{n-1}.\) Thus, for increasing \(n\) the circuit delay
grows quite rapidly, even with *gate sizing*.

If we wish to minimize the circuit delay, we can use a tree-structured circuit for each output. Figure 5.36 shows a forest of AND trees constructed with 2-input AND gates without any sharing. Output \(f_7\) requires the tree with a maximum height of \(\lg 8 = 3\) AND gates. In general, the maximum number of AND gates on the critical path of a circuit with \(n\) inputs and outputs is \(\lceil \lg n\rceil.\) For large \(n,\) the tree structure reduces the maximum path delay significantly compared to the chain circuit.

The obvious disadvantage of the AND forest compared to the AND chain
is the significant increase in the number of AND gates. However, we
can reduce the number of gates by sharing common subcircuits among the
trees. For example, output \(f_1\) can be used as input for the
other trees \(f_2, f_3, \ldots, f_7,\) which would save six AND
gates. On the other hand, we may want to share the larger subcircuit
of output \(f_3\) as input of the trees for \(f_4\) through
\(f_7.\) In fact, there is a large number of choices for sharing
subcircuits among the trees of a forest. Two topologies of shared
forests have gained sufficient popularity among circuit designers that
they have been named by their inventors. The **Kogge-Stone circuit**
for \(f\) is shown in Figure 5.37 and the
**Ladner-Fischer circuit** in Figure 5.38.

To denote the intermediate conjunctions in the Kogge-Stone and
Ladner-Fischer circuits, we introduce the **bracket notation**. For
an associative, binary operator \(\otimes\,,\) like the *AND
operation* for instance, we define the bracket for
indices \(i\) and \(j,\) where \(0 \le i \le j < n\) on
inputs \(x_0, x_1, \ldots, x_{n-1}\) such that

In case of the AND operation, bracket \([i{:}j]\) denotes the conjunction of the inputs \(x_k\) in index range \(i \le k \le j.\) Since brackets refer to contiguous index ranges, we can define the composition of two consecutive brackets \([i{:}k]\) and \([k+1{:}j],\) where \(0 \le i \le k < j < n\) such that

The composition of two consecutive brackets concatenates the index ranges. For example, assuming the operator is the AND operation, given brackets \([0{:}1] = x_0 \cdot x_1\) and \([2{:}3] = x_2 \cdot x_3,\) then the composition denotes the conjunction of all inputs in index range 0 through 3, i.e. \([0{:}1] \cdot [2{:}3] = [0{:}3] = x_0 \cdot x_1 \cdot x_2 \cdot x_3.\) The circuits in Figure 5.37 and Figure 5.38 compose intermediate results such that the bracket composition applies. Output \(f_i = [0{:}i]\) in bracket notation.

The bracket notation is convenient for analyzing the Kogge-Stone and
Ladner-Fischer circuits. For each output \(f_i,\) verify that
\(f_i = [0{:}i]\) by means of bracket compositions. The
Kogge-Stone circuit combines all pairs of next-neighbor inputs in the
first level, all pairs of neighbors with distance two in the second
level, all pairs of neighbors with distance four in the third level,
and so on for larger \(n.\) The Ladner-Fischer circuit uses fewer
intermediate results and, therefore, requires fewer AND gates than the
Kogge-Stone circuit. On the other hand, the Ladner-Fisher circuit has
AND gates with a larger fan-out than the Kogge-Stone circuit, which
requires careful *gate sizing* to minimize the
propagation delay.

So far, we have considered three different circuit topologies for
multioutput function \(f,\) the chain circuit, the forest of
trees, and the shared tree circuits of which the Kogge-Stone and
Ladner-Fischer circuits are representative examples. We compare the
quality of these multioutput circuits as a function of the number of
inputs and outputs \(n\) based on their *cost*, which reflects the number of 2-input gates, and the
number of gates on the critical path. The table below lists the
asymptotic behavior of cost and critical path as a function of
\(n,\) neglecting constant factors.

cost critical path length chain \(n\) \(n\) forest \(n^2\) \(\lg n\) shared trees \(n \lg n\) \(\lg n\)

We find that the chain circuit minimizes the cost, whereas the forest and shared tree circuits minimize the critical path length. Although the shared trees have a smaller cost than the forest without sharing, the cost of the shared trees is by a factor of \(\lg n\) larger than the cost of the chain circuit.

The comparison above raises the question whether we can design a
circuit that combines the best of both worlds, a cost proportional to
\(n\) and a critical path length proportional to \(\lg n.\)
This is the case indeed, and a class of circuits with these properties
is known as **prefix circuits** because they perform prefix
computations:

A

prefix computationis a multioutput function with \(n\) inputs \(x_0, x_1, \ldots, x_{n-1}\) and an associative binary operator \(\otimes\) that produces \(n\) outputs \(y_0, y_1, \ldots, y_{n-1}\) such that\[\begin{split}y_i = \begin{cases} x_0\,, & \text{if}\ i = 0\,, \\ y_{i-1} \otimes x_i\,, & \text{if}\ 1 \le i < n\,. \end{cases}\end{split}\]

The name *prefix computation* is borrowed from the prefix of a
sequence or a string, considering that \(y_i\) combines the prefix
of the input sequence which starts at index \(0\) and ends at index
\(i.\) Note that prefix computations extend beyond the Boolean
domain. For example, if the \(x_i\) and \(y_i\) are integer
numbers and we use integer addition as associative binary operator,
then the prefix computation is a **prefix sum**:

for \(0 \le i < n.\) Our Boolean multioutput function \(f\)
is a *prefix conjunction* with \(n=8.\)

The key idea of a prefix circuit is a recursive construction algorithm:

- Combine even pairs of next neighbors of the inputs.
- Recurse on the outputs of step 1.
- Combine odd pairs of the outputs of step 2 and inputs.

We illustrate the recursive construction by means of a prefix sum of \(n=8\) numbers \(x_i = i+1.\) The output of the prefix sum are sums of the first positive numbers \(y_i = \sum_{k=0}^i k+1 = \sum_{k=1}^{i+1} k = (i+1) (i+2)/2\):

\(i\) 0 1 2 3 4 5 6 7 \(x_i\) 1 2 3 4 5 6 7 8 \(y_i\) 1 3 6 10 15 21 28 36

The circuit below illustrates the recursive construction algorithm for the prefix sum with \(n=8.\) Step 1 performs the pairwise addition of next neighbors. Here, even pairs have even indices in the first element of each pair, that is (0,1), (2,3), etc. The recursive step 2 uses the sums of step 1 as inputs of a prefix sum computation with \(n=4.\) We draw the recursion as a black-box, assuming that the outputs are computed as shown. In step 3, we combine odd pairs of outputs of step 2 with inputs to compute the prefix sum.

The total number of additions in steps 1 and 3 is 7 or \(n-1\) in general. Thus, we can express the number of additions \(A(n)\) recursively as \(A(n) = A(n/2) + n-1.\) We halve problem size \(n\) recursively until \(n = 1,\) where no additions are required. Solving this recurrence yields an upper bound for the number of additions \(A(n) \le 2 (n-1).\) This count is proportional to \(n\) and, thus, asymptotically equal to the minimum number of additions required in an equivalent chain circuit. Nevertheless, the prefix circuit requires up to a constant factor of 2 more additions than a chain circuit. Analogously, we find that the number of adders on the critical path is less than \(2 \lg n,\) which is up to a constant factor of 2 more than the minimum of the equivalent forest or shared tree circuits. Thus, beyond the factors of 2, the recursive construction algorithm guarantees a cost proportional to \(n\) and a critical path length proportional to \(\lg n.\)

Figure 5.39 shows the topology of a prefix circuit for \(n=8\) with the expanded recursion. The circuit can be viewed as two back-to-back binary trees, one with the leaves at the inputs and the other with the leaves at the outputs. For a prefix sum replace each \(\otimes\) operator with an adder, and for a prefix conjunction with an AND gate. The prefix circuit with problem size \(n=8\) requires 11 operators, which is less then \(2 (n-1),\) and has a critical path length of 4 operators, which is less than \(2 \lg n.\) The advantages of the prefix circuit become more pronounced for larger values of \(n.\)

## 5.5. Basic Arithmetic Circuits¶

In this section, we introduce combinational circuits for arithmetic operations that can be found in every digital computer. We discuss the design of an adder and a magnitude comparator for unsigned binary numbers.

### 5.5.1. Binary Adder¶

When we add two decimal numbers by paper and pencil, we tabulate the
numbers aligning the least significant digits in the rightmost column.
Then we add the digits from right to left, starting with the least
significant digits, potentially adding a **carry** digit to the more
significant column on the left. The example on the right shows an
**addition chart** for adding decimal numbers 4528 and 937. First, we
add 8 and 7, which is 15. Since number 15 occupies two digits, it
generates a carry of value 1. This is easily seen by expanding 15
into its polynomial representation \(15 = 1 \cdot 10 + 5.\) For
the addition chart to preserve the *positional notation* of the sum, digit 5 is the least
significant sum digit, whereas digit 1 counts tens, and is carried
into the next position, where we need to add the carry to the sum of
digits 2 and 3. The result is 6, which does not generate a carry into
the next position. Equivalently, we may interpret 6 as a two-digit
number 06, and carry the leading zero into the next position.

Since binary numbers and decimal numbers are both instances of a
*positional number system*, adding
two binary numbers by paper and pencil works analogously to decimal
numbers. The example on the right shows the addition chart for binary
numbers 1011 and 11. In fact, the addition of binary numbers is even
simpler than with decimal numbers because there are only four
combinations of two bits compared to one hundred for two decimal
digits. A carry occurs if the sum is larger than 1, which is the case
for the least significant bits in the example on the right. Since
\(1 + 1 = 10_2,\) we add the carry of the least significant
position into the next position to obtain \(1 + 1 + 1 = 11_2.\)

#### Ripple-Carry Adder¶

As a first step towards an adder circuit consider the addition of two 1-bit binary numbers \(a\) and \(b.\) Rather than using an addition chart, we list all four combinations of values for \(a\) and \(b\) in a truth table, and derive the carry and sum bits.

a b carry sum 0 0 0 0 0 1 0 1 1 0 0 1 1 1 1 0

Since \(0 + 0 = 0,\) the top row contains 0 bits for both carry and sum. If one of \(a\) or \(b\) is 1 then \(0 + 1 = 1 + 0 = 1\) such that the sum bit is 1 and the carry bit is 0. Only if both \(a\) and \(b\) are 1 is the sum \(1 + 1 = 2_{10} = 10_2,\) that is the carry bit is 1 and the sum bit is 0.

A **half adder** is a combinational circuit that implements the
addition of two 1-bit numbers. It has two inputs \(A\) and
\(B\) and two outputs one for carry \(C_{out}\) and the other
for sum \(S.\) Thus, the half adder is a *multioutput
function* with the two Boolean functions
defined in the truth table above and algebraic expressions:

The half adder has a carry output but no carry input. If we wish to build an adder for two multibit binary numbers, we need to extend the half adder with a carry input. The truth table for all combinations of input bits \(a,\) \(b,\) and a carry-in bit is:

a b carry-in carry-out sum 0 0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1

If all three inputs are 0, then the carry-out and the sum are both 0. If one of the three inputs is 1, then the sum is 1 and the carry-out 0. If two of the three inputs are 1, then the carry-out is 1 and the sum is 0. These three cases are covered by the half adder as well. New is the case in the bottom row where all three inputs are 1. In this case, \(1 + 1 + 1 = 3_{10} = 11_2,\) so that both carry-out and sum assume value 1.

The combinational circuit that implements this truth table is called a
**full adder**. With three inputs, \(A,\) \(B,\) and carry-in
\(C_{in},\) and two outputs for carry-out \(C_{out}\) and sum
\(S,\) the full adder is a *multioutput function*. We know the 3-variable functions defined in
the truth table already. The carry-out is the *majority function* \(M(A,B,C_{in})\) and the sum is the
*odd parity function* \(P(A,B,C_{in})\):

The full adder is the building block for an adder of two \(n\)-bit
binary numbers. A **carry propagate adder (CPA)** is a combinational
circuit with two \(n\)-bit inputs \(A\) and \(B\) and a
1-bit carry input \(C_{in}\) that computes the \(n\)-bit sum
\(S\) and the carry bit \(C_{out}.\) Before implementing a
CPA, we may wonder whether the \(n+1\) output bits, \(n\) sum
bits plus one carry-out *msb* suffice to represent all sums of two
\(n\)-bit numbers. The answer is affirmative, and important
enough to formulate as a lemma:

Lemma (CPA bit width)The sum of two unsigned \(n\)-bit binary numbers plus a carry bit into the least significant position can be represented with \(n+1\) bits.

**Proof.** We show that the largest possible sum of a CPA can be
represented with \(n+1\) bits. The range of an unsigned \(n\)-bit binary number is \([0,
2^n-1],\) and the largest unsigned \(n\)-bit number is
\(2^n-1.\) The largest possible sum of two \(n\)-bit unsigned
numbers and a carry into the *lsb* is the sum of the two largest
\(n\)-bit numbers plus a carry-in of 1:

which is the largest unsigned binary number representable with \(n+1\) bits.

Now that we know that the black-box specification of a CPA fits our
needs for unsigned binary numbers, we mimick the paper-and-pencil
method to implement a CPA with a chain of full adders. The resulting
**ripple carry adder (RCA)** in Figure 5.40 constitutes the
simplest implementation of a CPA. Alternative CPA designs are the
carry-lookahead adder and the prefix adder, which are generally faster
but more complex logic designs.

As a naming convention we use subscript \(i\) to refer to the full
adder in bit position \(i.\) It has inputs \(A_i,\)
\(B_i,\) and carry-in \(C_{i-1},\) and the outputs are sum
\(S_i\) and carry-out \(C_i.\) Carry-out \(C_i\) of the
full adder in position \(i\) drives the carry-in of the full adder
in position \(i+1.\) The boundary cases are carry-in
\(C_{in} = C_{-1}\) and carry-out \(C_{out} = C_{n-1}.\) When
adding two unsigned numbers \(A\) and \(B,\) we enforce
\(C_{in} = 0,\) apply the input signals \(A_i\) and
\(B_i\) for \(0 \le i < n.\) Since the ripple carry adder is
a combinational circuit, the propagation delay is determined by the
critical path, which spans positions 0 to \(n-1\) along the carry
wires. The carry signal propagates, or *ripples*, from *lsb* position
0 through the carry chain to *msb* position \(n-1.\) Therefore,
to the first order, the propagation delay of the adder is proportional
to the number of bits \(n.\) Figure 5.41 shows
on the left a glass-box diagram of the full adder based on a 3-input
majority gate for the carry output and a 3-input XOR or parity gate
for the sum. The 4-bit RCA on the right replicates this full adder
design, and emphasizes the carry chain as the critical path.

The design of an RCA does not end with Figure 5.41.
Instead, one may argue that Figure 5.41 is where the
fun really starts. In the following, we apply the *method of
logical effort* to explore the design space of RCA
circuits. We wish to assess the propagation delay of an \(n\)-bit
RCA as a function of \(n\) by comparing different circuit designs.
We assume that the adder encounters relatively small load capacitances
at its outputs. Otherwise, we may insert inverter stages to drive
larger loads. Our strategy is to design a full adder, and to
replicate this full adder \(n\) times as suggested in
Figure 5.41. The primary design goal is to identify
a combinational circuit that minimizes the delay of the carry chain.
To that end we present three alternative circuit designs. Additional
alternatives, for example circuits based on compound gates or majority
and parity CMOS gates, are left as exercises.

#### RCA with Two-Level Logic¶

As a reference design, we implement the 3-input majority and parity
gates of the full adder using two-level logic. More specifically, we
choose to implement the full adder with NAND gates only. To that end,
we apply the *NAND transform* to the minimal
SOP form of the majority gate:

and the minimal SOP form of the parity gate:

Figure 5.42 shows the corresponding two-level circuits. Both circuits are symmetric in the sense that all paths from any input to the output are 2-stage paths of two NAND gates. However, for the sum we need the inputs in both complemented and uncomplemented form.

The logical efforts of the Figure 1.48, 3-input, and 4-input NAND gates are \(g_{nand2} = 4/3,\) \(g_{nand3} = 5/3,\) and \(g_{nand4} = 6/3.\) Assuming path electrical effort \(H_C\) for the carry and \(H_S\) for the sum, and with branching effort \(B=1,\) the path efforts \(F_C\) of the carry circuit and \(F_S\) of the sum circuit in Figure 5.42 are

Since the carry chain is on the critical path of an \(n\)-bit RCA, we inspect the load of the carry output to determine electrical effort \(H_C.\) According to Figure 5.41, carry output \(C_{out}\) of position \(i\) drives one input of the majority gate and one input of the parity gate in position \(i+1.\) Since the parity gate requires the complemented and uncomplemented input signal, we use a 2-fork to generate the complement. Figure 5.43 shows the gate-level circuit of a majority gate in position \(i\) driving the inputs of the 2-fork and the majority gate in position \(i+1.\)

To keep the area requirements of the RCA small, we assume that the stage-1 gates of the subcircuits are matched gates of minimum size, i.e. with scale factor \(\gamma = 1.\) Thus, as annotated in Figure 5.43, the input capacitance of a stage-1 inverter is \(C_{in}(inv) = 3\) units and of a 2-input NAND gate \(C_{in}(nand2) = 4\) units. Since each majority gate input drives two NAND gates, the input capacitance of the majority gate is \(C_{in}(M) = 8\) units. The load capacitance of the majority gate is \(C_L(M) = 2 \cdot C_{in}(inv) + C_{in}(M) = 14\) units. Therefore, the electrical effort of the majority gate is \(H_C = C_L(M)/C_{in}(M) = 7/4\) with path branching effort \(B_C = 2\) due to the input fork. If we size the stage-2 NAND gate of the majority gate for minimum delay according to the method of logical effort, we obtain a minimum delay for the carry output of

time units. Thus, an \(n\)-bit RCA with 14 units of load capacitance at its carry output has a delay of \(D_{2\text{-}level} = n\,\hat{D}_C\) or

Note that the sum outputs are not on the critical path of the RCA in Figure 5.41, with the exception of sum output \(S_{n-1}.\) However, for large \(n\) the difference of the delays for \(C_{out}\) and \(S_{n-1}\) can be considered negligible.

#### RCA with Multilevel Logic¶

As an alternative to the full adder implementation with two-level logic, we now implement the full adder of the RCA with multilevel circuits. We seek inspiration from Boolean algebra, and observe that the inclusive and exclusive OR operations are related through these identities

that are easily proven by perfect induction. We use the second
identity to prove that we can assemble a full adder from two
*half adders* plus an OR gate as shown in
Figure 5.44 below. Call the outputs of the
stage-1 half adder \(C_1\) and \(S_1.\) Then, the outputs of
the full adder are

which proves that the circuit implements a *full adder*.

Note that the NAND transform applies to \(C_{out},\) if we restrict the transform to those two levels of the 3-level circuit with SOP form, ignoring the XOR gate in HA1. The NAND transform replaces the OR and AND gates in Figure 5.44 with NAND gates. The XOR gates remain unaffected. If we replicate this multilevel design to form an \(n\)-bit RCA, we find that the critical path consists of one pair of half adder and OR gate per position. Figure 5.45 shows the adder design with the HA1 half adders drawn near the inputs to emphasize the carry chain. Since the carry path through a half adder consists of an AND gate only, this design should offer competitive performance to our reference design with two-level logic.

Using the method of logical effort, we can quickly assess the delay by considering the gate-level circuit of the critical carry chain. In Figure 5.46, we have applied the NAND transform, so that in position \(i\) of an \(n\)-bit RCA the carry signal passes through two stages of 2-input NAND gates. The XOR gate for the sum computation branches off the carry path. For small electrical efforts, the evaluation in Figure 4.28 shows that our fastest option is an XOR circuit with a 2-fork to generate the complemented and uncomplemented inputs for the CMOS XOR gate. Therefore, the carry output drives two inverters in each leg of the 2-fork, and the NAND gate of the stage-1 half adder.

Assuming that the stage-1 gates of the half adders are matched and have minimum size with \(\gamma = 1,\) each input of the NAND gate has an input capacitance of \(C_{in}(nand2) = 4\) units and each stage-1 inverter of the 2-fork driving an XOR gate has \(C_{in}(inv) = 3\) units of capacitance. Therefore, the branching effort of the carry path is

The electrical effort of the carry path is \(H_C = 1,\) because both input capacitance and load capacitance of the carry chain is equal to the input capacitance of one half adder. The logical effort of the carry path through two stages of NAND gates is \(G_C = g_{nand2}^2 = (4/3)^2\). Therefore, the carry path effort is \(F_C = G_C B_C H_C = 40/9,\) and the minimum path delay is

time units. Although position 0 in Figure 5.45 shows an additional half adder on the critical carry path, for large \(n\) we may approximate the delay of an \(n\)-bit RCA with multilevel circuits reasonably well as \(n\) times \(\hat{D}_C,\) such that

We find that the multilevel RCA design is marginally faster than the two-level design with a delay of \(D_{2\text{-}level}(n) = 10.58\,n\).

#### RCA with Carry-Propagation Gate¶

If we wish to speed up the carry chain beyond the two-level and
multilevel circuits, our best bet is to consider designing a CMOS
circuit for fast carry propagation. Such a circuit requires abstract
thinking to arrive at a Boolean expression for the carry-out signal of
the full adder. More succinctly, we introduce two intermediate
signals from the truth table of the full adder that enable us to
express \(C_{out}\) such that we obtain a faster circuit. To that
end, consider the truth table of the *full adder*
below, reorganized such that the upper four rows are associated with a
carry-in of 0, and the lower four rows with a carry-in of 1. Because
the majority and parity functions are *symmetric*, the
carry-out and sum columns remain unchanged.

\(C_{in}\) \(A\) \(B\) \(C_{out}\) \(S\) \(G\) \(K\) 0 0 0 0 0 0 1 0 0 1 0 1 0 0 0 1 0 0 1 0 0 0 1 1 1 0 1 0 1 0 0 0 1 0 1 1 0 1 1 0 0 0 1 1 0 1 0 0 0 1 1 1 1 1 1 0

We make the following, abstract observations about the carry-out signal:

- The carry-out equals 0, independent of the carry-in, if and only if
both inputs \(A\) and \(B\) are 0. We say that input
combination \(A=B=0\)
**kills**the carry-in and outputs a carry-out of 0. - The carry-out equals 1, independent of the carry-in, if and only if
both inputs \(A\) and \(B\) are 1. We say that input
combination \(A=B=1\)
**generates**a carry-out of 1 independent of the carry-in. - If one of inputs \(A\) or \(B\) equals 1, then the
carry-out equals the carry-in. We say that the full adder
**propagates**the carry input to the carry output.

The first two observations motivate the definition of two Boolean functions for the generate signal, defined as

and for the kill signal

The truth table of the full adder above contains separate columns for the generate and kill functions. We can use these functions to express the carry-out and its complement as follows:

The carry-out equals 1 if we generate the carry or if we do not kill a
carry-in of value 1. The complemented form corresponds to the
inverting **carry-propagation gate** shown in
Figure 5.47. The pull-up and pull-down networks
of this CMOS gate are not duals of each other. Furthermore, the
carry-propagation gate is *asymmetric*, with
different logical efforts per input, \(g_{cp}(G) = 5/3,\)
\(g_{cp}(\overline{K}) = 4/3,\) and \(g_{cp}(C_{in}) = 2.\)
Although the parasitic delay \(p_{cp} = 3\) is relatively large
for a CMOS gate, the key for a fast RCA is to have just a single
carry-propagation gate per bit position.

If we substitute the inverting carry-propagation gate for the majority
gate in the *full adder*, we complement the carry
output. We indicate the change in the full-adder symbol by placing a
bubble on the carry output. Rather than using an inverter to compute
the uncomplemented carry-out signal, we construct an \(n\)-bit RCA
using the inverting ripple carry chain shown in Figure 5.48.

This design exploits the fact that 3-input majority and parity functions
are *self-dual*, i.e. the Boolean identities

hold. Therefore, rather than adding inverters to the critical carry chain, we can add inverters in every other position of the RCA to the noncritical inputs and sum outputs. Figure 5.49 shows a 4-bit RCA with the inverting carry chain. The inverting carry propagation gates (\(CP\)) and the kill and generate (\(\overline{K} G\)) logic are shown as black boxes.

We approximate the delay of an \(n\)-bit RCA as \(n\) times the delay of the carry propagation gate. We assume that all carry propagation gates are minimum sized, matched gates. Then, the relevant logical effort for the carry chain is \(g_{cp}(C_{in}) = 2\) of the carry input. The load capacitance of the gate is the input capacitance of the carry-input of the carry propagation gate in the next position, \(C_{in}(cp(C_{in})) = 6\) units, plus the input capacitance of the 2-fork for the XOR gate, \(2 C_{in}(inv) = 6\) units, for minimum sized stage-1 inverters in each leg. Therefore, the electrical effort of the carry propagation gate is \(h_{cp} = C_L/C_{in}(cp(C_{in})) = 12/6 = 2.\) The delay of the gate is hence

time units. For large \(n,\) the delay of an RCA with carry propagation gates amounts to approximately

which is roughly 20% faster than the two-level and multilevel designs. Another delay reduction is possible by reducing the off-path capacitance for the sum outputs. Instead of driving the 2-fork of the XOR gate directly with the carry output, we may insert a minimum sized inverter before the 2-fork, and obtain a gate delay of \(d_{cp} = 6\) time units.

### 5.5.2. Magnitude Comparator¶

A fundamental operation on numbers is their comparison. We have
discussed circuits for the special case of *equality comparison* already. Here, we consider the more general case of
comparing two \(n\)-bit unsigned binary numbers \(A\) and
\(B.\) The four **magnitude comparisons** are \(A < B,\)
\(A \le B,\) \(A > B,\) \(A \ge B.\) We can design
circuits for each of these operations or combinations of them. If we
have a binary adder, we can use the equivalence \(A < B
\Leftrightarrow A - B < 0\) to compute the less-than comparison of
\(A\) and \(B\) by adding the 2’s complement of \(B\) to
\(A\) and inspecting the sign bit, the *msb*, of the sum. If the
sign bit is 1, then the subtraction is negative, i.e. \(A < B.\)
Otherwise, if the sign bit is 0, we conclude that \(A \ge B.\)

We may view two unsigned numbers as bitstrings, and perform the comparison based on their lexicographical order. Given bitstrings \(A = A_{n-1} A_{n-2} \ldots A_0\) and \(B = B_{n-1} B_{n-2} \ldots B_0\) both of length \(n,\) then \(A\) is lexicographically less than \(B\) if there exists an integer \(k,\) \(0 \le k < n,\) such that

That is, if the prefix of \(A\) equals the prefix of \(B\) up to and excluding index \(k,\) and in position \(k\) we find \(A_k < B_k,\) then \(A < B.\) The lexicographical greater-than order is defined analogously. For example,

because both numbers have prefix \(11\) in bit positions 3 and 2, and for \(k=1\) we have \(A_1 = 0\) is less than \(B_1 = 1.\) Here is an example for the extreme case where \(k = n-1,\) i.e. the numbers have no common prefix:

In the other extreme case the common prefix covers the numbers entirely, and the numbers are equal, for instance:

The lexicographical order suggests combinational circuits for
magnitude comparisons, with a chain structure similar to the
*ripple-carry adder*. In the following, we
discuss three different comparator designs at the logic level. As for
the RCA designs, alternative circuits can be compared in terms of
delay by means of the *method of logical effort*.

#### Downward Comparator Chain¶

The downward comparator chain implements the lexicographical
comparison discussed above, starting at the *msb* down to the *lsb*.
We choose to implement a less-than-or-equal comparator for two
\(n\)-bit unsigned binary numbers \(A\) and \(B\):

A greater-than-or-equal comparator results in similar logic, whereras a less-than or a greater-than comparator can be built with less logic.

We consider bit position \(i,\) with the goal to design a comparator circuit for bit position \(i\) as a function of inputs \(A_i\) and \(B_i,\) and the comparison results of the next more significant position \(i+1.\) Our comparator generates two outputs. Output \(E_{out} = E_i\) shall be 1 if the prefix of \(A\) equals the prefix of \(B\) up to and including position \(i.\) Output \(L_{out} = L_i\) shall be 1 if the prefix of \(A\) is less than the prefix of \(B\) up to and including position \(i.\) The comparator receives \(E_{in} = E_{i+1}\) and \(L_{in} = L_{i+1}\) as inputs from the comparator in position \(i+1.\) Next, we derive a compact truth table for the comparator.

\(E_{in}\) \(L_{in}\) \(A_i\) \(B_i\) \(E_{out}\) \(L_{out}\) 1 0 0 0 1 0 1 0 0 1 0 1 1 0 1 0 0 0 1 0 1 1 1 0 0 0 \(\ast\) \(\ast\) 0 0 0 1 \(\ast\) \(\ast\) 0 1

The four top rows of the truth table cover the case where the prefix of \(A\) equals the prefix of \(B\) up to and excluding position \(i.\) Then, the outputs depend on the values of \(A_i\) and \(B_i.\) If \(A_i = B_i,\) the common prefix extends into position \(i,\) and we output \(E_{out} = 1\) and \(L_{out} = 0.\) If \(A_i = 0\) and \(B_i = 1,\) we have \(A_i < B_i,\) and output \(E_{out} = 0\) and \(L_{out} = 1.\) Otherwise, if \(A_i = 1\) and \(B_i = 0,\) we have \(A_i > B_i,\) resulting in \(A > B.\) Therefore, we output \(E_{out} = 0\) and \(L_{out} = 0.\)

Row five of the truth table covers the case where the decision that \(A > B\) is made in one of the more significant bit positions. When the comparator in position \(i\) receives inputs \(E_{in} = L_{in} = 0,\) then \(A > B\) independent of the values of \(A_i\) and \(B_i.\) Therefore, we output \(E_{out} = L_{out} = 0.\) The bottom row of the truth table covers the case where the decision that \(A < B\) is made in one of the more significant bit positions. In this case, input \(E_{in} = 0\) and \(L_{in} = 1.\) Since \(A < B\) independent of the values of \(A_i\) and \(B_i,\) we output \(E_{out} = 0\) and \(L_{out} = 1.\) The compact truth table does not include the case where \(E_{in} = L_{in} = 1,\) because the equality and less-than orderings are exclusive, and cannot occur simultaneously.

The corresponding K-maps below enable us to derive Boolean expressions for the outputs \(E_{out}\) and \(L_{out}\):

Given the comparator module for bit position \(i,\) we assemble an \(n\)-bit comparator by composing a chain of these modules. Figure 5.50 shows a 4-bit comparator as an example. We drive a logical 1 into input \(E_n\) and logical 0 into \(L_n\) assuming that an imaginary prefix of leading zeros is equal for both numbers. Output \(Y\) of the comparator shall be 1 if numbers \(A\) and \(B\) are less than or equal. Consequently, we include an OR gate with inputs \(E_0\) and \(L_0\) to compute \(Y.\)

The delay of an \(n\)-bit comparator with a downward chain is proportional to \(n.\) The proportionality constant depends on the specific circuit design for \(E_{out}\) and \(L_{out}.\) Figure 5.51 shows a logic design of the comparator module emphasizing the logic on the critical chain path.

This downward chain has one AND gate on the \(E\)-path and one compound gate on the \(L\)-path. Since \(E_{out}\) drives both gates of the next comparator module, signal \(E_{out}\) has a larger load than \(L_{out}.\) These circuit features deserve our attention when minimizing the propagation delay.

#### Upward Comparator Chain¶

In search for a faster comparator circuit, we may consider reversing
the direction of the chain such that the signals propagate upwards
from *lsb* to *msb*, as in a *ripple-carry chain*. This change requires a redesign of the
comparator logic. We briefly show that this redesign is well worth
the effort for the less-than-or-equal comparison.

As before, we consider bit position \(i,\) now assuming that the chain propagates information from less significant bit position \(i-1\) to position \(i.\) Thus, we compare the suffix of number \(A\) with the suffix of number \(B\) at position \(i\) using the information about the suffix of position 0 up to and including position \(i-1.\) Introduce a new signal \(LE_i,\) and let \(LE_i\) be 1 if the suffix of \(A\) up to including position \(i\) is less than or equal to the corresponding suffix of \(B.\) Then, the lexicographical order enables us to argue that \(LE_i = 1\) if

This argument implies that we need just one wire to carry the information of \(LE_{i-1}\) from the comparator in position \(i-1\) to the comparator in position \(i.\) Furthermore, we can formalize the argument directly into a Boolean expression:

Figure 5.52 shows a 4-bit comparator using a single wire between the comparators to propagate \(LE_i\) from position \(i\) to position \(i+1.\) In an \(n\)-bit comparator, output \(Y = LE_{n-1}\) and input \(LE_{-1} = 1.\)

One advantage of the comparator with an upward chain compared to the downward chain is that it saves wires. Whether the upward chain is faster than the downward chain is a matter of concrete circuit design, and is left as exercise.

#### Radix-4 Comparator¶

A common trick for speeding up arithmetic circuits is the grouping of bits with the goal to reduce the length of a chain at the expense of more complex group logic. In case of a comparator, we may group pairs of consecutive bits, essentially interpreting the numbers as base-4 or radix-4 numbers. Higher radix groups are possible too. Bit triples, for example, may be viewed as octal radix-8 digits and bit quadruples as hexadecimal radix-16 digits. We illustrate the idea by means of a radix-4 design of the less-than-or-equal comparator with a downward chain.

Recall our design of the 4-bit comparator in Figure 5.50, which we now view as a radix-2 comparator chain. We design the radix-2 comparator for bit position \(i,\) and replicate the radix-2 comparator four times. In a radix-4 design, we consider pairs of bits and design a comparator for 2-bit digits. We win if we can design a circuit for the radix-4 comparator such that its chain delay is less than two times the chain delay of the original radix-2 comparator, because the radix-4 chain requires only half as many comparators than the radix-2 chain. Figure 5.53 shows the block diagram of the radix-4 comparator for two 4-bit numbers.

The logic of the radix-4 comparator requires expressing \(E_{out}\) and \(L_{out}\) as a functions of \(A_{2i+1},\) \(A_{2i},\) \(B_{2i+1},\) \(B_{2i},\) \(E_{in},\) and \(L_{in}\):

The corresponding circuit diagram in Figure 5.54 arranges the gates such that the critical chain path has the same logic as for the radix-2 design in Figure 5.51. We conclude that the chain delay of our radix-4 comparator module equals the chain delay of the radix-2 comparator. Therefore, for large \(n,\) we expect the \(n\)-bit radix-4 comparator to be approximately twice as fast as the radix-2 comparator.

The \(n\)-bit comparator also permits designs with a radix larger than 4. Every doubling of the radix halves the number of modules on the chain, and reduces the delay by a factor of two. When the number of modules becomes so small that the delay of the comparator logic in the most significant comparator module cannot be considered negligible w.r.t. the chain delay any longer, the radix trick has reached its point of diminishing returns. Thus, for any given number of bits \(n,\) there exists a particular radix or group size that minimizes the comparator delay. The method of logical effort enables us to determine the best group size swiftly.

Use binary paper-and-pencil arithmetic to compute

- \(37_{10} - 44_{10}\) with 8-bit binary numbers,
- \(11_{10} \cdot 13_{10}\) with 4-bit binary operands.

We know from basic algebra that subtraction \(x - y\) is equal to addition \(x + (-y)\) of negative \(y.\) Therefore, our plan is to use 2’s complement format to represent \(x\) and \(y\) as signed binary numbers, negate \(y,\) and use paper-and-pencil addition. Using signed binary numbers, we need sufficiently many bits so as to include the sign bit. In this exercise, we are given the number of bits as 8.

We begin by converting \(37_{10}\) and \(44_{10}\) from decimal to unsigned binary:

\[\begin{eqnarray*} 37_{10} &=& 1 \cdot 32 + 0 \cdot 16 + 0 \cdot 8 + 1 \cdot 4 + 0 \cdot 2 + 1 \cdot 1 \\ &=& 100101_2 \\ 44_{10} &=& 1 \cdot 32 + 0 \cdot 16 + 1 \cdot 8 + 1 \cdot 4 + 0 \cdot 2 + 0 \cdot 1 \\ &=& 101100_2 \end{eqnarray*}\]Note that both unsigned binary numbers occupy 6 bits. Before negating \(44_{10}\) by forming the 2’s complement, we zero extend the binary number to 8 bits. Thus, we include one more bit than needed for a sign bit.

\[\begin{eqnarray*} && 0010\,1100 \\ \text{1's complement:} && 1101\,0011 \\ \text{add 1:} && \phantom{0000\,000}1 \\ \text{2's complement:} && 1101\,0100 \end{eqnarray*}\]We find that \(-44_{10} = 1101\,0100_2\) in 8-bit 2’s complement format. Next we perform the addition of the 8-bit signed binary numbers:

\[\begin{eqnarray*} 37_{10} + (-44_{10}) &=& \phantom{+}\ 0010\,0101 \\ && +\ 1101\,0100 \\ &=& \phantom{+}\ 1111\,1001_2 \end{eqnarray*}\]We verify the difference by converting the signed binary number into decimal, and check whether it equals the expected difference \(37_{10} - 44_{10} = -7_{10}.\) Since the sign bit (

*msb*) of our binary sum is 1, the number is negative, and we form the 2’s complement to determine its magnitude:\[\begin{eqnarray*} && 1111\,1001 \\ \text{1's complement:} && 0000\,0110 \\ \text{add 1:} && \phantom{0000\,000}1 \\ \text{2's complement:} && 0000\,0111\ =\ 7_{10} \end{eqnarray*}\]We conclude that the result of our paper-and-pencil subtraction is \(-7_{10},\) which equals the expected result by decimal subtraction.

We multiply two binary numbers with paper-and-pencil just as we multiply two decimal numbers: form the partial products and add. Recall the paper-and-pencil multiplication with decimal arithmetic:

Multiply multiplicand \(11\) with each digit of multiplier \(13,\) and right align the partial products with the multiplier digits. Then, add the partial products.

For paper-and-pencil binary multiplication, we convert the decimal numbers into unsigned binary format:

\[\begin{eqnarray*} 11_{10} &=& 1011_2 \\ 13_{10} &=& 1101_2\,. \end{eqnarray*}\]Both numbers are 4-bit numbers, as the problem requests. The binary multiplication chart is:

To check the result we convert the unsigned binary product into decimal format:

\[\begin{eqnarray*} 1000\,1111_2 &=& 2^7 + 2^3 + 2^2 + 2^1 + 2^0 \\ &=& 143_{10}\,, \end{eqnarray*}\]which matches the expected product of the decimal multiplication.

The paper-and-pencil method for substraction with borrowing is
admired in the US for its simplicity, where it is known as
**Austrian subtraction**:

Subtraction \(294_{10} - 154_{10}\) does not require any borrowing, whereas \(8205_{10} - 4696_{10}\) does. If the difference in a digit position is negative, such as \(5 - 6\) in the least significant digit, we borrow \(10\) from the left, and remember the fact by writing borrow digit \(1\) between the digits. Now, we add the borrowed 10 to the minuend digit and subtract again, here \((10 + 5) - 6 = 9.\) Then, in the subtraction of the next position to the left, we make up for the borrowed \(10\) by adding borrow digit \(1\) to the subtrahend, here forming subtraction \(0 - (9 + 1).\) Since the difference is negative, we borrow another \(10\) from the left, and so on.

Apply Austrian subtraction to unsigned binary numbers:

- \(101_2 - 100_2,\)
- \(10010_2 - 01011_2.\)

We apply the Austrian subtraction to unsigned binary numbers \(101_2 - 100_2.\)

The paper-and-pencil chart is shown on the right. We begin with the least significant bits: \(1 - 0 = 1\) yields a positive 1. The next position \(0 - 0 = 0\) yields a 0, and the most significant position \(1 - 1 = 0\) yields a 0 as well. Since none of the differences is negative, no borrowing is required. We check the result by converting the operands to decimal, and compare the decimal difference with the binary result. We find the operands \(101_2 = 5_{10}\) and \(100_2 = 4.\) The difference computed in decimal is \(5_{10} - 4_{10} = 1_{10}.\) This matches our binary result \(001_2 = 1_{10},\) indeed.

We apply the Austrian subtraction to unsigned binary numbers \(10010_2 - 01011_2.\)

This example requires borrowing. The difference of the least significant bits \(0 - 1 = -1\) is negative. Hence, we borrow a \(2\) from the next position, write borrow \(1\) between the bits, and redo the subtraction adding the borrowed \(2\) to the minuend: \((2 + 0) - 1 = 1.\) Note that we do the arithmetic in decimal for convenience, although the result is a binary digit. When designing a digital subtractor circuit, we would implement the subtraction in binary format using two bits, and place the borrow bit in the second (

*msb*) position of the minuend: \((10_2 + 0) - 1 = 1.\)In the second position, we make up for the borrowed \(2\) by adding borrow bit \(1\) to the subtrahend. The resulting subtraction becomes: \(1 - (1+1) = -1.\) Since the result is negative, we borrow another \(2\) from the third position, and redo the subtraction: \((2+1) - (1+1) = 1\) yields bit \(1\) in the second position.

In the third position, we include the borrowed \(2\) in the subtrahend: \(0 - (0 + 1) = -1.\) This negative result requires borrowing yet another \(2\) from the fourth position. We redo the subtraction adding the borrowed \(2\) to the minuend: \((2 + 0) - (0 + 1) = 1\) yields bit \(1\) in the third position. We need another borrow bit for the fourth position, whereas the fifth position does not.

We check the result using decimal arithmetic. The operands are \(10010_2 = 18_{10}\) and \(01011_2 = 11_{10}.\) The difference is \(18_{10}-11_{10}=7_{10},\) which is equal to the binary difference \(111_2.\)

An \(n\)-bit comparator computes \(A < B\) of two \(n\)-bit numbers \(A\) and \(B\) by subtracting \(A-B\) and outputting the sign bit.

Implement these magnitude comparisons using the \(n\)-bit comparator circuit:

- \(A > B,\)
- \(A \ge B,\)
- \(A \le B.\)

Since \(A > B = B < A,\) we implement the greater-than comparison with a less-than comparator by exchanging the inputs.

Note that \(A \ge B = \overline{A < B}.\) Therefore, we implement the greater-than-or-equal comparison with a less-than comparator and an inverter.

Since \(A \le B = \overline{A > B},\) we implement the less-than-or-equal comparison using the less-than comparator with swapped inputs and an inverter to complement the output.

We study the connection between bit counting and addition.

- We wish to count the number of 1-bits in a 2-bit binary number \(A = a_1 a_0,\) and output the number of 1-bits as an unsigned binary number. Design a 1-bit counter using half adders as building blocks.
- We wish to count the number of 1-bits in a 3-bit binary number \(A = a_2 a_1 a_0,\) and output the number of 1-bits as an unsigned binary number. Design a 1-bit counter using full adders as building blocks.
- Design a full adder using half adders as building blocks.

We begin by formalizing the functionality of the 1-bit counter using a truth table. We are given 2-bit input \(A = a_1 a_0,\) and wish to count the number of bits with value 1. Our truth table lists the four input combinations, and for each input combination we count the number of 1-bits. We find that the number of 1-bits is in range \([0,2].\)

\(a_1\) \(a_0\) \(\text{# 1-bits}\) \(y_1\) \(y_0\) 0 0 0 0 0 0 1 1 0 1 1 0 1 0 1 1 1 2 1 0 We wish to output the number of 1-bits as an unsigned binary number. Since the range of the number of 1-bits is \([0,2],\) we need two bits to encode the output. Denote output \(Y = y_1 y_0,\) then our 1-bit counter has input \(A\) and output \(Y\) as shown in the black box diagram on the right. We include columns for the binary encoding of the number of 1-bits, \(y_1\) and \(y_0,\) in the truth table.

Now, notice that our truth table specifies a

*half adder*if we interpret \(y_1\) as carry and \(y_0\) as sum. Therefore, we can implement the 1-bit counter simply with a half adder. Even more insightful is the observation that a 1-bit counter for 2-bit number \(A = a_1 a_0\) is equivalent to an adder of two 1-bit numbers, \(a_0 + a_1.\)We extend our insight that a 1-bit counter for 2-bit numbers is an adder for two 1-bit numbers to 3-bit numbers. We hypothesize that a 1-bit counter for 3-bit numbers is an adder of three 1-bit numbers. Since three bits can have a minimum of zero 1-bits and a maximum of three 1-bits, the number of 1-bits must be in range \([0,3],\) which we can encode with two bits. Thus, our 1-bit counter module must have the black box specification shown on the right.

To specify the 1-bit counter function, we derive a truth table with three input bits \(A = a_2 a_1 a_0,\) the number of 1-bits in decimal and in binary representation \(Y = y_1 y_0.\)

\(a_2\) \(a_1\) \(a_0\) \(\text{# 1-bits}\) \(y_1\) \(y_0\) 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1 0 1 0 1 1 2 1 0 1 0 0 1 0 1 1 0 1 2 1 0 1 1 0 2 1 0 1 1 1 3 1 1 Compare this truth table with that of the

*full adder*. If we interpret \(y_1\) as carry-out and \(y_0\) as sum bit, our 1-bit counter for 3-bit input \(A\) and the full adder truth tables specify the same function. Therefore, we can implement the 1-bit counter for three inputs using nothing but a full adder. Furthermore, we conclude that our hypothesis is true, i.e. a 1-bit counter for 3-bit number \(A = a_2 a_1 a_0\) is equivalent to an adder for three 1-bit numbers, \(a_0 + a_1 + a_2.\) The corresponding paper-and-pencil addition is illustrated on the right.We design a full adder from half adders. From the perspective of bit counting, a full adder counts the number of 1-bits in three inputs, whereas a half adder counts the number of 1-bits in two inputs. Thus, we should be able two use two half adders to count the 1-bits in two of the three inputs, and combine the result with the 1-bit of the third input. From the perspective of adding three 1-bit numbers, we may view this composition as a parenthesization: \(a_0 + a_1 + a_2 = (a_0 + (a_1 + a_2)).\) Circuit (a) below uses one half adder to count the 1-bits in 2-bit number \(a_2 a_1,\) and the second half adder to count the 1-bits in 2-bit number \(s_0 a_0.\) This circuit is incomplete, because it does not incorporate the carry outputs of the half adders. Nevertheless, it counts already, if the number of 1-bits is in range \([0,1]\) which we can represent in a single bit \(y_0.\) If all bits are 0, i.e. \(a_0 = a_1 = a_2 = 0,\) then \(s_0 = 0\) and \(s_1 = y_0 = 0.\) If \(a_0 = 1\) and \(a_1 = a_2 = 0,\) then \(s_0 = 0\) and \(s_1 = y_0 = 1\) as expected. Circuit (a) handles two more cases: if \(a_0 = 0\) and one of \(a_1\) or \(a_2\) equals 1. Then, \(s_0 = 1\) and \(s_1 = y_0 = 1.\)

Circuit (a) does not count two or three 1-bits. For example, if \(a_0 = 0\) and \(a_1 = a_2 = 1,\) then \(s_0 = s_1 = y_0 = 0.\) We need to incorporate output \(c_0,\) because it carries the information that both \(a_1\) and \(a_2\) are 1. Since value 2 is the weight of the second bit position in a binary number, carry output \(c_0\) contributes to bit \(y_1\) of binary count \(Y = y_1 y_0.\) Also, if \(s_0 = 1\) because one of \(a_1\) or \(a_2\) is 1, and \(a_0 = 1,\) then the 1-bit count is two and \(c_1 = 1.\) Hence, carry output \(c_1\) should contribute to \(y_1\) as well. Circuit (b) shows a solution that uses an OR gate to combine carry outputs \(c_0\) and \(c_1\) into output \(y_1.\) This circuit works, because \(c_0 = 1\) if both \(a_1\) and \(a_2\) are 1, i.e. the 1-bit count is 2, then \(y_1 = 1\) independent of \(a_0.\) Furthermore, if one of \(a_1\) or \(a_2\) is 1, i.e. their 1-bit count is 1, and \(a_0 = 1,\) then \(y_1 = 1.\) For all other input combinations \(y_1 = 0\) because their 1-bit count is less than two.

Circuit (c) replaces the OR gate of circuit (b) with a third half adder to produce the requested implementation of a full adder based on half adders. Carry output \(c_2\) remains unused. The sum of the

*half adder*is the XOR rather than the OR of \(c_0\) and \(c_1,\) however. Using an XOR instead of the OR gate in circuit (b) does not affect the 1-bit count, if we notice that input combination \(c_0 = c_1 = 1\) cannot occur, because it would require four 1-bits in three inputs.Circuits (b) and (c) are both implementations of the

*full adder*. The abstract perspective of bit counting serves as an aid for the design of these circuits. However, for a rigorous proof of equivalence, we resort to a perfect induction or Boolean algebra.

We investigate a modification of the ripple carry adder in Figure 5.41 with an inverting carry chain:

- Design a CMOS inverting majority gate with inputs \(A,\) \(B,\) \(C,\) and output \(Y\):

\[Y = \overline{A B + A C + B C}\,.\]

Estimate the delay of an \(n\)-bit RCA with inverting carry chain. The circuit fragment below shows two stages of the inverting carry chain.

For comparison, the RCA delay of the

*non-inverting carry chain with a NAND-NAND mapping of the majority gate*is \(D_{2\text{-}level}(n) \approx 10.58 n\) and for the*fast carry chain with carry-propagation gates*\(D_{cp}(n) \approx 7 n.\)

The inverting majority gate is discussed in

*Section Majority Gate*. In the symmetric majority gate all inputs have logical effort \(g_{Msym} = 4\) and the parasitic delay is \(p_{Msym} = 6\) capacitive units. The asymmetric majority gate has one fast input with logical effort \(g_{Masym} = 2\) and the parasitic delay is \(p_{Masym} = 4.\)In an \(n\)-bit RCA each bit position contributes a single stage, one inverting majority gate, to the carry chain. To determine the delay of one stage in the chain, we consider bit position \(i.\) The load of carry output \(C_{out}\) in bit position \(i\) is the input capacitance of bit position \(i+1.\) Assuming that all gates at the input of a stage are minimum sized matched gates, the load capacitance is the sum of the input capacitances of the inverters in the 2-fork and of the majority gate:

\[C_L = 2 C_{inv} + C_{in}(M)\,.\]If we use a symmetric majority gate, we have input capacitance \(C_{in}(M) = C_{in}(M_{sym}) = 12,\) and if we use the fast input of the asymmetric majority gate for the carry input, then the input capacitance is \(C_{in}(M_{asym}) = 6.\) Thus, the load capacitance using the symmetric majority gate is \(C_{Lsym} = 2 \cdot 3 + 12 = 18\) and with the asymmetric majority gate \(C_{Lasym} = 2 \cdot 3 + 6 = 12\) capacitive units. The delay of the symmetric inverting majority gate in the carry chain is

\[d_{Msym} = g_{Msym} \frac{C_{Lsym}}{C_{in}(M_{sym})} + p_{Msym} = 4 \frac{18}{12} + 6 = 12\]time units, and the delay of the asymmetric inverting majority gate only

\[d_{Masym} = g_{Masym} \frac{C_{Lasym}}{C_{in}(M_{asym})} + p_{Masym} = 2 \frac{12}{6} + 4 = 8\]time units. For large \(n,\) the propagation delay of the RCA is approximately the delay of the carry chain, here

\[D_{Msym}(n) \approx 12\,n\,,\qquad D_{Masym}(n) \approx 8\,n\]for our two choices of symmetric and asymmetric inverting majority gates. We conclude that the asymmetric inverting majority gate provides a competitive alternative for an RCA design that is almost as fast as the RCA with carry-propagation gates and a delay of \(D_{cp}(n) \approx 7 n.\)

## 5.6. Timing Analysis¶

A typical task for combinational circuit designers is to derive a
circuit with minimum delay for a given functional specification. We
accomplish this goal by means of the *method of logical effort*, because it enables us to analyze the delay of a
circuit, and to understand and fix potential shortcomings within one
or more design iterations. However, the timing behavior of a
combinational circuit is usually more complex than the single delay
number produced by the method of logical effort suggests. In the
following, we characterize the basic effects that cause combinational
circuits to exhibit a rather complex timing behavior.

### 5.6.1. Timing Diagrams¶

Consider the buffer circuit in Figure 5.55. Assume that both inverters have the same size, and the input capacitances are equal to the load capacitance. Since the electrical efforts of the inverters are equal, they have equal delays. If you build such a circuit, and measure the voltages of signal \(A,\) \(B,\) and \(Y\) with an oscilloscope, you may obtain an analog timing diagram, as shown on the left in Figure 5.55. The voltages transition between 0 and \(V_{DD}\) within a finite amount of time. If we could apply a step function as input signal \(A,\) the transitions would have the exponential response of Figure 1.38. However, the step function is merely a convenient idealization. In reality, signals cannot transition in zero time. Therefore, analog signals resemble the shape of an exponential step response only.

The finite slope of the analog voltage transitions complicates a
measurement of the buffer delay. It forces us to pick a particular
point in time for a transition. A convenient point in time is where
the voltage equals 50% of the maximum voltage, here \(V_{DD}/2.\)
We have marked these transition times in Figure 5.55 as \(t_0, t_1, \ldots, t_5.\) In our ideal model for the
exponential step response of Figure 1.38, the 50% crossing
occurs where \(e^{-t/RC} = 1/2,\) or \(t = RC \ln 2 =
0.69\,RC.\) The propagation delay of the buffer is the time difference
between the 50% crossing of the input transition and the corresponding
50% crossing of the output transition. In Figure 5.55, we find propagation delay \(t_{pd}(buf) = t_2 - t_0\)
for the buffer. If the rising and falling transitions are symmetric,
then the propagation delay of the buffer is also equal to
\(t_{pd}(buf) = t_5 - t_3.\) We can measure these delays, and
determine the technology specific time constant \(\tau\) of the
*model of logical effort* experimentally.

An immediate consequence of characterizing all transitions by their
50% crossing points is that the corresponding gate delays add up to
path delays. For example, buffer delay \(t_{pd}(buf) = t_2 -
t_0\) in Figure 5.55 is the propagation delay of
the stage-1 inverter \(t_{pd}(inv_1) = t_1 - t_0\) plus the
propagation delay of the stage-2 inverter \(t_{pd}(inv_2) = t_2 -
t_1.\) The gate delays of a path form a *telescoping sum*.

The *digital abstraction* ignores the details of the analog voltages,
in particular the actual voltage value of \(V_{DD}\) and the
finite slope of transitions. Instead, we approximate the analog
signals with digital step transitions between Boolean values 0 and 1.
As shown in Figure 5.55 on the right, we assume
that the ideal transitions occur at the 50% crossings of the actual
transitions. Then, all gate and path delays are reflected properly in
the digital timing diagram.

If we are interested primarily in the signal delays of a digital circuit, but not in their concrete Boolean values, then we draw the digital timing diagram as shown in Figure 5.56. This variant of a timing diagram displays both complemented and uncomplemented signal values. Transitions occur at their crossing points.

### 5.6.2. Path Delays¶

An inverter with a given size and load has a particular propagation
delay from input to output. Likewise, symmetric CMOS gates with
multiple inputs have equal propagation delays from each input to the
output. In contrast, *asymmetric gates* do not
have equal propagation delays from each input to the output, and
neither do most circuits with multiple inputs and *multiple
stages*. As a concrete example, consider
4-variable function \(Y = \overline{A}\,C + \overline{B}\,C +
\overline{D},\) that we can implement as a 3-stage NAND chain, shown in
Figure 5.57.

Using the method of logical effort, we can minimize the delay of the critical path from inputs \(A\) or \(B\) to output \(Y.\) Assuming that the stage-1 NAND gate has input capacitance \(C_{in} = 4,\) the path electrical effort is \(H = C_L/C_{in} = 64.\) With path logical effort \(G = (4/3)^3\) and branching effort \(B=1,\) the path effort is \(F = (16/3)^3.\) We obtain a minimum path delay of \(\hat{D} = 3 F^{1/3} + 3 p_{nand2} = 22\) time units if each stage bears effort \(\hat{f} = F^{1/3} = 16/3.\) Then, each NAND gate incurs a delay of \(\hat{d} = \hat{f} + p_{nand2} = 22/3\) time units. If we account for the technology specific time constant \(\tau,\) we obtain propagation delay \(t_{pd}(nand2) = \tau \hat{d}.\)

Minimum path delay \(\hat{D}\) of the circuit is a worst-case delay that minimizes the delay of the longest, the critical path of the circuit. Actually, the circuit in Figure 5.57 has two critical paths with delay \(t_{A\rightarrow Y} = t_{B\rightarrow Y} = \hat{D} \tau = 3 \tau \hat{d}.\) Other paths of the circuit are shorter, and have smaller delays than \(\hat{D}.\) The path from input \(C\) to \(Y\) traverses two NAND gates, and has a delay of \(t_{C\rightarrow Y} = 2 \tau \hat{d}.\) The path from input \(D\) to \(Y\) is even shorter. It traverses only one NAND gate with a delay of \(t_{D\rightarrow Y} = \tau \hat{d}.\) In general, a circuit has one or more shortest paths with the smallest delay, one or more longest paths with the largest delay, and all other paths have delays between the smallest and largest delays.

We characterize the timing behavior of a combinational circuit with the smallest and largest delays.

The

contamination delay\(t_{cd}\) is the minimum delay from any input to an output of a circuit.The

propagation delay\(t_{pd}\) is the maximum delay from any input to an output of a circuit.

The propagation delay is the worst-case delay that we minimize by means of the method of logical effort. Together the contamination and propagation delays bound the delay from any input to an output of a circuit. For example, the circuit in Figure 5.57 has \(t_{cd} = \tau \hat{d}\) and \(t_{pd} = 3 \tau \hat{d}.\) The delay of path \(C \rightarrow Y\) lies within this range, \(t_{cd} < 2 \tau \hat{d} < t_{pd}.\)

To observe a path delay in a circuit, we *stimulate* the path by means
of an input transition at time \(t_0\) and measure time
\(t_1\) when the output transition occurs. Then, the path delay
is the difference \(t_1 - t_0.\) Figure 5.58
illustrates the path delays in the timing diagram for the critical
path and the shortest path of the 3-stage NAND chain. We define an
initial state \(Y(A,B,C,D) = Y(0,1,1,1) = 1,\) apply the inputs
and assume that the circuit has enough time, i.e. more than its
propagation delay, to stabilize output \(Y = 1,\) see
Figure 5.58(a).

Stimulating the critical path of the NAND chain requires a transition on one of the inputs \(A\) or \(B.\) If we change input \(B\) from 1 to 0, then output \(W\) remains unchanged. This transition does not trigger a change at output \(Y.\) However, changing \(A\) from 0 to 1 causes \(W\) to transition from 1 to 0, which causes \(X\) to transition from 0 to 1, which in turn causes output \(Y\) to transition from 1 to 0. The sequence of transitions is shown in Figure 5.58(b) and the timing diagram. We stimulate the transitition of \(A\) at time \(t_0.\) After a delay of \(\tau \hat{d},\) i.e. at time \(t_1 = t_0 + \tau \hat{d},\) node \(W\) transitions from 1 to 0. Output \(Y\) transitions after the propagation delay of the circuit at time \(t_3 = t_0 + t_{pd} = t_0 + 3 \tau \hat{d}.\)

Figure 5.58(c) illustrates the transitions that enables us to observe the contamination delay of the 3-stage NAND chain. We stimulate the shortest path by changing input \(D\) from 1 to 0 at time \(t_4 > t_3.\) This causes output \(Y\) to transition from 0 to 1 after the delay of the stage-3 NAND gate, \(\tau \hat{d}.\) Therefore \(t_5 - t_4 = \tau \hat{d},\) which equals contamination delay \(t_{cd}\) of the circuit.

When we design larger circuits, we modularize smaller circuits. We draw the module as a black box and provide the functional and timing specifications. For example, to modularize the 3-stage NAND circuit, we may define a 4-bit input bus \(A,\) as shown in Figure 5.59 on the left, and specify its function \(Y = \overline{A}_0 A_2 + \overline{A}_1 A_2 + \overline{A}_3.\) Furthermore, we summarize the timing behavior by specifying the contamination delay and propagation delay of the circuit. The graphical version of the timing specification is shown in form of a timing diagram in Figure 5.59 on the right. A transition of input bus \(A\) causes output \(Y\) to transition after a delay in range \([t_{cd}, t_{pd}].\) We mark the signal of \(Y\) in this time range with a zig-zag pattern, indicating that the actual value is unknown or unstable during this period of time. The essential information provided by this diagram is (1) the output does not change until delay \(t_{cd}\) after an input transition, and (2) the output is stable beyond delay \(t_{pd}\) after an input transition.

### 5.6.3. Algorithmic Timing Analysis¶

Assume we have designed a combinational circuit and know the delay of each gate and the contamination and propagation delays of all subcircuits. We wish to determine the timing specification of the combinational circuit, i.e. its contamination delay and its propagation delay.

The following algorithm determines the propagation delay of an acyclic combinational circuit.

Algorithm (Propagation Delay)

Initialize the arrival times of all terminal inputs with 0.

For each circuit element, if the arrival times \(t_a(A_i)\) of all inputs \(A_i\) are determined, set the output times of all outputs \(Y_j\) to

\[\max_i(t_a(A_i)) + t_{pd}(Y_j)\,.\]The maximum output time of all terminal outputs is the propagation delay of the circuit.

We illustrate the algorithm by means of the 3-stage NAND circuit in Figure 5.57. We assume that each NAND gate has a delay of \(\tau \hat{d} = 50\,\mathit{ps}.\) Since the NAND gate is symmetric, all paths from each input to the output have equal delay. Therefore, the contamination delay of the NAND gate equals its propagation delay, such that \(t_{cd}(nand2) = t_{pd}(nand2) = 50\,\mathit{ps}.\) We initialize the arrival time of terminal inputs \(t_a(A) = t_a(B) = t_a(C) = t_a(D) = 0\,\mathit{ps}.\) Figure 5.60(a) shows the annotated arrival times.

Next, according to step 2 of the algorithm, we update the output times of those NAND gates for which the arrival times of all inputs are known. This is the case for the stage-1 NAND gate only. Its input arrival times are \(t_a(A) = t_a(B) = 0\,\mathit{ps}.\) Therefore, the output time is

as shown in Figure 5.60(b). Since output \(W\) of the stage-1 NAND gate is the input of the stage-2 NAND gate, we know the arrival times of all inputs of the stage-2 NAND gate, \(t_a(W) = 50\,\mathit{ps}\) and \(t_a(C) = 0\,\mathit{ps}.\) Therefore, we can update its output time to

as shown in Figure 5.60(c). Now, we know the arrival time of nodes \(X\) and \(D,\) so that we can update the output time of the stage-3 NAND gate to

Output \(Y\) of the stage-3 NAND gate is the only terminal output of the circuit. Therefore, step 3 of the algorithm is trivial. The propagation delay of the circuit is the arrival time at output \(Y,\) that is \(t_{pd}(Y) = 150\,\mathit{ps}.\)

The algorithm can be adapted in a straightforward manner to determine the contamination delay of an acyclic combinational circuit. In steps 2 and 3 substitute contamination delay for propagation delay and minimum for maximum:

Algorithm (Contamination Delay)

Initialize the arrival times of all terminal inputs with 0.

For each circuit element, if the arrival times \(t_a(A_i)\) of all inputs \(A_i\) are determined, set the output times of all outputs \(Y_j\) to

\[\min_i(t_a(A_i)) + t_{cd}(Y_j)\,.\]The minimum output time of all terminal outputs is the contamination delay of the circuit.

Figure 5.61 illustrates the algorithm for the 3-stage NAND chain. We assume a contamination delay for each NAND gate of \(t_{cd}(nand2) = 50\,\mathit{ps}.\) The output time of the stage-1 NAND gate is \(\min(0,0)+t_{cd}(nand2) = 50\,\mathit{ps}.\) Analogously, the output time of the stage-2 NAND gate is \(\min(50,0)+t_{cd}(nand2) = 50\,\mathit{ps}\) and of the stage-3 NAND gate \(\min(50,0)+t_{cd}(nand2) = 50\,\mathit{ps}.\) The contamination delay of the circuit is the arrival time at output \(Y,\) or \(t_{cd}(Y) = 50\,\mathit{ps}.\)

We conclude that the 3-stage NAND chain has a contamination delay of \(50\,\mathit{ps}\) and a propagation delay of \(150\,\mathit{ps}.\) These delays define the lower and upper delay bounds of the timing specification shown in Figure 5.59.

We determine the contamination and propagation delay of the 4-bit RCA with majority and parity gates of Figure 5.41, redrawn below.

The critical path of the RCA is the carry chain. We assume that the majority and parity gates have equal contamination and propagation delays. The normalized delays are given as \(t_{cd}(M) = t_{pd}(M) = 9\) time units for the majority gates and \(t_{cd}(P) = t_{pd}(P) = 29\) time units for all parity or XOR gates.

We determine the contamination and propagation delays algorithmically. The annotated circuit diagram below shows pairs of arrival times for the contamination delay as first component and propagation delay as second component at each node. Terminal inputs \(A_i,\) \(B_i,\) \(C_{in}\) are initialized with arrival time 0. Since the RCA is a multioutput circuit, step 3 requires finding the minimum and maximum arrival times of the terminal outputs \(S_i\) and \(C_{out}.\) The contamination delay is the minimum of the first components, \(t_{cd}(rca) = 9\) time units, and the propagation delay is the maximum of the second components, \(t_{pd}(rca) = 56\) time units. Thus, the shortest paths of the RCA stretch from inputs \(A_3\) or \(B_3\) to output \(C_{out}.\) The longest, critical paths start at \(A_0,\) \(B_0,\) or \(C_{in},\) and end at output \(S_3.\)

The timing diagram below shows the delays of all nodes of the RCA. The arrows record which gate input transition causes the output transition. If a circuit node has an arrival time for the contamination delay that is smaller than the arrival time for the propagation delay, the signal begins switching at the former and stabilizes at the latter delay. The signal is unstable in between. For example, for node \(C_2,\) our delay algorithms yield arrival time 9 for the contamination delay and 27 for the propagation delay. Correspondingly, in the timing diagram below, signal \(C_2\) becomes unstable at time 9 and becomes stable at time 27 again.

If we wish to use the RCA as a subcircuit, we summarize this complex timing behavior with the simpler delay bounds \(t_{cd} = 9\) and \(t_{pd} = 56\) time units.

Perform a timing analysis of the multilevel circuit

with these gate delays:

gate | delay |
---|---|

nand | \(20\,ps\) |

nor | \(25\,ps\) |

and | \(30\,ps\) |

or | \(35\,ps\) |

- Determine the propagation delay of the circuit.
- Determine the contamination delay of the circuit.

We apply the

*algorithm for propagation delay*to the circuit. We begin by assigning arrival time 0 to each of the inputs, as shown in step (a) below.In subsequent steps (b)-(e), we update the output times of those gates for which the arrival times of all inputs are known. The output time is the maximum of all input arrival times plus the gate delay. We find that the propagation delay of the circuit is \(t_{pd} = 110\,ps,\) i.e. the output stabelizes at most \(110\,ps\) after one of its inputs changes.

We apply the

*algorithm for contamination delay*to the circuit. In step (a) we assign arrival time 0 to each of the inputs.In steps (b)-(e), we update the output times of those gates for which the arrival times of all inputs are known. The output time is the minimum of all input arrival times plus the gate delay. We find that the circuit has a contamination delay of \(t_{cd} = 75\,ps,\) i.e. the output begins to change at earliest \(75\,ps\) after one of its inputs changes.

## 5.7. Hazards¶

The timing behavior of combinational circuits can be surprising at
times. For example, let’s assume we have designed a 2:1 multiplexer
and want to verify its timing behavior with an oscilloscope.
Figure 5.62 shows the multiplexer circuit. All gates shall
be symmetric, so that their contamination and propagation delays are
equal. The inverter has a delay of \(t(inv) = 2\) time units, the
AND gates have a delay of \(t(and) = 5\) time units, and the OR
gate has a delay of \(t(or) = 5\) time units as well. Recall the
functional behavior of a *multiplexer*: output
\(Y = D_0\) if \(S = 0,\) and \(Y = D_1\) if \(S =
1.\) The timing diagram in Figure 5.62 shows initially
\(D_0 = 1,\) \(D_1 = 0,\) \(S = 1,\) and \(Y = D_1 =
0\) as expected. At time \(t_0,\) we toggle \(D_1\) from 0 to
1, expecting output \(Y\) to follow. We observe that \(Y\)
transitions to 1 at time \(t_1.\) When we switch \(S\) from 1
to 0 at time \(t_2,\) we expect no change in output \(Y,\)
because both data inputs are 1. However, we observe that output
\(Y\) transitions to 0 at time \(t_3,\) and at time
\(t_4\) back to 1. This so-called **glitch** in signal \(Y\)
cannot be explained by the functional behavior of the multiplexer.

Understanding the cause of the glitch requires a *timing analysis* of the multiplexer circuit. There are four paths
from the inputs to output \(Y.\) Three of the paths have a delay
of \(t(and) + t(or) = 10\) time units: (1) path \(D_0
\rightarrow Y\) via node \(V,\) (2) path \(D_1 \rightarrow Y\)
via node \(W,\) and (3) path \(S \rightarrow W \rightarrow Y\)
via node \(W.\) The fourth path \(S \rightarrow \overline{S}
\rightarrow V \rightarrow Y\) traverses the inverter, and has a delay
of 12 time units. We conclude that the multiplexer has a
contamination delay of \(t_{cd} = 10\) time units and a
propagation delay of \(t_{pd} = 12\) time units.

Next, we derive the detailed timing diagram in Figure 5.63 for transition \(S: 1 \rightarrow 0\) at time \(t_2\) and constant data inputs \(D_0 = D_1 = 1.\) Our analysis covers all inner nodes of the multiplexer. We reset time such that the stimulating transition of \(S\) occurs at time \(t_2 = 0,\) and begin the analysis with output \(\overline{S}\) of the inverter. Signal \(\overline{S}\) transitions from 0 to 1 after the inverter delay of \(t(inv) = 2\) time units, i.e. at time \(t = 2.\) Output \(V\) of the upper AND gate transitions \(t(and)=5\) time units later from 0 to 1, i.e. at time \(t = 7.\) Since \(V=1\) forces the output of the OR gate to 1, independent of input \(W\) of the OR gate, the OR gate assumes value 1 after a delay of \(t(or) = 5\) time units, which is at time \(t = 12 = t_4.\) In the meantime, the transition of input \(S\) at time \(t_2\) forces output \(W\) of the lower AND gate to 0. This occurs after a delay of \(t(and) = 5\) time units at time \(t = 5.\) After this transition, both \(W\) and \(V\) are 0, forcing output \(Y\) of the OR gate to 0. Since the delay of the OR gate is \(t(or) = 5\) time units, the output transition to 0 occurs at time \(t = 10 = t_3.\)

Note that \(t_3\) marks the contamination delay of the multiplexer, \(t_{cd} = t_3 - t_2,\) and \(t_4\) marks the propagation delay \(t_{pd} = t_4 - t_2.\) The behavior of the output signal between \(t_{cd}\) and \(t_{pd}\) is consistent with our earlier discussion of the timing behavior of combinational circuits. The output signal becomes “unstable” after the contamination delay and is stable beyond the propapagation delay again. In case of the multiplexer, this instability exhibits itself in form of a glitch.

A glitch is a particular kind of **hazard**, where one input
transition causes two output transitions. The following four types of
hazards are common in combinational circuits. The glitch in
Figure 5.63 is a **static-1 hazard**, because the
output should stay static at value 1, but glitches temporarily to 0.
Analogously, a **static-0 hazard** occurs if the output should stay
static at value 0, but glitches temporarily to value 1 instead.
Whereas static-1 hazards can occur in AND-OR circuits, static-0
hazards can occur in OR-AND circuits, for example. Multilevel
circuits may have dynamic hazards, which incur more than two output
transitions in response to one input transition. A **dynamic-1
hazard** occurs if one input transition should cause an output
transition from 0 to 1, but instead causes the output to produce three
transitions, \(0 \rightarrow 1 \rightarrow 0 \rightarrow 1\).
Analogously, a **dynamic-0 hazard** has three transitions, \(1
\rightarrow 0 \rightarrow 1 \rightarrow 0,\) although a single output
transition from 1 to 0 would be the expected response to a single
input transition.

You shouldn’t be surprised to encounter glitches when analyzing a circuit with an oscilloscope. Furthermore, there is no reason to panic in face of a glitch. Combinational circuits may exhibit multiple transitions in the unstable time period between the contamination and propagation delays. As long as we give the circuit enough time to stabilize, a glitch is a harmless temporary behavior, despite the precarious aftertaste of the term hazard. Nevertheless, occasionally, glitch-free circuits are important, for example in edge-triggered sequential circuits where the clock signal must not exhibit any hazards for the circuit to function correctly.

We can detect and avoid static hazards in two-level circuits with the aid of K-maps. Figure 5.64 shows on the left the minimal cover that corresponds to the AND-OR multiplexer circuit in Figure 5.62. The glitch-causing input transition is marked with the red arrow. In the K-map, the glitch corresponds to the transition across the boundaries of two prime implicants. Not every transition across prime implicants causes a glitch, including the reverse transition \(S: 0 \rightarrow 1.\) The hazardous transition \(S: 1 \rightarrow 0\) switches one AND gate to 0 before switching the other AND gate to 1, cf. output \(W\) at time \(t=5\) and output \(V\) at time \(t=7\) in Figure 5.63.

We may avoid the glitch by covering the glitching transition with the
*consensus* on \(S.\) Figure 5.64 shows the redundant prime implicant of consensus term
\(D_0 D_1.\) The extended SOP form of the multiplexer is \(Y
= \overline{S}\,D_0 + S\,D_1 + D_0\,D_1.\) Correspondingly, we extend
the multiplexer circuit with an AND gate for the consensus term. The
output of this AND gate will remain 1 during the transition of
\(S,\) because \(D_0\) and \(D_1\) maintain value 1. This
1-input suffices to pull output \(Y\) of the extended 3-input OR
gate to 1, while outputs \(V\) and \(W\) of the other two AND
gates change their values. This delightful insight enables us to
avoid glitches in two-level circuits at the expense of functionally
redundant hardware. We may add consensus terms to a minimal SOP form
to avoid static-1 hazards, and use the dual consensus theorem to avoid
static-0 hazards in minimal POS forms.

Footnotes

[1] | Combinational circuits are sometimes confused with
combinatorial circuits. Combinatorics is a branch of discrete
mathematics, whereas combinational circuits combine their inputs
to compute the output. We can design combinational circuits for
combinatorial problems like in Example 5.1. |

[2] | In graph theory, a Hamiltonian path visits each vertex
of a graph exactly once. A Hamiltonian cycle is a cyclic path
that visits each vertex of a graph exactly once, except for the start
and end vertex, which is visited twice. |

[3] | The set covering problem has been studied extensively. To point to just a few solution methods, you can solve the binary decision problem with a search algorithm. If you are interested in the problem formulation as an integer linear program, study the simplex algorithm or interior point methods. The greedy approximation algorithm for the set covering problem is one of the classical approximation algorithms, that does not necessarily produce a minimal cover but the cost of the resulting cover is at most by a logarithmic factor larger than the cost of the minimal cover. |