24. Grade School Revisited: How to Multiply Two Numbers
Computers worldwide perform enormous numbers of multiplications each day. In most computers, each such operation consumes mere microseconds. Nonetheless, the general question of how quickly two n-bit numbers can be multiplied has not only great theoretical importance but also practical relevance.
Indeed, when it comes to multiplying two numbers, the best (or fastest) way to do it is often far from obvious.
Gauss and Complex Numbers
A complex number is an expression of the form a + bi, where a and b are real numbers, and i has the property i 2 = −1. The real number a is called the real part of the complex number, and the real number b is the imaginary part. When the imaginary part b is 0, the complex number is just the real number a.
Suppose that we want to multiply two complex numbers, a + bi and c + di. To do so, we use the following rule.
(a + bi)(c + di) = [ac – bd] + [ad + bc] i
=''Example''=
(2 + 3i)(1 + 2i) = [2 – 6] + [4 + 3]i = (–4 + 7i)
Expressed in terms of a program, we would have input a, b, c, and d and output ac – bd and ad + bc.
Let’s look at the computational cost of obtaining the output from the input. Suppose that multiplying two real numbers costs $1 and adding them costs a penny. To obtain ac – bd and ad + bc requires four multiplications and two additions, for a total of $4.02.
Is there a cheaper way to obtain the output from the input?
[Gauss optimization algorithm]
Here’s how it can be done for $3.05, with three multiplications and five additions.
¢ x1 = a + b
¢ x2 = c + d
$ x3 = x1 x2 = ac + ad + bc + bd
$ x4 = ac
$ x5 = bd
¢ x6 = x4 – x5 = ac – bd
¢¢ x7 = x3 – x4 – x5 = bc + ad
Gauss optimization saves one multiplication out of four. So, using this algorithm, the computation requires about 25% less work.
Recall that, in terms of time complexity, T(n), grade school addition is linear (see chapter 17). In other words, the amount of time grade school addition uses to add two n-bit numbers is T(n) = Θ(n).
In contrast, the amount of time grade school multiplication uses to add two n-bit numbers is quadratic. In other words, T(n) = Θ(n2).
No matter how dramatic the difference in the constants, the quadratic curve will eventually dominate the linear curve. So, in terms of time complexity, grade school multiplication is a more costly operation than is grade school addition.
Can we do better than linear time for addition? Is there a sub-linear method for addition?
The answer is no. Any addition algorithm takes Ω(n) time.
=''Proof''=
Claim: Any algorithm for addition must read all the input bits.
Proof: Suppose there is a mystery algorithm A that does not examine each bit.
Give A a pair of numbers. There must be some unexamined bit position i in one of the numbers.
• If A is not correct on the inputs, we’ve found a bug.
• If A is correct, flip the bit at position i and give A the new pair of numbers.
A gives the same answer as before, which is now wrong.
So, any algorithm for addition must use time at least linear in the size of the numbers.
Hence, grade school addition can’t be improved upon by more than a constant factor. Furthermore, it is optimal.
Divide, Conquer, and Glue
Is there a clever algorithm to multiply two numbers in linear time? Despite years of research, no one knows. Can we even break the quadratic time barrier? In other words, can we do something very different from grade school multiplication to achieve improved time complexity?
Here’s one way to approach this problem intuitively. Let’s say that each time an algorithm has to multiply a digit from one number with a digit from the other number, we call that a “kiss.” It seems as if any correct algorithm must kiss at least n2 times.
However, we can try a divide-and-conquer approach to try to obtain a faster algorithm.
1. DIVIDE a problem into smaller sub-problems.
2. CONQUER them recursively.
3. GLUE the answers together so as to obtain the answer to the larger problem.
Suppose that we want to multiply two n-bit numbers, x and y. We can divide each of these numbers into two parts, each part consisting of n/2 bits. We have
x = a 2n/2 + b and y = c 2n/2 + d .
We can then multiply the two n-bit numbers in the following way:
x ∙ y = ac2n + (ad + bc)2n/2 + bd .
Here’s a program, MULT(x, y), to implement this computation.
MULT(x, y):
If |x| = |y| = 1 then return xy
break x into a; b and y into c; d
return
MULT(a, c)2n + (MULT(a, d) + MULT(b, c))2n/2 + MULT(b, d)
We can do the same thing with numbers in decimal.
Given x = a 10n/2 + b and y = c 10n/2 + d, we have
x ∙ y = ac 10n + (ad + bc) 10n/2 + bd.
=''Example''=
Find 12345678 * 21394276.
Break each number into two 4-digit parts.
a = 1234 b = 5678 c = 2139 d = 4276
Find the products bd, bc, ad, and ac.
bd = 5678 * 4276
bc = 5678 * 2139
ad = 1234 * 4276
ac = 1234 * 2139
Consider 1234 * 2139.
Break each of the resulting numbers into 2-digit parts. Repeat the calculations with these parts.
12 34 * 21 39
12 * 21 12 * 39 34 * 21 34 * 39
Break each of the resulting numbers into 1-digit parts. Repeat the calculations with these parts.
1 2 * 2 1
1 * 2 1 * 1 2 * 2 2 * 1
Given that x ∙ y = ac 10n + (ad + bc) 10n/2 + bd, n = 2, and ac = 1 * 2 = 2, ad = 1 * 1 = 1, bc = 2 * 2 = 4, bd = 2 * 1 = 2, we obtain
12 * 21 = 2 * 102 + (1 + 4)101 + 2 = 252
Similarly, for 12 * 39:
1 * 3 1 * 9 2 * 3 2 * 9
3 9 6 18
3 * 102 + 9 * 101 + 6 * 101 + 18 * 1 = 468
So, we have 12 * 21 = 252, 12 * 21 = 468, 34 * 21 = 714, 34 * 39 = 1326.
This allows us to compute 1234 * 2139.
252 * 104 + 468 * 102 + 714 * 102 + 1326 * 1 = 2639526
Similarly, we obtain
1234 * 4276 = 5276584
5678 * 2139 = 12145242
5678 * 4276 = 24279128
Hence,
12345678 * 21394276
= 2639526 * 108 + 5276584 * 104 + 12145242 * 104 + 24279128 * 1
= 264126842539128
Kissing Numbers
How efficient is the “divide, conquer, and glue” (MULT) algorithm for multiplication? We need to work out the time taken by MULT on two n-bit numbers, T(n), and determine its growth rate? In particular, we want to see whether T(n) = Θ(n2).
We can see that the “conquering” time is 4 T(n/2) and the “divide-and-glue” time is (k'n + k"), for a total, T(n) = 4 T(n/2) + (k'n + k").
We also have the following recurrence relation:
T(1) = k for some constant k
T(n) = 4 T(n/2) + k'n + k" for constants k'n + k"
Keeping it simple by making all constants equal to 1, we can say
T(1) = 1
T(n) = 4 T(n/2) + n
Note that T(n) is inductively defined only for positive powers of 2.
What is the growth rate of T(n)? We have three techniques that we can use to obtain an answer.
Technique 1: Guess and verify
Guess: G(n) = 2n2 – n
Verify: G(1) = 1 and G(n) = 4 G(n/2) + n
4 [2(n/2)2 – n/2] + n
= 2n2 – 2n + n
= 2n2 – n
= G(n)
Smilarly: T(1) = 1 and T(n) = 4 T(n/2) + n
So T(n) = G(n) = Θ(n2)
Technique 2: Guess form and calculate coefficients
Guess: T(n) = an2 + bn + c for some a, b, c
Calculate: T(1) = 1 Þ a + b + c = 1
T(n) = 4 T(n/2) + n
Þ an2 + bn + c = 4 [a(n/2)2 + b(n/2) + c] + n
= an2 + 2bn + 4c + n
Þ (b + 1)n + 3c = 0
Therefore: b = –1 c = 0 a = 2
Technique 3: Labeled tree representation
Recall that a tree node-labeled by S is a tree T = <V, E> with an associated function Label: V to S (see chapter 3).
Level i is the sum of 4i copies of n/2i
Hence, n(1 + 2 + 4 + 8 + . . . + n) = n(2n – 1) = 2n2 – n
These arguments show that divide-and-conquer MULT runs in Θ(n2) time—the same as grade school multiplication.
All that work for nothing! However, in retrospect, it is obvious that the kissing number for divide-and-conquer MULT is n2, given that that leaves in the tree representation are in correspondence with the kisses.
Can we do better? Note that MULT calls itself four times. Is it possible to reduce the number of calls? The answer lies in the application of Gauss optimization.
Karatsuba Multiplication
In the late 1950s, Anatolii Alexeevich Karatsuba (1937–) formulated the first multiplication algorithm to break the kissing barrier. He did it by taking advantage of Gauss optimization, as follows.
Gauss MULT(x, y):
If |x| = |y| = 1 then return xy
break x into a; b and y into c; d
e = MULT(a, c) and f = MULT(b, d)
return
e 2n + (MULT(a + c, b + d) – e – f)2n/2 + f
In this case, the time complexity is roughly T(n) = 3 T(n/2) + n or, more accurately, T(n) = 2 T(n/2) + T(n/2 + 1) + kn.
In a tree representation, level i is the sum of 3i copies of n/2i.
n(1 + 3/2 + (3/2)2 + . . . + (3/2)log2 n) = 3n1.58. . . – 2n
Recall the formula for a geometric series. Substituting into that formula for x = 3/2 and k = log2n, we have:
[(3/2) × (3/2)log2n – 1]/½ = 3 × (3log2 n/2log2 n) – 2 = 3 × (3log2 n/n) – 2 = 3 n(log2 3)/n – 2
So for the running time, we get
Note that we have a dramatic improvement in running time for large n.
T(n) = 3nlog2 3 – 2n
= Θ(nlog2 3)
= Θ(n1.58…)
Indeed, we have a huge saving over Θ(n2) when n gets large, as seen in the following plot, where the upper curve is n2 and the lower curve is n1.584 .
This might seem a little curious. Gauss optimization simply replaces every four multiplications with three. So, it appears to be worth only a 25-percent speedup. Where does the power come from? It comes from the fact that we apply Gauss optimization recursively.
So, what is the kissing number of Karatsuba’s algorithm? It’s not clear that we kiss even once.
Here is an alternative multiplication algorithm. We will call it Mystery MULT.
Mys-MULT(x, y):
If |y| = 1 then return xy
break y into c; d where |d| = 1
return
2[Mys-MULT(x, c)] + Mys-MULT(x, d)
What’s going on here? Does this point to an even better way of doing multiplication? Mys-MULT is the Egyptian method (see chapter 5) stated in recursive language.
Let us summarize what we now know about the running times of different multiplication algorithms.
Kindergarten n2n
Grade school n2
Karatsuba n1.58…
Fastest known n logn loglogn
[conclusion]
Exercises
References
Karatsuba, A., and Y. Ofman. “Multiplication of multidigit numbers on automata.” Sov. Phys. Dokl. 7(1962), 595–596.