Archive for December, 2007

4.3.3 HOW FAST CAN WE MULTIPLY? 281 University, (Cheap web hosting)

Monday, December 24th, 2007

4.3.3 HOW FAST CAN WE MULTIPLY? 281 University, 1966), 51-771 later showed how Toom s method can be adapted to fast computer programs. Before we discuss the Toom-Cook algorithm any further, let us study a small example of the transition from U(z) and V(x) to the coefficients of W(z). This example will not demonstrate the efficiency of the method, since the numbers are too small, but it points out some useful simplifications that we can make in the general case. Suppose that we want to multiply u = 1234 times 2, = 2341; in binary notation this is u = (0100 1101 0010)s times v = (1001 0010 0101)s. Let r = 2; the polynomials V(s), V(z) in (8) are U(x) = 4X2 + 132 + 2, V(x) = 9×2 + 2x + 5. Hence we find, for W(x) = U(x)V(s), U(0) = 2, U(1) = 19, U(2) = 44, U(3) = 77, U(4) = 118; V(0) = 5, V(1) = 16, V(2) = 45, V(3) = 92, V(4) = 157; W(0) = 10, W(1) = 304, W(2) = 1980, W(3) = 7084, W(4) = 18526. (12) Our job now is to compute the five coefficients of W(z) from the latter five values. There is an attractive little algorithm that can be used to compute the coefficients of a polynomial W(z) = Wm-lxm-l + . . . + WIz + W , when the values W(O), W(l), . . . , W(m -1) are given: Let us first write W(x) = 8,-l xd + 8,-2 xm–2 + . . . + 81x + eo, (13) where xk = X(X -1). . . (X - k + l), and where the coefficients I!?~ are unknown. The falling factorial powers have the important property that W(X + I) -W(X) = (77~-~)e,-~ 28 + (772-2)e,-2 xd + . . . + el; hence by induction we find that, for all k 2 0, em-1 x– m-l-k em-2xzkSL=!5+…+ : ok. (14) 0 Denoting the left-hand side of (14) by (l/k!) Ak W(z), we see that PA - W(s) (k: l)! and (l/k!) Ak W(0) = ok. So the coefficients 0, can be evaluated using a very simple method, illustrated here for the polynomial W(s) in (12): 10 294 304 138212 = 691 1676 = 341 1980 3428/2 =1714 102313 14414 = 36 (15) 7084 5104 6338/2 = 3169 145513 =485 11442 18526

280 ARITHMETIC 43.3 where each Uj and each (Christian web host)

Sunday, December 23rd, 2007

280 ARITHMETIC 43.3 where each Uj and each Vj is an n-bit number. Consider the polynomials U(x) = &XT +. . . + UlX + VI& V(x) = V,x + * * * + VlX + vo, 63) and let W(x) = U(x)V(x) = W2rx2r + * * * + WlX + wi. (9) Since u = U(2n) and Y = V(2n), we have UZI = W(2n), so we can easily compute uz, if we know the coefficients of W(z). The problem is to find a good way to compute the coefficients of W(z) by using only 2r + 1 multiplications of n-bit numbers plus some further operations that involve only an execution time proportional to n. This can be done by computing U(O)V(O) = W(O), U(l)V(l) = W(l), . . . , U(2r)V(2r) = W(2r). (10) The coefficients of a polynomial of degree 2r can be written as a linear com- bination of the values of that polynomial at 2r + 1 distinct points; computing such a linear combination requires an execution time at most proportional to n. (Actually, the products U(j)V(j) are not strictly products of n-bit numbers, but they are products of at most (n + t)-bit numbers, where t is a fixed value depend- ing on T. It is easy to design a multiplication routine for (n + t)-bit numbers that requires only T(n) + cln operations, where T(n) is the number of operations needed for n-bit multiplications, since two products of t-bit by n-bit numbers can be done in csn operations when t is fixed.) Therefore we obtain a method of multiplication satisfying (6). Relation (6) implies that T(n) 2 csn10g~+l(27f1) < cgnl-t os~+l 2, if we argue as in the derivation of (5), so we have now proved the following result: Theorem A. Given f > 0, there exists a multiplication algorithm such that the number of elementary operations T(n) needed to multiply two n-bit numbers satisfies T(n) < c(e)nl+ , (11) for some constant C(E) independent of n. I This theorem is still not the result we are after. It is unsatisfactory for practical purposes in that the method becomes much more complicated as E –+ 0 (and therefore as r + oo), causing C(E) to grow so rapidly that extremely huge values of n are needed before we have any significant improvement over (5). And it is unsatisfactory for theoretical purposes because it does not make use of the full power of the polynomial method on which it is based. We can obtain a better result if we let r vary with n, choosing larger and larger values of T as n increases. This idea is due to A. L. Toom [Doklady Akad. Nauk SSSR 150 (1963), 496-498, English translation in Soviet Mathematics 3 (1963), 714-7161, who used it to show that computer circuitry for multiplication of n-bit numbers can be constructed involving a fairly small number of components as n grows. S. A. Cook [On the minimum computation time of functions (Thesis, Harvard

4.3.3 HOW FAST CAN WE MULTIPLY? (Web hosting plans) 279 Formula

Sunday, December 23rd, 2007

4.3.3 HOW FAST CAN WE MULTIPLY? 279 Formula (2) can be used for double-precision multiplication when we want a quadruple-precision result, and it will be just a little faster than the traditional method on many machines. But the main advantage of (2) is that we can use it to define a recursive process for multiplication that is significantly faster than the familiar order-n2 method when n is large: If T(n) is the time required to perform multiplication of n-bit numbers, we have T(2n) 5 3T(n) + cn (3) for some constant c, since the right-hand side of (2) uses just three multiplications plus some additions and shifts. Relation (3) implies by induction that 7727 5 C(3k -a ), k 2 1, if we choose c to be large enough so that this inequality is valid when k = 1; therefore we have T(n) I 7v q _ < c(3 rknl -2bl) < 3c. 31gn = 3cdg3. (5) Relation (5) shows that the running time for multiplication can be reduced from order n2 to order n1g3 z n1.585, so the recursive method is much faster than the traditional method when n is large. (A similar but more complicated method for doing multiplication with run- ning time of order n1g3 was apparently first suggested by A. Karatsuba in Doklady Akad. Nauk SSSR 145 (1962), 293-294 [English translation in Soviet Physics-Doklady 7 (1963), 595-5961. Curiously, this idea does not seem to have been discovered before 1962; none of the calculating prodigies who have be- come famous for their ability to multiply large numbers mentally have been reported to use any such method, although formula (2) adapted to decimal nota- tion would seem to lead to a reasonably easy way to multiply eight-digit numbers in one s head.) The running time can be reduced still further, in the limit as n approaches infinity, if we observe that the method just used is essentially the special case r = 1 of a more general method that yields qr + 1,g 2 (2r + l)T(n) + fm (6) for any fixed T. This more general method can be obtained as follows: Let zf = (U(,+l)n-1 . . . %740)2 and 21 = (y,+qn-1 . . . VlVO)Z be broken into r + 1 pieces, u = l&2,, + . . . + U12 + uo, 2, = V,2 +. . * + VIZ + vi, (7)

278 ARITHMETIC 4.3.2 k 13. [M.%] (Automorphic numbers.) (Dedicated web hosting)

Saturday, December 22nd, 2007

278 ARITHMETIC 4.3.2 k 13. [M.%] (Automorphic numbers.) Au n-place decimal number 5 > 1 is called an automorph by recreational mathematicians if the last n digits of z2 are equal to CL For example, 9376 is a 4-place automorph, since 93762 = 87909376. [See Scientific American 218 (January 1968), 125.1 a) Prove that an n-place number z > 1 is an automorph if and only if z mod 5n = 0 or 1 and ~rnod2~ = 1 or 0, respectively. (Thus, if ml = 2 and m2 = 5 , the only two n-place automorphs are the numbers Ml and MZ in (7).) b) Prove that if z is an n-place automorph, then (3~~ - 2~~) mod 102n is a an-place automorph. c) Given that cx E 1 (modulo y), find a simple formula for a number c depending on c and x but not on y, such that crx2 E 1 (modulo y2). l 4.3.3. How Fast Can We Multiply? The conventional method for multiplication in positional number systems, Algorithm 4.3.1M, requires approximately cmn operations to multiply an m-digit number by an n-digit number, where c is a constant. In this section, let us assume for convenience that m = n, and let us consider the following question: Does every general computer algorithm for multiplying two n-digit numbers require an execution time proportional to n2, as n increases? (In this question, a general algorithm means one that accepts, as input, the number n and two arbitrary n-digit numbers in positional notation, and whose output is their product in positional form. Certainly if we were allowed to choose a different algorithm for each value of n, the question would be of no interest, since multiplication could be done for any specific value of n by a table-lookup operation in some huge table. The term computer algorithm is meant to imply an algorithm that is suitable for implementation on a digital computer such as MIX, and the execution time is to be the time it takes to perform the algorithm on such a computer.) A. Digital methods. The answer to the above question is, rather surprisingly, No, and, in fact, it is not very difficult to see why. For convenience, let us assume throughout this section that we are working with integers expressed in binary notation. If we have two 2n-bit numbers u = (usn-r . . . uru~)s and v = (UZn-1. * * 211wo)a, we can write u = 2nUl + UrJ, 2) = 2 Vl + vo, (1) where Ur = (Q+-~. . . un)s is the most significant half of the number u and uo = (z&l.. . ~0)s is the least significant half ; similarly VI = (wzn–l . . . z1,)~ and V. = (u,-~. . .210)2. Now we have uw = (22, + 2 )U~V~ + 2yiY1 -Uo)(Vo -VI) + (2 + 1)UuVc. (2) This formula reduces the problem of multiplying an-bit numbers to three multi- plications of n-bit numbers, namely VI&, (Vi -Uo)(Vo -VI), and UoVo, plus some simple shifting and adding operations.

4.3.2 MODULAR (Jetty web server) ARITHMETIC 277 provided that ui Et

Friday, December 21st, 2007

4.3.2 MODULAR ARITHMETIC 277 provided that ui Et uj (module gcd(m,, m3)), l

Photo web hosting - 276 ARITHMETIC 4.3.2 An example of such an

Thursday, December 20th, 2007

276 ARITHMETIC 4.3.2 An example of such an application is the exact solution of linear equations with rational coefficients. For various reasons it is desirable in this case to assume that the moduli ml, 7732, . . . , m7 are all large prime numbers; the linear equations can be solved independently modulo each mj. A detailed discussion of this procedure has been given by I. Borosh and A. S. Fraenkel [Math. Comp. 20 (1966), 107-1121. By means of their method, the nine independent solutions of a system of 111 linear equations in 120 unknowns were obtained exactly in less than one hour s running time on a CDC 1604 computer. The same procedure is worthwhile also for solving simultaneous linear equations with floating point coefficients, when the matrix of coefficients is ill-conditioned. The modular technique (treating the given floating point coefficients as exact rational numbers) gives a method for obtaining the true answers in less time than conventional methods can produce reliable approximate answers! [See M. T. McClellan, JACM 20 (1973), 563-588, for further developments of this approach; and see also E. H. Bare&, J. Inst. Math. and Appl. 10 (1972), 68-104, for a discussion of its limitations.] The published literature concerning modular arithmetic is mostly oriented towards hardware design, since the carry-free properties of modular arithmetic make it attractive from the standpoint of high-speed operation. The idea was first published by A. Svoboda and M. Valach in the Czechoslovakian journal Stroje na Zpracovrini Informaci 3 (1955), 247-295; then independently by H. L. Garner [IRE Trans. EC-8 (1959), 140-1471. The use of moduli of the form 2e~ - 1 was suggested by A. S. Fraenkel [JACM 8 (1961), 87-961, and several advantages of such moduli were demonstrated by A. Schiinhage [Computing 1(1966), 182-1961. See the book Residue Arithmetic and its Applications to Computer Technology by N. S. Saab6 and R. I. Tanaka (New York: McGraw-Hill, 1967), for additional information and a comprehensive bibliography of the subject. A Russian book published in 1968 by I. &. AkushskiY and D. I. hditski i includes a chapter about complex moduli [cf. Rev. Romaine des Math. 15 (1970), 159-1601. Further discussion of modular arithmetic can be found in Section 4.3.3B. EXERCISES 1. [.20] Find all integers u that satisfy all of the following conditions: ~mod7 = 1, umodll = 6, umod13 = 5, 0 5 u < 1000. 2. [A&%] Would Theorem C still be true if we allowed a, ~1, ~2, . . , 2~~ and u to be arbitrary real numbers (not just integers)? b 3. [A&X] (Generalized Chinese Remainder Theorem.) Let ml, m2, . , rnT be positive integers. Let m be the least common multiple of ml, m2, . , m7, and let a, ~1, ~2, . , u7. be any integers. Prove that there is exactly one integer u that satisfies the conditions alu

43.2 MODULAR ARITHMETIC 275 0 5 u < (Web hosting)

Thursday, December 20th, 2007

43.2 MODULAR ARITHMETIC 275 0 5 u < m and 0 2 u < m, then we can tell if u < u by first doing the conversion to (VI,. . . , v,) and (vi,. . . , v:), then testing if w, < vi, or if v, = vi and q-1 < VI-~, etc. It is not necessary to convert all the way to binary or decimal notation if we only want to know whether (ul, . . . , u,) is less than (u;,...,u;). The operation of comparing two numbers, or of deciding if a modular number is negative, is intuitively very simple, so we would expect to have a much easier method for making this test than the conversion to mixed-radix form. But the following theorem shows that there is little hope of finding a substantially better method, since the range of a modular number depends essentially on all bits of all the residues (~1, . . . , u,): Theorem S (Nicholas Szab6, 1961). In terms of the notation above, assume that ml < fi, and let L be any value in the range ml u, and IJ -u is a multiple of m2 . . . m, = pm/ml, so v 2 TJ -u 2 m/m1 > ml. Therefore, if (28) does not hold for (u, v), it will be satisfied for the pair (u , 2) ) = (v - ml, u + m -ml). 1 Of course, a similar result can be proved for any my in place of ml; and we could also replace (28) by the condition a 2 u < a + L 5 v < a + m with only minor changes in the proof. Therefore Theorem S shows that many simple functions cannot be used to determine the range of a modular number. Let us now reiterate the main points of the discussion in this section: Mod- ular arithmetic can be a significant advantage for applications in which the predominant calculations involve exact multiplication (or raising to a power) of large integers, combined with addition and subtraction, but where there is very little need to divide or compare numbers, or to test whether intermediate results overflow out of range. (It is important not to forget the latter restric- tion; methods are available to test for overflow, as in exercise 12, but they are in general so complicated that they nullify the advantages of modular arith- metic.) Several applications of modular computations have been discussed by H. Takahasi and Y. Ishibashi, Information Proc. in Japan 1 (1961), 28-42.

Fedora web server - 274 ARITHMETIC 4.3.2 Since Mj mod m is

Wednesday, December 19th, 2007

274 ARITHMETIC 4.3.2 Since Mj mod m is uniquely determined if (7) is to be satisfied (because of the Chinese remainder theorem), we can see that, in any event, Eq. (9) requires a lot of high-precision calculation, and such calculation is just what we wished to avoid by modular arithmetic in the first place. So we need an even better proof of Theorem C if we are going to have a really usable method of conversion from (~1,. . . , u,) to U. Such a method was suggested by H. L. Garner in 1958; it can be carried out using (5) constants czj for 1 2 i < j 2 r, where mi S 1 (modulo my). (22) CiJ These constants cij are readily computed using Euclid s algorithm, since for any given i and j Algorithm 4.5.2X will determine a and b such that ami + bmj = gcd(mi, mj) = 1, and we may take Cij = a. When the moduli have the special form 2Q - 1, a simple method of determining cij is given in exercise 6. Once the cij have been determined satisfying (22), we can set 2rl c u1 modml, vz + (~2 -ul)m modma, (23) 21, + (. . . ((% -~1)cl7 -~12) ~2~ -+ * * -%-I) c(,-~)~ modm,. Then u = v,m,-l . . . m2m1 +. . . + v3m2ml+ v2ml + 211 (24) is a number satisfying the conditions OIu

Web hosting compare - 4.3.2 MODULAR ARITHMETIC 273 As we have already

Tuesday, December 18th, 2007

4.3.2 MODULAR ARITHMETIC 273 As we have already observed, the operations of conversion to and from modular representation are very important. If we are given a number U, its modular representation (~1, . . . , u,) may be obtained by simply dividing u by ml, , m, and sa,ving the remainders. A possibly more attractive procedure, if u = (u~v,-~. . . uO)~, is to evaluate the polynomial (. . . (v,b + ~,-~)b + . . .) b + v. using modular arithmetic. When b = 2 and when the modulus rnj has the special form 2e~ - 1, both of these methods reduce to quite a simple procedure: Consider the binary representation of u with blocks of ej bits grouped together, u = atAt + CX-~A~- + . . . + alA + ao, (20) where A = 2e~ and 0 5 ak < 2e3 for 0 2 k 5 t. Then u E at + utFl + . . . + al + a0 (modulo 2e3 - l), (21) since A G 1, so we obtain u3 by adding the e3 -bit numbers at $ . . . $ al $ ao, using (16). This process is similar to the familiar device of casting out nines that determines umod9 when u is expressed in the decimal system. Conversion back from modular form to positional notation is somewhat more difficult. It is interesting in this regard to make a few side remarks about the way computers make us change our viewpoint towards mathematical proofs: Theorem C tells us that the conversion from (ui, . . . , u,) to u is possible, and two proofs are given. The first proof we considered is a classical one that relies only on very simple concepts, namely the facts that i) any number that is a multiple of ml, of m2, . . . , and of m,, must be a multiple of m1m2.. . m7 when the mj s are pairwise relatively prime; and ii) if m things are put into m boxes with no two things in the same box, then there must be one in each box. By traditional notions of mathematical aesthetics, this is no doubt the nicest proof of Theorem C; but from a computational standpoint it is completely worthless. It amounts to saying, Try u = a, a + 1, . . . until you find a value for which u = u1 (modulo ml), . . . , u s u7 (modulo mr). The second proof of Theorem C is more explicit; it shows how to compute r new constants Ml, . . . , MT, and to get the solution in terms of these constants by formula (9). This proof uses more complicated concepts (for example, Euler s theorem), but it is much more satisfactory from a computational standpoint, since the constants Ml, . . . , M, need to be determined only once. On the other hand, the determination of My by Eq. (8) is certainly not trivial, since the evaluation of Euler s (o-function requires, in general, the factorization of my into prime powers. Furthermore, Mj is likely to be a terribly large number, even if we compute only the quantity Mj mod m (which will work just as well as Mj in (9)).

272 ARITHMETIC 4.3.2 On binary computers it is (1 on 1 web hosting)

Monday, December 17th, 2007

272 ARITHMETIC 4.3.2 On binary computers it is sometimes desirable to choose the mj in a different way, by selecting mj = 2e3 - 1. (14 In other words, each modulus is one less than a power of 2. Such a choice of rnj often makes the basic arithmetic operations simpler, because it is relatively easy to work modulo 2ej -1, as in ones complement arithmetic. When the moduli are chosen according to this strategy, it is helpful to relax the condition 0 5 uj < rnj slightly, so that we require only 0 5 Uj < 2e , Uj E u (modulo 2e - 1). (15) Thus, the value uj = mj = 2e~ - 1 is allowed as an optional alternative to Uj = 0, since this does not affect the validity of Theorem C, and it means we are allowing ZL~ to be any ej-bit binary number. Under this assumption, the operations of addition and multiplication modulo mj become the following: uj + vj7 ifUj+Vj < 2e3; Uj $ Vj = (16) ((Uj + vj)mod2ej) + 1, if uj + vj 2 2e3. (17) (Here $ and @ refer to the operations done on the individual components of (WY... , u,) and (VI , . . . , v,) when adding or multiplying, respectively, using the convention (15).) Equation (12) may be used for subtraction. These operations can be performed efficiently even when mj is larger than the computer s word size, since it is a simple matter to compute the remainder of a positive number modulo a power of 2, or to divide a number by a power of 2. In (17) we have the sum of the upper half and the lower half of the product, as discussed in exercise 3.2.1.1-8. If moduli of the form 29 -1 are to be used, we must know under what con- ditions the number 2e - 1 is relatively prime to the number 2f -1. Fortunately, there is a very simple rule, gcd(ze -1,2f -1) = 2gcd@lf) -1, (18) which states in particular that 2e - 1 and 2f -1 are relatively prime if and only if e md f are relatively prime. Equation (18) follows from Euclid s algorithm and the identity (2e -1) mod (2f -1) = 2e mod f -1. (19) (See exercise 6.) Thus we could choose for example ml = 235 - 1, m2 = 234 - 1, m3 = 233 -1, m4 = 231 -1, m.tj = 22g -1, if we had a computer with word size 235; this would permit efficient addition, subtraction, and multiplication of integers in a range of size m1m2m3m4m5 > 2161.