Archive for October, 2007

4.2.1 SINGLE-PRECISION CALCULATIONS 205 13 FDIV STJ (Web hosting isp) EXITF

Wednesday, October 31st, 2007

4.2.1 SINGLE-PRECISION CALCULATIONS 205 13 FDIV STJ EXITF Floating point division subroutine: 14 JOV OFLO Ensure overflow is off. 15 STA TEMP TEMP + v. 16 STA FV (0 : 4) FV+-ItffffO. 17 LDI TEMP (EXP) 18 LD2 ACC(EXP) 19 DEC2 -Q,l r12 e e, -e, + q. Z?O ENTX 0 .21 LDA ACC 22 SLA 1 rA+ fu. 23 CMPA FV(I : 5) 24 JL *+3 Jump if lful < IfJ. .25 SRA I Otherwise, scale f,, right 26 INC2 1 and increase r12 by 1. .27 DIV FV Divide. .28 JNOV NORM Normalize, round, and exit. 29 DVZRO HLT 3 Unnormalized or zero divisor 1 The most noteworthy feature of this program is the provision for division in lines 23-26, which is made in order to ensure enough accuracy to round the answer. If Iful < Ifv I, straightforward application of Algorithm M would leave a result of the form & 0 f f f f in register A, and this would not allow a proper rounding without a careful analysis of the remainder (which appears in register X). So the program computes fw + fu/fv in this case, ensuring that fw is either zero or normalized in all cases; rounding can proceed with five significant bytes, possibly testing whether the remainder is zero. We occasionally need to convert values between fixed and floating point representations. A fix-to-float routine is easily obtained with the help of the normalization algorithm above; for example, in MIX, the following subroutine converts an integer to floating point form: 01 FLOT STJ FXITF Assume that rA = u, an integer. 02 JOV OFLO Ensure overflow is off. 03 ENT2 Q+5 Set raw exponent. (10) 04 ENTX 0 05 JMP NORM Normalize, round, and exit. 1 A float-to-fix subroutine is the subject of exercise 14. The debugging of floating point subroutines is usually a difficult job, since there are so many cases to consider. Here is a list of common pitfalls that often trap a programmer or machine designer who is preparing floating point routines: 1) Losing the sign. On many machines (not MIX), shift instructions between registers will affect the sign, and the shifting operations used in normalizing and scaling numbers must be carefully analyzed. The sign is also lost frequently when minus zero is present. (For example, Program A is careful to retain the sign of register A in lines 30-34. See also exercise 6.)

204 ARITHMETIC 4.2.1 The rather long section of (Remote web server)

Tuesday, October 30th, 2007

204 ARITHMETIC 4.2.1 The rather long section of code from lines 25 to 37 is needed because MIX has only a 5-byte accumulator for adding signed numbers while in general 2p+ 1 = 9 places of accuracy are required by Algorithm A. The program could be shortened to about half its present length if we were willing to sacrifice a little bit of its accuracy, but we shall see in the next section that full accuracy is important. Line 55 uses a nonstandard MIX instruction defined in Section 4.5.2. The running time for floating point addition and subtraction depends on several factors that are analyzed in Section 4.2.4. Now let us consider multiplication and division, which are simpler than addition, and which are somewhat similar to each other. Algorithm M (Floating point multiplication or division). Given base b, excess 9, pdigit, normalized floating point numbers u = (eu, fiL) and v = (e,, fv), this algorithm forms the product w = u @ v or the quotient w = u @ v. Ml. [Unpack.] Separate the exponent and fraction parts of the representations of LL and V. (Sometimes it is convenient, but not necessary, to test the operands for zero during this step.) M2. [Operate.] Set . . . e, + e, + ev - 4, fw + fu fv for multiplication; (g) ew +- e, -e, + q + 1, fW c (F1fu)/fV for division. (Since the input numbers are assumed to be normalized, it follows that either fW = 0, or l/b* 5 If,,,] < 1, or a division-by-zero error has occurred.) If necessary, the representation of fW may be reduced to p + 2 or p + 3 digits at this point, as in exercise 5. M3. [Normalize.] Perform Algorithm N on (e,, fW) to normalize, round, and pack the result. (Note: Normalization is simpler in this case, since scaling left occurs at most once, and since rounding overflow cannot occur after division.) 1 The following MIX subroutines, which are intended to be used in connection with Program A, illustrate the machine considerations necessary in connection with Algorithm M. Program M (Floating point multiplication and division). 01 Q EQU BYTE/2 q is half the byte size 02 FMUL STJ EXITF Floating point multiplication subroutine: 03 JOV OFLO Ensure overflow is off. 04 STA TEMP TEMP + w. 0.5 LDX ACC rx +-u. 06 STX FU(O:4) FU+ztffffO. 07 LDI TEMP (EXP) 08 LD2 ACC@XP) 09 INC2 -&,I r12 +- e, + ev -q. 10 SLA 1 11 MUL FU Multiply fu times fv. 12 JMP NORM Normalize, round, and exit.

Web proxy server - 4.2.1 SINGLE-PRECISION CALCULATIONS 203 19 4H INCI 0,2

Tuesday, October 30th, 2007

4.2.1 SINGLE-PRECISION CALCULATIONS 203 19 4H INCI 0,2 20 5H LDA FV a ENTX 0 22 SRAX 0,l 23 6H ADD FU 24 JOV N4 25 JXZ NORM 26 CMPA =O=(l:l) 27 JNE N5 28 SRC 5 29 DECX 1 30 STA TEMP 31 STA HALF(O:O) 32 LDAN TEMP ADD HALF ADD HALF SRC 4 JMP N3A CON l//2 CON 0 CON 0 JAZ ZRO CMPA =O=(l:i) JNE N5 SLAX 1 DEC2 1 JMP N2 ENTX 1 SRC 1 INC2 1 CMPA =BYTE/2=(5:5) JL N6 JG 5F JXNZ 5F STA TEMP LDX TEMP(4:4) JXO N6 STA *+l(O:O) INCA BYTE JOV N4 J2N EXPUN ENTX 0,2 SRC I DEC2 BYTE STA ACC J2N * 33 34 35 36 37 HALF 38 FU 39 FV 40 NORM 41 N2 42 43 N3 44 N3A 45 46 N4 47 48 49 N5 50 51 52 53 54 55 56 5H 57 58 59 N6 60 N7 61 62 ZRO 63 8H 64 EXITF rI1 + e, -e,. (Step A4 unnecessary.) A5 Scale rig& Clear rX. Shift right e, -e, places. A6. Add. A7. Normdize. Jump if fraction overflow. Easy case? Is f normalized? If so, round it. IrXl ++ (rA[. (rX is positive.) (Operands had opposite signs, registers must be adjusted before rounding and normalization.) Complement least significant portion. Jump into normalization routine. One half the word size (Sign varies) Fraction part fU Fraction part fv Nl. Test fL N2. Is f normalized? To N5 if leading byte nonzero. N3. Scale left. Decrease e by 1. Return to N2. N4. Scale rig& Shift right, insert 1″ with proper sign. Increase e by 1. N5. Round. Is [tail1 < ib? Is ltaill > +b? Jtaill = 4 b; round to odd. To N6 if rX is odd. Store sign of rA. Add bd4 to IfI. (Sign varies) Check for rounding overflow. N6. Check e. Underflow if e < 0. N7. Pack. rX + e. r12 + e -b. Exit, unless e 2 b. 65 EXPOV HLT 2 Exponent overflow detected 66 EXPUN HLT 1 Exponent underflow detected 67 ACC CON 0 Floating point accumulator 1

202 ARITHMETIC 4.2.1 be expressed as a normalized (Web and email hosting)

Monday, October 29th, 2007

202 ARITHMETIC 4.2.1 be expressed as a normalized floating point number in the required range, special action is necessary.) N7. [Pack.] Put e and f together into the desired output representation. 1 Some simple examples of floating point addition are given in exercise 4. The following MIX subroutines, for addition and subtraction of numbers having the form (4), show how Algorithms A and N can be expressed as computer programs. The subroutines below are designed to take one input u from symbolic location ACC, and the other input v comes from register A upon entrance to the subroutine. The output w appears both in register A and location ACC. Thus, a fixed point coding sequence LDA A; ADD B; SUB C; STA D (7) would correspond to the floating point coding sequence LDA A, STA ACC; LDA B, JMP FADD; LDA C, JMP FSUEl; STA D. (8) Program A (Addition, subtraction, and normalization). The following program is a subroutine for Algorithm A, and it is also designed so that the normalization portion can be used by other subroutines that appear later in this section. In this program and in many others throughout this chapter, OFLO stands for a subroutine that prints out a message to the effect that MIX s overflow toggle was unexpectedly found to be on. The byte size b is assumed to be a multiple of 4. The normalization routine NORM assumes that r12 = e and rAX = f, where rA = 0 implies rX = 0 and r12 < b. 00 BYTE EQU 1(4:4) Byte size b 01 EXP E&U I:1 Definition of exponent field 0.2 FSUB STA TEMP Floating point subtraction subroutine: 03 LDAN TEMP Change sign of operand. 04 FADD STJ EXITF Floating point addition subroutine: 05 JOV OFLO Ensure overflow is off. 06 STA TEMP TEMP +-v. 07 LDX ACC rX + 21. 08 CMPA ACC @XI ) Steps Al. A2, A3 are combined here: 09 JGE IF Jump if e, 2 e,. 10 STX FU (0 : 4) FU+ fffffo. 11 LD2 ACC@XP) r12 + e,. 12 STA FV (0 : 4) 13 LDIN TEMP @XI ) rI1 + -e,. 1-4 JMP 4F 15 IH STA FU (0 : 4) FU t f f f f f 0 (u, v interchanged). 16 LD2 TEMP (EXP) r12 + e,. 17 STX FV (0 : 4) 18 LDlN ACC (!ZXF ) rI1 t -e,.

4.2.1 SINGLE-PRECISION (Cpanel web hosting) CALCULATIONS 201 N4. Scale right Nl.

Sunday, October 28th, 2007

4.2.1 SINGLE-PRECISION CALCULATIONS 201 N4. Scale right Nl. underflow N7. Pack Fig. 3. Normalization of (e, f). base-b digits to the right of the radix point. If such a large accumulator is not available, it is possible to shorten the requirement to p + 2 or p + 3 places if proper precautions are taken; the details are given in exercise 5.1 A6. [Add.] Set fw +- fU + fV. A7. [Normalize.] (At this point (G,,, fiu) represents the sum of u and 21, but lfwj may have more than p digits, and it may be greater than unity or less than l/b.) Perform Algorithm N below, to normalize and round (ezu, jiu) into the final answer. 1 Algorithm N (Normalization). A raw exponent e and a raw fraction f are converted to normalized form, rounding if necessary to p digits. This algorithm assumes that ]f 1 < b. Nl. [Test f .] If If] 2 1 ( fraction overflow ), go to step N4. If f = 0, set e to its lowest possible value and go to step N7. N2. [Is f normalized?] If ] f ] 2 l/b, go to step N5. N3. [Scale left.] Shift f to the left by one digit position (i.e., multiply it by b), and decrease e by 1. Return to step N2. N4. [Scale right.] Shift f to the right by one digit position (i.e., divide it by b), and increase e by 1. N5. [Round.] Round f to p places. (We take this to mean that f is changed to the nearest multiple of b-p. It is possible that (bpf)modl = f so that there are two nearest multiples; if b is even, we choose the one that makes bPf + $b odd. Further discussion of rounding appears in Section 4.2.2.) It is important to note that this rounding operation can make If] = 1 ( rounding overflow ); in such a case, return to step N4. N6. [Check e.] If e is too large, i.e., larger than its allowed range, an exponent overflow condition is sensed. If e is too small, an exponent underflow condition is sensed. (See the discussion below; since the result cannot

Web hosting providers - 200 ARITHMETIC 4.2.1 Al. Unpack i-; A7. Normalize

Sunday, October 28th, 2007

200 ARITHMETIC 4.2.1 Al. Unpack i-; A7. Normalize I Fig. 2. Floating point addition. Note: Since floating point arithmetic is inherently approximate, not exact, we will use round symbols $7 8, 8, 0 to stand for ffoating point addition, subtraction, multiplication, and division, respectively, in order to distinguish approximate operations from the true ones. The basic idea involved in floating point addition is fairly simple: Assuming that eu 2 e,, we take e, = eu, fW = fu. + fV/beu+~ (thereby aligning the radix points for a meaningful addition), and normalize the result. Several situations can arise that make this process nontrivial, and the following algorithm explains the method more precisely. Algorithm A (Floating point addition). Given base b, excess 4, p-digit, normalized floating point numbers u = (ezL, fu) and Y = (e,, fV), this algorithm forms the sum w = u$ w. The same procedure may be used for floating point subtraction, if -v is substituted for V. Al. [Unpack.] Separate the exponent and fraction parts of the representations of u and v. A2. [Assume e, 2 e,.] If e, < e,, interchange u and w. (In many cases, it is best to combine step A2 with step Al or with some of the later steps.) A3. [Set e,.] Set e, c ezL . A4. [Test e, -e, .] If e, -e, 2 p+ 2 (large difference in exponents), set fW c fu. and go to step A7. (Actually, since we are assuming that ZL is normalized, we could terminate the algorithm; but it is occasionally useful to be able to normalize a possibly unnormalized number by adding zero to it.) A5. [Scale right.] Shift f,, to the right e, -e, places; i.e., divide it by Vu- u. [Note: This will be a shift of up to p + 1 places, and the next step (which adds fiL to fV) thereby requires an accumulator capable of holding 2p + 1

4.2.1 SINGLE-PRECISION CALCULATIONS 199 used for this purpose, (Web hosting services)

Saturday, October 27th, 2007

4.2.1 SINGLE-PRECISION CALCULATIONS 199 used for this purpose, notably characteristic and mantissa ; but it is an abuse of terminology to call the fraction part a mantissa, since this concept has quite a different meaning in connection with logarithms. Furthermore the English word mantissa means a worthless addition. ) The MIX computer assumes that its floating point numbers have the form (4 Here we have base b, excess q, floating point notation with four bytes of precision, where b is the byte size (e.g., b = 64 or b = loo), and q is equal to [$b]. The fraction part is f j j j j, and e is the exponent, which lies in the range 0 2 e < b. This internal representation is typical of the conventions in most existing computers, although b is a much larger base than usual. B. Normalized calculations. A floating point number (e, j) is normalized if the most significant digit of the representation of f is nonzero, so that l/b I VI < 1; (5) or if j = 0 and e has its smallest possible value. It is possible to tell which of two normalized floating point numbers has a greater magnitude by comparing the exponent parts first, and then testing the fraction parts only if the exponents are equal. Most floating point routines now in use deal almost entirely with normalized numbers: inputs to the routines are assumed to be normalized, and the outputs are always normalized. Under these conventions we lose the ability to represent a few numbers of very small magnitude-for example, the value (0, .OOOOOOOl) can t be normalized without producing a negative exponent-but we gain in speed, uniformity, and the ability to give relatively simple bounds on the relative error in our computations. (Unnormalized floating point arithmetic is discussed in Section 4.2.2.) Let us now study the normalized floating point operations in detail. At the same time we can consider the construction of subroutines for these operations, assuming that we have a computer without built-in floating point hardware. Machine-language subroutines for floating point arithmetic are usually writ- ten in a very machine-dependent manner, using many of the wildest idiosyn- crasies of the computer at hand; so floating point addition subroutines for two different machines usually bear little superficial resemblance to each other. Yet a careful study of numerous subroutines for both binary and decimal computers reveals that these programs actually have quite a lot in common, and it is possible to discuss the topics in a machine-independent way. The first (and by far the most difficult!) algorithm we shall discuss in this section is a procedure for floating point addition, (eu, .fd $ (euj .fv) = (k, full. (6)

Medical web site - 198 ARITHMETIC 4.2 4.2. FLOATING POINT ARITHMETIC IN

Friday, October 26th, 2007

198 ARITHMETIC 4.2 4.2. FLOATING POINT ARITHMETIC IN THIS SECTION, we shall study the basic principles of doing arithmetic on floating point numbers, by analyzing the internal mechanisms underlying such calculations. Perhaps many readers will have little interest in this subject, since their computers either have built-in floating point instructions or their computer manufacturer has supplied suitable subroutines. But, in fact, the material of this section should not merely be the concern of computer-design engineers or of a small clique of people who write library subroutines for new machines; every well-rounded programmer ought to have a knowledge of what goes on during the elementary steps of floating point arithmetic. This subject is not at all as trivial as most people think; it involves a surprising amount of interesting information. 4.2.1. Single-Precision Calculations A. Floating point notation. We have discussed fixed point notation for numbers in Section 4.1; in such a case the programmer knows where the radix point is assumed to lie in the numbers he manipulates. For many purposes it is considerably more convenient to let the position of the radix point be dynamically variable or floating as a program is running, and to carry with each number an indication of its current radix point position. This idea has been used for many years in scientific calculations, especially for expressing very large numbers like Avogadro s number N = 6.02252 x 1023, or very small numbers like Planck s constant h = 1.0545 X 1O-27 erg sec. In this section we shall work with base b, excess q, Aoating point numbers with p digits: Such numbers will be represented by pairs of values (e, f), denoting (e, f) = f x bepq. 0) Here e is an integer having a specified range, and f is a signed fraction. We will adopt the convention that Ifl < 1; in other words, the radix point appears at the left of the positional representation of f. More precisely, the stipulation that we have pdigit numbers means that bpf is an integer, and that -bP < bPf < bP. (2) The term floating binary implies that b = 2, floating decimal implies b = 10, etc. Using excess-50 floating decimal numbers with 8 digits, we can write, for example, Avogadro s number N = (74, +.60225200); Planck s constant h = (24, +.10545000). (3) The two components e and f of a floating point number are called the exponent and the fraction parts, respectively. (Other names are occasionally

4.1 POSITIONAL NUMBER SYSTEMS 197 b 31. [M3.5]

Thursday, October 25th, 2007

4.1 POSITIONAL NUMBER SYSTEMS 197 b 31. [M3.5] A generalization of two s complement arithmetic, called 2-adic numbers, was invented about 1900 by K. Hensel. (In fact he treated p-adic numbers, for any prime p.) A 2-adic number may be regarded as a binary number u = (. .u3u2u1u0.u-1.. .U-&, whose representation extends infinitely far to the left, but only finitely many places to the right, of the binary point. Addition, subtraction, and multiplication of 2-adic numbers are done according to the ordinary procedures of arithmetic, which can in principle be extended indefinitely to the left. For example, 7=(… 000000000000111)2 ) = (. . .110110110110111)2 -7=(… 111111111111001)2 -f = (. . .001001001001001)2 $ = ( .000000000000001.11)2 & = (. .110011001100110.1)2 J–7 = (. .100000010110101)~ or (. .011111101001011)~ Here 7 appears as the ordinary binary integer seven, while -7 is its two s comple- ment (extending infinitely to the left); it is easy to verify that the ordinary procedure for addition of binary numbers will give -7 + 7 = ( . 00000)s = 0, when the procedure is continued indefinitely. The values of 3 and -+ are the unique 2-adic numbers that, when formally multiplied by 7, give 1 and -1, respectively. The values of 3 and & are examples of 2-adic numbers that are not 2-adic integers, since they have nonzero bits to the right of the binary point. The two values of J-7, which are negatives of each other, are the only 2-adic numbers that, when formally squared, yield the value (. . .111111111111001)2. a) Prove that any 2-adic number u can be divided by any nonzero 2-adic number v to obtain a unique 2-adic number w satisfying u = VW. (Hence the set of 2-adic numbers forms a field ; cf. Section 4.6.1.) b) Prove that the 2-adic representation of the rational number -1/(2n + 1) may be obtained as follows, when n is a positive integer: First find the ordinary binary expansion of +1/(2n + l), which has the periodic form (O.cucucu.. .)z for some string o of O s and 1 s. Then -1/(2n + 1) is the 2-adic number (. . ocycy)~. c) Prove that the representation of a 2-adic number u is ultimately periodic (that is, ZLN+X = UN for all large N, for some x 2 1) if and only if u is rational (that is, u = m/n, for some integers m and n). d) Prove that, when n is an integer, fi is a 2-adic number if and only if it satisfies nmod22k+3 = 22k for some nonnegative integer k. (Thus, the possibilities are either n mod S = 1, or n mod 32 = 4, etc.) 32. [M40] (I. Z. Ruzsa.) Prove that there are infinitely many integers whose ternary representation uses only O s and l s and whose quinary representation uses only O s, l s, and 2 s. 33. [M40] (D. A. Klarner.) Let D be any set of integers, let b be any positive integer, and let k, be the number of distinct integers that can be written as n-digit numbers (a,-1 . . . alac)b to base b with digits a, in D. Prove that the sequence (k,) satisfies a linear recurrence relation, and explain how to compute the generating function C,, kd. Illustrate your algorithm in the case b = 3 and D = (-1, 0,3}.

196 ARITHMETIC 4.1 27. [M.Zl] Show that every

Thursday, October 25th, 2007

196 ARITHMETIC 4.1 27. [M.Zl] Show that every nonzero integer has a unique reversing binary representa- tion 2-3 -2el + . + (-1)9, where es < el < .. < et. b 28. [M24] Show that every nonzero complex number of the form a + bi where a and b are integers has a unique revolving binary representation (1 + i) + i(l + i) -(1 + i) -i(1 + i) + . . + ?(l + i) , where es < el < ... < et. (Cf. exercise 27.) 29. [M35] (N. G. de Bruijn.) Let SO, S1, Ss, be sets of nonnegative integers; we will say that the collection {SO, ,571, Sz, . } has Property B if every nonnegative integer n can be written in the form n = so + Sl + s2 + . , s3 E s,, in exactly one way. (Property B implies that 0 E S, for all j, since n = 0 can only be represented as 0 + 0 + 0 + . . . .) Any mixed-radix number system with radices bo, bl, bz, . provides an example of a collection of sets satisfying Property B, if we let S, = (0, B3, , (b, -l)B,}, where B, = bobI . . b,-1; here the representation of n = so + Sl + s2 + '.. corresponds in an obvious manner to its mixed-radix repre- sentation (9). Furthermore, if the collection {SO, Si, Sz, . } has Property B, and if Ao, AI, A2, . . is any partition of the nonnegative integers (so that we have A0 U A1 U A2 U . = (0, 1,2,. . } and A, n A3 = 0 for i # j; some A3 s may be empty), then the collapsed collection {TO, ZJ, 7 2, . } also has Property B, where ? , is the set of all sums x2eA s, taken over all possible choices of si E S,. Prove that any collection {TO, T1, 7 2, , } that satisfies Property B may be obtained by collapsing some collection {SO, S1 , S2, . . } that corresponds to a mixed-radix number system. 30. [A4391 (N. G. de Bruijn.) The radix-(-2) number system shows us that every integer (positive, negative, or zero) has a unique representation of the form (-2)el + (--2)Q + . . + (--2)Q, el > e2 > f.. > et 2 0, t 2 0. The purpose of this exercise is to explore generalizations of this phenomenon. a) Let bc, bl, b2, . . be a sequence of integers such that every integer n has a unique representation of the form n = beI + be, + . . + b,, , el > e2 > … > et 2 0, t > 0. (Such a sequence (b,) is called a binary basis. ) Show that there is an index j such that b, is odd, but bk is even for all Ic # j. b) Prove that a binary basis (bn) can always be rearranged into the form do, 2di, 4dz, . . . = (2nd,), where each dk is odd. c) If each of do, di, ds, . in (b) is fl, prove that (b,) is a binary basis if and only if there are infinitely many +1 s and infinitely many -1 s. d) Prove that 7, -13. 2, 7 3 2 , -13. 23, . . , 7. 22k, -13. 22k+1, . is a binary basis, and find the representation of n = 1.