4.2.3 DOUBLE-PRECISION CALCULATIONS 237 For extension (Web site designers) of the
4.2.3 DOUBLE-PRECISION CALCULATIONS 237 For extension of the methods of this section to triple-precision floating point fraction parts, see Y. Ikebe, CACM 8 (1965), 175-177. EXERCISES 1. [16] Try the double-precision division technique by hand, with E = &, when dividing 180000 by 314159. (Thus, let (urn, ~1~1) = (.180, .OOO) and (urn, WL) = (.314, .159), and find the quotient using the method suggested in the text following (2).) 2. [ZOO] Would it be a good idea to insert the instruction ENTX 0 between lines 30 and 31 of Program M, in order to keep unwanted information left over in register X from interfering with the accuracy of the results? 3. [A&O] Explain why overflow cannot occur during Program M. 4. [.Z?] How should Program M be changed so that extra accuracy is achieved, essentially by moving the vertical line in Fig. 4 over to the right one position? Specify all changes that are required, and determine the difference in execution time caused by these changes. b 5. [.24] How should Program A be changed so that extra accuracy is achieved, essen- tially by working with a nine-byte accumulator instead of an eight-byte accumulator to the right of the decimal point? Specify all changes that are required, and determine the difference in execution time caused by these changes. 6. [.% ] Assume that the double-precision subroutines of this section and the single- precision subroutines of Section 4.2.1 are being used in the same main program. Write a subroutine that converts a single-precision floating point number into double-precision form (l), and write another subroutine that converts a double-precision floating point number into single-precision form (reporting exponent overflow or underflow if the conversion is impossible). b 7. [M30] Estimate the accuracy of the double-precision subroutines in this section, by finding bounds 61, 62, and 6s on the relative errors 8. [MZ8] Estimate the accuracy of the improved double-precision subroutines of exercises 4 and 5, in the sense of exercise 7. 9. [M42] T. J. Dekker [Numer. Math. 18 (1971), 224-2421 has suggested an alter- native approach to double precision, based entirely on single-precision floating binary calculations. For example, Theorem 4.2.2C states that U+U = w+r, where w = v@ and r = (U 8 w) $ v, if ]u] 2 (~1 and the radix is 2; here ]r] 5 ]2~]/2~, so the pair (w, r) may be considered a double-precision version of u + w. To add two such pairs (u, u ) $ (w, w ), where ]u ] 5 ]~]/2~ and IV ] 5 ]~]/2~ and ]u] 2 /WI, Dekker suggests computing u + u = w + r (exactly), then s = (r $ w ) @ U (an approximate remainder), and finally returning the value (w $ s, (w 8 (w $ s)) $ s). Study the accuracy and efficiency of this approach when it is used recursively to produce quadruple-precision calculations.