240 ARITHMETIC 4.2.4 (Michigan web site) gave good grounds for believing

240 ARITHMETIC 4.2.4 gave good grounds for believing that the leading digit d occurs with probability log,,,(l + l/d). The same distribution was discovered empirically, many years later, by Frank Benford, who reported the results of 20,229 observations taken from different sources [Proc. Amer. Philosophical Sot. 78 (1938), 551-5721. In order to account for this leading-digit law, let s take a closer look at the way we write numbers in floating point notation. If we take any positive number u, its leading digits are determined by the value (logrc u) mod 1: The leading digit is less than d if and only if (loglo u) mod 1 < log,, d, (1) since 10fu = 10(lofJlo u, mod l. Now if we have a random positive number U, chosen from some reasonable distribution that might occur in nature, we might expect that (log,, U) mod 1 would be uniformly distributed between zero and one, at least to a very good approximation. (Similarly, we expect U mod 1, U2 mod 1, d-mod 1, etc., to be uniformly distributed. We expect a roulette wheel to be unbiased, for essentially the same reason.) Therefore by (1) the leading digit will be 1 with probability log,, 2 M 30.103 percent; it will be 2 with probability log,, 3 - log,,2 M 17.609 percent; and, in general, if T is any real value between 1 and 10, we ought to have 1Ofu 2 T approximately logic T of the time. Another way to explain this law is to say that a random value U should appear at a random point on a slide rule, according to the uniform distribution, since the distance from the left end of a slide rule to the position of U is propor- tional to (log,,, U) mod 1. The analogy between slide rules and floating point calculation is very close when multiplication and division are being considered. The fact that leading digits tend to be small is important to keep in mind; it makes the most obvious techniques of average error estimation for floating point calculations invalid. The relative error due to rounding is usually a little more than expected. Of course, it may justly be said that the heuristic argument above does not prove the stated law. It merely shows us a plausible reason why the leading digits behave the way they do. An interesting approach to the analysis of leading digits has been suggested by R. Hamming: Let p(r) be the probability that 1Ofu 2 r, where 1 2 r 5 10 and fry is the normalized fraction part of a random normalized floating point number U. If we think of random quantities in the real world, we observe that they are measured in terms of arbitrary units; and if we were to change the definition of a meter or a gram, many of the fundamental physical constants would have different values. Suppose then that all of the numbers in the universe are suddenly multiplied by a constant factor c; our universe of random floating point quantities should be essentially unchanged by this transformation, so p(r) should not be affected. Multiplying everything by c has the effect of transforming (log,, U) mod 1 into (log,, U + log,, c) mod 1. It is now time to set up formulas that describe the desired behavior; we may assume that 1 2 c 2 10. By definition, p(r) = probability that (log,, U) mod 1 2 log,, r.

Leave a Reply