Archive for September, 2007

3.5 WHAT IS A RANDOM SEQUENCE? (Linux web host) 149 Thus

Monday, September 17th, 2007

3.5 WHAT IS A RANDOM SEQUENCE? 149 Thus a k-distributed sequence is the special case m = 1 in Definition E; the case m = 2 means that the k-tuples starting in even positions must have the same density as the k-tuples starting in odd positions, etc. Several properties of Definition E are obvious: An (m, k)-distributed sequence is (m, n)-distributed for 1 5 rc 5 k. (12) An (m, k)-distributed sequence is (d, k)-distributed for all divisors d of m. (13) We can also define the concept of an (m, k)-distributed bary sequence, as in Definition D; and the proof of Theorem A remains valid for (m, k)-distributed sequences. The next theorem, which is in many ways rather surprising, shows that the property of being co-distributed is very strong indeed, much stronger than we imagined it to be when we first considered the definition of the concept. Theorem C (Ivan Niven and H. S. Zuckerman). An co-distributed sequence is (m, k)-distributed for all positive integers m and k. Proof. It suffices to prove the theorem for bary sequences, by using the gen- eralization of Theorem A just mentioned. Furthermore, we may assume that m = k, because (12) and (13) tell us that the sequence will be (m, k)-distributed if it is (mk, mk)-distributed. So we will prove that any m-distributed b-ary sequence X0, X1, . . . is (m, m)- distributed for all positive integers m. Our proof is a simplified version of the original one given by Niven and Zuckerman in Pacific J. Math. 1 (1951), 103-109. The key idea we shall use is an important technique that applies to many situations in mathematics: If the sum of m quantities and the sum of their squares are both consistent with the hypothesis that the m quantities are equal, then that hypothesis is true. In a strong form, this principle may be stated as follows: Lemma E. Given m sequences of numbers (yjn) = yjlj~, ~71, . . . for 1 2 j 2 m, suppose that lim (YM + ~2~ + . . . + ymn) = ma, n+00 (14) lim sup (Y:, + yin + . . -+ yk.,) I mff . n-cc Then for each j, lim,,, yjn exists and equals cy. An incredibly simple proof of this lemma is given in exercise 9. 1 Resuming our proof of Theorem C, let z = zlz2.. . zm be a bary number, and say that 5 occurs at position p if Xp–m+lXp–m+2.. .X, = 2. Let ~j(n) be the number of occurrences of x at position p when p < n and pmodm = j. Let yyn = vj(n)/ n; we wish to prove that lim yjn = l/mbm. (15) n+m

Web hosting domain - 148 RANDOM NUMBERS 3.5 We can also show

Sunday, September 16th, 2007

148 RANDOM NUMBERS 3.5 We can also show that the serial correlation test is satisfied: Corollary S. If a [ 0,l) sequence is (k + 1)-distributed, the serial correlation coefficient between U, and Un+k tends to zero: lim k c UjUj+k -($ c uj)(k c Uj+k) n oo d(~cU32-(~~uj)2)(~CUjZ+k-(~Cuj+k)2) = * (All summations here are for 0 5 j < n.) Proof. By Theorem B, the quantities tend to the respective limits 4, 6, 6, 4, 4 as 72 + 00. 1 Let us now consider some slightly more general distribution properties of sequences. We have defined the notion of /c-distribution by considering all of the adjacent k-tuples; for example, a sequence is a-distributed if and only if the points @Jo, w, (Ul, u2>, 672, U3), 073, U4), P4, u51, . . . are equidistributed in the unit square. It is quite possible, however, that this can happen while alternate pairs of points (UI, Uz), (Us, U4), (Us, UG), . . . are not equidistributed; if the density of points (U2n–1, Uzn) is deficient in some area, the other points (Uz,, U2n+l) might compensate. For example, the periodic binary sequence (Xn) = o,o,o, 1, o,o,o, 1, l,l,O, 1, l,l,O, 1, o,o,o, 1, . . . , (11) with a period of length 16, is seen to be 3-distributed; yet the sequence of even- numbered elements (Xsn) = 0, 0, 0, 0, 1, 0, 1, 0, . . . has three times as many zeros as ones, while the subsequence of odd-numbered elements (Xan+i) = 0, 1, 0, 1, 1, 1, 1, 1, . . . has three times as many ones as zeros. If a sequence (Un) is oo-distributed, example (11) shows that it is not at all obvious that the subsequence of alternate terms (Uzn) = UO, U2, U4, Us, . . . will be co-distributed or even l-distributed. But we shall see that (Uzn) is, in fact, m-distributed, and much more is true. Definition E. A [ 0,l) sequence (Un) is said to be (m, k)-distributed if Pr( l 5 Umn+j < %r u2 2 Umn+j+l < 212, . . . , uk 5 Umn+j+k-1 < vk) = (211 -u1) . . . ( ?& -t k) for all choices of red numbers ur, V~ with 0 2 u7 < v, 5 1 for 1 < r 5 k, and for a1J integers j with 0 5 j < m.

3.5 WHAT IS A RANDOM SEQUENCE? (Web hosting billing) 147 Proof.

Saturday, September 15th, 2007

3.5 WHAT IS A RANDOM SEQUENCE? 147 Proof. The definition of a k-distributed sequence states that this result is true in the special case that 1, if u1 5 z1 < wl,.. .,uk 2 xk < vk; f(x1, . . . , xk) = (9) 0, otherwise. Therefore Eq. (8) is true whenever f = al fl +asf~ +. . . +a,fm and when each fj is a function of type (9); in other words, Eq. (8) holds whenever f is a step- function obtained by (i) partitioning the unit k-dimensional cube into subcells whose faces are parallel to the coordinate axes, and (ii) assigning a constant value to f on each subcell. Now let f be any Riemann-integrable function. If E is any positive number, we know (by the definition of Riemann-integrability) that there exist step func- tions f and 7 such that f(zl,. . . , zk) 2 f(xl, . . . , zk) 5 7(x1,. . . , xk), and such that the difference of the integrals of f and 7 is less than 6. Since Eq. (8) holds for f and 7, and since - we conclude that Eq. (8) is true also for f. 1 Theorem B can be applied, for example, to the permutation test of Section 3.3.2. Let (pl,pz,. .., pk) be any permutation of the numbers {1,2,. . . , k}; we want to show that pr(un+p~–l < &1+~~-1 < . . . < un+pk-l) = l/k!. (10) To prove this, assume that the sequence (Un) is k-distributed, and let 1, if xpl < xpz < . < xpr; f(x1,. . . , xk) = 0, otherwise. We have pr(un-tpl-1 < Un+pz–l < *. . < un+pk-l) -I J f(xl, . . . . . . , xk) dXl dXk ~;:;~p~~zpk…lx’” dq,,f2 dxpl = ;. Corollary P. If a [ 0,l) sequence is k-distributed, it satisfies the permutation test of order k, in the sense of Eq. (10). 1

Web hosting reseller - 146 RANDOM NUMBERS 3.5 Theorem A. Let (Un)

Saturday, September 15th, 2007

146 RANDOM NUMBERS 3.5 Theorem A. Let (Un) = UO, VI, UZ, . . . be a [ 0,l) sequence. If the sequence (1bjG.J) = lbjUoJ, Ly&J, 14U2J, . . . is a k-distributed bj-ary sequence for all bj in an infinite sequence of integers 1 < bl < bz < b3 < +. ., then the original sequence (Un) is k-distributed. As an example of this theorem, suppose that bj = 2j. The sequence [2Wo], 12jU1J, . . . is essentially the sequence of the first j bits of the binary representations of UO, VI, . . . . If all these integer sequences are k-distributed, in the sense of Definition D, then the real-valued sequence UO, UI, . . . must also be k-distributed in the sense of Definition B. Proof of Theorem A. If the sequence [bU,-,J, LbUlJ, . . . is k-distributed, it follows by the addition of probabilities that Eq. (5) holds whenever each Uj and V, is a rational number with denominator b. Now let Uj, VU~ be any real numbers, and let ~5, v$ be rational numbers with denominator b such that U> 5 Uj < IL; + l/b, 215 2 Vj < 215 + l/b. Let S(n) be the statement that u1 5 U, < VI, . . . , uk 5 Un+k-l < vk. we have FS(n)) I Pr 4 I U, < vi + f , . . . , ?k 5 h+k-1 < 21; + f 1 = v;-u;– b NOW I($ - U: f l/b) -(Vj -Uj)J 5 2/b; since our inequalities hold for all b = bj, and since bj + COas j + 00, we have (vl -ul). . . (vk -uk) 5 i?@(n)) 5 E(S(n)) 5 (Vl -Ul). . . (vk -uk). 1 The next theorem is our main tool for proving things about k-distributed sequences. Theorem B. Let (Un) be a k-distributed [ 0,l) sequence, and Jet f(zl, ~2,. . . , zk) be a Riemann-integrable function of k variables; then h c f(uj,Uj+l,…rUj+k–l) n-+m n OIj

Free web space - 3.5 WHAT IS A RANDOM SEQUENCE? 145 sequence

Friday, September 14th, 2007

3.5 WHAT IS A RANDOM SEQUENCE? 145 sequence 3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, . . . . It has been conjectured that this sequence is co-distributed, but nobody has yet been able to prove that it is even l-distributed. Let us analyze these concepts a little more closely in the case when Ic equals a million. A binary sequence that is lOOOOOO-distributed is going to have runs of a million zeros in a row! Similarly, a [ 0,l) sequence that is lOOOOOO-distributed is going to have runs of a million consecutive values each of which is less than 4. It is true that this will happen only (4) loooooo of the time, on the average, but the fact is that it does happen. Indeed, this phenomenon will occur in any truly random sequence, using our intuitive notion of truly random. One can easily imagine that such a situation will have a drastic effect if this set of a million truly random numbers is being used in a computer-simulation experiment; there would be good reason to complain about the random number generator. However, if we have a sequence of numbers that never has runs of a million consecutive U s less than f , the sequence is not random, and it will not be a suitable source of numbers for other conceivable applications that use extremely long blocks of U s as input. In summary, a truly random sequence will exhibit local nonrandomness. Local nonrandomness is necessary in some applications, but it is disastrous in others. We are forced to conclude that no sequence of random numbers can be adequate for every application. In a similar vein, one may argue that there is no way to judge whether a finite sequence is random or not; any particular sequence is just as likely as any other one. These facts are definitely stumbling blocks if we are ever to have a useful definition of randomness, but they are not really cause for alarm. It is still possible to give a definition for the randomness of infinite sequences of real numbers in such a way that the corresponding theory (viewed properly) will give us a great deal of insight concerning the ordinary finite sequences of rational numbers that are actually generated on a computer. Furthermore, we shall see later in this section that there are several plausible definitions of randomness for finite sequences. B. co-distributed sequences. Let us now undertake a brief study of the theory of sequences that are co-distributed. To describe the theory adequately, we will need to use a bit of higher mathematics, so we assume in the remainder of this subsection that the reader knows the material ordinarily taught in an advanced calculus course. First it is convenient to generalize Definition A, since the limit appearing there does not exist for all sequences. Let us define Fr(S(n)) = lim sup (~(n)/n), &(S(n)) = lim&f (~(n)/n). (7) 72-00 Then Pr(S(n)), i f i t exists, is the common value of Pr(S(n)) and E(S(n)). We have seen that a k-distributed [ 0,l) sequence leads to a k-distributed bary sequence, if U is replaced by [MY]. Our first theorem shows that a converse result is also true.

144 RANDOM NUMBERS 3.5 general, for any positive (Web design service)

Thursday, September 13th, 2007

144 RANDOM NUMBERS 3.5 general, for any positive integer k we can require our sequence to be k-distributed in the following sense: Definition B. The sequence (1) is said to be k-distributed if Pr(ul < U, . . . (vk -uk) (5) for a11 choices of real numbers Uj, vj, with 0 2 Uj < 1-j < 1, for 1 5 j 5 k. An equidistributed sequence is a l-distributed sequence. Note that if k > 1, a k-distributed sequence is always (k -1)-distributed, since we may set uk = 0 and vk = 1 in Eq. (5). Thus, in particular, any sequence that is known to be 4-distributed must also be 3-distributed, a-distributed, and equidistributed. We can investigate the largest k for which a given sequence is k-distributed; and this leads us to formulate Definition C. A sequence is said to be co-distributed if it is k-distributed for all positive integers k . So far we have considered [ 0,l) sequences, i.e., sequences of real numbers lying between zero and one. The same ideas apply to integer-valued sequences; let us say a sequence (Xn) = X0, X1, X2, . . . is a bary sequence if each X, is one of the integers 0, 1, . . . , b - 1. Thus, a 2-ary (binary) sequence is a sequence of zeros and ones. We also say that a k-digit bary number is a string of k integers ~1~2 . . . xk, where 0 5 xj < b for 1 5 j 5 k. Definition D. A b-ary sequence is said to be k-distributed if Pr(x,&+1.. .&f&l = X1X2.. . xk) = l/bk (6) for all b-ary numbers x1x2 . . . xk. It is clear from this definition that if U,-,, VI, . . . is a k-distributed [ 0,l) sequence, then the sequence LbUoJ, [bUIJ, . . . is a k-distributed bary sequence. (If we set Uj = xj/b, vj = (Zj + 1)/b, X, = [bun], Eq. (5) becomes Eq. (6).) Furthermore, every k-distributed bary sequence is also (k -1)-distributed, if k > 1: we add together the probabilities for the bary numbers xl.. . xk-10, Xl . . . xk-1 1, . . . , xl . . . xk-1 (b -1) to obtain Pr(x, . . .a&+&2 = X1 . . .x&l) = l/b - . (Probabilities for disjoint events are additive; see exercise 5.) It therefore is natural to speak of an co-distributed bary sequence, as in Definition C above. The representation of a positive real number in the radix-b number system may be regarded as a bary sequence; for example, 7r corresponds to the lo-ary

3.5 WHAT IS A RANDOM SEQUENCE? 143 an (Hosting your own web site)

Thursday, September 13th, 2007

3.5 WHAT IS A RANDOM SEQUENCE? 143 an adequate definition of randomness according to these criteria, although many interesting questions remain to be answered. Let u and u be real numbers, 0 5 u < u 2 1. If U is a random variable that is uniformly distributed between 0 and 1, the probability that u 5 U < w is equal to ZI - u. For example, the probability that 3 5 U < $ is 4. How can we translate this property of the single number U into a property of the infinite sequence UO, VI, Uz, . . . ? The obvious answer is to count how many times U, lies between u and w, and the average number of times should equal w - u. Our intuitive idea of probability is based in this way on the frequency of occurrence. More precisely, let v(n) be the number of values of j, 0 5 j < n, such that u 5 Uj < v; we want the ratio v(n)/n to approach the value v - u as n approaches infinity: lim V(n)/n = 2, - u. (4 n-00 If this condition holds for all choices of u and v, the sequence is said to be equidistributed. Let S(n) be a statement about the integer n and the sequence VI, U2, . . . ; for example, S(n) might be the statement considered above, u < U, < v. We can generalize the idea used in the preceding paragraph to define the probability that S(n) is true with respect to a particular infinite sequence: Let v(n) be the number of values of j, 0 5 j < n, such that S(j) is true. Definition A. We say Pr(S(n)) = X, if limn–rm u(n)/n = X. (Read, The probability that S(n) is true equals X, if the limit as n tends to infinity of v(n)/n equals A. ) In terms of this notation, the sequence Uo, VI, . . . is equidistributed if and only if Pr(u 2 U,, < V) = v -u, for all real numbers u, u with 0 5 u < v 5 1. A sequence may be equidistributed without being random. For example, if uo, Ul, … and VO, VI, . . . are equidistributed sequences, it is not hard to show that the sequence ~,~,W2,~,.*. = @Jo, $+po, gJ1, i+g4 7 … is also equidistributed, since the subsequence DUO, $UI, . . . is equidistributed between 0 and 4, while the alternate terms f + $VO, 3 + JVl, . . . , are equi- distributed between 4 and 1. In the sequence of W s, a value less than i is always followed by a value greater than or equal to 4, and conversely; hence the sequence is not random by any reasonable definition. A stronger property than equidistribution is needed. A natural generalization of the equidistribution property, which removes the objection stated in the preceding paragraph, is to consider adjacent pairs of numbers of our sequence. We can require the sequence to satisfy the condition Pr(ul 5 U, < 211 and ~2 2 G+I < 7~2)= (ul -u1)(v2 - u2) (4)

142 RANDOM NUMBERS 3.5 (Multiple domain web hosting) 3.5. WHAT IS A

Wednesday, September 12th, 2007

142 RANDOM NUMBERS 3.5 3.5. WHAT IS A RANDOM SEQUENCE? A. Introductory remarks. We have seen in this chapter how to generate sequences (1) of real numbers in the range 0 2 U, < 1, and we have called them random sequences even though they are completely deterministic in character. To justify this terminology, we claimed that the numbers behave as if they are truly random. Such a statement may be satisfactory for practical purposes (at the present time), but it sidesteps a very important philosophical and theoretical question: Precisely what do we mean by random behavior ? A quantitative definition is needed. It is undesirable to talk about concepts that we do not really understand, especially since many apparently paradoxical statements can be made about random numbers. The mathematical theory of probability and statistics carefully sidesteps the question; it refrains from making absolute statements, and instead expresses everything in terms of how much probability is to be attached to statements involving random sequences of events. The axioms of probability theory are set up so that abstract probabilities can be computed readily, but nothing is said about what probability really signifies, or how this concept can be applied meaningfully to the actual world. In the book Probability, Statistics, and Truth (New York: Macmillan, 1957), R. von Mises discusses this situation in detail, and presents the view that a proper definition of probability depends on obtaining a proper definition of a random sequence. Let us paraphrase here some statements made by two of the many authors who have commented on the subject. D. H. Lehmer (1951): A random sequence is a vague notion embodying the idea of a sequence in which each term is unpredictable to the uninitiated and whose digits pass a certain number of tests, traditional with statisticians and depending somewhat on the uses to which the sequence is to be put. J. IV. Franklin (1962): The sequence (1) is random if it has every property that is shared by all infinite sequences of independent samples of random variables from the uniform distribution. Franklin s statement essentially generalizes Lehmer s to say that the se- quence must satisfy all statistical tests. His definition is not completely precise, and we will see later that a reasonable interpretation of his statement leads us to conclude that there is no such thing as a random sequence! So let us begin with Lehmer s less restrictive statement and attempt to make it precise. What we really want is a relatively short list of mathematical properties, each of which is satisfied by our intuitive notion of a random sequence; furthermore, the list is to be complete enough so that we are willing to agree that any sequence satisfying these properties is random. In this section, we will develop what seems to be

3.42 RANDOM SAMPLING AND SHUFFLING 141 12. [M,E6] (Business web site)

Tuesday, September 11th, 2007

3.42 RANDOM SAMPLING AND SHUFFLING 141 12. [M,E6] The gist of Algorithm P is that any permutation T can be uniquely written as a product of transpositions in the form T = (att) . . . (as3)(~2), where 1 2 a3 2 j for t 2 j > 1. Prove that there is also a unique representation of the form r = (b22)(b33). . . (btt), where 1 5 bj 2 j for 1 < j 5 t, and design an algorithm that computes the b s from the u s in O(t) steps. 13. [MB] (S. W. Golomb.) One of the most common ways to shuffle cards is to divide the deck into two parts as equal as possible, and to riffle them together. (See the discussion of card-playing etiquette in Hoyle s rules of card games; we read, A shuffle of this sort should be made about three times to mix the cards thoroughly. ) Consider adeckof2n-1 cardsXl,Xz, . . . . X2,-1; a perfect shuffle s divides this deck into X1, X2, . . . , X, and Xn+l, . . . , X2n--1, then perfectly interleaves them to obtain Xl, xn+1, x2, -%x+2, . . . , Xzn-l, X,. The cut operation c3 changes X1, X2, . . , Xzn-l into X3+1, . . . , X2n-1, Xl, ...I Xj. Show that by combining perfect shuffles and cuts, at most (2n -1)(2n -2) different arrangements of the deck are possible, if n > 1. F 14. [so] (Ole-Johan Dahl.) If Xk = k for 1 5 k 5 t at the start of Algorithm P, and if we terminate the algorithm when j reaches the value t -n, the sequence Xten+l, . . . , Xt is a random permutation of a random combination of n elements. Show how to simulate the effect of this procedure using only O(n) cells of memory. b 15. [Mz~] Devise a way to compute a random sample of n records from N, given N and n, based on the idea of hashing (Section 6.4). Your method should use O(n) storage locations and an average of O(n) units of time, and it should present the sample as a sorted set of integers 1 5 Xl < X2 < … < X, 5 N. 16. [.%I Discuss the problem of weighted sampling, where each subset of n elements is obtained with probability proportional to the product of the weights of the elements.

140 RANDOM NUMBERS 3.4.2 This algorithm was first (Web hosting faq)

Tuesday, September 11th, 2007

140 RANDOM NUMBERS 3.4.2 This algorithm was first published by L. E. Moses and R. V. Oakford, in Tables of Random Permutations (Stanford University Press, 1963); and by R. Durstenfeld, CACM 7 (1964), 420. It can also be modified to obtain a random permutation of a random combination (see exercise 14). For a discussion of random combinatorial objects of other kinds (e.g., parti- tions), see Section 7.2 and/or the book Combinatorial Algorithms by Nijenhuis and Wilf (New York: Academic Press, 1975). EXERCISES 1. [Mz?] Explain Eq. (1). 2. [.20] Prove that Algorithm S never tries to read more than N records of its input file. b 3. [,% I The (t+l)st item in Algorithm S is selected with probability (n–m)/(N–t), not n/N, yet the text claims that the sample is unbiased-so each item should be selected with the same probability. How can both of these statements be true? 4. [A&B] Let p(m, t) be the probability that exactly m items are selected from among the first t in the selection sampling technique. Show directly from Algorithm S that 5. [M.24] What is the average value of t when Algorithm S terminates? (In other words, how many of the N records have been passed, on the average, before the sample is complete?) 6. [M2 4] What is the standard deviation of the value computed in the previous exercise? b 7. [Mz~] Prove that any given choice of n records from the set of N is obtained by Algorithm S with probability l/(E). Th ere f ore the sample is completely unbiased. 8. [A4461 Algorithm S computes one uniform deviate for each input record it handles. Find a more efficient way to determine the number of input records to skip before the first is selected, assuming that N/n is rather large. (We could iterate this process to select the remaining n -1 records, thus reducing the number of necessary random deviates from order N to order n.) 9. [1,6] Let n = 3. If Algorithm R is applied to a file containing 20 records numbered 1 thru 20, and if the random numbers generated in step R3 are respectively 4,1,6,7,5,3,5,11,11,3,7,9,3,11,4,5,4, which records go into the reservoir? Which are in the final sample? 10. [15] Modify Algorithm R so that the reservoir is eliminated, assuming that the n records of the current sample can be held in memory. b 11. [M%] Let pm be the probability that exactly m elements are put into the reservoir during the first pass of Algorithm R. Determine the generating function G(z) = C,,, pmzm, and find the mean and standard deviation. (Use the ideas of Section 1.2.10.)