Archive for April, 2007

72 RANDOM NUMBERS 3.3.2 L. Historical remarks and (Free web hosting services)

Monday, April 30th, 2007

72 RANDOM NUMBERS 3.3.2 L. Historical remarks and further discussion. Statistical tests arose naturally in the course of scientists efforts to prove or disprove hypotheses about various observed data. The best known early papers dealing with the testing of artificially generated numbers for randomness are two articles by M. G. Kendall and B. Babington-Smith in the Journd of the Royal Statisticd Society 101 (1938), 147-166, and in the supplement to that journal, 6 (1939), 51-61. These papers were concerned with the testing of random digits between 0 and 9, rather than random real numbers; for this purpose, the authors discussed the frequency test, serial test, gap test, and poker test, although they misapplied the serial test. Kendall and Babington-Smith also used a variant of the coupon collector s test; the method described in this section was introduced by R. E. Greenwood in Math. Comp. 9 (1955), l-4. The run test has a rather interesting history. Originally, tests were made on runs up and down at once: a run up would be followed by a run down, then another run up, and so on. Note that the run test and the permutation test do not depend on the uniform distribution of the U s, they depend only on the fact that Vi = Uj occurs with probability zero when i # j; therefore these tests can be applied to many types of random sequences. The run test in primitive form was originated by J. Bienayme [Comptes Rendus 81 (Paris: Acad. Sciences, 1875), 417-4231. S ome sixty years later, W. 0. Kermack and A. G. McKendrick published two extensive papers on the subject (Proc. Royal Society Edinburgh 57 (1937), 228-240, 332-3761; as an example they stated that Edinburgh rainfall between the years 1785 and 1930 was entirely random in character with respect to the run test (although they examined only the mean and standard deviation of the run lengths). Several other people began using the test, but it was not until 1944 that the use of the chi-square method in connection with this test was shown to be incorrect. The paper by H. Levene and J. Wolfowitz in Annals Math. Stat. 15 (1944), 58-69, introduced the correct run test (for runs up and down, alternately) and discussed the fallacies in earlier misuses of that test. Separate tests for runs up and runs down, as proposed in the text above, are more suited to computer application, so we have not given the more complex formulas for the alternate-up-and-down case. See the survey paper by D. E. Barton and C. L. Mallows, Annals Math. Stat. 36 (1965), 236-260. Of all the tests we have discussed, the frequency test and the serial correla- tion test seem to be the weakest, in the sense that nearly all random number generators pass these tests. Theoretical grounds for the weakness of these tests are discussed briefly in Section 3.5 (cf. exercise 3.5-26). The run test, on the other hand, is a rather strong test: the results of exercises 3.3.3-23 and 24 sug- gest that linear congruential generators tend to have runs somewhat longer than normal if the multiplier is not large enough, so the run test of exercise 14 is definitely to be recommended. The collision test is also highly recommended, since it has been especially designed to detect the deficiencies of many poor generators that have unfor- tunately become widespread. This test, which is based on ideas of H. Delgas Christiansen [Inst. Math. Stat. and Oper. Res., Tech. Univ. Denmark (Oct. 1975),
Note: If you are looking for cheap webhost to host and run your apache application check Vision jboss web hosting services

3.3.2 EMPIRICAL TESTS 71 coefficient is (Virtual web hosting) not expected

Monday, April 30th, 2007

3.3.2 EMPIRICAL TESTS 71 coefficient is not expected to be exactly zero. (See exercise 18.) A good value of C will be between pn -2a, and pn + 2a,, where -1 1 n(n -3) Pn = -on=—n > 2. (25) n-l n-l J n+l We expect C to be between these limits about 95 percent of the time. Equations (25) are only conjectured at this time, since the exact distribution of C is not known when the U s are uniformly distributed. For the theory when the U s have the normal distribution, see the paper by Wilfrid J. Dixon, Annals Math. Stat. 15 (1944), 119-144. Empirical evidence suggests that we may safely use the formulas for the mean and standard deviation that have been derived from the assumption of the normal distribution, without much error; these are the values that have been listed in (25). It is known that lim,,, &(T, = 1; cf. the article by Anderson and Walker, Annals Math. Stat. 35 (1964), 1296-1303, where more general results about serial correlations of dependent sequences are derived. Instead of simply computing the correlation coefficient between the obser- vations ( UO, U1, . . . , Unpl) and their immediate successors (VI, . . . , Unpl, UO), we can also compute it between (UO, U1,. . , Unpl) and any cyclically shifted sequence (U,, . . . , Unpl, UO, . . . , U,-,); the cyclic correlations should be small for 0 < 4 < n. A straightforward computation of Eq. (24) for all 4 would require about n2 multiplications, but it is actually possible to compute all the correlations in only O(n log n) steps by using fast Fourier transforms. (See Section 4.6.4; cf. also L. P. Schmid, CACM 8 (1965), 115.) K. Tests on subsequences. It frequently happens that the external program using our random sequence will call for numbers in batches. For example, if the program works with three random variables X, Y, and 2, it may consistently invoke the generation of three random numbers at a time. In such applications it is important that the subsequences consisting of every third term of the original sequence be random. If the program requires 4 numbers at a time, the sequences uo, uq, u2q,. * * ; ~1,~q+1,~2q+1,...; . . . . uq-1,7Jzq-1, u3q-1,. . . can each be put through the tests described above for the original sequence UO, Ul, u2, . . . . Experience with linear congruential sequences has shown that these derived sequences rarely if ever behave less randomly than the original sequence, unless 4 has a large factor in common with the period length. On a binary computer with m equal to the word size, for example, a test of the subsequences for 4 = 8 will tend to give the poorest randomness for all 4 < 16; and on a decimal computer, g = 10 yields the subsequences most likely to be unsatisfactory. (This can be explained somewhat on the grounds of potency, since such values of 9 will tend to lower the potency.)
Note: If you are looking for best quality webspace to host and run your tomcat application check Vision tomcat hosting services

70 RANDOM NUMBERS 3.3.2 Sl. [Initialize.] Set A[j] (Web site)

Monday, April 30th, 2007

70 RANDOM NUMBERS 3.3.2 Sl. [Initialize.] Set A[j] e 0 for 0 2 j 5 n; then set A[l] t 1 and j, e ji t 1. Then do step S2 exactly n - 1 times and go on to step S3. S2. [Update probabilities.] (Each time we do this step it corresponds to tossing a ball into an urn; A[j] represents the probability that exactly j of the urns are occupied.) Set jr +ji+l. Thenforjcjl,ji-l,…, jh(inthis order), set A[j] +- (j/mkUl + ((1 + l/m) -(jlm))Ab -11. If ALI has become very small as a result of this calculation, say A[j] < 10m2 , set A[j] + 0; and in such a case, if j = j, decrease ji by 1, or if j = jo increase jo by 1. S3. [Compute the answers.] In this step we make use of an auxiliary table (Tl I r2 f . . * , Ttmax ) = (.Ol, .05, .25, .50, .75, .95, .99, 1.00) containing the specified percentage points of interest. Set p t 0, t t 1, and j c j. -1. Do the following iteration until t = tmax: Increase j by 1, and set p e p + A[j]; then if p > Tt, output n -j -1 and 1 - p (meaning that with probability 1 -p there are at most n -j -1 collisions) and repeatedly increase t by 1 until p 5 Tt. I J. Serial correlation test. We may also compute the following statistic: n(U0Ul + GU2 + . . . + un-2un–l + un-1Uo) -(VII + Ul + . . . + un-1y = n(U$ + UT + .*a+ q-,)-(uo + u1+…+ un-ly . (23) This is the serial correlation coefficient, which is a measure of the amount lJ+l depends on Uj. Correlation coefficients appear frequently in statistics; if we have n quantities uo, Ul, f U,-I and n others Vi, VI, . . . , V,-I , the correlation coefficient between them is defined to be C= n m4w -cc WC vj) (24 J(nCu3 -Euj)2)(nCV~ -(,IEV,)2) All summations in this formula are to be taken over the range 0 5 j < n; Eq. (23) is the special case V, = U(j+l) modn. (Note: The denominator of (24) is zero when UO = VI = ... = Un-l or VO = VI = ... = V,-1; we exclude this case from discussion.) A correlation coefficient always lies between -1 and +l. When it is zero or very small, it indicates that the quantities Uj and I+ are (relatively speaking) independent of each other, but when the correlation coefficient is fl it indicates total linear dependence; in fact Vj = cx f PiIJj for all j in such a case, for some constants cr and p. (See exercise 17.) Therefore it is desirable to have C in Eq. (23) close to zero. In actual fact, since U&J1 is not completely independent of UIU2, the serial correlation
Note: If you are looking for best quality webspace to host and run your tomcat application check Vision virtual web hosting services

3.3.2 EMPIRICAL TESTS (Web hosting colocation) 69 Suppose we have m

Monday, April 30th, 2007

3.3.2 EMPIRICAL TESTS 69 Suppose we have m urns and we throw n balls at random into those urns, where m is much greater than n. Most of the balls will land in urns that were previously empty, but if a ball falls into an urn that already contains at least one ball we say that a collision has occurred. The collision test counts the number of collisions, and a generator passes this test if it doesn t induce too many or too few collisions. To fix the ideas, suppose m = 220 and n = 214. Then each urn will receive only one 64th of a ball, on the average. The probability that a given urn will contain exactly Ic balls is pk = (z)mpk(l -m-l)n-k, so the expected number of collisions per urn is ,&>l(k-l)pk = xkBO k&-E,>, pk = n/m-l+po. Since po = (1 - m-l)n = 1 - n/m + (y)mm2 + small& terms, we find that the average total number of collisions taken over all m urns is very slightly less than n2/2m = 128. We can use the collision test to rate a random number generator in a large number of dimensions. For example, when m = 220 and n = 214 we can test the 20-dimensional randomness of a number generator by letting d = 2 and forming 20-dimensional vectors Vj = (Yzo~, Yzo~+~, . . . , Yzo~+~~) for 0 2 j < n. It suffices to keep a table of m = 22o bits to determine collisions, one bit for each possible value of the vector V,; on a computer with 32 bits per word, this amounts to 215 words. Initially all 220 bits of this table are cleared to zero; then for each V,, if the corresponding bit is already 1 we record a collision, otherwise we set the bit to 1. This test can also be used in 10 dimensions with d = 4, and so on. To decide if the test is passed, we can use the following table of percentage points when m = 220 and n = 214: collisions 5 101 108 119 126 134 145 153 with probability .009 .043 .244 .476 .742 .946 .989 The theory underlying these probabilities is the same we used in the poker test, Eq. (5); the probability that c collisions occur is the probability that n - c urns are occupied, namely m(m-l)...(m-n+c+l) n mn 1n-c I Although m and n are very large, it is not difficult to compute these probabilities using the following method: Algorithm S (Percentage points for collision test). Given m and n, this algorithm determines the distribution of the number of collisions that occur when n balls are scattered into m urns. An auxiliary array A[O], A[l], . . . , A[n] of floating point numbers is used for the computation; actually A[j] will be nonzero only for jo 2 j 2 j,, and ~ 1 - ~ 0 will be at most of order log n, so it would be possible to get by with considerably less storage.
Note: If you are looking for best quality webspace to host and run your tomcat application check Vision virtual web hosting services

68 RANDOM NUMBERS 3.3.2 Form the matrix C (Ipower web hosting)

Sunday, April 29th, 2007

68 RANDOM NUMBERS 3.3.2 Form the matrix C of the covariances of the R s; for example, Crs = covar(Rr , Rs), while CIt = covar(Rr , R:). When t = 6, we have C = nC1 + C2 =n + if n > 12. Now form A = (ai?), the inverse of the matrix C, and compute ClNote: In case you are looking for affordable and reliable webhost to host and run your j2ee application check Vision j2ee hosting services

Net web server - 3.3.2 EMPIRICAL TESTS 67 are (~+q+l)~~q)-(Pll~~l)-(p+~+l)+l (16) ways

Sunday, April 29th, 2007

3.3.2 EMPIRICAL TESTS 67 are (~+q+l)~~q)-(Pll~~l)-(p+~+l)+l (16) ways to arrange them in the order (15), as shown in exercise 13; and there are (n -p -q -l)! ways to arrange the remaining elements. Thus there are 1 1 times 16 ways in all, and dividing by n! we get the (p+:+&-P-q-1. ( 1 desired formula. From relations (14) a rather lengthy calculation leads to mean(R ,) = mean(Z,, + . . . + Z,,) = (n + l)Pl(P + 111 - (P - 1)/P!, l n, (18) where t = max(p, q), s = p + q, and ^, j(p, q, n) = (n + 1) s(l -pq) + pq 1Y -L)+(Y) (lg) ( (P + lY(q + (s + I)! , (s2 -s - 2)pq -s2 - p2q2 + 1 t. (P + lY(q + l)! . This expression for the covariance is unfortunately quite complicated, but it is necessary for a successful run test as described above. From these formulas it is easy to compute mean = mean(R ,) -mean(R ,+,), covar(Rp, Rb) = covar(RL, Rb) -covar(R ,+, , R ,), (20) covar(Rp, R4) = covar(R,, R ,) -covar(Rp, Rb+,). In Annals Math. Stat. 15 (1944), 163-165, J. Wolfowitz proved that the quantities Rl, R2, . . . , RtF1, R: become normally distributed as n –+ 00, subject to the mean and covariance expressed above; this implies that the following test for runs is valid: Given a sequence of n random numbers, compute the number of runs R, of length p for 1 2 p < t, and also the number of runs Ri of length t or more. Let &I = RI -mean( . . . . &t-l = Rt-1 -mean(Rt-I), &t = R$ -mean( (21)
Note: In case you are looking for affordable webhost to host and run your web application check Vision http web server services

Web hosting directory - 66 RANDOM NUMBERS 3.3.2 Given any permutation on

Sunday, April 29th, 2007

66 RANDOM NUMBERS 3.3.2 Given any permutation on n elements, let Z,, = 1 if position i is the beginning of an ascending run of length p or more, and let Zpi = 0 otherwise. For example, consider the permutation (9) with n = 10; we have and all other Z s are zero. With this notation, R; = Z,I + Z,2 + . . . + z,n (12) is the number of runs of length > p, and Rp=R;-R;+l (13) is the number of runs of length p exactly. Our goal is to compute the mean value of Rp, and also the covariance covar(RP, R4) = mean((RP -mean(R,))(R, -mean(R, which measures the interdependence of Rp and R,. These mean values are to be computed as the average over the set of all n! permutations. Equations (12) and (13) show that the answers can be expressed in terms of the mean values of Z,, and of ZpiZql, so as the first step of the derivation we obtain the following results (assuming that i < j): Zpi = (P + hlMP + 1)!1 ifisn-p+l; otherwise. 3 (0 il: + wql(P + lY(q + l)!, ifi+p 1. Note that Z~iZ ~~ is either zero or one, so the summation consists of counting all permutations Ur Us . . . U, for which Z,, = .Z ,j = 1, that is, all permutations such that a-1 > u% < . < Uifp-1 > Ui+p < . . . < Ui+p+q-l. (15) The number of such permutations may be enumerated as follows: there are (P+q+l n ) ways to choose the elements for the positions indicated in (15); there
Note: If you are looking for cheap webhost to host and run your apache application check Vision apache web hosting services

3.3.2 EMPIRICAL TESTS 65 G. Run test. A (Free web design)

Sunday, April 29th, 2007

3.3.2 EMPIRICAL TESTS 65 G. Run test. A sequence may also be tested for runs up and runs down. This means we examine the length of monotone subsequences of the original sequence, i.e., segments that are increasing or decreasing. As an example of the precise definition of a run, consider the sequence of ten numbers 1298536704 ; putting a vertical line at the left and right and between X, and X3+1 whenever X, > X3+1, we obtain /l 2 9181513 6 710 41, (9) which displays the runs up : there is a run of length 3, followed by two runs of length 1, followed by another run of length 3, followed by a run of length 2. The algorithm of exercise 12 shows how to tabulate the length of runs up. Unlike the gap test and the coupon collector s test (which are in many other respects similar to this test), we should not apply a &i-square test to the above data, since adjacent runs are not independent. A long run will tend to be followed by a short run, and conversely. This lack of independence is enough to invalidate a straightforward chi-square test. Instead, the following statistic may be computed, when the run lengths have been determined as in exercise 12: V = k c (COUNT[~] - &,)(couNT[~] -nbY)az3, (10) 11w16 where n is the length of the sequence, and the coefficients at3 and bi are equal to fall a12 a13 a14 a15 a16 (4529.4 9044.9 13568 18091 22615 27892 a21 a22 a23 a24 a25 a26 9044.9 18097 27139 36187 45234 55789 a31 a32 a33 a34 a35 a36 13568 27139 40721 54281 67852 83685 a41 a42 a43 a44 a45 a46 18091 36187 54281 72414 90470 111580 a51 a52 a53 a54 a55 a56 22615 45234 67852 90470 113262 139476 ia61 a62 a63 a64 a65 a66 I= L 27892 55789 83685 111580 139476 172860 (11) (bl bz b3 h b5 b6) = (8 & & +& & &). (The values of az3 shown here are approximate only; exact values may be obtained by using formulas derived below.) The statistic V in (10) should have the chi- square distribution with six (not five) degrees of freedom, when n is large. The value of n should be, say, 4000 or more. The same test can be applied to runs down. A vastly simpler and more practical run test appears in exercise 14, so a reader who is interested only in testing random number generators should skip the next few pages and go on to the maximum-of-t test after looking at exercise 14. On the other hand it is instructive from a mathematical standpoint to see how a complicated run test with interdependent runs can be treated, so we shall now digress for a moment.
Note: In case you are looking for affordable webhost to host and run your servlet application check Vision mysql5 web hosting services

Photography web hosting - 64 RANDOM NUMBERS 3.3.2 F. Permutation test. Divide

Saturday, April 28th, 2007

64 RANDOM NUMBERS 3.3.2 F. Permutation test. Divide the input sequence into n groups oft elements each, that is, (Ujt, Ujt+l, . . . , Ujt++r) for 0 5 j < n. The elements in each group can have t! possible relative orderings; the number of times each ordering appears is counted, and a chi-square test is applied with k = t! and with probability l/t! for each ordering. For example, if t = 3 we would have six possible categories, according to whether Usj < Usj+l < Us3+z or U3j < U3j+2 < U3j+l or *** or U3j+2 < U3j+l < U3je We assume in this test that equality between U s does not occur; such an assumption is justified, for the probability that two U s are equal is zero. A convenient way to perform the permutation test on a computer makes use of the following algorithm, which is of interest in itself: Algorithm P (Analyze a permutation). Given a sequence of distinct elements (ht..., Ut), we compute an integer f(Ul, . . . , Ut) such that 0 2 f(U1, . f * , Ut) < t!, and f(Ul,. . . , Ut) = f(Vl,. . . , Vt) if and only if (VI,. . . , Ut) and (VI,. . . ,Vt) have the same relative ordering. Pl. [Initialize.] Set r c t, f c 0. (During this algorithm we will have 0 5 f < t!/r!.) P2. [Find maximum.] Find the maximum of {VI,. . . , UT}, and suppose that U, is the maximum. Set f t T. f + s - 1. P3. [Exchange.] Exchange U, c* Us. P4. [Decrease r.] Decrease r by one. If r > 1, return to step P2. 1 Note that the sequence (VI,. . . , Ut) will have been sorted into ascending or- der when this algorithm stops. To prove that the result f uniquely characterizes the initial order of (VI,. . ., Ut), we note that Algorithm P can be run backwards: For r = 2, 3, . . . , t, set s c f modr, f t [f/r], and exchange U,. Us. It is easy to see that this will undo the effects of steps P2-P4; hence no two permutations can yield the same value of f, and Algorithm P performs as advertised. The essential idea that underlies Algorithm P is a mixed-radix representation called the factorial number system : Every integer in the range 0 2 f < t! can be uniquely written in the form f = (. * .(6-l x (t -1) + G-2) x (t -2) + * * * + cz) x 2 + Cl = (t - l)! Q-1 + (t - 2)! Q-2 + * * * + 2! cz + l! Cl (7) where the digits Cj are integers satisfying Cl 5 Cj L j, for 1 2 j < t. (8) In Algorithm P, cr.-1 = s - 1 when step P2 is performed for a given value of T.
Note: If you are looking for cheap and reliable webhost to host and run your web application check Vision coldfusion web hosting services

3.3.2 EMPIRICAL TESTS (Space web hosting) 63 Algorithm C (Data for

Saturday, April 28th, 2007

3.3.2 EMPIRICAL TESTS 63 Algorithm C (Data for coupon collector s test). Given a sequence of integers Yo, K, , with 0 5 Y, < d, this algorithm counts the lengths of n consecutive coupon collector segments. At the conclusion of the algorithm, COUNT[r] is the number of segments with length r, for d 5 T < t, and COUNT[t] is the number of segments with length 2 t. Cl. [Initialize.] Set j c -1, s c 0, and set COUNT[r] +- 0 for d 5 r 5 t. C2. [Set q,r zero.] Set o c T c 0, and set OCCURS[k] c 0 for 0 5 k < d. C3. [Next observation.] Increase T and j by 1. If OCCURS[Yj] # 0, repeat this step. C4. [Complete set?] Set OCCURS[Y~] c 1 and o t q + 1. (The subsequence observed so far contains q distinct values; if q = d, we therefore have a complete set.) If q < d, return to step C3. C5. [Record the length.] If r > t, increase COUNT[t] by one, otherwise increase COUNT[r] by one. C6. [n found?] Increase s by one. If s < n, return to step C2. 1 For an example of this algorithm, see exercise 7. We may think of a boy collecting d types of coupons, which are randomly distributed in his breakfast cereal boxes; he must keep eating more cereal until he has one coupon of each type. A chi-square test is to be applied to COUNT[d], COUNT[d + 11, . . . , COUNT[t], with k = t-d+ 1, after Algorithm C has counted n lengths. The corresponding probabilities are dNote: In case you are looking for affordable webhost to host and run your servlet application check Vision make web site services