| main page (gr) | home | main page |
| novel (gr) | selections | readings | gallery | pre-meaning | advanced computing | str (gr) | internet epitome (gr) | win stuff |
| lecture #1 | lecture #2 | lecture #3 | lecture #4 | lecture #5 |

Advanced Computing Course - Lecture #5
Copyright © Constantin S. Chassapis - amorphous.art
Version 1: 19/1/95 - 15/2/95. Version 2: 26 Oct 2000.
In this lecture we will give a unified picture (i) of discrete and continuous probability distributions, (ii) of their transformations and, (iii) of their generation using a digital computer. The main tool for accomplishing this will be the Dirac delta function.
Before we continue, we must find a way to represent integrals in this document because HTML is not supporting a lot of mathematics notation. So, in what follows, the construct: aSbf(x)dx denotes the definite integral of function f(x) from a to b, while the construct S f(x)dx denotes the indefinite integral of f(x). Also we write inf to denote infinity.
For a mathematical definition of probability, please see the section on probability, some pages below. In reality, I have made here a conceptual swapping. Sorry. One must first understand what probability is all about and then try to recover some probability law through statistical procedures. But, well ..., we all know what a probability is! Don't we?
One may consider ä(x) as being the following limit:
(1) ä(x) = lima->0 sin(ax)/(ðx)
A very useful representation of delta function is also
(2) ä(x-x') = (1/(2ð)) -infS+inf exp(-ik(x-x')) dk
Grossly speaking, the delta function, ä(x), is everywhere zero, except at point zero where it jumps to infinity. It does so in such a manner to keep its integral equal to one:
(3) S ä(x) dx = 1
where the region of integration always includes zero.
Another way of defining delta function is, that, for an arbitrary function f(x) that is continuous at x=0, the equation
(4) S f(x) ä(x) dx = f(0)
is valid (where again the integration includes x=0).
It is important to note that, because of its singular character, the delta function cannot be the end result of a standard calculation and has meaning only so long as a subsequent integration over its argument is carried out (or is understood that it will be carried out !).
With this understanding, we may write
(5a) ä(x) = ä(-x)
(5b) ä'(x) = -ä'(-x)
(5c) xä'(x) = -ä(x)
(where the single quote denotes differentiation)
Keeping in mind what we stated above, we may also write
(6) È'(x) = ä(x)
where, È(x) is the step function:
(7) È(x) = 1, if x > 0, otherwise, = 0.
Let us see now how we may use all the above information to distributions. There are two types of random variables. Continuous and discrete. In the continuous case, the random variable may take any value inside a given interval [a,b]. A discrete random variable may take only some specific values from a given set Ù = {Xr, r=1, 2, ..., N}. In this discussion we assume that there are well defined probability laws associated with the above sets, [a,b], or Ù, but we do not necessarily know them.
If a certain process provides us with independent values xj (trials) with j = 1, 2, ..., M, and with xj in Ù, then a way to determine the unknown probabilities associated with Ù is by taking the limit of very big M. Then
(8) Pr = limM->inf mr/M
where the numbers mr are the number of appearances of each Xr. Obviously,
(9) Ór=1N mr = M
Therefore
(10) Ór=1N Pr = 1
We should point here that since we cannot in practice make an infinite number of trials, our knowledge of Pr will always be limited.
In the continuous case now, we may again have a process that gives to us the numbers xj with j = 1, 2, ..., M, but now xj belongs to [a,b], and due to the continuity it looses meaning to ask for the number of appearances of some particular value x inside [a,b]. There are simply infinite x's between a and b. The best alternative is to ask for the number of appearances of xj inside some subinterval [aâ, bâ] of [a,b]. Then again we may write (the index now points to a subinterval, not to a value)
(11) Pâ = limM->inf mâ/M
Now, if there is a well defined probability law behind our process which gives numbers, x, inside [a,b], there should exist a function ö(x) which will provide the probabilistic information we need. This is usually called probability density function (p.d.f.). This has the following property
(12) Pâ = aâSbâ ö(x) dx
(the strange limits just denote interval [aâ, bâ]. The whole problem arrises due to the limitations of HTML.)
Obviously it follows
(13) aSb ö(x) dx = 1
The mean, or average value of x, <x>, is defined to be
(14) <x> = aSb ö(x') x' dx'
More generally, for any well behaved function g(x) we will write
(15) < g(x) > = aSb ö(x') g(x') dx'
Now, observe a point that will enable us to unify our treatment of continuous and discrete random variables. We can define a p.d.f. for the discrete case in the following manner
(16) ö(x) = Ór=1N ä(x-Xr) Pr
with -inf < x < +inf. We may now use freely (15) for example in the discrete case, one easily verifies (using eq. (4)) that:
(15') < g(x) > = Ór=1N Pr g(Xr)
The validity of this result provides a justification for (16) even if one can approach (16) in a purely formal way.
We will, therefore, discuss mainly continuous distributions, since via the use of (16) the same formalism is also usable for the discrete case.
How we discover/recover a distribution? Obviously we have at hands only the values xj, j = 1, 2, ..., M, and we need to determine mâ. This is done by counting, or
(17) mâ= Ój=1M { È(bâ-xj) - È(aâ-xj) }
Now, a Cartesian diagram having â in the horizontal axis and mâ in the vertical axis is called a histogram and gives a picture that reflects the probability law (if there exists one). If there is no probability law behind the numbers, then the picture will vary unpredictably as M or the partioning of the x-axis varies.
If the sectioning becomes very detailed, i.e. if Ä -> 0, with Ä = max{(bâ-aâ)}, then, the picture will show if the distribution is discrete (for sufficiently big M) or no. This is accomplished by the clear presence of peaks and of nothing between them. In that case, because we will have Ä -> 0, the interval index will become a discrete value index and we will get
(18) Pr = mr/M (approximate equality)
In the case of a continuous distribution, (17) will still be o.k., but we will not be able to locate specific values by decreasing the value of Ä. The representation of the p.d.f. will become more faithful though as Ä -> 0 showing signs of shape-convergence (again for sufficiently big M).
Assuming that a p.d.f. exists therefore, we use (12) and (17)-(18) to write
(19) aâSbâ ö(x) dx = (1/M) Ój=1M { È(bâ-xj) - È(aâ-xj) } (approximate equality)
(where the equality becomes exact when M goes to infinity). Now (19) takes a really useful form if we have a very small Ä. Then
(20) ö(aâ) = (1/M[bâ-aâ]) Ój=1M { È(bâ-xj) - È(aâ-xj) } (approximate equality)
Equation (20) represents an O(Ä2) equality.
Here ends the numerical recovery/discovery of ö(x).
If we push now equation (20) a step further we should be able to corroborate the auto-coherence of our approach. We let now Ä -> 0 (and M>>1), then (19) becomes, by using (6)
(21) ö(aâ) = (1/(MÄ)) Ój=1M ä(aâ-xj)
and, in the limits Ä -> 0 and M -> inf we obtain again equation (4) in a more form (because the summation becomes an integration over the true p.d.f. of x, i.e. of ö(x)):
(21') ö(x) = aSb ö(x') ä(x-x') dx'
Assuming that we know the distribution ö(x), and if it holds that y=f(x), what is the distribution ø(y)? The answer, maybe not so obvious at first, is very simple, especially if one has a good understanding of the selective properties of the delta function and has well understood (21'):
(22) ø(y) = <ä(y-f(x))> = aSb ö(x) ä(y-f(x)) dx
To define probability we require three things: (a) sample set, (b) classes, (c) the statistical weight of each class. (In the previous section we named classes using the r index and mr gave the weight of the class. In the continuous case the integral of ö(x) in some small subinterval gave the statistical weight of the events happening in the subinterval). Now the above three things must satisfy some conditions. These are:
(a) The sample set E must be nonempty
(b) Each class is a subset of E, and, each subset is a class. Classes should satisfy the following conditions: if R1 is a classand R2 is another, then, their union R1 AND R2 is also a class. Their intersection R1 OR R2 is also a class. In addition, if R is a class, ten its complement is also a class.
(c) Each class is assigned a statistical weight P(R), that satisfies:
(c1) P(R) ≥ 0
(c2) P(R1 OR R2) = P(R1) + P(R2) - P(R1 AND R2)
(c3) P(E) = 1.
The last equation is the normalisation of the statistical weights. Any function P(R) that satisfies the above conditions is a probability.
The joint probability of R1 and R2 is denoted by
(23a) P(R1,R2) = P(R1 OR R2)
Given condition R2, the probability of R1 is
(23b) P(R1 | R2) = P(R1 OR R2)/P(R2)
The condition of "given R2" means that we take R2 as the sample set and then classify within R2.
The last equation can be also given in the form:
(23c) P(R1 OR R2) = P(R1 | R2) P(R2)
But note that we can equally well write P(R1 OR R2) = P(R2 | R1) P(R1), so that P(R1 | R2) P(R2) = P(R2 | R1) P(R1). Then, since P(R2) = P(R2 OR R1) + P(R2 OR (E-R1)) = P(R2 | R1) P(R1) + P(R2 | (E-R1)) P(E-R1), this important result follows:
(23d) P(R1 | R2) = {P(R2 | R1) P(R1)} / {P(R2 | R1) P(R1) + P(R2 | (E-R1))P(E-R1)}
This is Bayes' theorem. This important formula shows how to reverse the roles of R1 and R2 to obtain P(R1|R2) from P(R2|R1).
From the above definitions and equations we can determine the joint distribution ñ(x,y) and the conditional distribution ñ(x|y) of two variables x and y.
Define
(24) ñ(x,y)dxdy = P(R AND Q)
where event R is the value of x being within the interval corresponding to R and event Q is the case where y is within the limits corresponding to Q. Also,
(25) ñ(x|y) = ñ(x,y)/ñ(y)
It holds that
(26) ñ(x) = S ñ(x,y) dy
The definition of independence is as follows: if
(27) P(R1,R2) = P(R1) P(R2)
or
(28) P(R1 | R2) = P(R1)
then R1 and R2 are mutually independent, i.e. R2 does not affects R1, or vice versa.
The definition of independence for the variables x and y is
(29) ñ(x,y) = ñ(x)ñ(y)
or
(30) ñ(x|y) = ñ(x)
That is, the statistical data on x does not depend on y. If x and y are mutually independent, then any functions f and g satisfy
(31) < f(x)g(y) > = <f(x)> <g(y)>
Let the correlation of x and y be
(32) C = <xy> - <x> <y> = < (x-<x>) (y-<y>) >
Hence, if x and y are mutually independent, then C=0.
Mutually independent variables can be considered separately, simplifying many complicated problems.
The problem we will consider in this section belongs to the general class of "transformation of distributions". Given a distribution (say ñ(x,y,z) = const.), of points into a given volume, we ask what will this distribution be, if transformed to another coordinate system, ñ(r,è,ö)=?. Also we would like to know the distributions ñ(r), ñ(è), ñ(ö).
Before immediately attacking the 3D problem, we will discuss the problem in 2D.
Let r=(x,y) be distributed uniformly in the square (0<x<1, 0<y<1), i.e. ñ(x,y)=1. Calculate the distribution in the polar coordinates as well as ñ(r) and ñ(è).
Suppose we have analysed a problem with variables x1 and x2, and that we know ñ(x1,x2). The question is what will be ñ(î,ù) if
(33) î = u(x1,x2) x1 = ó(î,ù)
(34) ù = v(x1,x2) x2 = ô(î,ù)
The answer comes from the formula (generalised (22)),
(35) ñ(î,ù) = < ä(î-u(x1,x2)) ä(ù-v(x1,x2)) >
Now, the generalised (21') reads
(36) S S f(x1,x2) ä(x1-a1) ä(x2-a2) dx1 dx2 = f(a1,a2)
Putting (33) and (34) in (36) we get
(37) S S f(ó,ô) ä(ó(î,ù)-a1) ä(ô(î,ù)-a1) |J| dî dù = f(a1,a2)
where
(38) J = d(ó,ô)/d(î,ù)
is the Jacobian of the transformation (33)-(34). Since here we go from cartesian to polar coordinates, i.e, x = r cosè and y = r sinè, the Jacobian is equal to r, i.e. J = r. So, (37) becomes
(39) 1 = S S ñ(è,ö) ä(r-r1) ä(è-è1) (1/r) dr dè
Therefore, from (26),
(40) ñ(r,è) = r
(41) ñ(r) = 2ð r
(42) ñ(è) = 1
We are ready now to solve the 3D problem. The problem's structure is similar to the 2D case, but, of course, the results are different. The box is now (-1,1)x(-1,1)x(-1,1), and we have
(43) 1= ñ(x1,y1,z1) = S S S ñ(r,è,ö) ä(x1-rsinècosö)ä(y1-rsinèsinö)ä(z1-rcosè) drdèdö
The Jacobian of this transformation is J = r2sin è, so
(44) 1 = ñ(r1,è1,ö1)/(r12sin è1)
So, using (26), we get the normalised distributions
(45) ñ(r,è,ö) = {1/(4ð 3½)} r2 sin è
(46) ñ(r) = {1/(3½)} r2 0 ≤ r ≤ 3½
(47) ñ(è) = {1/2} sin è 0 ≤ è ≤ ð
(48) ñ(ö) = {1/(2ð)} 0 ≤ ö ≤ 2ð
Gregory Chaitin [1], showed in 1987 explicitly (using Lisp), that there exist symbolic objects (texts) which are "algorithmically inexplicable", i.e., cannot be specified by any text shorter than themselves. We shall proceed in this line of conceptualisation for approaching randomness.
Consider a number as a word, or string, written in a very particular language -- that of arithmetic -- then, define a random string as being an object describable by a string at least as large as the object itself. It comes naturally then, to say that a random number is not redundant i.e. cannot be defined by a program shorter than the length of the number itself.
Generating truly random numbers is both impractical and in fact undesirable as we shall see shortly. Pseudorandom numbers (the objects that this lecture discusses), have the great advantage that they can be reproduced exactly, without having to store them. This is accomplished via the pseudorandom number generators (prng) which are objects that produce numbers with many of the statistical properties of truly random numbers. The more statistical properties they reproduce the better prng are. This also makes them bigger and complicated. This is inevitable.
The ability to recall the pseudorandom sequence via the prng is desirable, both for improving the simulation program that uses it, and also (and most important) to debug it.
Random numbers from virtually any distribution can be obtained by transforming (0,1)-uniform random numbers (for the latter we reserve the symbol U). This is based on the two propositions below. (We call the appropriate integral of a given p.d.f. a "distribution" or a "probability distribution".).
Proposition 1: Let P[X=a] = q and P[X=b] = 1-q. Generate U. If U < q, output a, else output b. Then the output and X have the same distribution.
Proof: P[U ≤ q]=q=P[output=a]. What is missing goes to b. So the two distributions are identical.
Proposition 2: Let X have distribution F. Suppose that F is continuous and strictly increasing. Generate U. Then F-1(U) has distribution F.
Proof: U being uniform on (0,1): P[U ≤ x] = x for 0≤x≤1 . Hence P[F-1(U)≤t] = P[U≤F(t)] = F(t).
The interesting question now, becomes how to generate uniform pseudorandom numbers, since, due to the above propositions, we can produce any other p.d.f.
There are many methods. The most frequently used methods are the so called linear congruential generators. We will not discuss the number-theoretic issues involved in their definition, nor the statistical tests that pass or not pass with success. We will just give the minimum of theory behind their definition and we will also give a fortran program producing good uniform random numbers.
Linear congruential generators compute the i-th integer Xi in the pseudorandom sequence from Xi-1 by the recursion
(49) Xi = ( a Xi-1 + c ) mod m.
The parameters (all integer numbers) a, c, and m determine the statistical quality of the generator. If c is zero, then the resulting generator is called a (pure) "multiplicative congruential" generator. The particular multiplicative generator with:
(50a) m = 231 - 1 = 2147483647
(50b) a = 75 = 16807
is a good minimal standard against which other generators should be judged.
A major weakness of linear multiplicative congruential generators is that successive overlapping sequences of N numbers all fall on, at most:
(51) h = (N!m)1/N
parallel hyperplanes. For example, for m = 231 - 1, the approximate value of h is tabulated below.
N h
1 2147483647
2 ~ 65536
3 ~ 2344
4 ~ 476
5 ~ 192
6 ~ 108
For certain applications, a small number of covering hyperplanes can give grossly wrong results.
If the period of a generator (49) is m, then it generates every integer in [0,m-1] before repeating it (in this case it is called "full-cycle"). This, which is good, can happen if:
(i) c and m are relative prime,
(ii) a-1 is multiple of every prime p which divides m, and
(iii) a-1 is a multiple of 4, if 4 divides m.
In the c=0 case, the prng is full-cycle if:
(i) m is prime and if
(ii) an-1 is a multiple of m for n=m-1 but for no smaller n.
A full-cycle generator with m large is clearly good: its one dimensional p.d.f., when reconstructed approximately by means of a histogram, appears uniform. A less than full-cycle generator might have gaps in the sequence of possible values 1, ..., m-1 and that is bad.
A possible fortran implementation of the minimal standard (p199, [2]) is
function unif ( ix )
ix = dmod ( 16 807 d0*ix, 2 147 483 647 d0 )
unif = ix / 2 147 483 647 d0
end
Another program implementing the same prng but being more flexible and fast is (p202, [2])
function unif ( ix )
k1 = ix / 127773
ix = 16807*(ix - k1*127773) - k1*2836
if ( ix .lt. 0 ) ix = ix + 214 748 3647
unif = ix*4.656612875E-10
end
A variant of this is also named ran0 and is "the minimal standard" of [3].
Initial values for ix (seeds) might be one of the following:
748932582, 1985072130, 1631331038, 67377721, 366304404.
These particular seeds may produce (if needed) sequences (of at most 131072 numbers) which do not overlap.
A final note. Suppose we want integers uniformly distributed over the interval [0,K-1]. A seemingly sensible way of doing this is to perform
j = K*unif(ix)
where unif(ix) is a function returning a floating point number uniformly distributed over (0,1). For simplicity assume that unif(ix) is a linear congruential generator with ix iterating over the successive integers in the sequence. Using floating point arithmetic for what is fundamentally an integer operation can cause problems:
(i) Different computers may use different floating point formats, thus even though unif may generate the same sequence ix on two different computers, the j sequence may differ.
(ii) If K is large, the roundoff inherent in floating point operations may cause the j's to be nonuniformly distributed. In particular, the values for j near K may be systematically over or under represented.
Both weaknesses can be virtually avoided by using
junk = unif (ix)
j = mod(ix, K)
This of course, is not a monotone transformation of the original sequence.
Proposition 3: Let X have distribution function F. Let h be a strictly increasing function. Then h(X) is a random variable with distribution function F(h-1(x)).
Proof: It is a generalisation of proposition 2. Now, P[X≤x]=F(x). Hence P[h(X)≤h(x)]=F(x) also. Hence P[h(X)≤y] = F(h).
Proposition 4: If the F of proposition 3 has as its corresponding p.d.f. the function f and h-1 is absolutely continuous then, h(X) has density (h-1)'(x) f(h-1(x)), for almost all x.
Proof 1: From Eq.(22) the searched density, say g(z), is (we write y=h(x)): g(z) = S f(x)ä(z-h(x))dx = S f(h-1(y))ä(z-y)(h-1)'(y)dy = (h-1)'(z) f(h-1(z)).
Proof 2: From prop. 3, P[h(X)≤y] = F(h-1(y)), therefore, g(z) = dF(h-1(y))/dy = (h-1)'(z) f(h-1(z)).
Examples:
a) The exponential distribution. When X has distribution
function F and ë>0 is a real number, then, the variate (-1/ë)log(X) has distribution function 1-F(e-ëx), as can be directly verified:
P[-(1/ë)log(X)≤y] = P[X≤e-ëx] = 1-F(e-ëx) (x > 0).
In particular, if X is uniform in (0,1), then -(1/ë)log(U) is ë-exponential. Vice versa, when X is ë-exponential, then e-ëx is uniform in (0,1).
b) Non-monotone transformations. Non-monotonous transformations are best handled by computing the distribution function first from general principles. To illustrate this, let us consider a random variable X with distribution function F and density f. Then, the variable X2 has distribution function
P[X2≤x] = P[ |X|≤x½] = F(x½)-F(-x½) (x > 0)
and density [f(x½)+f(-x½)]/[2 x½]
In particular, when X is a normal variate with p.d.f. g(x; 0,1), then X2 is gamma distributed. The latter density is known as chi-square with one degree of freedom.
Having, now, the complete theoretical basis at hands, we turn to the discussion of the general principles in random variate generation.
The inversion method is based upon Proposition 2. This proposition can be used to generate random variates with an arbitrary continuous distribution F provided that F-1 is explicitly known. The faster the inverse can be computed, the faster we can compute X from a given uniform variate U. Formally, we have:
Algorithm 3. The inversion method: 1. Generate a uniform (0,1) random variate U. 2. Return X = F-1(U).
Often, the formulae for the F-1 can be simplified, by noting that 1-U is distributed as U.
As an example, the variate ó tan(ðU) generated at step 2 of algorithm 3 generates the Cauchy p.d.f.: ó/[ð(x2+ó2)].
If the inverse of F cannot be computed explicitly, then one may use a root finder line Newton's method to invert F. This is done by locating the root X of equation F(X)-U=0, after a U has been drawn. Of course this can be time-consuming.
The basic version of the rejection algorithm assumes the existence of a density g and the knowledge of a constant c ≥ 1 such that f(x) ≤ c g(x) for all x.
Random variates with density f can then be obtained as follows
Algorithm 4. The rejection method: Repeat 1. Generate a variate X with density g and a U (uniform on (0,1)). 2. Set T = c g(X)/f(X) Until UT ≤ 1 Return X.
The closer g approximates f, the higher is the probability that the Until condition will be satisfied, giving, thus, the correct output fast.
In what follows, by normal or Gaussian we mean random numbers having p.d.f. g(x; 0,1). Also, by log(x) we mean natural logarithms: elog(x)= x.
One can generate random variates by using the general methods we have discussed. But for the normal density special two-to-two methods produce excellent and fast results. There are methods that explicitly use two iid uniforms to generate two iid normal randoms. These methods sometimes are regarded as "ingenious"!
The simplest algorithm (Wiener or Box-Muller) begins by generating two iid uniforms U1 and U2 both belonging to [0,1]. Then two iid normal deviates are obtained by computing
X1 = (-2log(U1)½cos(2ðU2)
X2 = (-2log(U1)½sin(2ðU2)
A variation of the above algorithm (Marsaglia) uses a rejection scheme to obtain two iid uniforms V,W belonging to [-1,1] which give a point (V,W) uniformly distributed within the unit circle. When this is accomplished, then, the following transformation (S=V2+W2)
X1= V (-2log(S)/S)½
X2= W (-2log(S)/S)½
also produces two iid normal randoms with mean zero and standard deviation one.
As we shall see shortly Box-Muller method has some troubles, therefore we choose to give here a program for the Marsaglia method:
subroutine gauss (z1, z2)
implicit none
real z1, z2, u1, u2, S, fac
integer count
count = 0
10 continue
u1 = 2.*ran()-1.
u2 = 2.*ran()-1.
S = u1*u1 + u2*u2
if (S.gt.1) then
count = count + 1
if (count.gt.10) stop 'bizarre matters in gauss'
goto 10
else
fac = sqrt(-2.*alog(S)/S)
z1 = u1*fac
z2 = u2*fac
return
endif
end
Closing this section on the normal density a word of caution is to be said on using the obviously fast box-Muller method in conjunction with a linear congruential generator. The B.-M. method generates a pair of variables (Y,Z) which are supposedly independently normally distributed. Because cos(x) and sin(x) are periodic, simple algebra shows that all possible values of (Y,Z) fall on a spiral !! As an approximation to a pair of independent variates, (Y,Z) is terrible! This is a case where a pathology of linear congruential generators can be demonstrated analytically.
Another example where two simple uniform variates can produce a random variate of some other p.d.f. is the following. Let us ask what p.d.f. a random variate has if it is the product of two iid uniforms? Answer: f(x) = S S ä(x-U1U2) dU1dU2. The reader is welcomed to take as an exercise to prove that f(x) = -log(x).
Suppose that we want to estimate
(50) Ø = 0S1f(x) dx
Further, assume that (50) cannot be evaluated analytically and that numerical evaluation would be difficult. So we try simulation! In this essentially deterministic context, simulation is often called Monte Carlo, after that well-known mecca where chance phenomena are intensively studied [2]. Monte Carlo methods are generally not recommended to evaluate one-dimensional integrals, in high-dimensional cases, however, they may offer the only practical approach. If you prefer, the above integral can be interpreted as being over the unit hypercube rather than over the ordinary unit interval.. Now put Vi=f(Ui) and
(51) ø=(1/Í) Ói=1N Vi
It can be shown, using the delta function techniques of this lecture, that the pdf of ø is a sharp gaussian centered at Ø. So, ø is an unbiased estimator of Ø and, more, its expected square error is O(N).
1. Algorithmic Information Theory, G. J. Chaitin, Cambridge Univ. Press, Cambridge, 1987.
2. A Guide to Simulation, P. Bratley, B. L. Fox, L. E. Schrage, Springer-Verlag, New York, 1983.
3. Portable random number generators, W. H. Press, S. A. Teukolsky, Computers in physics, Vol. 6, No. 5, Sep/Oct 1992.
4. Introduction to probability and statistics, fourth edition, B. W. Lindgren, G. W. McElrath, D. A. Berry, Mcmillan Publishing Co., New York, 1978.
5. Non-Uniform random variate generation, Luc Devroye, Springer-Verlag, New York, 1986.
6. Statistical Mechanics, S. K. Ma, World Scientific, 1985.
7. Applied mathematics for engineers and physicists, third edition, L. A. Pipes, L. R. Harvill, McGraw-Hill, 1971.
8. The percentage points of the normal distribution, R. E. Odeh, J. O. Evans, Applied Statistics, Vol. 23, pp. 96-97, 1974.
9. Gaussian random number generators on a Cyber-205, S.C.Black & A.D.Kennedy, Computers in Physics, May/Jun 1989, pp 59-67.
10. Monte carlo methods and simulation of physical systems, lectures by M. H. Kalos, notes by P. A. Whitlock, NYU G22.2960 photocopied lecture notes.
If you have something to say about anything in this site, please, communicate! Contact me at cschassapis@acm.org. But, if you download and save this page to your machine, or print this page, then, please send me an e-mail saying that you did that. Thanks.
![]() |
The next lecture, entitled "Integration and differentiation", will be uploaded some time in the near (or far) future. |
Last updated: June 7, 2008