By Julienne Walker
License: Public Domain
The informal definition of a random sequence is where
each newly generated number has no immediately obvious relation to the numbers generated
before it. While in computer science there is no way to generate random numbers
deterministically, algorithms do exist that give the appearance of a random sequence.
This tutorial will cover the most common class of algorithms for generating pseudo-random
numbers: Linear Congruential Generators. After discussing the theory behind LCG
algorithms, we will go on to look at built-in support for random numbers in C and
C++, and effective ways to make use of it.
Linear Congruential Generators
A linear congruential sequence is a series of numbers
based on the recurrence relation formula:
Xn = (aXn-1 + c) mod m
In this formula, m is called the modulus, a is called
the multiplier, and c is called the increment. It is not difficult to imagine the
resulting sequence of this formula when given different values. For example, with
a = 2, c = 3, m = 10, and a starting value of X = 5, the following sequence would
be produced:
5 3 9 1 5 3 9 1 5 3...
While this particular sequence is simple to calculate
by hand, an algorithm to produce it is a trivial direct translation from the formula
to source code:
1 #include <stdio.h>
2
3 int jsw_lcg ( int seed )
4 {
5 return ( 2 * seed + 3 ) % 10;
6 }
7
8 int main ( void )
9 {
10 int seed = 5;
11 int i;
12
13 for ( i = 0; i < 10; i++ ) {
14 printf ( "%d ", seed );
15 seed = jsw_lcg ( seed );
16 }
17
18 printf ( "...\n" );
19
20 return 0;
21 }
Notice that the sequence repeats after four numbers
have been generated. This is due to the modular arithmetic that forces wrapping
of values into the desired range. In random number theory, the length of this cycle
is called the period of the sequence. As you might expect, a longer period allows
for sequences that appear more random because there is no immediately discernable
pattern. The longest period possible for a linear congruential algorithm is the
value of m itself. As such, the larger m is, the better the chances are of having
a long period.
Also notice that despite a modulus of 10, the period
of the algorithm given was only 4. This suggests that both the increment and the
multiplier should be chosen with care. It is possible to obtain a sequence that
covers the full range of m such that the resulting sequence is a valid permutation
of the values in the range. This is called a Full-Cycle Linear Congruential Generator.
There are a few rules (described by Knuth), that when adhered to, guarantee a full-cycle.
However, these rules only apply to the formula when c is not zero. When c is not
zero, the algorithm is typically called a mixed congruential generator.
Unfortunately, a long period does not guarantee a
random sequence. The majority of combinations of a, c, and m will either have sufficient
randomness but an unacceptably short period, or a long period with an obviously
non-random sequence. Of the ways to work around this problem, the most common is
to ignore the mixed congruential generator and instead use the formula for a pure
multiplicative generator. The formula is the same, except for the removal of the
variable c:
Xn = aXn-1 mod m
The difference between a mixed congruential generator
and a pure multiplicative generator is that the former uses c, where c is not equal
to zero, and the latter uses c, where c is always equal to zero. Because c is now
a constant with no effect, it can be removed from the formula entirely. The result
is a simpler algorithm, so many random number algorithms use a pure multiplicative
generator.
One note should be made before moving on, and that
is that pure multiplicative generators cannot generate 0, or every subsequent value
will be 0 as well. This was not an issue with mixed congruential generators because
if a value of 0 was generated, c would be added to it.
As already described, pure multiplicative generators
have been widely accepted as random number generators, and studied extensively.
Through trial and error, many acceptable combinations of modulus and multiplier
have been discovered. However, Park and Miller have called one of these combinations
the Minimal Standard Generator. The resulting sequence from m = 2,147,483,647 and
a = 16,807 is recommended as the bare minimum for random number generators (though
Park and Miller now recommend a = 48,271 for better statistical properties). While
there are better combinations of modulus and multiplier, and even other (and better)
categories of algorithm than the linear congruential generator, the Minimal Standard
Generator is the one by which all others should be judged, and an acceptable algorithm
should not be worse.
The Minimal Standard Generator is relatively simple
to write (assuming at least 32-bit integers):
1 #include <stdio.h>
2
3 #define M 2147483647
4 #define A 16807
5 #define Q ( M / A )
6 #define R ( M % A )
7
8 static int seed = 1;
9
10 int jsw_rand ( void )
11 {
12 seed = A * ( seed % Q ) - R * ( seed / Q );
13
14 if ( seed <= 0 )
15 seed += M;
16
17 return seed;
18 }
19
20 int main ( void )
21 {
22 int i;
23
24 for ( i = 0; i < 10; i++ )
25 printf ( "%d ", jsw_rand() );
26 printf ( "...\n" );
27
28 return 0;
29 }
This is quite different from the basic LCG algorithm.
The reason that extra work is required, is because, unlike the theoretical world,
real programming languages have limits. If the intermediate values of aX exceed
the limits of a signed integer, the type will overflow and the algorithm will invoke
undefined behavior. Unsigned types could be used instead, but then the Minimal Standard
Generator would not be strictly correct, as it would produce the wrong sequence.
The generally accepted solution is called Schrage's Method, a clever approach that
avoids overflow through careful use of division and subtraction, however, an alternative
method is to use floating-point for intermediate values.
Because the resulting sequence from just about every
general random number generator may consist of very large numbers, it may not be
practical for use in programs that need a random number within a smaller range.
The need arises to scale random numbers to fit the application, and a whole new
class of problems is created as a result.
There are two potential needs for scaling a random
number sequence. First, you might need random numbers in the range of [0, n). Once
again turning to modular arithmetic, this is a simple affair. In the above program,
only one line must to be changed to to force the sequence into the range of, say,
[0, 100):
1 printf ( "%d ", jsw_rand() % 100 );
To shift this range to [1, 101), simply add 1 to the
result:
1 printf ( "%d ", jsw_rand() % 100 + 1 );
The second need for scaling a random number sequence
is into a range with specific high and low values. To do this is only slightly more
difficult:
1 printf ( "%d ", low + ( jsw_rand() % ( high - low ) ) );
However, be aware that the randomness of a random
number sequence is global for the sequence as a whole. If you shrink the range then
there is a higher probability of the sequence appearing less random. For example,
you may have long runs of the same number if the scaled range is small. This is
immediately obvious if you shift the range to [0,2):
1 #include <stdio.h>
2
3 #define M 2147483647
4 #define A 16807
5 #define Q ( M / A )
6 #define R ( M % A )
7
8 static int seed = 1;
9
10 int jsw_rand ( void )
11 {
12 seed = A * ( seed % Q ) - R * ( seed / Q );
13
14 if ( seed <= 0 )
15 seed += M;
16
17 return seed;
18 }
19
20 int main ( void )
21 {
22 int i;
23
24 for ( i = 0; i < 10; i++ )
25 printf ( "%d ", jsw_rand() % 2 );
26 printf ( "...\n" );
27
28 return 0;
29 }
It is theoretically possible that in a sequence of
one million random numbers, of any range, a single value will be generated one million
times. However, this is highly unlikely for long ranges, but for short sequences,
apparent non-randomness can be a problem. Also, scaling a random range through modulus
is generally frowned upon with linear congruential generators. A more reliable approach
is to find the deviate of a range and multiply by
N rather than use a given random number and take the remainder of division:
1 #include <stdio.h>
2
3 /* ... */
4
5 int main ( void )
6 {
7 int i;
8
9 for ( i = 0; i < 10; i++ ) {
10 jsw_rand();
11 printf ( "%d ", (int)( uniform_deviate() * 100 ) );
12 }
13
14 printf ( "...\n" );
15
16 return 0;
17 }
The linear congruential generator shown so far is
the bottom-of-the-bucket generator, and it does have problems. In fact, linear congruential
generators as a whole have generally acceptable performance, but in randomness testing,
they still show uncomfortable patterns. Fortunately, there is a solution to this
without turning to dreadfully complicated algorithms such as the Mersenne Twister
(the most popular random number generator at the time of writing).
Dual-Phase Linear Congruential Generators
The linear congruential generator shown so far is
a single-phase algorithm. This means that it calculates one seed and returns the
value calculated. Another approach uses a two-phase algorithm, where the results
of two linear congruential generators are combined to produce a significantly more
random sequence. Pierre L'Ecuyer studied this brand of LCG and suggests a combination
of two LCG algorithms with m1 = 2,147,483,563, a1 = 40014, and m2 = 2,417,483,399,
a2 = 40692.The implementation is only marginally more difficult than a single-phase
algorithm using either one of those:
1 #define M1 2147483647
2 #define M2 2147483399
3 #define A1 40015
4 #define A2 40692
5 #define Q1 ( M1 / A1 )
6 #define Q2 ( M2 / A2 )
7 #define R1 ( M1 % A1 )
8 #define R2 ( M2 % A2 )
9
10 static int seed1 = 1, seed2 = 1;
11
12 /* Dual-Phase Linear Congruential Generator */
13 int jsw_rand ( void )
14 {
15 int result;
16
17 seed1 = A1 * ( seed1 % Q1 ) - R1 * ( seed1 / Q1 );
18 seed2 = A2 * ( seed2 % Q2 ) - R2 * ( seed2 / Q2 );
19
20 if ( seed1 <= 0 )
21 seed1 += M1;
22
23 if ( seed2 <= 0 )
24 seed2 += M2;
25
26 result = seed1 - seed2;
27
28 if ( result < 1 )
29 result += M1 - 1;
30
31 return result;
32 }
The only problem with this approach is that multiple
seeds must be maintained. This is an issue because to avoid repeating the sequence
each time a program is run, client code is often required to specify the starting
seed. While only one such seed needs to be initialized with the Minimal Standard
Generator, two are needed with a dual-phase generator, three for a tri-phase generator,
and so on. This initialization can be done with a helper function that “seeds”
the random number generator:
1 #include <stdio.h>
2
3 #define M1 2147483647
4 #define M2 2147483399
5 #define A1 40015
6 #define A2 40692
7 #define Q1 ( M1 / A1 )
8 #define Q2 ( M2 / A2 )
9 #define R1 ( M1 % A1 )
10 #define R2 ( M2 % A2 )
11
12 static int seed1 = 1, seed2 = 1;
13
14 void jsw_seed ( int s1, int s2 )
15 {
16 seed1 = s1;
17 seed2 = s2;
18 }
19
20 /* Dual-Phase Linear Congruential Generator */
21 int jsw_rand ( void )
22 {
23 int result;
24
25 seed1 = A1 * ( seed1 % Q1 ) - R1 * ( seed1 / Q1 );
26 seed2 = A2 * ( seed2 % Q2 ) - R2 * ( seed2 / Q2 );
27
28 if ( seed1 <= 0 )
29 seed1 += M1;
30
31 if ( seed2 <= 0 )
32 seed2 += M2;
33
34 result = seed1 - seed2;
35
36 if ( result < 1 )
37 result += M1 - 1;
38
39 return result;
40 }
41
42 int main ( void )
43 {
44 int i;
45
46 for ( i = 0; i < 10; i++ )
47 printf ( "%d ", jsw_rand() % 100 );
48 printf ( "...\n" );
49
50 jsw_seed ( 12345, 23456 );
51
52 for ( i = 0; i < 10; i++ )
53 printf ( "%d ", jsw_rand() % 100 );
54 printf ( "...\n" );
55
56 return 0;
57 }
Because seed1 and seed2
are statically initialized to 1, that is the default seed for the generator. However,
by calling jsw_seed, the client code can “re-seed”
the generator to produce a different sequence at will. This method is employed by
the standard library functions rand and srand, which we will cover later in the
tutorial.
Sometimes, a number between 0 and N is not what is
desired, such as in simulations, but a random floating-point number between 0 and
1. In its simplest form, this is called a uniform deviate, and it is easy to produce
from the result of a call to jsw_rand. Simply multiply the value
returned by jsw_rand by 1/m:
1 #include <stdio.h>
2
3 #define M 2147483647
4 #define A 16807
5 #define Q ( M / A )
6 #define R ( M % A )
7
8 static int seed = 1;
9
10 /* Single-Phase Linear Congruential Generator */
11 int jsw_rand ( void )
12 {
13 seed = A * ( seed % Q ) - R * ( seed / Q );
14
15 if ( seed <= 0 )
16 seed += M;
17
18 return seed;
19 }
20
21 double uniform_deviate ( void )
22 {
23 return (double)seed * ( 1.0 / M );
24 }
25 int main ( void )
26 {
27 int i;
28
29 for ( i = 0; i < 10; i++ ) {
30 jsw_rand();
31 printf ( "%f\n", uniform_deviate() );
32 }
33
34 printf ( "\n" );
35
36 return 0;
37 }
Standard Library
Fortunately, or unfortunately, depending on your opinion
of the quality of common implementations of the standard library, the C standard
library (also inherited by the C++ standard library) supports two functions for
generating random numbers and seeding that generator. These functions are called
rand, and srand, respectively. Their use is surprisingly
simple, and fairly consistent with the design of jsw_rand and
jsw_seed:
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 int main ( void )
5 {
6 int i;
7
8 for ( i = 0; i < 10; i++ )
9 printf ( "%d ", rand() % 100 );
10 printf ( "\n" );
11
12 srand ( 12345U );
13
14 for ( i = 0; i < 10; i++ )
15 printf ( "%d ", rand() % 100 );
16 printf ( "\n" );
17
18 return 0;
19 }
Now, instead of the definition for jsw_rand
and jsw_seed, all you have to do is include stdlib.h
(or cstdlib for C++). To obtain a random number in a range, the
same concepts apply as with jsw_rand. However, because the implementation
of rand is unknown, it is impossible to tell whether or not using modulus to shrink
the range will result in an acceptably random sequence. The reason that this might
not be the case is due to the very definition of linear congruential generators.
The details are complicated, and I'm not entirely sure I understand them completely
myself, so I will refrain from embarrassing myself in an attempt to describe the
issues. Just be aware that scaling the range of a linear congruential generator
with modulus may have a detrimental effect on the randomness of the sequence.
The common solution is to use the constant value
RAND_MAX, defined with every standard library, to use the high-order
bits of a random number through division rather than low-order bits with the remainder
of division:
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 int main ( void )
5 {
6 int i;
7
8 for ( i = 0; i < 10; i++ )
9 printf ( "%d ", rand() / ( RAND_MAX / 100 + 1 ) );
10 printf ( "...\n" );
11
12 return 0;
13 }
An alternative to dividing RAND_MAX
by N is to force the expression to floating-point and multiply RAND_MAX
by N:
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 int main ( void )
5 {
6 int i;
7
8 for ( i = 0; i < 10; i++ )
9 printf ( "%d ", (int)( rand() / ( RAND_MAX + 1.0 ) * 100 ) );
10 printf ( "...\n" );
11
12 return 0;
13 }
Yet another alternative is to utilize the uniform_deviate
function provided not too long ago. The rules still apply, even though we are now
using a library function rather than rolling our own, so modifying uniform_deviate
to work with rand is an exercise in the trivial:
1 #include <stdio.h>
2 #include <stdlib.h>
3
4 double uniform_deviate ( int seed )
5 {
6 return seed * ( 1.0 / ( RAND_MAX + 1.0 ) );
7 }
8
9 int main ( void )
10 {
11 int i;
12
13 for ( i = 0; i < 10; i++ )
14 printf ( "%d ", (int)( uniform_deviate ( rand() ) * 100 ) );
15 printf ( "...\n" );
16
17 return 0;
18 }
However, at the time of writing, implementations have
improved such that modulus will likely work, and if it does not work, chances are
good that the standard rand is unsuited to the application anyway.
In such a case, an algorithm known to be good should be used, such as the dual-phase
generator provided in this tutorial, or the Mersenne Twister, which has an incredibly
large period and very good statistical properties.
Mersenne Twister
Most tutorials will simply recommend the Mersenne
Twister and be done with it, but I will be kind enough to offer a working implementation.
How it works is far beyond the scope of this tutorial. A simple web search for “Mersenne
Twister” will give you several resources including the original paper, which
is an interesting read and provided the majority of the information required to
create the following two functions:
1 #define N 624
2 #define M 397
3 #define A 0x9908b0dfUL
4 #define U 0x80000000UL
5 #define L 0x7fffffffUL
6
7 static unsigned long x[N];
8 static int next;
9
10 void jsw_seed ( unsigned long s )
11 {
12 int i;
13
14 x[0] = s & 0xffffffffUL;
15
16 for ( i = 1; i < N; i++ ) {
17 x[i] = ( 1812433253UL
18 * ( x[i - 1] ^ ( x[i - 1] >> 30 ) ) + i );
19 x[i] &= 0xffffffffUL;
20 }
21 }
22
23
24 unsigned long jsw_rand ( void )
25 {
26 unsigned long y, a;
27 int i;
28
29 /* Refill x if exhausted */
30 if ( next == N ) {
31 next = 0;
32
33 for ( i = 0; i < N - 1; i++ ) {
34 y = ( x[i] & U ) | x[i + 1] & L;
35 a = ( y & 0x1UL ) ? A : 0x0UL;
36 x[i] = x[( i + M ) % N] ^ ( y >> 1 ) ^ a;
37 }
38
39 y = ( x[N - 1] & U ) | x[0] & L;
40 a = ( y & 0x1UL ) ? A : 0x0UL;
41 x[N - 1] = x[M - 1] ^ ( y >> 1 ) ^ a;
42 }
43
44 y = x[next++];
45
46 /* Improve distribution */
47 y ^= (y >> 11);
48 y ^= (y << 7) & 0x9d2c5680UL;
49 y ^= (y << 15) & 0xefc60000UL;
50 y ^= (y >> 18);
51
52 return y;
53 }
The Mersenne Twister is surprisingly short for how
good it is, but, like most effective random number generators, the code is almost
completely opaque. At first glance, it is a mess of magic numbers and bitwise operations,
and comments generally do not help much. Therefore, I highly recommend reading the
paper by Matsumoto and Nishimura if understanding is your goal. If not, the two
functions have passed stringent testing and have come out clean in every case, so
you can be comfortable just using it if you want.
Random Seeds
As a final note, the seed for a random number generator
itself should appear to be random, or at least change with each run of the program.
As a result, most people prefer to seed their random number generator with the current
system time. The code typically looks like this:
1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <time.h>
4
5 int main ( void )
6 {
7 int i;
8
9 for ( i = 0; i < 10; i++ )
10 printf ( "%d ", rand() % 100 );
11 printf ( "...\n" );
12
13 srand ( (unsigned)time ( NULL ) );
14
15 for ( i = 0; i < 10; i++ )
16 printf ( "%d ", rand() % 100 );
17 printf ( "...\n" );
18
19 return 0;
20 }
Now, while I am not aware of a system where this would
not work, it is technically not portable because the C (and C++) standards do not
require that time_t, which is the return value of the time
function, be convertable to unsigned int. To be strictly portable, it has been suggested
that a hash can be taken of the bytes of the time_t, converted
to unsigned, and passed to srand. The following
is based on a function by Lawrence Kirby via Ben Pfaff:
1 #include <limits.h>
2 #include <stdio.h>
3 #include <stdlib.h>
4 #include <time.h>
5
6 unsigned time_seed ( void )
7 {
8 time_t now = time ( NULL );
9 unsigned char *p = (unsigned char *)&now;
10 unsigned seed = 0;
11 size_t i;
12
13 for ( i = 0; i < sizeof now; i++ )
14 seed = seed * ( UCHAR_MAX + 2U ) + p[i];
15
16 return seed;
17 }
18
19 int main ( void )
20 {
21 int i;
22
23 for ( i = 0; i < 10; i++ )
24 printf ( "%d ", rand() % 100 );
25 printf ( "\n" );
26
27 srand ( time_seed() );
28
29 for ( i = 0; i < 10; i++ )
30 printf ( "%d ", rand() % 100 );
31 printf ( "\n" );
32
33 return 0;
34 }
Conclusion
Random numbers are a pain in the butt. However, they're
terribly interesting, and very useful in many applications of computer science.
Hopefully this tutorial has laid the foundation necessary for using random numbers
properly and enough interest in the topic to promote further study. This article
has only scratched the surface, as random number generation is a popular topic of
study for computer scientists and mathematicians, so I encourage you to branch out
from here and continue your research.