One suggestion I got on the GSL list was to call this suite "dieharder" as it includes and extends George Marsaglia's "Diehard" suite of tests of random number generators. This actually sounds pretty good as a tribute to Diehard's enduring usefulness as a test suite. Only now, many years after it was introduced, are computers getting to be fast enough that extensions to diehard are required to push the existing pseudorandom number generators to failure.
Click here to see a list of all rand_rate files on this repository.
This is the current snapshot of the rand_rate random number tester. It encapsulates all of the Gnu Scientific Library random number generators as well as /dev/random and /dev/urandom in a single harness that can time them and subject them to various tests for randomness. These tests are variously drawn from Marsaglia's "Diehard" battery of random number tests, the NIST STS tests, useful e.g. binning or spectral tests that I've found doing research into random number tests or tests that I myself have made up or that are improvements on tests from other sources.
The primary point of this tester is to make it easy to time and test (pseudo)random number generators OR hardware or other generators. Three examples are provided of wrapping a random number generator and inserting it so that it is can be called via the GSL interface. An interface is planned that would allow random numbers to be read in from a file, allowing the tool to be used with any generator (wrappable or not) that can generate a table of random numbers.
Another important motivation for writing rand_rate is to have the entire test code and documentation be fully Gnu Public Licensed and hence openly available for adaptation, testing, modification, including the addition of new tests. The primary objections I have towards diehard and sts are not that they are or are not adequate and complete; it is that the code is obscure and not explicitly published for reuse. It is my hope that by providing this tool in autodocumenting source, it makes it easy to add new tests, critique older tests, and improve the suite in general.
If you compile the test and run it as
rand_rate -t -1it should run the entire suite on the default generator. Choose alternative tests with -r number where rand_rate -t 0 will list all possible numbers known to the current test (mostly from the GSL).
The following are the tests so far implemented and (possibly) functional:
Works OK. Pretty much the way it is described in the original documentation, but very definitely new/clean original C code. Use a large number of samples (e.g. -s 1000000) for the best results.
Works well with KS test, need to choose -s 500. I fancied it up a bit -- use GSL's rmax_bits correctly, use a cyclic permutation under a 24 bit mask to do all 32 cyclic permutations of bit position, not just 9 a la Marsaglia, use Pearson's chisq directly computed from gsl_ran_poisson_pdf() and the observations per cycle. Good at solidly rejecting bad generators (try -r 40, randu, as the universal "bad generator":-).
This works well, uses KS test on lots of samples. It's relatively slow as it has to test a lot of pairs of points to float out the minimum distance, and I haven't written this optimally (yet) out of sheer laziness. It requires e.g. an index sort and while doing primary development a prefer brute force.
This works well also and is in fact the same test as the 2d test above but yields a different target distribution. Again, I'm not sorting to find the minimum (yet) and checking all the pairs of points is relatively slow, if sure.
This is one of my own clever ideas. Some rng's, for some seeds, have bits that don't change. Obviously that makes the rng's bad, but bad in a somewhat different way than "just" non-randomness of some more occult order. This is a very fast test that can iterate samples over seeds and extract a collective mask of all the bits that didn't change for an entire sequence of rands for at least one seed in the test. Very efficiently rejects e.g. randu (GSL 40).
This is an awesome test. No rng I've tested gets out alive. Basically, this test runs through a random bitstring somewhere between 32 and 128 bits wide. The test slides a window n \in (1-8) bits wide across the bitstring and counts the number of occurrences of the integer(s) it finds there, per bitstring, and bins the results in a histogram, per test of tsamples. It then uses the binomial probability to compute the theoretical occupations of the histogram bins and converts the results for a single integer into a pvalue per test. It saves these pvalues in a result pvalue vector and subjects them all to a kstest at the end, providing a single p-value for the test.
As implemented, it does each n-tuple (window width) at a time from 1 to 8. 1 is a more powerful version of e.g. sts_monobit, that not only determines if 0's and 1's are balanced but if their binomial histogram is correct when sampling relatively small bitstrings. 2 is a more powerful version of the sts "runs" test, since it can easily be shown that counting "runs" is equivalent to counting 2x(# of 01 bitstrings). 5 appears to be redundant with several diehard tests, although that remains to be proven. All the generators I've tested in the GSL fail at 6-tuples or 7-tuples -- none make it to 8-tuples (bytes).
This has profound implications for simulations and samplings of bytes. Existing rng's appear to have a bitwise bias that is clearly apparent long before getting to bitstrings as long as a single byte of data.
This is a weak (but still useful) version of the counting of 1's in a word of N bits that examines the number of 1's in a very long list of bits and compares it to N/2. Here one can be very sure chisq and p are correctly computed per trial in the STS prescription, but the test now collects a vector of p's and applies a KS test rather than forcing you to run it lots of times by hand and visually assess the p distribution as "uniform" (or using the other methods described in the STS documentation). Weak or not, monobit can definitely find truly weak rng's. Not so good in general, as it is quite possible for an rng to yield the right fraction of 1's and 0's but still be very nonrandom.
This is distinct from Diehard's runs. In fact, this test is TRYING to compute what I will eventually compute far more directly in the pairwise bit distribution, since runs of 0's or 1's always end on a 01 or 10 pair, right? 1/4 of all pairs should be 01, 1/4 should be 10, 1/4 should be 11, 1/4 should be 00. Why bother calling it "runs" (or counting the intervening bits)? Just count the damn pairs!
I'm mostly through a rewrite of the code that is putting it all in a common format where a shell calls and samples the actual test, after setting any test-specific parameters and spitting out a more or less standard form description of the test (which can be surpressed by the -q flag). A few diehard tests remain to be converted, and rgb_persist cannot be converted as it doesn't produce a p-value, it just fails or doesn't fail. Once I finish converting the existing tests it is back to adding more diehard tests, as my current goal is to get all of diehard implemented this year (and more selected tests out of sts).
I think that it would be "interesting" to apply this exact framework to the GSL distribution generators as a test of random numbers TOTALLY within the GSL -- if one simply (e.g.) samples a poissonian using randu as your rng and compare the result to the poissonian probability density function and use it to compute Pearson's chisq and a p-value (vector) and eventually a KS p-value, does that prove to be as sensitive a test as the various (far more complex) STS and Diehard tests? An interesting question, if nothing else.
For example, the birthday test seems like a difficult way to generate a Poissonian compared to just "doing it" from an rng. Is there any evidence that it has greater sensitivity to various failures of the rng than a test that didn't have to achieve an asymtotic limit (large bit windows, many birthdays, many samples) in order to become Poissonian in the first place? Similarly, does the sts runs test have anything to recommend it relative to just counting windowed integer values returned by the rng (integer values extracted from the actual bits viewed through a "port" if fixed width with cyclic/periodic wraparound of the source integer/bitstring)? I doubt it, but sensitivity tests require a lot of time and energy to conduct. When the tool is finished, I'll conduct them.
Anyway, I hope that even during development, you find rand_rate useful. If you have any interest in participating in its development, be sure to let me know, especially if you have decent coding skills and a basic knowledge of statistics. I have documents to help with the latter, if you have the programming skills and want to LEARN statistics.