Welcome to the dieharder distribution website.
Version 3.29.4beta is the current snapshot. Some of the documentation below may not quite be caught up to it, but it should be close.
Dieharder is a random number generator (rng) testing suite. It is intended to test generators, not files of possibly random numbers as the latter is a fallacious view of what it means to be random. Is the number 7 random? If it is generated by a random process, it might be. If it is made up to serve the purpose of some argument (like this one) it is not. Perfect random number generators produce "unlikely" sequences of random numbers -- at exactly the right average rate. Testing a rng is therefore quite subtle.
dieharder is a tool designed to permit one to push a weak generator to unambiguous failure (at the e.g. 0.0001% level), not leave one in the "limbo" of 1% or 5% maybe-failure. It also contains many tests and is extensible so that eventually it will contain many more tests than it already does.
If you are using dieharder for testing rngs either in one of its prebuilt versions (rpm or apt) or built from source (which gives you the ability to e.g. add more tests or integrate your rng directly with dieharder for ease of use) you may want to join either or both of the dieharder-announce or the dieharder-devel mailing lists here. The former should be very low traffic -- basically announcing when a snapshot makes it through development to where I'm proud of it. The latter will be a bit more active, and is a good place to post bug reports, patches, suggestions, fixes, complaints and generally participate in the development process.
At the suggestion of Linas Vepstas on the Gnu Scientific Library (GSL) list this GPL'd suite of random number tests will be named "Dieharder". Using a movie sequel pun for the name is a double tribute to George Marsaglia, whose "Diehard battery of tests" of random number generators has enjoyed years of enduring usefulness as a test suite.
The dieharder suite is more than just the diehard tests cleaned up and given a pretty GPL'd source face in native C. Tests from the Statistical Test Suite (STS) developed by the National Institute for Standards and Technology (NIST) are being incorporated, as are new tests developed by rgb. Where possible or appropriate, all tests that can be parameterized ("cranked up") to where failure, at least, is unambiguous are so parameterized and controllable from the command line.
A further design goal is to provide some indication of why a generator fails a test, where such information can be extracted during the test process and placed in usable form. For example, the bit-distribution tests should (eventually) be able to display the actual histogram for the different bit ntuplets.
Dieharder is by design extensible. It is intended to be the "Swiss army knife of random number test suites", or if you prefer, "the last suite you'll ever ware" for testing random numbers.
Dieharder can be freely downloaded from the Dieharder download site. On this page there should be a long list of previous versions of dieharder, and it should tell you what is the current snapshot. The version numbers have the following specific meaning which is a bit different than usual:
The single-tree dieharder sources (.tgz and .src.rpm) files can be downloaded from this directory. In addition, binary rpm's built on top of Fedora Core whatever (for either i386 or both of x86_64) may be present. Be warned: the GSL is a build requirement. The current packaging builds both the library and the dieharder UI from a single source rpm, or from running "make" in the toplevel directory of the source tarball. With a bit of effort (making a private rpm building tree), "make rpm" should work for you as well in this toplevel directory.
This project is under very active development. Considerable effort is being expended so that the suite will "run out of the box" to produce a reasonably understandable report for any given random number generator it supports via the "-a" flag, in addition to the ability to considerably vary most specific tests as applied to the generator. A brief synopsis of command options to get you started is presented below. In general, though, documentation (including this page, the man page, and built-in documentation) may lag the bleeding edge snapshot by a few days or more.
An rpm installation note from Court Shrock:
I was reading about your work on dieharder. First, some info about getting dieharder working in Gentoo: cd ~ emerge rpm gsl wget http://www.phy.duke.edu/~rgb/General/dieharder/dieharder-0.6.11-1.i386.rpm rpm -i --nodeps dieharder-0.6.11-1.i386.rpm
Rebuilding from tarball source should always work as well, and if you are planning to play a lot with the tool may be a desireable way to proceed as there are some documentation goodies in the ./doc subdirectory and the ./manual subdirectory of the source tarball (such as the original diehard test descriptions and the STS white paper).
George Marsaglia retired from FSU in 1996. For a brief time diehard appeared to have finally disappeared from FSU webspace, but what had really happened is google's favorite path to it had disappeared when his personal home directory was removed. Diehard is still there, at the URL http://www.stat.fsu.edu/pub/diehard as well as at a Hong Kong website. The source code of diehard itself is (of course) Copyright George Marsaglia but Marsaglia did not incorporate an explicit license into his code which muddles the issue of how and when it can be distributed, freely or otherwise. Existing diehard sources are not directly incorporated into dieharder in source form for that reason, to keep authorship and GPL licensing issues clear.
Note that the same is not true about data. Several of the diehard tests require that one use precomputed numbers as e.g. target mean, sigma for some test statistic. Obviously in these cases we use the same numbers as diehard so we get the same, or comparable, results. These numbers were all developed with support from Federal grants and have all been published in the literature, though, and should therefore be in the public domain as far as reuse in a program is concerned.
Note also that most of the diehard tests are modified in dieharder, usually in a way that should improve them. There are three improvements that were basically always made if possible.
Unfortunately, some of the diehard tests that rely on weak inverses of the covariance matrices associated with overlapping samples seem to have errors in their implementation, whether in the original diehard (covariance) data or in dieharder-specific code it is difficult to say. Fortunately, it is no longer necessary to limit the number of random numbers drawn from a generator when running an integrated test, and non-overlapping versions of these same tests do not require any treatment of covariance. For that reason non-overlapping versions of the questionable tests have been provided where possible (in particular testing permutations and sums) and the overlapping versions of those tests are deprecated pending a resolution of the apparent errors.
In a few cases other variations are possible for specific tests. This should be noted in the built-in test documentation for that test where appropriate.
Aside from these major differences, note that the algorithms were independently written more or less from the test descriptions alone (sometimes illuminated by a look at the code implementations, but only to clear up just what was meant by the description). They may well do things in a different (but equally valid) order or using different (but ultimately equivalent) algorithms altogether and hence produce slightly different (but equally valid) results even when run on the same data with the same basic parameters. Then, there may be bugs in the code, which might have the same general effect. Finally, it is always possible that diehard implementations have bugs and can be in error. Your Mileage May Vary. Be Warned.
The primary point of dieharder (like diehard before it) is to make it easy to time and test (pseudo)random number generators, both software and hardware, for a variety of purposes in research and cryptography. The tool is built entirely on top of the GSL's random number generator interface and uses a variety of other GSL tools (e.g. sort, erfc, incomplete gamma, distribution generators) in its operation.
Dieharder differs significantly from diehard in many ways. For example, diehard uses file based sources of random numbers exclusively and by default works with only roughly ten million random numbers in such a file. However, modern random number generators in a typical simulation application can easily need to generate 10^18 or more random numbers, generated from hundreds, thousands, millions of different seeds in independent (parallelized) simulation threads, as the application runs over a period of months to years. Those applications can easily be sensitive to rng weaknesses that might not be revealed by sequences as short as 10^7 uints in length even with excellent and sensitive tests. One of dieharder's primary design goals was to permit tests to be run on very long sequences.
To facilitate this, dieharder prefers to test generators that have been wrapped up in a GSL-compatible interface so that they can return an unbounded stream of random numbers -- as many as any single test or the entire suite of tests might require. Numerous examples are provided of how one can wrap one's own random number generator so that it is can be called via the GSL interface.
Dieharder also supports file-based input three distinct ways. The simplest is to use the (raw binary) stdin interface to pipe a bit stream from any rng, hardware or software, through dieharder for testing. In addition, one can use "direct" file input of either raw binary or ascii formatted (usually uint) random numbers. The man page contains examples of how to do all three of these things, and dieharder itself can generate sample files to use as templates for the appropriate formatting.
Note Well! Dieharder can consume a lot of random numbers in the course of running all the tests! To facilitate this, dieharder should (as of 2.27.11 and beyond) support large file (> 2GB) input, although this is still experimental. Large files are clunky and relatively slow, and the LFS (large file system) in linux/gcc is still relatively new and may have portability issues if dieharder is built with a non-gcc compiler. It is therefore strongly recommended that both hardware and software generators be tested by being wrapped within the GSL interface by emulating the source code examples or that the pipe/stdin interface be used so that they can return an essentially unbounded rng stream.
Dieharder also goes beyond diehard in that it is deliberately extensible. In addition to implementing all of the diehard tests it is expected that dieharder will eventually contain all of the NIST STS and a variety of tests contributed by users, invented by the dieharder authors, or implemented from descriptions in the literature. As a true open source project, dieharder can eventually contain all rng tests that prove useful in one place with a consistent interface that permits one to apply those tests to many generators for purposes of comparison and validation of the tests themselves as much as the generators. In other words, it is intended to be a vehicle for the computer science of random number generation testing as well as a practical test harness for random number generators.
To expand on this, the development of dieharder was motivated by the following, in rough order of importance:
Although this tool is being developed on Linux/GCC-based platforms, it should port with no particular difficulty to other Unix-like environments (at least ones that also support the GSL), with the further warning that certain features (in particular large file support) may require tweaking and that the dieharder authors may not be able to help you perform that tweaking.
If you compile the test or install the provided binary rpm's and run it as:dieharder -a
it should run -a(ll) tests on the default GSL generator.
Choose alternative tests with -g number where:dieharder -g -1
will list all possible numbers known to the current snapshot of the dieharder.dieharder -l
should list all the tests implemented in the current snapshop of DieHarder. Finally, the venerable and time tested:dieharder -h
provides a Usage synopsis (which can quite long) andman dieharder
is the (installed) man page, which may or many not be completely up to date as the suite is under active development. For developers, additional documentation is available in the toplevel directory or doc subdirectory of the source tree. Eventually, a complete DieHard manual in printable PDF form will be available both on this website and in /usr/share/doc/dieharder-*/.
List of GSL and user-defined random number generators that can be tested by dieharder:
#=============================================================================# # dieharder version 3.29.4beta Copyright 2003 Robert G. Brown # #=============================================================================# # Id Test Name | Id Test Name | Id Test Name # #=============================================================================# | 000 borosh13 |001 cmrg |002 coveyou | | 003 fishman18 |004 fishman20 |005 fishman2x | | 006 gfsr4 |007 knuthran |008 knuthran2 | | 009 knuthran2002 |010 lecuyer21 |011 minstd | | 012 mrg |013 mt19937 |014 mt19937_1999 | | 015 mt19937_1998 |016 r250 |017 ran0 | | 018 ran1 |019 ran2 |020 ran3 | | 021 rand |022 rand48 |023 random128-bsd | | 024 random128-glibc2 |025 random128-libc5 |026 random256-bsd | | 027 random256-glibc2 |028 random256-libc5 |029 random32-bsd | | 030 random32-glibc2 |031 random32-libc5 |032 random64-bsd | | 033 random64-glibc2 |034 random64-libc5 |035 random8-bsd | | 036 random8-glibc2 |037 random8-libc5 |038 random-bsd | | 039 random-glibc2 |040 random-libc5 |041 randu | | 042 ranf |043 ranlux |044 ranlux389 | | 045 ranlxd1 |046 ranlxd2 |047 ranlxs0 | | 048 ranlxs1 |049 ranlxs2 |050 ranmar | | 051 slatec |052 taus |053 taus2 | | 054 taus113 |055 transputer |056 tt800 | | 057 uni |058 uni32 |059 vax | | 060 waterman14 |061 zuf | | #=============================================================================# | 200 stdin_input_raw |201 file_input_raw |202 file_input | | 203 ca |204 uvag |205 AES_OFB | | 206 Threefish_OFB | | | #=============================================================================# | 400 R_wichmann_hill |401 R_marsaglia_multic. |402 R_super_duper | | 403 R_mersenne_twister |404 R_knuth_taocp |405 R_knuth_taocp2 | #=============================================================================# | 500 /dev/random |501 /dev/urandom | | #=============================================================================# | 600 empty | | | #=============================================================================#
Two "gold standard" generators in particular are provided to "test the test" -- AES_OFB and Threefish_OFB are both cryptographic generators and should be quite random. gfsr4, mt19937, and taus (and several others) are very good generators in the GSL, as well. If you are developing a new rng, it should compare decently with these generators on dieharder test runs.
Note that the stdin_input_raw interface (-g 200) is a "universal" interface. Any generator that can produce a (continuous) stream of presumably random bits can be tested with dieharder. The easiest way to demonstrate this is by running:
dieharder -S 1 -B -o -t 1000000000 | dieharder -g 75 -r 3 -n 2
where the first invocation of dieharder generates a stream of binary bits drawn from the default generator with seed 1 and the second reads those bits from stdin and tests them with the rgb bitdist test on two bit sequences. Compare the output to:
dieharder -S 1 -r 3 -n 2
which runs the same test on the same generator with the same seed internally. They should be the same.
Similarly the file_input generator requires a file of "cooked" (ascii readable) random numbers, one per line, with a header that describes the format to dieharder. Note Well! File or stream input rands (with any of the three methods for input) are delivered to the tests on demand, but if the test needs more than are available dieharder either fails (in the case of a stdin stream) or rewinds the file and cycles through it again, and again, and again as needed. Obviously this significantly reduces the sample space and can lead to completely incorrect results for the p-value histograms unless there are enough rands to run EACH test without repetition (it is harmless to reuse the sequence for different tests). Let the user beware!
List of the CURRENT fully implemented tests (as of the 08/18/08 snapshot):
#=============================================================================# # dieharder version 3.29.4beta Copyright 2003 Robert G. Brown # #=============================================================================# Installed dieharder tests: Test Number Test Name Test Reliability =============================================================================== -d 0 Diehard Birthdays Test Good -d 1 Diehard OPERM5 Test Suspect -d 2 Diehard 32x32 Binary Rank Test Good -d 3 Diehard 6x8 Binary Rank Test Good -d 4 Diehard Bitstream Test Good -d 5 Diehard OPSO Good -d 6 Diehard OQSO Test Good -d 7 Diehard DNA Test Good -d 8 Diehard Count the 1s (stream) Test Good -d 9 Diehard Count the 1s Test (byte) Good -d 10 Diehard Parking Lot Test Good -d 11 Diehard Minimum Distance (2d Circle) Test Good -d 12 Diehard 3d Sphere (Minimum Distance) Test Good -d 13 Diehard Squeeze Test Good -d 14 Diehard Sums Test Do Not Use -d 15 Diehard Runs Test Good -d 16 Diehard Craps Test Good -d 17 Marsaglia and Tsang GCD Test Good -d 100 STS Monobit Test Good -d 101 STS Runs Test Good -d 102 STS Serial Test (Generalized) Good -d 200 RGB Bit Distribution Test Good -d 201 RGB Generalized Minimum Distance Test Good -d 202 RGB Permutations Test Good -d 203 RGB Lagged Sum Test Good -d 204 RGB Kolmogorov-Smirnov Test Test Good
Full descriptions of the tests are available from within the tool. For example, enter:
rgb@lilith|B:1003>./dieharder -d 203 -h OK, what is dtest_num = 203 #================================================================== # RGB Lagged Sums Test # This package contains many very lovely tests. Very few of them, # however, test for lagged correlations -- the possibility that # the random number generator has a bitlevel correlation after # some fixed number of intervening bits. # # The lagged sums test is therefore very simple. One simply adds up # uniform deviates sampled from the rng, skipping lag samples in between # each rand used. The mean of tsamples samples thus summed should be # 0.5*tsamples. The standard deviation should be sqrt(tsamples/12). # The experimental values of the sum are thus converted into a # p-value (using the erf()) and a ks-test applied to psamples of them. #==================================================================
Note that all tests have been independently rewritten from their description, and may be functionally modified or extended relative to the original source code published in the originating suite(s). This has proven to be absolutely necessary; dieharder stresses random number generator tests as much as it stresses random number generators, and tests with imprecise target statistics can return "failure" when the fault is with the test, not the generator.
The author (rgb) bears complete responsibility for these changes, subject to the standard GPL code disclaimer that the code has no warranty. In essence, yes it may be my fault if they don't work but using the tool is at your own risk and you can fix it if it bothers you and/or I don't fix it first.
All tests are encapsulated to be as standard as possible in the way they compute p-values from single statistics or from vectors of statistics, and in the way they implement the underlying KS and chisq tests. Diehard is now complete in dieharder (although two tests are badly broken and should not be used), and attention will turn towards implementing more selected tests from the STS and many other sources. A road map of sorts (with full supporting documentation) is available on request if volunteers wish to work on adding more GPL tests.
Note that a few tests appear to have stubborn bugs. In particular, the diehard operm5 test seems to fail all generators in dieharder. Several users have attempted to help debug this problem, and it tentatively appears that the problem is in the original diehard code and not just dieharder. There is extensive literature on overlapping tests, which are highly non-trivial to implement and involve things like forming the weak inverse of covariance matrices in order to correct for overlapping (non-independent) statistics.
A revised version of overlapping permutations is underway (as an rgb test), but is still buggy. A non-overlapping (rgb) permutations test is provided now that should test much the same thing at the expense of requiring more samples to do it.
Similarly, the diehard sums test appears to produce a systematically non-flat distribution of p-values for all rngs tested, in particular for the "gold standard" cryptographic generators aes and threefish, as well as for the "good" generators in the GSL (mt19937, taus, gfsr4). It seems very unlikely that all of these generators would be flawed in the same way, so this test also should not be used to test your rng.
I hope that even during its development, you find dieharder useful. Remember, it is fully open source, so you can freely modify and redistribute the code according to the rules laid out in the Gnu Public License (version 2b), which might cost you as much as a beer one day. In particular, you can easily add random number generators using the provided examples as templates, or you can add tests of your own by copying the general layout of the existing tests (working toward a p-value per run, cumulating (say) 100 runs, and turning the resulting KS test into an overall p-value). Best of all, you can look inside the code and see how the tests work, which may inspire you to create a new test -- or a new generator that can pass a test.
To conclude, if you have any interest in participating in the development of dieharder, be sure to let me know, especially if you have decent C coding skills (including familiarity with Subversion and the GSL) and a basic knowledge of statistics. I even have documents to help with the latter, if you have the programming skills and want to LEARN statistics. Bug reports or suggestions are also welcome.
Submit bug reports, etc. torgb at phy dot duke dot edu