Robert G. Brown's General Tools Page

Things on the site itself that may be of interest to students or philosophers of any age or generation include complete online books of poetry, various support materials for the study of physics, and links related to beowulfery. All materials on this site that are authored by Robert G. Brown are Copyright 2004. The details of their Open Public License (modified) can be viewed here.

To use yum repositories on this site, you'll probably need to run rpm --import on Robert G. Brown's GPG public key.


Home Top Flashcard Program DieHarder Program Benchmaster Program Jove (editor) Program The C Book The Tao of Programming Your Brain: a User's Manual (draft) CVS Mini Howto
C project template PVM project template LaTeX project template HOWTO project template Latex Manual (online) random_pvm demo/template The yum HOWTO (draft) Yum Article Contact About

Site Links


Home

Home
Lilith
Class
Beowulf
Research
General
Poetry
Prose
Philosophy
Search
Contact
About

Webalyze

Home
Class
Beowulf
Research
General
Poetry
Prose
Philosophy

Misc

Brahma
(webalize)
DBUG
(webalize)
DULUG
Linux@Duke

General Interest (Miscellaneous) Links

Various links and publications of Robert G. Brown that fit under none of the other categories. All of these materials are published under a modified Open Publication License that permits unlimited free noncommercial and personal use. The materials (books, presentations, or otherwise) may not be published in any form or media that is sold for profit. The details of the license can be viewed here and in each available version viewed below. Stats for this page can be viewed here.

Commercial publishers interested in producing an actual book (or other media form) of the material below are encouraged to contact the author.

Contents


Back to top
Flashcard Program

Gflashcard: A Personal Learning Tool

by Robert G. Brown (rgb)

gflashcard Version 0.7.1


Contents


Descripction

gflashcard is a program based on GTK and XML for presenting simple flashcards to students in a standard terminal (e.g. xterm) window. Its license (GPL 2b) can be viewed at the bottom of this page. Its current features include:

  • Arithmetic flashcards (ones that take a numerical answer).
  • Spelling flashcards (it "reads each word out loud" and then checks the spelling. It can handle alternative spellings and homonyms (words with the same sound but different spellings and meanings).
  • True/False flashcards.
  • Multiple Choice flashcards. The multiple choice engine is powerful. One can define up to 1024 possible answers. Multiple choices can be selected to be counted "correct"; the program can manage combinations like "A and/or B" or "A and B" and hence is suitable for type K questions. Up to ten answers and foolers combined can be presented per question. The answers are always accompanied by randomly selected foolers and the choices are shuffled before presentation, so that choice "A" this time may choice "D" the next time you see it.
  • Foreign Language flashcards are an alternative application of the spelling or multiple choice flashcards. For example, an English word can be read and a Spanish translation required, or vice versa.
  • All flashcards can be accompanied by an easily recorded audio snippet. In the case of spelling or foreign language flashcards, an audio presentation is likely the only presentation schema used. In all cases, though, the audio feature means that flashcards that are suitable for the sight-impaired can be prepared. A future version of the program will provide audio feedback on each flashcard as well.
  • Flashcard sets can be presented in random order (the default, in fact) or list order, where the presentation order is determined by the author.
  • Flashcard sets can be mixed type. Spelling, arithmetic, multiple choice, foreign language, true/false -- a whole day's worth of material can easily be reviewed in a mixture less likely to be boring to small children. In fact, "fun" trivia type questions can be mixed in with academic material.
  • The program automatically scores each session.
  • The program automatically times each session.
  • The program has a practice mode where it loops indefinitely, presenting and representing a shuffled list of the same problems.
  • In practice mode, a user can be required to repeat each question that they miss one or more times, to help "drill" them briefly on their mistakes.
  • The program has a test mode where it runs for a specified amount of time and scores all questions that a student answers in that much time.
  • In all modes, each flashcard can include an explanation that appears when a wrong answer is entered. This can be a powerful teaching tool, as a student isn't just informed that their answer is wrong, they are presented with both the right answer and why the right answer is right, or why the wrong answers were wrong.
  • All problems are defined by a simple and straightforward XML tagging not unlike the tagging that produces this web page. Most problem types can be created by any text editor. A (still fairly crude, but functional) scripted recording tool is provided to automate the generation of audio/spelling clips for inclusion in flashcard sets. More advanced tools are under development (developer participation welcome).
  • The XML-based format of the flashcards is both open and easy to read, write, and parse in nearly any programming environment. It furthermore means that with an associated DTD a single set of flashcards can be presented by many tools in many ways. For example, a single set of flashcards might be:
    • presented in an ncurses/tty flashcard engine like "flashcard", which is also included on this site.
    • presented in a GUI flashcard engine like "gflashcard".
    • presented in a web-based flashcard engine using java, perl-cgi, php (not yet written)
    • transformed into simply formatted text with a suitable DTD.
    • transformed into text formatted for printing onto card stock to make real flashcards, again via DTD and tools such as jade.
    • similarly transformed into latex-formatted snippets that could then be included in actual textbooks, quizzes, problem assignments. flashcard's XML offers the possibility of their one day existing a large database of open content from which many people can draw problems for either self-guided or directed study.
  • The flashcard programs' source itself and the XML that specifies flashcards are open source and open standards. This means that anybody can take the sources and write their own derived flashcard presentation engine, as long as that engine in turn remains open source. It also means that anybody can write non-derived proprietary engines for presentation that can instantly present "any" flashcards built according to the standard. Flashcard sets themselves can in fact be copyright protected and sold for money. This combination of open presentation engine and open language and protection of the rights of problem set authors should encourage the development of content, both proprietary (for sale) and free.
  • Although gflashcard (the GTK presentation engine) and flashcard (the ncurses/tty presentation engine) are fully functional already and reasonably bug free, it is an ongoing project. Using XML as the basis of flashcards means that they will be at least moderately extensible -- future evolutions should contain support for graphical presentations, mathematical formulae, foreign character sets, GUI based presentation and authoring tools, web based presentation and authoring tools, problem databases and more. Using new features based on open source tools and standards is straightforward; XML means that each addition shouldn't break existing flashcard sets, only permit more powerful content to be added to them (as XML parsers will generally ignore tags they do not recognize while parsing the ones that they do).
  • Open Source Developers are welcome to participate. Content development is also openly solicited. My most precious commodity is time -- to facilitate the rapid development of this powerful tool requires time and/or money. In the documents directory there should be a tarball of free flashcards covering a number of topics in their own directories -- many lamentably empty. Interested parties should certainly consider writing and contributing content back to this tarball.

gflashcard requires libxml2 and Gtk-2.

flashcard requires libxml2 and ncurses.

Back to top

About flashcard

flashcard is a program I wrote to help my second grader drill basic spelling and arithmetic, my seventh grader drill Spanish and vocubulary, and more advanced students get immediate feedback on "conceptual" or memory building multiple choice questions.

flashcard is not a game. No imaginary space aliens are harmed as one learns vocabulary or arithmetic. It is a very, very efficient way to get the practice and immediate feedback necessary to memorize a large number of dry, boring factoids. It never gets tired, impatient, angry, or has to cook dinner. It doesn't require lots of little cutout cards or pieces of paper to get lost. Suitably installed on a local area network of student-accessible computers, it can be used as a memorization aid for nearly any sort of factoid and as a relatively weak "deep" learning tool.

Based on my own personal experience so far with it, young kids initially find it at least intriguing (doing flashcards on the computer is kind of neat), then boring and to be resisted, and finally just another daily homework chore to be done as quickly as possible. Older kids progress through these stages more rapidly as they can see that using flashcard to e.g. learn their vocabulary words beats the hell out of trying to do it by just reviewing their handout list or writing each word three times. By the time any student is in the latter stage, the program is generally "working".

I've been quite pleased with the results. My second (now third) grader went from getting an average of 60% or worse on timed arithmetic tests to getting consistent high-90%s (and occasional 100%s!) in about one month of three or four time a week drill.

His spelling is improving as well, but more slowly as he learned to spell phonetically and is a bit stubborn (literally) about the non-phonetic spellings and multiple possible spellings that abownd in English. He initially exhibited a similar stubbornness about memorizing arithmetic versus using his fingers to compute the answer, but repetition and time pressure eventually take their toll and he ended up memorizing the right answers in spite of himself.

One thing that I'm doing that may or may not be the right thing to do (and may be responsible for his initially relatively slow progress) is having him drill on all the second grade core vocabulary words at once, instead of doing a small list every week in phase with his class. In the short run this leaves him behind, as he sees his current vocabulary words only occasionally and mixed in with all many others. However, I expect that very soon, with consistent drill, he will suddenly know the entire list (this is the way it worked with arithmetic) and be not only ahead but done with learning second grade vocabulary half way through the year.


Back to top

Screenshots

________
New! gflashcard GUI!
The material displayed is the included "demo" tutorial that both demonstrates gflashcards abilities and serves as a template for building your own content.
________
Multiple Choice Flashcards
The material is chemistry questions for an academic bowl team.
________
Arithmetic Flashcards
Times tables for third graders.
________
Spellilng Flashcards
Although a screenshot cannot show it, the words are read out loud by the program and then typed in by the student.

Back to top

Practice Strategies

I'd suggest the following strategies for using flashcard for yourself or with your kids:

  • Use practice mode a lot more often than test mode. Practice mode (the default) actively drills the kids with no external time pressure. Each time they make a mistake they have to enter the correct answer (which is then presented to them) at least one time. This is just annoying enough to make them try, a little bit, to remember, and repetition aids memorization anyway. The number of "reps" can be varied in a flashcard file with the count tag.
  • Don't get angry with your kids for poor performance either in practice or in test mode. gflashcard will work over time -- believe it and just let it work. My kids beat themselves up over mistakes so much that I've had to tell them to relax and let the program teach them when they make mistakes. Keep short-run expectations, and pressure, low.
  • Do insist on a consistent, regular schedule of using flashcard to "do their flashcards" in all subjects you're using them for. For younger kids, every day is ideal, every other day acceptable, every week probably not enough. gflashcard will not work if the students don't use it regularly and get used to it and start working with it and not against it (see notes above). We try for every day and get maybe every other day. Use it regularly until mastery is achieved, and then occasionally to reinforce it and make it permanent.
  • In practice mode, you can specify the number of problems or just run it "forever" (until they q(uit)). When younger kids first start using it, 30 or 40 arithmetic problems or 20 vocabulary words is "a lot" and they won't be very good at entering answers. Start them easy until they get to where they can read a problem and enter an answer in less than 10 seconds. Then you should be able to crank up to 100 arithmetic problems and 50 vocabulary words, which should take them about fifteen minutes to run less as they get better. Much less.
  • There is no harm at all in offering performance or participation rewards (I mean "bribes"), especially to younger kids. Tell your child that when they first get (set your expectation here) you'll buy them a toy or a treat, take them to a movie, do something they enjoy with them. You're going to make them use the program whether or not they like it (and of course they "won't like it") -- at least give them something to look forward to when they do.
  • Remember that the computer is "stupid" (or if you prefer, my grading program is stupid). If you type an extra space after a word it will mark the spelling wrong. It probably can't cope with 1.0 as an integer answer or 1 as a floating point answer. Typing on a computer is also error prone, and your kids are probably learning to type at the same time they are drilling the words! I make a few mistakes per 100 doing bone-simple arithmetic in a hurry. Take all this into account when assessing performance; a score in the high 90%s may well really be "perfect" from the point of view of the child's grasp of the material being drilled. Try the program yourself for a few hundred problems (presuming that you know simple arithmetic perfectly:-) and you'll see what I mean.
  • Feel free to contact me at rgb@phy.duke.edu with suggestions, comments, remarks, bug reports. Note well that I'm planning to add various features, especially "explanations", multiple choice problems, and "instructions" in future releases as I have time to cut the code. Check this site from time to time for the current revision number/snapshot.
  • I may or may not be willing to author flashcard files to add to the package that your child needs (depends on my time and whether my own kids can use them). I will gratefully accept and add flashcard files and supporting audio directories authored by others to add to the package. It would be nice to have ALL the vocabulary words for K-12, listed by year, spoken in a pleasant voice with a usage sentence added to differentiate words that are homonyms, already in the package, wouldn't it? I don't plan to sell it, and it will be GPL'd forever, but if I ever DO sell some packaging of it I probably won't pay you for any contributions so made. Sorry.
  • On the same note, I'd cheerfully accept and add to the GPL package anything like bug patches, additions to the program itself (probably should check with me before writing them), extensions to or corrections of the flashcard xmlish tagset, webware or Gtk or "other" versions of the basic program. Given the open nature of the xmlish flashcard files, it should be pretty easy to write a "flashcard program" that displays them and does the flashcard rituals in many venues, including just plain printing them out formatted for being cut up into, well, flashcards!

Back to top

Caveat, Warning, Read This!

As is the case for all drilling/memorization tools, gflashcard can be used for great evil as well as great good. As a professional teacher, I am constantly made aware of the difference, and you should be to if you are going to use this program to "teach", either in the classroom, as a parent, or (to teach yourself) as a student.

There is a significant difference between memorization of factoids and true learning. To speak metaphorically, factoids are the many tiny rocks which are cemented together by experience and conceptualization into the edifice of our full understanding. One should never confuse successful memorization of a large body of factoids with real comprehension.

It is entirely possible to comprehend things deeply without memorizing lots of factoids. Mathematics is not arithmetic, although arithmetic is a useful skill that underlies some mathematics. History is not a bunch of events and their associated dates. Language is not a collection of words. Science is not scientific data. The abstract rule is not the many concrete realizations of the rule. One could embark on a long discussion of semantics and epistemology, semiotics and psychology -- and indeed I'm working slowly on a book on these subjects -- but not here. The main point is to recognize that memorization can be a soul-sucking process for a young mind (or an older one!) when unsupported by any sort of reason.

For many of these subjects, of course, memorizing factoids is one essential step in beginning to comprehend the subject. It is difficult to understand American History without knowing when the American Revolution occurred and whether it occurred before or after (say) the American Civil War. It is difficult to read and write clearly and effectively if one's collection of vocabulary factoids is inadequate. For that reason I think flashcard can be a useful component of teaching and learning, but it does not teach anything like real comprehension of a subject, only its associated and foundational factoids, and its only real virtue here is its efficiency -- by drilling those factoids with a tool, one can quickly build up a base of factual knowledge sufficient to be a foundation for deeper learning.

I would therefore recommend that this tool be used ONLY as a factoid memorization tool, and NOT as a classroom "testing" tool or "teaching" tool, although it does have a timed test mode and other things that might be construed or abused into a classroom role. Don't expect flashcards to be more than they are or do more than they can do.



Back to top
DieHarder Program

DieHarder: A Random Number Test Suite

Version 2.24.7

Robert G. Brown (rgb)

Dirk Eddelbuettel

At the suggestion of Linas Vepstas on the Gnu Scientific Library (GSL) list this GPL'd suite of random number tests will be named "DieHarder". Using a movie sequel pun for the name is a double tribute to George Marsaglia, whose "Diehard battery of tests" of random number generators has enjoyed years of enduring usefulness as a test suite.

The DieHarder suite is more than just the diehard tests cleaned up and given a pretty GPL'd source face in native C. Tests from the Statistical Test Suite (STS) developed by the National Institute for Standards and Technology (NIST) are being incorporated, as are new tests developed by rgb. Where possible or appropriate, all tests that can be parameterized ("cranked up") to where failure, at least, is unambiguous are so parameterized and controllable from the command line.

A further design goal is to provide some indication of why a generator fails a test, where such information can be extracted during the test process and placed in usable form. For example, the bit-distribution tests should (eventually) be able to display the actual histogram for the different bit ntuplets.

DieHarder is by design extensible. It is intended to be the "Swiss Army Knife of random number test suites", or if you prefer, "the last suite you'll ever ware" for testing random numbers.


DieHarder Download Area

The version numbers have the following meaning:

  • First number (major). Bumped only when major goals in the design roadmap are reached (for example, finishing all the diehard tests). Version 1.x.x, for example, means that ALL of diehard (and more) is now incorporated in the program. Version 2.x.x means that the tests themselves have been split off into the libdieharder library, so that they can be linked into scripting languages such as R, new UIs, or user code. 3.x.x would be expected to indicate that the entire STS suite is incorporated, and so on.
  • Second number (first minor). This number indicates the number of tests currently supported. When it bumps, it means new tests have been added from e.g. STS, Knuth, Marsaglia and Tsang, rgb, or elsewhere.
  • Third number (second minor). This number is bumped when significant features are added or altered. Bug fixes bump this number, usually after a few bumps of the release number for testing snapshots. This number and the release are reset to 0 when the major is bumped or a new test is added to maintain the strictly increasing numerical value on which e.g. yum upgrades rely.

The single-tree dieharder sources (.tgz and .src.rpm) files can be downloaded from this directory. In addition, i386 binary rpm's built on top of Fedora Core 6 are present. Be warned: the GSL is a build dependency. The current packaging builds both the library and the dieharder UI from a single source rpm, or from running "make" in the toplevel directory of the source tarball. With a bit of effort (making a private rpm building tree), "make rpm" should work for you as well in this toplevel directory.

This project is under very active development. Considerable effort is being expended so that the suite will "run out of the box" to produce a reasonably understandable report for any given random number generator it supports via the "-a" flag, in addition to the ability to considerably vary most specific tests as applied to the generator. A brief synopsis of command options to get you started is presented below. In general, though, documentation (including this page, the man page, and built-in documentation) may lag the bleeding edge snapshot by a few days or more.

An rpm installation note from Court Shrock:

I was reading about your work on dieharder.  First, some info
about getting dieharder working in Gentoo:

cd ~
emerge rpm gsl
wget
http://www.phy.duke.edu/~rgb/General/dieharder/dieharder-0.6.11-1.i386.rpm
rpm -i --nodeps dieharder-0.6.11-1.i386.rpm

Rebuilding from tarball source should always work as well, and if you are planning to play a lot with the tool may be a desireable way to proceed as there are some documentation goodies in the ./doc subdirectory and the ./manual subdirectory of the source tarball (such as the original diehard test descriptions and the STS white paper).

George Marsaglia retired from FSU in 1996. For a brief time diehard appeared to have finally disappeared from FSU webspace, but what had really happened is google's favorite path to it had disappeared when his personal home directory was removed. Diehard is still there, at the URL http://www.stat.fsu.edu/pub/diehard as well as at a Hong Kong website. The source code of diehard itself is (of course) Copyright George Marsaglia but Marsaglia did not incorporate an explicit license into his code which muddles the issue of how and when it can be distributed, freely or otherwise. Existing diehard sources are not directly incorporated into dieharder in source form for that reason, to keep authorship and GPL licensing issues clear.

Note that the same is not true about data. Several of the diehard tests require that one use precomputed numbers as e.g. target mean, sigma for some test statistic. Obviously in these cases we use the same numbers as diehard so we get the same, or comparable, results. These numbers were all developed with support from Federal grants and have all been published in the literature, though, and should therefore be in the public domain as far as reuse in a program is concerned.

Note also that most of the diehard tests are modified in dieharder, usually in a way that should improve them. There are three improvements that were basically always made if possible.

  • The number of test sample p-value that contribute to the final Kolmogorov-Smirnov test for the uniformity of the distribution of p-values of the test statistic is a variable with default 100, which is much larger than most diehard default values. This change alone causes many generators that are asserted to "pass diehard" to in fact fail -- any given test run generates a p-value that is acceptable, but the distribution of p-values is not uniform.
  • The number of actual samples within a test that contribute to the single-run test statistic was made a variable when possible. This was generally possible when the target was an easily computable function of the number of samples, but a number of the tests have pre-computed targets for specific numbers of samples and that number cannot be varied because no general function is known relating the target value to the number of samples.
  • Many of diehard's tests investigated overlapping bit sequences as being "independent identically distributed (iid) samples. This was generally done because it used file-based input of random numbers and the size of files that could reasonably be generated and tested by in the mid-90's contained on the order of a million random deviates. The restriction of testing to small, overlapping samples is neither necessary nor desireable in modern testing -- numerical simulation can easily consume ten to the eighteenth or more uniform deviates and the integration of the test with the built-in generators in the GSL permits "most" generators to be tested with essentially no limits on the number that can be generated for testing purposes. Indeed, some tests are likely to reveal the short period or limited number of returns by some generators because they can consume far more numbers than are available within the period, which was not so easy with diehard. dieharder therefore permits overlapping sequences to be a non-default option selected by the user wherever possible.

In a few cases other variations are possible for specific tests. This should be noted in the built-in test documentation for that test where appropriate.

Aside from these major differences, note that the algorithms were independently written more or less from the test descriptions alone (sometimes illuminated by a look at the code implementations, but only to clear up just what was meant by the description). They may well do things in a different (but equally valid) order or using different (but ultimately equivalent) algorithms altogether and hence produce slightly different (but equally valid) results even when run on the same data with the same basic parameters. Then, there may be bugs in the code, which might have the same general effect. Finally, it is always possible that diehard implementations have bugs and can be in error. Your Mileage May Vary. Be Warned.


About DieHarder

The primary point of DieHarder (like Diehard before it) is to make it easy to time and test (pseudo)random number generators, both software and hardware, for a variety of purposes in research and cryptography. The tool is built entirely on top of the GSL's random number generator interface and uses a variety of other GSL tools (e.g. sort, erfc, incomplete gamma, distribution generators) in its operation. Five examples are provided of wrapping a random number generator (including both file I/O and the entropy-based /dev/random and /dev/urandom available on many linux systems) and inserting it so that it is can be called via the GSL interface. It is strongly suggested that any software random number generator to be tested by provided with such an GSL-compatible interface.

A file interface (still consistent with the GSL to the extent possible) has been added that allows random numbers in either binary unsigned integer or a variety of ascii encoded formats to be read in from a file. This permits dieharder to be used with "any" generator (directly wrappable or not) that can generate a table of random numbers, but this will place severe limits on some of the tests, which can require very large numbers of random numbers. For this reason software generators should be implemented directly if at all possible and not via the file interface.

In this respect, DieHarder differs significantly from Diehard, which used file based sources of random numbers exclusively and would "work" with only a few million random numbers in such a file. Modern random number generators in a typical simulation application can easily need to generate 10^18 or more random numbers, generated from hundreds, thousands, millions of different seeds, over months to years of accumlated run time and are therefore sensitive to weaknesses that might not be revealed by such short sequences even with excellent and sensitive tests.

This was, in part, the motivation for the development for the Statistical Test Suite by NIST, which focusses more on cryptographical strength (although the general testing methodology is much the same).

The development of DieHarder was motivated by the following, in rough order of importance:

  • To provide a readily available, rpm-installable toolset so that "consumers" of random numbers (who typically use large numbers of random numbers in e.g. simulation or other research) can test the generator(s) they are using to verify their quality or lack thereof.
  • To provide a very simple user interface for that toolset for random number consumers. A GUI is on the list of things to do, although it adds little to the practical utility of the tool.
  • To provide lots of lots of knobs and dials and low level control for statistical researchers that want to study particular generators with particular tests in more detail.
  • To have the entire test code and documentation be fully Gnu Public Licensed and hence openly available for adaptation, testing, comment, and modification so that the testing suite itself becomes reliable and can be easily extended.
  • To provide a fairly simple API for adding new tests with a common set of low-level testing tools and a common test structure that leads (one hopes) to an unambiguous decision to accept or reject any given random number generator on the basis of any given test for a suitable choice of controllable test parameters.
  • To allow all researchers to be able to directly test, in particular, the random number generators interfaced with the GSL. This is a deliberate design decision justified by the extremely large and growing number of random number generators prebuilt into the GSL and the ease of adding new ones (either contributing them to the project or for the sole purpose of local testing).
  • To allow researchers that use e.g. distributions directly generated by GSL random distribution generation routines (which can in principle fail two ways, due to the failure of the underlying random number generator or due to a failure of the generating algorithm) to be able to directly validate their particular generator/distribution combination, where possible.

Note well that the primary objections I have towards diehard and STS are not that they are or are not adequate, accurate and complete; it is that the code itself is not properly packaged for reuse, testing, and extension. Diehard is remarkably poorly documented (with one small paragraph of text describing each test, even very complex ones, and with no accompanying description of how certain important data used in the program were actually computed). STS, in contrast, is really nothing but its description in documentation with no readily available open source code for implementation. DieHarder will hopefully rectify both situations and be both well documented and available in clearly publically licensed code to make it extremely easy for anybody to test random numbers on any GSL-supported platform.

Although this tool is being developed on Linux/GCC-based platforms, it should port with no particular difficulty to other Unices, especially ones that support RPMs. No particular effort is being expended at this time to make it run on Windows based compute platforms (due to a lack of availability of such platforms and compilers to rgb) but there is no reason to think that such a port would be terribly difficult PROVIDED that the Gnu Scientific Library is installable under Windows.

Essential Usage Synopsis

If you compile the test or install the provided binary rpm's and run it as:

dieharder -a

it should run -a(ll) tests on the default GSL generator.

Choose alternative tests with -g number where

dieharder -g -1

will list all possible numbers known to the current snapshot of the DieHarder (mostly from the GSL).

dieharder -l

should list all the tests implemented in the current snapshop of DieHarder. Finally, the venerable and time tested:

dieharder -h

provides a Usage synopsis (which can quite long) and

man dieharder

is the (installed) man page, which may or many not be completely up to date as the suite is under active development. For developers, additional documentation is available in the toplevel directory or doc subdirectory of the source tree. Eventually, a complete DieHard manual in printable PDF form will be available both on this website and in /usr/share/doc/dieharder-*/.

List of Random Number Generators and Tests Available

List of GSL and user-defined random number generators that can be tested by DieHarder:

rgb@lilith|B:1344>dieharder
              Listing available built-in gsl-linked generators:           |
 Id Test Name           | Id Test Name           | Id Test Name           |
==========================================================================|
  0 borosh13            |  1 cmrg                |  2 coveyou             |
  3 fishman18           |  4 fishman20           |  5 fishman2x           |
  6 gfsr4               |  7 knuthran            |  8 knuthran2           |
  9 lecuyer21           | 10 minstd              | 11 mrg                 |
 12 mt19937             | 13 mt19937_1999        | 14 mt19937_1998        |
 15 r250                | 16 ran0                | 17 ran1                |
 18 ran2                | 19 ran3                | 20 rand                |
 21 rand48              | 22 random128-bsd       | 23 random128-glibc2    |
 24 random128-libc5     | 25 random256-bsd       | 26 random256-glibc2    |
 27 random256-libc5     | 28 random32-bsd        | 29 random32-glibc2     |
 30 random32-libc5      | 31 random64-bsd        | 32 random64-glibc2     |
 33 random64-libc5      | 34 random8-bsd         | 35 random8-glibc2      |
 36 random8-libc5       | 37 random-bsd          | 38 random-glibc2       |
 39 random-libc5        | 40 randu               | 41 ranf                |
 42 ranlux              | 43 ranlux389           | 44 ranlxd1             |
 45 ranlxd2             | 46 ranlxs0             | 47 ranlxs1             |
 48 ranlxs2             | 49 ranmar              | 50 slatec              |
 51 taus                | 52 taus2               | 53 taus113             |
 54 transputer          | 55 tt800               | 56 uni                 |
 57 uni32               | 58 vax                 | 59 waterman14          |
 60 zuf                 |
                   Listing available non-gsl generators:                  |
 Id Test Name           | Id Test Name           | Id Test Name           |
==========================================================================|
 61 /dev/random         | 62 /dev/urandom        | 63 empty               |
 64 file_input          | 65 file_input_raw      |

Note that the last five tests are examples of random number generators that have been wrapped up in GSL compatible clothes and linked to the GSL so that the standard GSL interface works for them. Any random number generator that one wishes to test can thus easily be added for testing using these as prototypes, and can likely be submitted to the GSL for inclusion if they pass the tests as well or better than the tests that are already there. That makes this a very convenient tool for testing new RNGs.

Note also that the last two non-gsl generators are "universal" generators in the sense that they permit you to input your OWN random number stream from a file (but NOT from /dev/random or /dev/urandom, be warned). The file_input generator requires a file of "cooked" (ascii readable) random numbers, one per line, with a header that describes the format to dieharder. This interface is still somewhat experimental -- not all ascii formats have been tested. However, it has been tested and should work for 32 bit unsigned integers represented directly in ascii or as 32 bits of binary. An example of the required header for these formats is given here:

#==================================================================
# generator mt19937_1999  seed = 1274511046
#==================================================================
type: u
count: 100000
numbit: 32
3129711816
  85411969
2545911541
 903839182
2564046000
1157728411
 202655667
 969286899
1519043834
... (for 100,000 rands total).
#==================================================================
# handmade.  Comments are ignored, obviously.
#==================================================================
type: b
count: 10
numbit: 32
00000000000000000000000000000001
00000000000000000000000000000010
00000000000000000000000000000011
00000000000000000000000000000100
00000000000000000000000000000101
00000000000000000000000000000110
00000000000000000000000000000111
00000000000000000000000000001000
11111111111111111111111111111111
11111111111111110000000000000000

(where the latter is clearly not very random).

The last type, file_input_raw, accepts a file of raw bits as input, such as might be generated by

 dd if=/dev/urandom of=testrands.raw bs=4 count=1000000
(to generate 1,000,000 four-byte ints directly from the software-augmented kernel entropy generator). That is, running the tests from such a file should be approximately the same as testing /dev/urandom directly.

The main (important!) difference is that some of the test require a lot of random numbers -- far more than were needed by diehard. Indeed, dieharder runs many of the diehard tests 100 independent times, generating a p-value for each, and plots a histogram of the p-values and generates a p-value for the (presumed uniform) distribution of p-values! This approach mimics the histogram presented in the STS suite but augments it with a hard Kolmogorov-Smirnov p-value that describes the distribution of p itself in many independent test runs!

This protects one somewhat from the "p happens" problem described by Marsaglia -- every now and then you will have a run with a very low p from a good generator, but overall a good generator will generate a uniform distribution of p-values. Dieharder lets you visually decide if the distribution is or isn't credibly uniform, while giving you an index that in most cases is a fairly clear "good" or "bad" indicator for a given random sequence or generator. Direct control over the number of samples used in the computation of the KS p-value (as well as other important test parameters) permit one to "crank up" the generator to clarify what appears to be a marginal level of success or failure based on a few separate runs.

File input rands are delivered to the tests on demand, but if the test needs more than are available it simply rewinds the file and cycles through it again, and again, and again as needed. Obviously this significantly reduces the sample space and can lead to completely incorrect results for the p-value histograms unless there are enough rands to run EACH test without repetition (it is harmless to reuse the sequence for different tests). Let the user beware!

List of the CURRENT fully implemented tests (as of the 07/12/06 snapshot):

rgb@lilith|B:1346>dieharder -l

                     DieHarder Test Suite
========================================================================
The following tests are available and will be run when diehard -a is
invoked.  Special options or suggested parameters are indicated if
they are needed to get a satisfactory result (such as completion in a
reasonable amount of time).

            Diehard Tests
   -d 1  Diehard Birthdays test
   -d 2  Diehard Overlapping Permutations test
   -d 3  Diehard 32x32 Binary Rank test
   -d 4  Diehard 6x8 Binary Rank test
   -d 5  Diehard Bitstream test
   -d 6  Diehard OPSO test
   -d 7  Diehard OQSO test
   -d 8  Diehard DNA test
   -d 9  Diehard Count the 1s (stream) test
   -d 10 Diehard Count the 1s (byte) test
   -d 11 Diehard Parking Lot test
   -d 12 Diehard Minimum Distance (2D Spheres) test
   -d 13 Diehard 3D Spheres (minimum distance) test
   -d 14 Diehard Squeeze test
   -d 15 Diehard Sums test
   -d 16 Diehard Runs test
   -d 17 Diehard Craps test
   -d 18 Marsaglia and Tsang GCD test

             RGB Tests
   -r 1 Bit Persist test
   -r 2 Bit Ntuple Distribution test suite (-n ntuple for 1-8)
   -r 3 Timing test (times rng)

      Statistical Test Suite (STS)
   -s 1 STS Monobit test
   -s 2 STS Runs test

            User Tests
   -u 1 User Template (Lagged Sum Test)

Note that the design goal of completely encapsulating diehard is COMPLETED with all tests apparently functional as of 7/12/06. dieharder is now in a "beta" debugging/testing phase until the new code shakes out, but it produces what are for the most part very reasonable and consistent values for all the tests on known "good" or "bad" random number generators encapsulated in the GSL.

Full descriptions of the tests are available (as you can see) from within the tool and source documentation. All tests are completely and independently rewritten from their description alone, and may be functionally modified or extended relative to the original source code published in the originating suite. The author (rgb) bears complete responsibility for these changes, subject to the standard GPL code disclaimer (in essence, yes it's my fault if they don't work but using the tool is at your own risk and you can fix it if it bothers you and/or I don't fix it first).

Development Notes

All tests are encapsulated to be as standard as possible in the way they compute p-values from single statistics or from vectors of statistics, and in the way they implement the underlying KS and chisq tests. Diehard is now complete in dieharder, and attention will turn towards implementing more selected tests from the STS. I also have my eye on the as-yet unimplemented tests from Knuth's The Art of Programming, lagged correlation, and more bitwise tests that have occurred to me as I ported diehard (which does some things somewhat backwards or indirectly, IMO).

Thoughts for the Future/Wish List/To Do

  • Tests of GSL random distribution (as opposed to number) generators, as indirect tests of the generators that feed them.
  • Anderson-Darling KS test. Kuiper works, but AD is more common. It therefore should be a user choice, or should even do both. Why not? The computation for either is trivial compared to the effort required to run the tests in the first place.
  • New tests, compressions of existing ones that are "different" but really the same. Hyperplane tests. Spectral tests. Especially the bit distribution test with user defineable lag or lag pattern (to look for subtle, long period correlations in the bit patterns produced).
  • Collaborators. Co-developers welcome, as are contributions or suggestions from users. Note well that users have already provided critical help debugging the early code! Part of the point of a GPL project is that you are NOT at the mercy of a black box piece of code. If you are using dieharder and are moderately expert at statistics and random numbers and observe something odd, please help out!

Conclusions

I hope that even during its development, you find dieharder useful. Remember, it is fully open source, so you can freely modify and redistribute the code according to the rules laid out in the Gnu Public License (version 2b), which might cost you as much as a beer one day. In particular, you can easily add random number generators using the provided examples as templates, or you can add tests of your own by copying the general layout of the existing tests (working toward a p-value per run, cumulating (say) 100 runs, and turning the resulting KS test into an overall p-value). Best of all, you can look inside the code and see how the tests work, which may inspire you to create a new test -- or a new generator that can pass a test.

To conclude, if you have any interest in participating in the development of dieharder, be sure to let me know, especially if you have decent C coding skills (including familiarity with Subversion and the GSL) and a basic knowledge of statistics. I even have documents to help with the latter, if you have the programming skills and want to LEARN statistics. Bug reports or suggestions are also welcome.

Submit bug reports, etc. to

rgb at phy dot duke dot edu



Back to top
Benchmaster Program

Benchmaster: A System Testing Suite

by Robert G. Brown (rgb)

benchmaster Version 1.1.3


Contents


Back to top

Description

Benchmaster is a fairly sophisticated program designed to time and exercise very specific systems functions. It uses the fastest onboard clock that it can find (generally the CPU's cycle counter on x86-derived architectures) to time test "objects", and determines the precision of that timer including the overhead of the timer call.

A test object contains (in addition to test creators/destructor functions) a test routine with two branches -- one "empty" and one "full" -- that are structured to be, as nearly as possibly, identical except for the additional code to be timed in the full loop.

The test harness then determines iteration counts -- the number of times it has to run the empty or full branches to accumulate a time much greater than the timer resolution. It then proceeds to generate a requested number of samples of the timings of the empty and full branches. Finally, it subtracts the average full time from the average empty time to determine the result and evaluates the mean and standard deviations to produce the cumulative expected error.

Finally, the results are printed out in a standard XML based format with an optional header describing the test and certain runtime details. Numbers that are returned include the length of the vector (see discussion of vector tests below), the stride (ditto), and the mean time with error bars, in nanoseconds required to execute just the tested code fragment in the particular context of the test routine. Finally, a "megarate" is returned that is generally the number of million times the test fragment is executed per second. There are a few exceptions to this, see below.

The use of XML in the test output is one of the project's major design goals, as it in principle makes it possible to build e.g. DTDs for the benchmark description language to generate standard reports in a variety of media. Of course this is not yet done -- it is one of the next major design goals. Volunteers/contributions of code welcome... as are comments on the XML itself (which is still pretty malleable until at least one program or transformation process is written that uses it).

Back to top

Benchmaster Download Area

The version numbers have the following meaning. Note that these aren't necessarily what you might expect, so be sure to at least glance at this.

  • First number (major). Bumped only when major goals in the design roadmap are reached (for example, finishing all the test ideas outlined below:-). Version 1.x.x, for example, means that the basic testing structure is well defined and stable.
  • Second number (first minor). Bumped for bugfixes, feature additions, major or minor improvements (but reset to zero when a major design goal is reached).
  • Third number (second minor). This number indicates the number of tests currently supported.

All benchmaster sources (.tgz and .src.rpm) files can be downloaded from this directory. In addition, i386 binary rpm's (currently built on top of Fedora Core 5) are present.

This project is currently semi-stable. The tests seem to work (at least for me) fairly consistently, there just aren't as many as I'd like there to eventually be. Alas, I have too many projects going at once, and have recently been spending a lot of time with the Dieharder project available elsewhere on this website. If you are interested in seeing the benchmaster project advanced more aggressively, contact me (especially with an offer to help out:-).

Below are descriptions of the different kinds of tests already in benchmaster.

Back to top

Vector Tests

As noted above, in addition to being sampled in one loop (with a controllable number of samples) and iterated inside that loop per sample (with an automatically set but user controllable number of iterations) some of the tested code fragments are loops that operate on vectors of numbers. For example, the stream benchmark by John D. McCalpin consists of four simple numerical operations -- a vector copy, scaling a vector, adding a vector, and adding and rescaling a vector. Each of these stream operations is one of the tests already programmed into benchmaster, so benchmaster is capable of "running stream" on a system

In stream, the length of the tested vector is generally fixed -- programmed directly into the test as a compile-time parameter. In benchmaster's stream (and other related vector arithmetic tests) the vector length is a variable and can be selected by the user at runtime. This permits one to directly observe how cache improves performance for various vector lengths and strides.

Most modern CPUs have elaborate optimization mechanisms designed to improve numerical performance on vectors in particular. They prefetch data from memory into the cache in order to ensure that it is waiting there when needed. They have special registers and pipelines that speed repeated operations iterated over a vector. However, sometimes one runs code that (although iterative) does not have such a fortunate memory access pattern. Benchmaster therefore contains a facility for permitting at least some vector tests to be performed in "shuffled order". Basically, a matching vector of vector indices is shuffled and used as the indices for the test. The overhead associated with the shuffling process per se is duplicated in the "empty" code fragment so that only the actual time required to access the memory in shuffled order contributes.

Back to top

Test Design

The general function and design of the timing harness is explained above and documented in the code itself. This section indicates how to add a test to the suite, as one of its major design goals is to make it easy for you to add your own test fragments and operations.

Before showing you an example of a test (the easiest way to document how to create a new one) let me remark on a few of the many problems that plague "benchmarking". For one, the speed with which code is executed depends on many, many things, some of which are out of our control. For example, there is an obvious dependence on system state -- if your task is swapped off of the CPU on a multitasking system in mid-instruction, your time for that sample will be quite high. There is an uncontrollable dependence on the compiler. I'm assuming that the full and empty branches are both likely to be close together in memory and both likely to be kept resident in cache in order for the timings to be comparable and subtractable. This is likely enough to be true, but there are no guarantees from the compiler.

It is also very difficult to test/time single instructions of just about any kind. A multiply on a system can take a fraction of a clock cycle. The finest-grained timekeeper on the system is the clock cycle counter (typically order of a nanosecond), and the code required to read it takes many nanoseconds to execute. Timing a multiply is thus akin to timing the beating of a hummingbird's wings with an hourglass, a process made worse by the compiler's tendency to optimize away instructions that it can tell are doing nothing and can be compressed.

This is just a warning. As it says in the program's Usage statement and man page, the "Mega-rates" returned by this tool are BOGUS and may not be even approximately correct. When interpreting results of existing tests or adding your own, be cautious and test the tester as much as the code fragment itself until the results make sense in your own mind.

One final warning about hidden optimizations, overflows, etc. Many CPUs are smart enough to use superfast internal arithmetic in order to perform certain operations. For example, multiplying by 0.0 or 1.0 or 2.0 on many CPUs will take much, much less time than multiplying by (say) 3.141592653589. For that reason I typically use this as a number to multiply by whenever I am testing multiplication. Of course if one iterates multiplication by \pi it doesn't take long to overflow a floating point variable, so one needs to use caution in designing test loops to avoid this without using 1.0 as a multiplier or dividing (which takes much longer than multiplication) when one wants to test only multiplication.

With all that said, a test consists of two pieces. An include file that minimally contains the function prototypes for the the test itself, for example (for the stream copy test):

/*
* $Id: benchmaster.abs,v 1.7 2004/12/17 15:31:56 rgb Exp $
*
* See copyright in copyright.h and the accompanying file COPYING
*
*/

/*
 * Goes with stream_copy_test.c.
 */
void stream_copy_init(Test *newtest);
void stream_copy_alloc();
void stream_copy_free();
int stream_copy_test(int full_flag);
void stream_copy_results();
void stream_copy_about();

and the stream copy test source for these components:

/*
 * $Id: benchmaster.abs,v 1.7 2004/12/17 15:31:56 rgb Exp $
 * See copyright in copyright.h and the accompanying file COPYING
 */

#include "benchmaster.h"

 /*
  *==================================================================
  * This is the "copy" test from the stream suite.  It is not
  * directly comparable to stream results for a variety of reasons.
  * For one, it uses malloc to allocate all vectors so vector
  * length may be different from what is compiled into any given copy
  * of stream.  Also, it uses the average time to determine the rate
  * and not the best time.  It will therefore generally return
  * results that are very SLIGHTLY LOWER/SLOWER than regular stream
  * (but which may be more realistic for general purpose code).
  *
  * It also uses a different timing harness, one that is both
  * more accurate (uses a superior timer) and which repeats the
  * computation many times, typically order 100, to obtain both a 
  * mean time and its standard deviation as test results.
  *==================================================================
  */


void stream_copy_init(Test *mytest){

 int i;

 mytest->alloc = stream_copy_alloc;
 mytest->free = stream_copy_free;
 mytest->test = stream_copy_test;
 mytest->results = stream_copy_results;
 snprintf(mytest->name,K,"stream copy");
 snprintf(mytest->about,K,"d[i] = a[i] (%d byte double vector)",sizeof(double));

 if(verbose == VERBOSE || verbose == V_INIT){
   printf("# Init for test %s\n",mytest->name);
 }


}

void stream_copy_alloc()
{

 int i;

 /*
  * Allocate vector(s) to be tested with and initialize it and all
  * associated test-specific variables.
  */
 d = (double *) malloc((size_t) (size*sizeof(double)));
 a = (double *) malloc((size_t) (size*sizeof(double)));
 /*
  * Initialize the vector. xtest is set from the command line, default PI.
  */
 for(i=0;i < size;i+=stride){
   a[i] = xtest;
 }

}

void stream_copy_free()
{

 int i;

 /*
  * Free all the memory we just allocated, to be neat and clean and
  * all that.
  */
 free(a);
 free(d);

}

int stream_copy_test(int full_flag)
{

 int i;
 
 if(full_flag){
   for(i=0;i < size;i+=stride){
     d[i] = a[i];
   }
   return(full_flag);
 } else {
   return(full_flag);
 }
}

void stream_copy_results(Test *mytest)
{

 double nanotime_norm;

 /*
  * This is the number of copy operations in the core loop.  We adjust the
  * test normalization so it is the SAME as that of stream, which computes
  * the rate as "megabytes/seconds": 1.0e-6*2*sizeof(double)*nsize/time
  * (in seconds).  We measure nanoseconds, so ours is just a teeny bit
  * different.
  */
 nanotime_norm = (double)size/stride;

 mytest->avg_time = fabs(mytest->avg_time_full - mytest->avg_time_empty)/nanotime_norm;
 mytest->sigma = (mytest->sigma_time_full + mytest->sigma_time_empty)/nanotime_norm;
 mytest->avg_megarate = 1000.0*2*sizeof(double)/mytest->avg_time;

 show_results(mytest);

}

Note that xtest is a user-settable value for the real number used to initialize the vector. It defaults to pi, but can be overridden on the command line so you can see for yourself the effect of using 1.0 or 0.0 in certain contexts of certain tests instead of a number that cannot be optimized at the CPU microcode level. Note that the avg_megarate is a bit different than from other tests as it returns a "bandwidth" in Mbytes/sec (to be comparable with stream) instead of a "bogomegarate", which is just the number of millions of iterations of the test fragment completed per second.

This code fragment in my testing harness produces results that are generally a few percent slower than those of off-the-shelf stream. This is understandable -- I use the average time instead of the minimum time to generate the number, so my stream number is average/expected performance over a reasonably large number (default 100) of samples while stream reports the observed peak performance on a fairly small sample size (default 10).

Once you have written a "microbenchmark" test fragment of your own, you have merely to insert it into the overall structure of the benchmaster program. The best way to do that is to follow exactly the way the existing tests are inserted. Put your test's include file and NAME into the enum and include list in tests.h. Initialize the test in startup.c. At that point the code should "just work". Try to use the provided global variables and pointers for things like xtest and vectors, just to keep the primary code from getting too gooped up. You should be able to put globals into the include file for your test if you need to add any new ones for specific tests, and should be sure to add code to parsecl.c and startup as required to manage them.

Note Well! Benchmaster is not written for the purpose of encouraging "benchmarketing" by vendors, so take any publication of results from the benchmark with a grain of salt.

Consider the source! I mean that quite literally. Benchmaster is a fully GPL tool that is available in rpm form. That means that you or a vendor can hack up the source, insert optimized machine code, etc. Note that you should insist on results of unmodified benchmaster source, compiled with the default flags as at least one component of what you consider when you are trying to measure or understand systems performance. Only then is it reasonable to have fun tuning things up to see how fast you can make them.

Back to top

Conclusion

As noted above, benchmaster is designed to be easy to modify to insert your own tests as "test objects". You should feel free to contribute particularly good or revealing tests back to me for inclusion in the primary distribution of the tool. If and when I get so many tests into the tool that the current method of selecting tests by number no longer works, I'll probably have to rewrite the toplevel interface to make it easier to keep track of everything, but that is likely some time off (and should not affect results).

You should also feel free to make suggestions or report bugs or contribute code to the timing harness itself -- I am constantly trying things to see if I can obtain better control over system state (and hence get more reproducible results) and to see if I can finagle accurate timings of those hummingbird wing beats using my hourglass. Perhaps one of you reading this knows just how to go about doing it in userspace code...

Future Plans for benchmaster include:

  • More microbenchmark tests, including ones for e.g. disk access, network latency/bw (will need an accompanying daemon to run on the target)
  • More shufflable tests (shuffled stream?)
  • Selected application or library level tests.
  • XML test interface on the input side as well as the output? It would be nice to be able to read in a "test descriptor file" that made benchmaster work through a suite of tests according to a set of instructions and produce a table/report as output. This would also facilitate the development of a...
  • ...GUI. Both web and Gtk interfaces would greatly simplify the use of the tool and would permit the immediate-gratification presentation of e.g. performance curves as vector sizes are swept across the size of L1 and L2 caches.

Participation in benchmaster is openly solicited and encouraged. All code is written in C for efficiency and its tight coupling to the operating system, for all that it is still "object oriented" in general design to keep it scalable and extensible. Contact me at rgbATphy.duke.edu if you are interested in participating.

I hope that you find benchmaster useful.



Back to top
The C Book

NOTE WELL! This book is not written by or copyrighted by Robert G. Brown! It is a mirror of an online book on C programming that -- curiously enough -- has a license almost identical to my Gnu Public License, v2b (b for beverage). However, you have to buy the actual authors of the book a beer, not me.

From the book's front page:


This is the online version of The C Book, second edition by Mike Banahan, Declan Brady and Mark Doran, originally published by Addison Wesley in 1991. This version is made freely available.

While this book is no longer in print, it's content is still very relevant today. The C language is still popular, particularly for open source software and embedded programming. We hope this book will be useful, or at least interesting, to people who use C.

If you have any comments about this book, or if you find any bugs in its presentation, please send a message to consulting@gbdirect.co.uk.



Back to top Back to top Brain: a User's Guide Back to top
The Yum HOWTO

This is a draft of a future yum HOWTO, offered up here for private comment by initially Seth and Michael and later by (I imagine) the yum list. If you've found this in the meantime via google, feel free to use it but beware -- it may be openly wrong or misleading in places.

Note Well: This is a Request for Comments only; Use at your own risk, and please return comments and corrections to rgb (or better, the yum mailing list) for encorporation into what I hope to be a dynamic document associated with a project, rather than a static prescription that may, in fact, be heavily flawed.



Back to top
CVS Mini HOWTO

is a tutorial mini-HOWTO on how to set up and use CVS for personal user or simple shared projects. It goes through things step by step with examples. To use it effectively, it is strongly suggested that you try the things illustrated as you go along.

It was written primarily to teach colleagues and students with whom I collaborate the rudiments of CVS, as I use it as an essential component of the management of any project I'm involved in that generates documents and other data objects (such as teaching a course as part of a team, guiding students in research involving programming, writing papers).

The HOWTO appears to fill a community need -- a quick google reveals only one standard-format HOWTO for CVS, and that one is extremely terse and specific to NetBSD. This particular HOWTO is structured as a tutorial that leads you step by step through the basic commands required to set up CVS root repositories for various purposes and use to CVS to manage project archives within those repositories.

It deliberately omits the setup and use of CVS with pserver (web-based access) as this is documented a variety of places and because we do not use this locally (internal to our department LAN) due to security concerns -- vulnerabilities have been found and patched within the last year, and while the are patched, one always worries about whether there are more. In any event, this requires root privileges to set up and manage and therefore almost by definition is advanced usage (in my opinion) and hence is inappropriate for this document.

In other words, this is very much a Getting Started with CVS sort of document -- CVS is very powerful and hence very complex, and given half a chance it will, as they say, eat your meatloaf for you. However to my own direct experience, well over 95% of all CVS usage in a simple LAN environment is encompassed within the commands tutorially demonstrated in this HOWTO. Many users will never need all that power; users that do will need to master the conventional and relatively simple aspects of CVS documented here before using the actual manual and more advanced documentation to learn how to (for example) manage multiple branches of code anyway.

At any rate, if you are interested in learning to use CVS (and you should be, if you work at all with dynamic information of any type whatsoever) you many find this document useful. Feel free to provide me with feedback, but remember that I did not write CVS and am not a CVS developer, so any actual bug reports or feature requests should probably go to the CVS developers via its home page:

http://cvshome.org

There are several additional sources of tutorial information, manuals, and other documentation on the primary website, but unfortunately none are formatted in the familiar HOWTO style and they tend (in my opinion) to be too simple (omitting enough information to be able to set up or participate in a project as a member of a group) or too complex (an exhaustive manual complete with visuals and lots of detail and instructions for Windows as well as Unices). Nevertheless, new CVS users will find it worthwhile to visit this site and quickly browse and bookmark these resources as they will provide additional support for this document when some feature or detail it omits is needed for a project.

Note Well! This mini-HOWTO was developed on a linux system and makes the following assumptions: A Unix-like filesystem with Unix groups and file permissions. Commands such as chgrp, chmod, mkdir and so forth should be available and hopefully familiar to the reader. Although CVS exists for Windows, it is documented elsewhere and I don't use Windows. A reasonably recent version of CVS, in particular one that supports the CVS_RSH environment variable. I'm using 1.11.17 as I write this. If your version is very different be aware that features and options tend to creep and commands illustrated may not work for you. The presence of and familiarity with /usr/bin/ssh (ssh = s(ecure )sh(ell)), configured as you prefer for remote access (with or without a password, for example). Reasonable familiarity with environment variables and how to set them within your shell. I give a few examples here, but make no effort to be thorough. A text editor installed on the system that you are familiar with and that can be set as the default editor for logging cvs changes. I'll try to indicate any others as they occur to me.



Back to top Back to top
random_pvm Demo/Template

random_pvm is a C source demo/template for generating random numbers using a PVM master/slave program. It is derived from the C source project template also available from rgb's website. It was written for a Cluster World Magazine column, and although it is freely available and GPL'd users are encouraged to subscribe to the magazine to get all sorts of other goodies that come with every issue.

random_pvm actually installs from the tarball ONLY. In most cases a program template I write will create a workable rpm, but it isn't really desireable to install this demo in a rootspace /usr/share/pvm3 directory so although the make targets are there (and might even work, although I doubt it) I advise against messing with them.

To build it, create or change to your source directory (I use $HOME/Src but suit yourself), put random_pvm.tgz there and unpack it:

 tar xvfz random_pvm.tgz

(and it should verbosely unpack).

Change to the random_pvm directory. There is a random_pvm.1 man page there that gives instructions on how to build and install and use the program. (Basically stuff like make, make install, and then running the program.) Remember to start pvm and build a virtual machine (instructions NOT included herein) before trying to run the program, and make sure that the random_pvm_slave program is installed in the appropriate place on all the nodes.

If you have any fundamental trouble getting it working, let me know and I'll try to help you. My email address is rgb@phy.duke.edu.



Back to top
Jove: Jonathan's Own Version of Emacs

JOVE: Jonathan's Own Version of Emacs

by Jonathan Payne (not NOT by Robert Brown

jove Version 4.16.0.65


This is a portable/semimaintainable rpm packaging of jove. Jove stands for Jonathan's Own Version of Emacs, and in my opinion it has been the best text editor available for decades (as emacs, its progenitor, has become ever more crack-ridden until I can no longer stand to use it at all even as a stand-in for jove). Jove is, in particular, a really great editor for e.g. C source code, latex source code, and in general source codes that require an invocation of "make" to build internally. It has all the essential features of emacs without losing its attractive sparseness.

Since I use jove exclusively (having done so for getting on 18 years at this point) and since I also use rpm-based systems exclusively and rpm-centric distribution tools such as yum, I need jove to be neatly packaged. The first thing I ever do on a system is go in and install jove so I can work on it. It needs to be cleanly rpm-buildable and (I think) distributed as prebuilt source rpm if not binary rpm for some of the major distributions.

Jove is currently maintained (as far as I can tell) as a tarball-only product within Toronto's CS department. From their base, I've hacked the Makefile, the spec file, and the versions.h file (trivially) as follows:

  • Added sed automagic so that version numbers and build date are set in one place only (the Makefile) and updated to spec file and version.h.
  • Added gpg signatures to the rpm's.
  • Added a set of macros and targets to support the maintenance of this website, including targets for "make tgz", "make rpm", "make yum" and "make installweb". With these targets I can (re)build rpm's for three or four architectures (e.g. RH 9, FC 2, FC 3, i386, x86_64, Centos) and install them in yum repositories by "make yum;make installweb" on each architecture from one set of sources.
  • Cleaned up the specfile in some trivial ways that may not be the best ways to get clean builds on RH/FC-derived rpm systems.

These changes SHOULD NOT affect any other build targets or build processes (with the possible exceptions of the specfile changes, where I don't have enough distribution alternatives to test across all of them). Either way, if you want a repository from which to mirror relatively current signed jove rpm's, yum update jove rpms, grab a tarball of jove that has the above make targets for your own local builds, feel free to use this site.

I'm also willing to provide some debugging support if the rpm's on this site don't work for you or rebuild for you. I have to emphasize the some because I have a lot of projects and as long as jove works for me, I'm happy and may be busy as well as happy. However, if you encounter a bug or just need some help feel free to contact me at rgb at phy.duke.edu.



Back to top
Project Template

project abstract (in html) .



Back to top
PVM Project Template

project_pvm is a C source project template for PVM master/slave projects. It is derived from the C source project template also available from rgb's website and does the usual automagical project maintenance via simple (and not so simple) make targets: make tgz, make rpm, make installweb, make cvs for example. It is worth several hours of work, minimally, in getting a pvm project off the ground, and gets it off on the right foot, with a lot of things you might add "eventually" already there in template form, ready to fill in.

project_pvm actually installs from the tarball ONLY -- or if you prefer, the tarball IS the project template ready to use -- but this page and distribution set is more or less automagically created by the make installweb target, so it seems worthwhile to include the rpm's even if they only install a trivial (cluster parallelized) "Hello World!" program.

To use this template, create or change to your source directory (I use $HOME/Src but suit yourself), put the project_pvm.tgz there and unpack it:

 tar xvfz project_pvm.tgz

(and it should verbosely unpack).

Change to the project directory. There is a README that gives instructions on how to use to the template. Most of the mucky part of the process is encapsulated in a script called "newproject" that you can read to see how it works. To use this script for a new "rgb standard" PVM project, start pvm and build a virtual machine (instructions NOT included herein) and enter:

  • newproject projectname
  • cd ../projectname
  • cvs import -m "initial revision" projectname projectname start
  • cd ..
  • /bin/rm -rf projectname
  • cvs checkout projectname
  • cd projectname
  • make
  • ./projectname
    (and Hello World example should run).

This presumes that you've got CVS setup (a functioning CVSROOT). If you want to use the make rpm targets, additionally you must:

  • Create a barebones rpm build tree. I put all my sources in ~/Src (as in ~/Src/project) and have my own rpm build tree in ~/Src/rpm_tree. This is basically:
       ./rpm_tree
                 \
                 |-BUILD
                 |-RPMS
                 |-SOURCES
                 |-SPECS
                 |-SRPMS
    
    
    (all empty and chmod 755). Note that this is in USERSPACE. You don't need to be root to build root-installed rpm's.
  • Create the file ~/.rpmmacros containing: %_topdir /home/rgb/Src/rpm_tree (where you should put in the path to YOUR Src tree, not mine:-).
  • You'll need to edit the RPM_TOPDIR macro in the Makefile to point to this directory. If I were REALLY good I'd make this a part of the project, but it really is reusable and should be outside the project source tree.
  • Edit the project.spec file and make the obvious changes. This is not intended to be an RPM build tutorial, but the enclosed spec file should be pretty obvious. If you complete the aforementioned two steps before messing too much with the project template, you SHOULD be able to execute "make rpm" and have it crank out both installation rpm's and src rpms totally automagically. Then all you have to do is keep it working with frequent tests while adding new code and modules (which it is presumed you know how to do). Note that it is a GOOD IDEA to "make tgz" after all final edits (including such details as changing version numbers in the Makefile).
  • Complicated dependencies, fancy distributions, and so forth are all up to you. As I said, this isn't a tutorial in rpm building, just a useful small project template.

Optionally edit the man page template, the README, the abstract (this file), the php file, Remember, the man page is your friend. Also remember to update/modify the "Usage" statement in parsecl.c as you add new command line commands, if any.

If you grab this project template and have any fundamental trouble getting it working, let me know and I'll try to help you. My email address is rgb@phy.duke.edu.



Back to top
Latex Project Template

Latex Project Template

by Robert G. Brown (rgb)

This is a reusable template for latex projects. Download the tarball and unpack it with e.g.

  tar xvfz latex.proj-0.4.1.tgz
  cd latex.proj-0.4.1
Place it under e.g. Subversion control so that you can back off any changes you make to it.

Look over the Makefile to familiarize yourself with the various targets. Not all of them are guaranteed to work, and some may not work for you without some editing, at least. Do not modify the Makefile yet, though -- work at first on a copy created as below and then backport the changes to the original Makefile carefully when you are sure they work.

In this directory, run the "newproject" command:

 ./newproject projectname
 cd ../projectname

Place the new project under your choice of version control -- subversion is currently supported by a couple of targets in the Makefile that will require a bit of editing to make work for you but CVS is equally possible with a bit more editing.

Note that newproject SHOULD have changed all the basic filenames around so that they correspond to your project name. Only make changes in the template itself if you want to make them permanent features of a new latex project!

You should be able to type "make" in the new project directory at any time and have it just work to build dvi, pdf, and a4pdf (a pdf for A4 sized paper commonly used in Europe). Bring up the dvi file for preview with a command such as:

 xdvi projectname &
Then start up your favorite editor on the projectname.tex source file. If it is emacs or jove (or any editor that permits you to invoke make from inside the editor) you should be able to make simple changes to the latex source, invoke make via e.g. ^x ^e, and bring the preview screen to the foreground, where it will automatically refresh to the current/new dvi file! This makes the edit, make/debug, view/debug, edit cycle quite painless and generally faster than most latex IDE GUI tools.

Good luck! Feel free to contact me with e.g. bug reports or problems.



Back to top
HOWTO project template

This is a HOWTO project template, using the linuxdoc dtd. It isn't intended to be very complete at the display side, but in the actual compressed tarball there are complete templates for both dtd and linuxdoc straight from the tldp.org website. The wrapping might prove useful as well.



Back to top
LaTeX Manual (online)

This is the online latex manual. It is here for my own use (the Nasa site where I found it can be pretty busy) but it is also freely available to others if you should find it and want to use it here.



Back to top
Home Top Flashcard Program DieHarder Program Benchmaster Program Jove (editor) Program The C Book The Tao of Programming Your Brain: a User's Manual (draft) CVS Mini Howto
C project template PVM project template LaTeX project template HOWTO project template Latex Manual (online) random_pvm demo/template The yum HOWTO (draft) Yum Article Contact About

This page is maintained by Robert G. Brown: rgb@phy.duke.edu