Brahma: The Duke Physics Department's Beowulf Compute Cluster
Brahma is the Duke University Physics Department's beowulf-style parallel supercomputer cluster. More properly, Brahma is at this point a collection of distinct clusters of various vintage, supported by various groups. It can still be viewed, loosely, as a single cluster by virtue of the fact that the computational resources represented are liberally shared between the various contributing research groups so that little of the resource is wasted in an idle state.
Brahma was funded by the Army Research Office as part of its long running support for the research of Brown and Ciftan. It was first "commissioned" in 1996, when its first systems were purchased and configured as a "distributed parallel supercomputer". The original systems used were dual Pentium Pro 200MHz CPUs, interconnected by (then very new) 100 Mbps switched ethernet. A more detailed history or photo tour of Brahma is available.
Although Brahma was not by any means the first linux-based PC cluster, it was one of the first, and in particular one of the first to be based on dual CPU systems (it ran the brand new linux 2.0.0 kernel out of Slackware on the original dual CPU nodes, which was the first "production" kernel that could support SMP operation). Consequently a fair bit of effort was expended just in dealing with SMP issues, especially with networking, or endeavoring to get the Adaptec SCSI bus to work correctly.
As time passed, more systems were added to the cluster, including systems belonging to other theoretical physics research groups interested in doing large scale computations, notably Berndt Mueller's groups studying quark-gluon plasmas with a variety of means. At the same time the physics department, which up to this time had been heavily invested in Sun hardware running SunOS or Silicon Graphics hardware running Irix, started implementing considerably less expensive commodity hardware linux systems throughout the department. At first this was done retaining a core of Sun servers, but as it became increasingly clear that Linux-based servers were as stable and efficient if not more stable and efficient than Solaris-based servers (especially in a department scale LAN operation) the department gradually shifted over into a linux-only configuration.
This, and an Intel Equipment grant received by the University in general with the Duke Physics Department as one of the main participants, permitted Brahma to scale up to well over 50 processors by around 2000. This was "small" by the standards of the largest research clusters, but nevertheless the cluster has been in nearly continuous use from its inception in 1996 until today, with only very limited periods of idle time or signficant instability. As a consequence, a lot of research computation was performed, at a very high ratio of benefit to cost.
Duke Physics and the Brahma project has participated heavily on the Beowulf list since shortly after its inception, at one point even housing a mirror of the primary beowulf website when confusion at NASA Goddard over funding and ownership issues temporarily shut it down. In the earlier years it also participated heavily in linux SMP kernel development, housing a mirror of the Linux SMP FAQ written by David Mentre until it was finally converted into a proper HOWTO and moved onto the Linux Documentation Project website.
The Brahma website is also well-known as a resource site for would-be beowulfers or linux-based cluster supercomputer builders. It houses the only free (modified Gnu Open Publication License) Book on Beowulf Design, by Robert G. Brown (who designed and constructed the original Brahma cluster and all its various descendants and relatives in the department). It houses a variety of other cluster resources, including software links, talks on parallel computing with commodity off-the-shelf (COTS) compute clusters, a list of at least some vendors who market COTS components useful to cluster computer builders or turnkey clusters or the like, and finally a collection of useful cross-reference URL links to other web-based resources that have been found useful in our own cluster computing efforts.
At this point Brahma has grown tremendously. It now consists of the old (soon to be retired) second generation nodes from the Intel grant as well as third generation nodes from the Intel grant (the original Pentium Pros are out of service). To these systems that are still the brahma cluster, there have been added the ganesh cluster (our first cluster of Athlons), the rama cluster, the champ cluster, the nano cluster, and soon we expect to add a small shiva cluster.
All in all, the Brahma cluster has well over 150 CPUs and counting (mostly in dual configurations of both Intel and Athlon processors) in almost continuous use, housed in a dedicated cluster/server room contributed by the University to help support the many participating grant-funded research projects that use the Brahma resource. This room also houses the physics department's servers and several clusters belonging to other departments on the campus, as can be seen on the Duke Beowulf User's Group website.
At this point, Brahma is only one of many clusters on Duke's campus, which are only a very few of the many, many clusters in the United States, which in turn are only a fraction of the clusters in active use all over the world. From an idea shared by handful of visionaries at Oak Ridge, at Nasa-Goddard, at a few Universities, beowulfs and compute clusters have come of age. At this point, it is beyond any that more high-performance computing compute cycles are provided and used on linux-based compute clusters than on all other technologies combined, with the margin widening every day and with no possible competition on the horizon. A beowulf (or beowulf-like open-source-based compute cluster or GRID) simply is the most cost-effective way to cheaply obtain compute power for parallelizable problems ranging from the simple to the very complex indeed.