next up previous contents
Next: Building ``Workstation''-like Nodes Up: Building and Maintaining a Previous: Room for your Beowulf   Contents

Power and Cooling for your Beowulf

Once you have located a space that is big enough and convenient to the administrators and/or users11.5 and that can hold all the systems you plan to put there without overstressing the physical structure, it is time to think about electrical power and air conditioning. You must think about the two together, because the amount of one you require determines the amount of the other you require.

A rule of thumb to use in estimating your power requirements is to assume 100 Watts per (Intel) node. This is only a rule of thumb - if you get dual CPU nodes with all the memory they can hold and a big power supply and add a big, fast disk and a CD ROM and four network cards and a video card and an extra fan, you might need twice that. Certain alpha nodes tend to be very power hungry as well - rumor has it that an XP1000 draws around 250 Watts, and although I haven't measured it on ours, it is a true fact that the air the blow out is twenty degrees or so warmer than the air that they draw in when operating. On the other hand a ``stripped'' diskless node might end up only drawing 50 or 60 Watts or even less.

Once again, if you are only planning on building a ``small'' beowulf (less than or equal to 16 nodes) you don't have to worry too much about power as most homes and businesses have circuits that can provide 1500-2000 Watts (15-20 Amps at 120 Volts) without blowing a fuse or breaker. Obviously you should check (thinking about other things, like a monitor and room lights, that might also be on the circuit) but you are likely OK.

With any more nodes than this, you are likely to need multiple circuits. You will also very likely need to have the room wired to obtain them, as (unless it is already a computer equipment room or a ex-machine shop or something) most rooms don't have multiple circuits already installed - they can actually be a bit dangerous in a home where somebody might mistakenly assume that because the lamp went off when they switched off a breaker it means that the next receptacle over is actually dead.

Anticipating that some of the folks who read this are expectant hobbyists or amateurs when it comes to electrical engineering, it seems like a good idea to learn a bit about electricity at this point. After all, electricity is one of those areas where what you don't know can kill you. Fairly easily, actually. Be Scared.

Let's discuss How Electricity Gets Around11.6.

Electricity is typically delivered to your home or office as a 60 cycle per second (Hertz or Hz) alternating voltage from a step-down transformer (from a much higher voltage) outside the building. In general, it will come in as either two-phase (home) or three-phase (Y,office) with each phase at 120 (rms) AC volts above ground. By the time this 120 VAC gets to where it is going to be used, it often has dropped to only 110 VAC because of the resistance of the distribution wires, hence its generic name of 110 VAC.

To get a 110-120 VAC circuit, one connects a line with one of the phases through a fuse or circuit breaker to the black wire of a standard cable. The white wire is connected to a grounded stake or sometimes the plumbing. The bare copper (ground) wire is also connected to the grounded stake, but should never be used to deliberately carry current according to most electrical codes.

To get a 240 VAC circuit, one runs one 120 VAC phase on the red wire, and the opposite phase (of a two phase supply) on the black wire of appropriate cabling. Both are colored to indicate that either wire will provide 120 VAC with respect to white (current carrying ground) or copper (safe ground) or with respect to your delicate and easily damaged human body in contact with just about anything connected to the ground. So don't touch them if there is the faintest chance that they are ``hot''. Don't touch the white current carrying wire either - under certain circumstances it can carry enough voltage to kill you.

Kill you? Did I just say that? I did. Electricity is very dangerous and will kill you in a heartbeat11.7. Electricity can also start fires very easily, and fires can also kill you dead. The best way to get your beowulf's space wired is by a certified professional who knows your local codes and is a lot less likely to come up with something that produces a blast of sparks when the breaker is thrown.

If you have three phase (Y or Wye, which is fairly commonly provided to businesses or industries but not common in homes) electricity, you can get a ``sort of'' 240 volt circuit out of it by running between any two of the three phases. The phase difference is only 120$^\circ$ instead of 180$^\circ$ so one ends up with only 208 VAC or so between the wires. This is enough to run most 240 VAC devices simply because the manufacturers aren't fools and know that Y/Wye supplies are fairly common. This is also true for a lot of computer equipment that requires 240 VAC (like some racks or uninterruptible power supplies (UPS) or some big-iron computers).

The thickness of the wires used to distribute the electricity and the length of the run from the primary distribution panel dictate how much current you can safely pull through a circuit. As a general rule (according to most local codes), 14 (for up to 15 amps) or 12 gauge wire (for up to 20 amps) is used in household dwellings to move electricity up to 100 feet. 10 gauge carries up to 30 amps (for e.g. air conditioners or the like). 8 gauge up to 40 amps. The smaller the gauge, the thicker the wire, the more it can carry without getting too hot. To go farther than 100 feet, one typically goes up a size (or more) of wire.

From this you can see that if you have a ``large'' beowulf, you will almost certainly need multiple circuits in the room (typically 20 amps each) and in many cases these circuits will have different phases. This means that if you are foolish enough to connect a black wire from one circuit to the black wire of another circuit, you could be basically shorting out 208-240 VAC. Amazingly enough, this happens (sometimes inside racks or computers that have more than one plug that manufacturers somehow assumed would always be plugged into the same circuit) with predictably spectacular results. This is just one of many reasons to have reliable fuses or circuit breakers in each and every line.

Once the electricity has made it to the room, there is no real difference between installing a bunch of receptacles in the wall for each circuit or just one or two and plugging power strips (with appropriately heavy gauge supply wires) into them, and the latter is likely more scalable and convenient. Just don't overload the circuits themselves and avoid thin extension cords and the like. Electricity ``likes'' to run over nice, fat wires and really hates it when it's squeezed down into a thin, scrawny wire. It responds by making those thin wires hot, which wastes energy, drops the voltage at the appliance, and can be dangerous.

You may want to think about uninterruptible power supplies (UPS) and power conditioning. In my area, the power goes off fairly frequently for tiny little times like ten seconds. This is just enough to cause all of your kitchen clocks and coffee makers to reset, and is plenty long enough to hard-crash your computer(s) as well, which is most annoying if you've been running a calculation for a day or two (or longer!) and have to start over. Almost any kind of UPS can keep a computer up through these short outages.

More expensive UPS can provide a degree of power conditioning and surge protection, which is also useful when you have many nodes and want maximal hardware reliability. Some of them also have other clever or desirable features, like the ability to control them and cycle the power remotely via a serial port connection or the like. This can sometimes save one a trip into the cluster in the middle of the night or can allow you to reboot while on your ski vacation in Europe, if that sort of thing is worth it to you (the bells and whistles aren't cheap).

So fine, you've got your space, it has room, the floor will hold all your systems (and you and your desk and your stereo), you've got electricians running one 20 Amp circuit in for each 16 nodes (or thereabouts). There's just one last major problem to worry about. You're delivering a lot of power to the room to run all those machines. When they're done with all that energy, they give it up as heat. Every watt that goes in to your computer room has to come out in a steady state.

Believe me here, I'm a physicist. Think of your 16 node beowulf as a 1600 watt space heater or 16 100 Watt light bulbs, and you won't go far wrong. 1600 watts is the rate at which energy is being delivered into the room11.8. If you don't remove all that energy at the same rate, it will build up. As it builds up, the room will get hotter and hotter until the temperature difference between the inside of the room and the outside of the room is big enough to drive all the heat out through the walls.

This may or may not happen before all your computers melt or catch on fire and turn into an expensive little puddle of metal and epoxy. Or just break, which is actually more likely but not as impressive. The former can happen, though - as you may discover the hard way if you are foolish enough to put 128 nodes (or approximately 13,000 watts) into a small, closed room with no kind of thermal kill switch and the air conditioning fails.

Once again, most rooms in most houses or office buildings can probably handle as many as eight nodes with their existing air conditioning arrangements. In my house, for example, my office gets a bit warm during the summer with five nodes (two with monitors) for around 700 watts, plus a couple of lights (150 watts more) plus a couple of warm bodies (200 watts more). 1000 watts in a 10 foot square room with a door and the house air conditioning set in the low seventies keeps the office temperature in the high seventies, but I can live with that and so can my nodes.

Sixteen nodes, of course, would be intolerable unless I added a window air conditioning unit (or unless I spread them out throughout the house). Once again, you'll have to work this out for however many nodes you plan to have, but if you have more than a very few nodes you must work it out.

A useful True Fact is that air conditioning is usually bought in ``tons'', but any sane measurement of power being delivered to a room will be in watts (or maybe kilowatts). So, MaryLou, what's Ton? A ton of air conditioning removes enough heat to melt a ton of ice at the melting point (0$^\circ$ C) in 24 hours. To calculate the power this represents is a pain in the butt, however straightforward11.9 and the result is that one tone of air conditioning can remove almost exactly 3500 watts continuously from a room.

So, in an ideal universe we could run perhaps 32 nodes per ton of available air conditioning (to stay a bit on the safe side). A 128 node beowulf might need four tons of air conditioning (depending on the actual power required by the nodes, which may well vary). However, reality might well be less than ideal - if your machine room is considerably cooler than its ambient surroundings, or has a large sunny window, or has a lot of electric lights, you may not be far enough on the safe side. Heat can flow in to the room from any of these sources and 1 square meter of sunny window can let a lot of heat into a room on a hot and sunny day.

I'm tempted to expound on the additional power needed by all that air conditioning, but that depends on the efficiency of your air conditioning unit and the temperature of the outside air and all that. A reasonable estimate is that you'll have to buy a watt of air conditioning power for every three to five watts of power consumed in your beowulf. Ahhh, physics. A wonderful thing.

Let me remind you one last time that if there is any chance at all that your air conditioning can shut down while your computers are still operating and they are not in a large room with plenty of circulation, you should think seriously about some sort of thermal kill switch. Computer hardware breaks or even catches on fire if it gets hot enough, and I can tell you from bitter experience that the temperature in a smallish closed room (in an otherwise cool building) will go up to well over 100$^\circ$ Farenheit in a remarkably short time if there is more than a kilowatt being released inside with no ventilation or air conditioning. The temperature inside the cases will be considerably higher, and the temperature of the CPU and memory chips and hard drives higher still.

We're now done with the serious stuff. I'll wrap up this section by reminding you to think about other kinds of infrastructure that you might want to provide for your beowulf room if it is in some sort of organization; fiber or copper lines to your organization LAN switch or router, for example, or connections to printing facilities. A phone (or two) is often nice, possibly equipped with a modem and terminal or network server if you plan on managing remotely (as in from someplace network-inaccessible).

Finally, you may want to think about physically securing the location. You've just built a pile of PC's that (however cheap the nodes) is worth thousands, possibly hundreds of thousands of dollars. It would be a shame if you came in after a weekend to discover that an entrepreneur with a pickup truck had disassembled and made off with a large chunk of them.

I wish that I could say that this is very unlikely, but we've had computers stolen (including one high end beowulf node) from just outside our beowulf room, which is itself located on a low-traffic hall inside a generally locked building with a carded lot. We're likely going to move our beowulf room to new digs on a NO traffic corridor that you have to have a building map to find. So think about locks, traffic patterns, access both day and night, and don't make it too easy for an ``entrepreneur'' to make off with your hard-earned nodes and support hardware.

The answer, fortunately, is that it is not difficult at all to build, and once built and configured, it is extremely easy (and cheap!) to maintain. Linux (or at least some sort of Unix) expertise is obviously very useful, but most linux distributions fully support generic cluster computing ``out of the box''. The most difficult single things to master are how to implement a scalable installation mechanism for your cluster (or LAN), and how to largely automate software maintenance for your cluster (or LAN) so that you do work once, and it is automatically applied to all the nodes (or workstations) you manage.

Why do I keep putting down nodes (or workstations)? Hmmm, good question! I suppose the answer is that from one point of view a generic compute cluster can be thought of as a LAN consisting of specialized workstations. In particular, workstations with no X or GUI installed, that indeed might not even have video and a keyboard installed at all, that are missing sound and games and office tools and a whole lot of user applications, but that do have compilers and other development tools, a wide range of application and development libraries, specialized libraries and toolsets for supporting e.g. PVM or MPI computations, and perhaps some specialized node monitoring daemons or batch job management tools installed.

Nearly all of this could equally be installed on a workstation, and if you run cluster nodes in your workstation LAN, you are very likely (and wise) to go ahead and install all the cluster tools but perhaps the batch schedulers on your workstations as well, so that the only difference between a workstation and a cluster node is that most ``desktop user'' interactive/user interface components are missing on the latter.

Note that this is not the strategy adopted by the ``true beowulf'' package builders11.10, who install custom kernels and tools to make cluster nodes look like ``CPUS'' in a big multiprocessor system with a unified PID space and transparent job distribution and management. In this latter approach, nodes are not workstations, and you can't ``log into a node'' any more than one can ``log into a CPU'' on a MP system.

This suggests that it is time for a pretty fundamental split in the discussion. All those who want to build a beowulfish cluster on top of their existing LAN, integrated with and possibly even transparently including their desktop workstations, creating nodes that are basically specialized, particularly simple workstations (that one can log into to run jobs or do whatever you like, just as one could a workstation) please move one full pace to the left. Unless, of course, you happen to be sitting down, or moving to the left would cause you to fall off of a tall building and die, can't have that.

All the rest of you, who want none of this ``workstation cluster'' crap and want to build a beowulf, pure and simple, similarly step to the right, if only metaphorically. Wishy washy ones can stay where they are and read both of the following sections to figure out which one they might be, or might become, and how.


next up previous contents
Next: Building ``Workstation''-like Nodes Up: Building and Maintaining a Previous: Room for your Beowulf   Contents
Robert G. Brown 2003-05-12