The Back Story
The school where I work was getting rid of a 1970 GMF/Fanuc S-Model 1 welding robot that was donated for training purposes by GM 20 years ago. The components either went to the scrap or were free to a good home.
I had been thinking about building a Beowulf Cluster for a while. I had built several before, but they were all ad-hoc or haphazardly planned. The controller cabinet, dubbed “The Fridge”, was a perfect fit for housing a more organized cluster. (See pic below. No, I did not use the three phase transformer in the bottom!)
Anatomy of A Beowulf Cluster
For those not in the know: Beowulf Cluster is essentially many computers acting as one. There is a master node that hands out parts of processes to the various nodes.
The nodes that are not the master are just dumb and headless drones that are PXE booted and just do what they're told. All of the drone nodes are used just for their processing and memory capabilities and nothing else.
If you still don't get it: stop reading now and get on with your life.
My budget for this is next to $0. I will be relying on donations for nodes and extra equipment as well as my supply of crap that has accumulated in my basement over the years. IT people out there: you know what I'm talking about.
Amount of Nodes
The amount of nodes that you can have in a Beowulf Cluster depends on how well you plan for them. In my case, I chose the nice, round, binary number of 32. Aim high.
AC power for all of these nodes is rather tricky. Considering the power supplies have an average wattage rating of around 300W, 300W/120V = 2.5A per node at peak draw. Herein lies the tricky part.
With 32 nodes at 2.5A each, worst case scenario is that the whole cluster will draw 80 amps! If I had the facilities to use 240V and cut the amperage in half I would use it in a heartbeat - but that is not a possibility. :(
To get around that, I will have to spread the cluster over 4 20A breakers.
All of the nodes need to have some uniform way of being powered on and also some way to let the operator know that the node is on.
In order to jam as many nodes into The Fridge as possible, I will have to mount them vertically instead of horizontally in order to be more efficient with space. Not only that, I will need some way of isolating the boards from one another in order to prevent short circuits.
Since all of the communications in a Beowulf Cluster is done via Ethernet, a reliable switch will need to be used. In my case, there are a ton (literally and figuratively) of Nortel 24 port 10/100 switches that are free to be put to good use at work. It is important that SPANNING TREE BE TURNED OFF. Otherwise, PXE booting will not work.
Master Node Access
The master node will be mounted in the cabinet along with it's subservient nodes. It will need to have mouse, keyboard, video and possibly Ethernet access to the outside world. I could just run a cable from a second NIC and either RDP or SSH into the master but what would be the fun in that?
There wouldn't be software without hardware. But the hardware is much more interesting on this one.
Normally, purpose-built Beowulfs tend to be homogeneous in make-up. Every node is identical in every way possible. This is a low-budget operation. The nodes are whatever I can get my hands on for free.
Just because a computer is freely given away doesn't mean that standards have to lax. My minimum requirements are as follows:
- Processor: >=1Ghz Pentium 4
- Memory: >= 1GB (but flexible with this one)
- On-board Ethernet with PXE capability
- Video: No on-board video preferred (the shared memory thing)
- Working Power Supply
Seems reasonable right? I had no idea how many P2s there were left running after all these years. They received a polite "no".
The ultimate goal of assembling this behemoth is to model proteins for cancer and disease research. I'm not 100% sure, but I've heard you can run Folding@Home on a Beowulf just fine.
No use in installing software without an OS to run it. My requirements for an OS were simple.
- Installable to a hard drive
- DHCP Daemon I don't have to mess with
- Driver support for as many different kinds of network cards as possible
- PXE server with a PXELinux image set to go
- Some sort of cluster monitoring software
I then stumbled on ABC GNU Linux and chose to use it with the cluster for its simplicity in set-up. It's an added plus that it can be tested on a live CD before having to spend time installing it on a hard drive. It had everything I needed turn-key. It also has a really good cluster/node monitoring utility. In my opinion, the best open source beta I've ever tried.
This is really a bunch of sub-projects.
Gutting The Fridge/OS Install
The Fridge had one metric assload of old electronics in it. They were removed and either scavenged by electronics gurus or tossed.
In parallel, the OS was installed on the hard drive of the master node. Note: this is the only hard drive in this entire cluster.
Power Control Panel
To solve the issue with having to turn the nodes on from outside of the cabinet, a button/light panel needed to be made. Handily, the Fridge had a 10 gauge steel plate bolted to the front of it that would accommodate 32 buttons and 32 power LEDs.
Really, that was the easy part. Soldering and shrink tubing the tiny ribbon cable wires to the contacts of the buttons and LEDs was very tedious. I can think of many things that are more fun. Being on fire or dysentery for example...
Wire Wrap Motherboard Header Pins
Now that the cabinet is almost ready for nodes, it was time to tackle the issue of the different styles of motherboard connectors for power on and power LED.
Most motherboards have two separate connectors that are pretty uniform. But some manufacturers (I'm looking at you Gateway) have every single front panel connector amalgamated into one giant brick of a connector.
To get around this, my friend suggested wire wrapping. I wrapped wires from the header pins on the motherboard to header pins that I can just plug into a ribbon cable female end.
Shelves also needed installed in the Fridge in order to... hell it's obvious.
There was some "T" aluminum that was laying around the office. An adjunct professor was storing it there. He quit and never picked it up. It hadn't been touched except by the occasional tripping foot in 2 years. It made for perfect shelf brackets.
For the shelf platforms, I decided to use pegboard. Not only do I have about a dozen sheets in my garage, but the holes also encourage the flow of air. With all of these machines being in the same cabinet at once, it gets pretty toasty.
The Fridge looks like a bread cart with just the aluminum brackets in place. Master node is on the top shelf.
With the shelves installed, it was then time to install nodes!
The vertical mounting was done just by leaning the boards against the side of the cabinet and each other. Where the bottoms of two boards met, a section of cardboard was added to isolate them from one another.
Power and Network
I scavenged some power conditioners. Not my first choice, but I'll take what I can get.
The aforementioned Nortel switch.
I tossed-around several ideas as to how the master node would be mouse, keyboard and video accessible. I thought about having VGA and USB cables sticking out an access hatch. At one point I had a KVM sticking out the top. You can see that one in several of the above photos.
Then my attention was grabbed by an innocuous little hatch on the front door. It used to have a parallel port that was used to communicate with some sort of ancient printer.
I removed the parallel port and elongated the slot it was in by 1/2 inch with a Dremel. I tore the case off the KVM, stuck a block of wood behind it and mounted it to the hatch.
The KVM switch has 2 whips on it. One is going to the master node. The video of the second whip has an AGP video card mounted on it for troubleshooting nodes when they misbehave. The second is set-aside.
The startup procedure is simple. Boot the master node completely. Power on node 1. Wait 20 seconds. Power on node 2. Rinse, lather, repeat.
Apparently I wired some of the power LEDs backwards, so they didn't work. Thanks to wire wrap and not soldering, that problem was fixed in minutes.