Tuesday, July 20, 2010

The Back Story

The school where I work was getting rid of a 1970 GMF/Fanuc S-Model 1 welding robot that was donated for training purposes by GM 20 years ago. The components either went to the scrap or were free to a good home.

I had been thinking about building a Beowulf Cluster for a while. I had built several before, but they were all ad-hoc or haphazardly planned. The controller cabinet, dubbed “The Fridge”, was a perfect fit for housing a more organized cluster. (See pic below. No, I did not use the three phase transformer in the bottom!)

Anatomy of A Beowulf Cluster

For those not in the know: Beowulf Cluster is essentially many computers acting as one. There is a master node that hands out parts of processes to the various nodes.

The nodes that are not the master are just dumb and headless drones that are PXE booted and just do what they're told. All of the drone nodes are used just for their processing and memory capabilities and nothing else.

If you still don't get it at this point, it is a small-scale and local version of SETI@home, folding@home or countless others.

If you still don't get it: stop reading now and get on with your life.

Considerations

Budget

My budget for this is next to $0. I will be relying on donations for nodes and extra equipment as well as my supply of crap that has accumulated in my basement over the years. IT people out there: you know what I'm talking about.

Amount of Nodes

The amount of nodes that you can have in a Beowulf Cluster depends on how well you plan for them. In my case, I chose the nice, round, binary number of 32. Aim high.

Power

AC power for all of these nodes is rather tricky. Considering the power supplies have an average wattage rating of around 300W, 300W/120V = 2.5A per node at peak draw. Herein lies the tricky part.

With 32 nodes at 2.5A each, worst case scenario is that the whole cluster will draw 80 amps! If I had the facilities to use 240V and cut the amperage in half I would use it in a heartbeat - but that is not a possibility. :(

To get around that, I will have to spread the cluster over 4 20A breakers.

All of the nodes need to have some uniform way of being powered on and also some way to let the operator know that the node is on.

Board Mounting/Isolation

In order to jam as many nodes into The Fridge as possible, I will have to mount them vertically instead of horizontally in order to be more efficient with space. Not only that, I will need some way of isolating the boards from one another in order to prevent short circuits.

Network

Since all of the communications in a Beowulf Cluster is done via Ethernet, a reliable switch will need to be used. In my case, there are a ton (literally and figuratively) of Nortel 24 port 10/100 switches that are free to be put to good use at work. It is important that SPANNING TREE BE TURNED OFF. Otherwise, PXE booting will not work.

Master Node Access

The master node will be mounted in the cabinet along with it's subservient nodes. It will need to have mouse, keyboard, video and possibly Ethernet access to the outside world. I could just run a cable from a second NIC and either RDP or SSH into the master but what would be the fun in that?

Hardware

There wouldn't be software without hardware. But the hardware is much more interesting on this one.

Normally, purpose-built Beowulfs tend to be homogeneous in make-up. Every node is identical in every way possible. This is a low-budget operation. The nodes are whatever I can get my hands on for free.

Just because a computer is freely given away doesn't mean that standards have to lax. My minimum requirements are as follows:

  • Processor: >=1Ghz Pentium 4
  • Memory: >= 1GB (but flexible with this one)
  • On-board Ethernet with PXE capability
  • Video: No on-board video preferred (the shared memory thing)
  • Working Power Supply

Seems reasonable right? I had no idea how many P2s there were left running after all these years. They received a polite "no".

Software

The ultimate goal of assembling this behemoth is to model proteins for cancer and disease research. I'm not 100% sure, but I've heard you can run Folding@Home on a Beowulf just fine.

No use in installing software without an OS to run it. My requirements for an OS were simple.

  1. Installable to a hard drive
  2. DHCP Daemon I don't have to mess with
  3. Driver support for as many different kinds of network cards as possible
  4. PXE server with a PXELinux image set to go
  5. Some sort of cluster monitoring software

I tried ClusterKnoppix, PelicanHPC and ParallelKnoppix. All were disqualified for not offering an installable version, were abandoned projects or were just too cumbersome and unpredictable to set up.

I then stumbled on ABC GNU Linux and chose to use it with the cluster for its simplicity in set-up. It's an added plus that it can be tested on a live CD before having to spend time installing it on a hard drive. It had everything I needed turn-key. It also has a really good cluster/node monitoring utility. In my opinion, the best open source beta I've ever tried.

Construction

This is really a bunch of sub-projects.

Gutting The Fridge/OS Install

The Fridge had one metric assload of old electronics in it. They were removed and either scavenged by electronics gurus or tossed.

In parallel, the OS was installed on the hard drive of the master node. Note: this is the only hard drive in this entire cluster.

Power Control Panel

To solve the issue with having to turn the nodes on from outside of the cabinet, a button/light panel needed to be made. Handily, the Fridge had a 10 gauge steel plate bolted to the front of it that would accommodate 32 buttons and 32 power LEDs.

Really, that was the easy part. Soldering and shrink tubing the tiny ribbon cable wires to the contacts of the buttons and LEDs was very tedious. I can think of many things that are more fun. Being on fire or dysentery for example...

Wire Wrap Motherboard Header Pins

Now that the cabinet is almost ready for nodes, it was time to tackle the issue of the different styles of motherboard connectors for power on and power LED.

Most motherboards have two separate connectors that are pretty uniform. But some manufacturers (I'm looking at you Gateway) have every single front panel connector amalgamated into one giant brick of a connector.

To get around this, my friend suggested wire wrapping. I wrapped wires from the header pins on the motherboard to header pins that I can just plug into a ribbon cable female end.

Shelving

Shelves also needed installed in the Fridge in order to... hell it's obvious.

There was some "T" aluminum that was laying around the office. An adjunct professor was storing it there. He quit and never picked it up. It hadn't been touched except by the occasional tripping foot in 2 years. It made for perfect shelf brackets.

For the shelf platforms, I decided to use pegboard. Not only do I have about a dozen sheets in my garage, but the holes also encourage the flow of air. With all of these machines being in the same cabinet at once, it gets pretty toasty.

Node Installation

With the shelves installed, it was then time to install nodes!

The vertical mounting was done just by leaning the boards against the side of the cabinet and each other. Where the bottoms of two boards met, a section of cardboard was added to isolate them from one another.

Power and Network

I scavenged some power conditioners. Not my first choice, but I'll take what I can get.

The aforementioned Nortel switch.

KVM

I tossed-around several ideas as to how the master node would be mouse, keyboard and video accessible. I thought about having VGA and USB cables sticking out an access hatch. At one point I had a KVM sticking out the top. You can see that one in several of the above photos.

Then my attention was grabbed by an innocuous little hatch on the front door. It used to have a parallel port that was used to communicate with some sort of ancient printer.

I removed the parallel port and elongated the slot it was in by 1/2 inch with a Dremel. I tore the case off the KVM, stuck a block of wood behind it and mounted it to the hatch.

The KVM switch has 2 whips on it. One is going to the master node. The video of the second whip has an AGP video card mounted on it for troubleshooting nodes when they misbehave. The second is set-aside.

Launch/Testing

The startup procedure is simple. Boot the master node completely. Power on node 1. Wait 20 seconds. Power on node 2. Rinse, lather, repeat.

Apparently I wired some of the power LEDs backwards, so they didn't work. Thanks to wire wrap and not soldering, that problem was fixed in minutes.

Reactions:

5 comments:

  1. Hello,

    I'm Iker Castaños, ABC GNU/Linux designer. Next week I will upload the ABC server GNU / Linux an improved version. The start of the nodes is much faster.

    Best regards.

    ReplyDelete
  2. Thank you Mr. Castaños. I really love using your distro. It has made the software part of building a Beowulf Cluster nearly trivial.

    ReplyDelete
  3. Thank you very much. If you want to try the new version please contact with me, abclinuxsupport@gmail.com . The new version is faster that the beta release and has been fixed small bugs.

    Best regards,

    ReplyDelete
  4. This comment has been removed by a blog administrator.

    ReplyDelete
  5. Кeep thіs going pleasе, greаt jоb!


    Ηerе iѕ my wеbpagе :: robbins roofing company

    ReplyDelete

Subscribe to RSS Feed Follow me on Twitter!