Big Red RAID
Dave Schuller with three of the four Big Red RAID servers. The fourth has
already been assigned for duty on the MacCHESS Linux Beowulf cluster
Sirius .
MacCHESS staff recently (February 2002) completed assembly of 4 two terabyte
Linux RAID disk servers in the Big Red RAID project.
Back to Schuller staff page
© Last updated March 2002
Webmeister: David J. Schuller
Design Goals
The point was to build a lot of RAID capacity for not a lot of money.
Performance was a secondary consideration. Therefore we went with ATA disks
instead of SCSI, and did not invest in hot swap drives or power supplies.
The Name
The enclosures are beige. Big Red is the unofficial nickname of
Cornell University athletics.
Hardware & Construction
Parts, per server:
- 1x Amtrade 1200RC cube series enclosure
with 26 5.25" half-height open disk bays + room for
a motherboard and several power supplies.
includes 4x fans, 25 pair mounting rails,
mounting hardware, casters with brakes
- 1x Tyan S2462 Thunder
K7 mother board
- 2x AMD Athlon 1.2 GHz MP CPUs
- 2x Monarch Computer #130955-2
CPU cooling fans
- 2x 256 MB ECC registered DDR PC2100 RAM Micron part# CT3272Y265
- 1x IBM DeskStar
60GXP 20 GB hard disk (for root disk)
- 1x CDROM drive, 56X EIDE for software installation & updates
- 1x NMB SD025A460WSW power supply for Thunder K7
- 1x 300W ATX power supply
- 18x Jameco #117217 3.5/5.25" hard
drive bracket
- 7x custom 3.5/5.25" perforated aluminum disk sleds
constructed by Bill Miller.
- 3x 3Ware
Escalade 7810 storage switch, 8 ATA100 ports,
64 bit 33 MHz PCI, includes 4x power splitter cables and 8x 18" ATA100 cables
- 24x Maxtor 536DX ATA100
5400 RPM 100 GB disks
- 1x Intel
Pro/1000T gigabit copper ethernet NIC
- 12x 3Ware single drive ATA100 cable, 36"
40 pin/80 conductor, pack of 2, 3W-CBL-36
- 2x Jameco #140724 insulation
displacement wire splice, 18-14 ga
- 1x Jameco #34323 plastic drive
mounting rails, pair (for root disk)
- 1x Jameco #147379 ATX power connector
- 2x Adaptec
ACK-68I-68E-LVD SCSI internal to external cable
Construction
Some of the interesting points:
- Two of the eight Athlon CPUs were DOA. We burned a third one in testing.
- The extended ATX motherboards are big. They hung over into the disk bays.
Two of the Amtrade enclosures were an inch shorter than the others, which
exacerbated the problem in those units. We were able to use all the bays
anyway after Bill Miller constructed some custom
disk sleds.
- The motherboard trays came pre-installed in the left side of the enclosures
(as seen from the front). We switched them to the right side to keep all those
ATA cables from blocking airflow to the motherboard.
- The Tyan ThunderK7 motherboard requires a special 460 W power
supply, the connectors differ from the ATX standard. This supply alone was
unable to handle the startup surge from a completed server, so we added a 300 W
ATX power supply to handle half of the RAID disks. We spliced into the green
wire of the NMB power supply (thus the wire splices, ATX power connector, and
some unspecified wire) so that both supplies start up at the same time through
the logic control power switch. I miss the days when computers had real power
switches. Its not a good feeling when you're into a chassis up to your elbows,
and then notice that the fan is still running.
Cabinet views

Front view, doors removed
The 7 lowest drives on the right hand side have the custom
sleds. The root disk is located behind the power switch panel at the top
left.

Front view, with doors

Inside view
Look at that mess of cables! We moved the motherboard to the right side of the
chassis to make more room for the cables coming off the PCI cards. The special
460 W NMB power supply is just above the motherboard, it runs the motherboard,
fans, root disk, CD-ROM and right-side RAID disks. The additional 300 W ATX
power supply (bottom foreground) runs the left-side RAID disks.
Software & Configuration
- For an OS, we used Red Hat Linux 7.1.
Aside from an undeniable Open Source bias on our part, it has good SMP support,
good reliability, excellent software RAID support. It comes with all the
drivers we needed. It also supports NFS, which is how we'll be sharing space
to our Unix systems. Oh yeah, and it's free. The Linux
Software-RAID HOWTO document was very helpful for RAID configuration.
- We used a separate root disk and saved ourselves the trouble of trying to
boot from a RAID volume.
- We considered a number of different RAID configurations. Factors to consider
included the number or disks/controller, the number of controllers, capacity,
reliability and speed. The 3Ware cards can be configured to run RAID levels
0(stripe), 1(mirror), 10(striped mirror?) or 5. Linux software is similarly
flexible, and can also support multiple tiers of RAID. We tried setting up the
3Ware cards with RAID5 and striping across the 3 controllers in software, and
striping across various numbers of disks on the cards with RAID5 in software. We
settled on the latter after seeing the Benchmark results
below.
- The 3Ware cards need to be configured during the boot sequence, so we
hooked up a keyboard and monitor during configuration. The kernel driver for
the 3Ware controller cards comes standard with Red Hat 7.1. Load the 3Ware
module at boot time by adding this line to file /etc/rc.d/rc.sysinit in line
after 'swapon'
/sbin/insmod /lib/modules/2.4.9-12smp/scsi/3w-xxx.o
You could alternatively tailor a kernel to include the module. If the 3Ware
support is running properly, the devices you configured on the controllers
should show up as SCSI devices listed in /proc/scsi/scsi. #Ware distributes a
daemon, 3dmd, to
check up on the controllers and report any problems, but the version available
did not run with recent kernels.
- construct a /etc/raidtab file for your Linux software
RAID configuration. Implement the configuration like so:
mkraid /dev/md0
Check results by looking in the /proc/mdstat file.
- Build the filesystem. We wanted a journaling filesystem, because waiting
for a 2 TB volume to fsck is no fun at all. We used
Reiserfs because it's already integrated
into the Linux kernel and the tools come with Red Hat Linux.
mkreiserfs /dev/md0
- put an entry in /etc/fstab file:
/dev/md0 /RAID reiserfs noauto 0 0
(the noauto flag was just for testing)
- Construct the mount point and mount your new RAID volume:
mkdir /RAID
mount /RAID
df -k /RAID
Filesystem 1k-blocks Used Available Use% Mounted on
/dev/md0 1957031468 32840 1956998628 1% /RAID
- Export the space via NFS, if that's your pleasure. Make sure NFS is
installed and configured, and put an entry in /etc/exports.
Capacity
Each server contains 24x 100 GB disks for a total raw RAID space of 2.4 TB.
After RAID configuration, formatting and assignment of spares, the usable space
is 1.95 TB. Current Linux kernels have a hard limit of 2 TB per filesystem, so
there was no point in trying to squeeze any extra space out of it.
Benchmarks
I found a Linux RPM
version of Bonnie, here
are the numbers with some other systems as well for comparison:
---Sequential Output (nosync)--- ---Sequential Input-- --Rnd Seek-
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --04k (03)-
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
alpha1 1*4096 19303 49.2 31346 17.9 17858 12.1 33897 78.9 48043 18.2 241.2 2.5
alpha2 1*4096 26601 67.4 65757 38.6 41512 29.5 37615 86.5106638 40.7 514.1 6.1
bigrd1 1*2047 17510 100 32501 41.9 10908 8.0 10430 63.5 27525 14.3 116.6 1.0
bigrd2 1*2047 14475 87.9 27616 20.9 21327 12.4 14566 90.1125586 57.3 332.8 3.2
bigrd3 1*2047 15737 97.9103108 93.2 81002 67.4 15455 97.6151118 84.7 353.1 6.3 ****
- alpha1: 72 GB, 10K RPM SCSI disk on Compaq XP1000 Tru64 system
- alpha2: 1+ TB turnkey Ultra160 SCSI RAID5 (8x 7200 RPM disks) system on
Compaq XP1000 Tru64 system
- bigrd1: Big Red server with single 10 GB ATA disk
- bigrd2: Big Red server with RAID5 on each 3Ware card (7 RAID disks, 1
spare) with software RAID0 over the 3 controllers.
- bigrd3: Big Red server; each 3Ware controller running RAID0 striping over 4
pairs of disks and software RAID5 over the resulting 12 devices (11 RAID5
devices, 1 spare)
- chunk size optimised to 64 KB, both on the 3Ware card and in software.
- **** final configuration
- the Alpha/Tru64 machine had >1 GB of RAM, while the Linux box had only 0.5
GB; that's why the memory size for Bonnie testing varies. 2047 is also the
maximum size with the Linux version of Bonnie used.
Commentary
Why we did what we did, and what we would do differently:
- Motherboard: We chose the Tyan ThunderK7 because we like Athlons. At the
time this was the only dual Athlon board available. We wanted dual CPUs for
handling all the I/O requests and doing the RAID5 parity calculations. The
five 64 bit/33 MHz PCI slots are great for getting the most out the disk and
ethernet controllers. The on-board video saved us an extra card during
configuration. On the down side, the size of the motherboard caused us some
grief by extending into the disk bays. We had
considered the possibility of serving the RAID space over one of the on-board
SCSI ports, but have been unable to find out anything about daemons to
accomplish this. Therefore the dual on-board SCSI is going to waste, unless we
tack on a tape drive at some point. The on-board dual 10/100 ethernet is also
a bit of a waste considering we added a gigabit ethernet card anyway.
- The 3Ware 7810 storage switches have the most ATA disks/PCI slot we have
seen. The design would have had to change considerably without these. The 64
bit/33 MHz PCI interface and ATA100 support go nicely with our other parts. The
3Ware "we're discontinuing these, no we're not" act was a bit disconcerting.
There was also a recall on the 7810s; we never did hear from our distributor
about that and we hope it doesn't affect our configuration. The RAID5 write
performance of the 3Ware cards is also disappointing, but we're doing just fine
with running our RAID5 in Linux. Drivers for the 3Ware controllers come
standard with Red Hat Linux, but the 3dmd daemon doesn't seem to work with our
kernel version.
- disks: We bought the highest capacity ATA disks available at the time.
Obviously the technology keeps improving; today we could get either a higher
capacity or higher spindle speed. The 536DX drives are quiet and cool, which
is good. During configuration of the four servers (96 disks) we encountered
only one bad disk. No complaints.
- Enclosure: lots of disks per box. We could have fit the disks in a smaller
box with 3in2 or 5in3 adaptors, but then heat might be an issue. The systems
seem plenty cool with the standard 4 fans plus a fan in each power supply.
Also, the thought of fitting all those ATA cables in a smaller space gives me
the willies. We moved the motherboard from the left side of the cabinet to
the right in order to make more room for all those ATA cables. Say, why
doesn't anyone sell chassis with lots of 3.5" bays instead of 5.25" bays?
- Cables: the 3Ware cards come with a full complement of standard 18" ATA100
single drive cables, but in a box this size about half the disks need something
longer, which is why we ordered the 3Ware 36" single drive cables. All parts
were purchased before any assembly took place, so we went conservative and
ordered too many. Make us an offer! The cabling is a nightmare, can't wait
for serial ATA.
- Performance: The Bonnie test results above lead to a number of conclusions:
- The 24 disks in parallel make for some excellent streaming performance.
- Random seek performance does not match the SCSI RAID competition. Using
disks faster then 5400 RPM might give some improvement here, but speed was not
our primary design goal anyway.
- Running RAID5 on the 3Ware cards slowed down sequential block write and
rewrite performance, but did not greatly affect input performance. RAID5 in
software was faster in every category and about 4x faster in sequential block
write and rewrite.
- Reliability: Hopefully the RAID5 configuration will work for us. We could
have gone with a two-tier RAID5 configuration for extra reliability, or even
RAID10 bullet-proof status. Another possibility would be to break the space
into several smaller volumes (probably a more experienced and sane person would
have done this).
Personnel
Planning and construction were carried out by Dave Schuller and Bill Miller of
MacCHESS.
Back to Top
Back to Schuller staff
page
MacCHESS home page