Bringing your data home
(AKA The rsync fanboy page)

In which the author conveys a few facts and spouts a lot of opinions on methods for getting your data home from MacCHESS.
Last update 2008-03

Modern equipment at high-powered synchrotrons can generate huge amounts of data. With 1 second exposure times our Quantum 210 (A1, F2) and Quantum 270 (F1) detectors can stream 11 frames/minute in binned mode(8MB files). In unbinned mode(32 MB files), a set of 180 images can take up over 5.7 GB. If you want to save the raw ( *.imx_0 pre-dark current correction) files you can double that number. (I personally don't recommend this, but some people do it and they have their reasons). Many of our users end up with 50 - 100 GB of data in a few days. You probably want to take the images home with you in case there are uncertainties about the indexing or you want to tweak the data processing. There are a variety of methods for getting your data home.


Bring your own computer

By bringing your own computer, presumably a laptop, to MacCHESS, you can exercise a lot of control over how you store your data. Recent laptops come with bigger and bigger disks, and many have FireWire and USB2.0 connections suitable for external disks.

See the section below on network data transfer for suggestions on copying the data to your own computer.


Portable disks

This seems to be the up-and-coming method for data transport. It is great for you for a number of reasons:


Its great for us too; PCI cards for USB2.0 and FireWire are a lot cheaper than tape drives.

MacCHESS can handle a variety of device interfaces and filesystems, especially since the introduction of Linux boxes at each of our beamlines:

My recommendations:


Here's a Unix/Linux TAR command which will copy an entire directory from one location to another, while preserving file creation times:
tar cBf - olddir | (cd newdir && tar xBf -)
Here's a comparable command using RSYNC which has some notable advantages:
rsync -av olddir newdir
See the network section for more details on the RSYNC command.

A word of caution: many FireWire and USB2.0 disks are sold as 'external' devices rather than 'portable', and may not be sufficiently ruggedized for portable use. Since you're going to use them anyway, you can do a few things to help assure the reliability of your data:


Network transfer

Some groups have taken to transferring their data home via the Internet. Your contentment with this will probably depend on the speed of connections between Cornell and your home lab. CHESS has an internal gigabit ethernet network, and our connection to the Cornell backbone is also gigabit, but then there's all those connections between here and your home lab that may vary considerably in speed. It takes longer than you think, and don't put it off until the last hour or two.

Cornell is a member of the National LambdaRail (NLR) consortium, which operates a nationwide high-speed optical network. Cornell provides NLR connections to several New York and New England institutions through NorthEast LambdaRail (NeLR).

CHESS Firewall
CHESS uses a firewall for network security. For connections from CHESS, any network protocol is OK. Consider requirements on the other end of your connection, especially if a firewall is in use there as well.
For connection to CHESS, only the ssh protocol is allowed in, and only from certain addresses. If you need to initiate connections from outside the CHESS firewall, contact your MacCHESS staff scientist so the appropriate arrangements can be made.

RSYNC
RSYNC is a Unix/Linux utility for backing up directory systems. It is well thought out and has a number of advantages over other options. If rsync is available to you, I recommend using it.

Here's an example script with options I would recommend, to be run on the MacCHESS machine:
#!/bin/csh -f
#
rsync -avz --delete --exclude "*.imx_0" /A1a/jones/xtal1 jones@myhost.mynet.edu:macc/xtal1
Alternatively, you can run RSYNC from your own host. This will be preferable if your host uses DHCP and therefore does not have a static IP address. If either the source or target disks are over a network link, RSYNC will prompt for the remote account password.
The FAT filesystem is limited in several respects: file creation times have low resolution and it is not possible to set owner and group ID. Different options on the rsync command can forestall warning messages. This script worked well on Macintosh and Linux computers with a FAT32 disk:
#!/bin/csh -f
#
rsync -rltv --modify-window=1 --exclude "*.imx_0" /A1a/smith /media/disk1/chess_040812/.

The FAT filesystem has another limitation: apparently it has trouble differentiating between upper and lower case. (The FAT filesystem dates back to the days of DOS.) All characters in file names will be converted to upper case on the FAT filesystem (which will be translated to lower case when read by Linux). This means that Rsync may not recognize your files as being the same on both disks. It might then proceed to delete all affected files on the FAT disk and recopy them. This defeats the benefit of incremental backup that makes rsync so great.
Here's a repeat of a typical rsync command to copy files from one place to another on the same computer. You could do this if the data disk was mounted via NFS:
rsync -av olddir newdir

!! CAUTION !! In the rsync command, the source comes first and the destination last. Do not confuse the two and wipe out your files before they have been backed up. We cannot recover lost files. Our RAID systems use the XFS file system.

FTP
FTP is available on most platforms, even Windows.

SCP
SCP is a file transfer program which uses the SSH network protocol. It is widely available on Unix/Linux computers, and ports exist for Windows as well (e.g. WinSCP). SCP does not have the sophisticated features of RSYNC, so investigate that option first.


Tape

The section on tape backup has been moved to a separate page to shorten this one. With the retirement of our Alpha workstations, we have fewer tape drives plugged in, so if you want to use tape, please make a request in advance.

Optical

MacCHESS has a quite a few DVD burners on Macintosh and Linux machines. A single-sided single-layer DVD holds < 5 GB, so I have trouble taking optical media seriously for synchrotron backup. K3b is a popular DVD-burning front-end application on the Linux machines. Bring your own media or get directions to the local shops; our stockroom does not carry them. If you encounter any difficulties in burning DVDs consult your MacCHESS staff scientist.

Blu-ray has finally won out over HD-DVD as the higher density optical format for pre-recorded media. Burners are still somewhat expensive and not common for data backup. If you use Blu-ray and would like to see a drive available at MacCHESS, let us know and we'll look into it.

Back to Schuller staff page
MacCHESS home page