Image Data Storage Media for DEIMOS

Sept 14 2000

De Clarke, UCO/Lick Observatory


Executive Summary

Observers will want to take DEIMOS images home with them for reduction. The size of these images poses a serious challenge to current removable storage media technology. We are currently leaning towards SCA removable hot-swap SCSI drives as the best compromise between cost, capacity, speed, convenience, and robustness. A detailed comparison of several currently available storage technologies follows.

Introduction

DEIMOS images will run about 140MB for spectra, 70MB for direct frames. Using lossless fcompress, we may get a factor of 2 to 3 compression; it may be overoptimistic to assume anything much better than 2.

These are the largest images being produced by any Keck instrument, as well as the most complicated FITS files we have yet written.

Ever since the project's inception we have been worried about the "media problem." On what media are we going to write these images for backup/archival storage? And how is the observer going to bring the data back to the home institution for reduction?

At both reviews (PDR and CDR) we expressed an optimism that has not been substantiated in reality. We thought that new developments in optical or magneto-optical storage, using CDROM sized media, would "catch up with" our needs by the time commissioning drew near. While there are exciting things going on in this sector, they are not happening as fast as we would have liked. We're left, realistically, with a set of choices each of which is unsatisfactory.

These are some trade-offs you may want to keep in mind when considering current storage technologies. It is impossible (today) to "win" on all these counts. Most media will only look good on two or three points out of this list.
Media Cost(and are those media re-usable?)
Reader Cost(cost to observer of drive to read media
at home institution)
Writer Cost(cost to project or to CARA to write media
in Waimea)
Write Speed(how long to copy a run's data to media?)
Read Speed/
Access
(random access for disc type media, linear
for tape; how long to access an image?)
Media
Density
(number of media needed to hold 1 run's worth
of FITS image files)
Media
Longevity
(how long will they last physically? how long
will the technology last?)
Media
Convenience
(size/weight/fragility)
Media
Openness
(how proprietary is the medium/format?)

Size of the Data Set vs Current Storage Technologies

There are two problems which we may want to address separately. One is the archiving or backup of acquired data at Keck, the process that is currently achieved using Exabyte tapes and STB software. DEIMOS images pose a challenge to this system, but this document is not primarily concerned with this side of the problem. What we are considering here is the secure transport of observed data to the observer's home institution. How does the astronomer take these very large DEIMOS datasets back home?

For conservative estimating, we'll say that an image is 140 MB, and that fcompress could reduce this to 70 MB. We could fit 100 images in 7GB of storage. If we assume a night is about 150 images -- counting all calibration exposures as well as science data -- then we could say a night is about 10GB.

The average run is probably 2 or 3 nights. We guess the astronomer will want to take home something like 20 to 30 GB of data from the run. Again to be conservative, we might call "one run" 30 GB.

Thirty gigabytes is not going to be such a depressingly large number for very much longer; there are a couple of very high-density storage technologies in development which may be of great interest to us. One (C3D) will not get to beta test for another six months or more. It is well behind the original schedule mentioned in press releases 2 years ago. Another effort (electron beam writing) seems further along in the prototyping process, but will not be commercially available until 2002. Both of these technologies promise over 100GB on a CDROM-sized plastic medium. C3D promises to be low-cost and 140GB per medium. The electron beam technology will probably be expensive (it's aimed at large governmental and corporate customers, not the consumer market) but it claims 200 GB per medium and a very stable, multi-decade, archival end product.

For our immediate needs, both these technologies are out of the question. Their only relevance is the ominous likelihood that whatever medium we select now will become obsolete and undesirable sometime in the next 2 years, because one of these technologies or a new competitor will be offering a large density improvement. In other words, we are likely to want to throw away whatever we are doing now, after only a couple of years of use. The impermanence of this situation colours our choice of technology.

We are looking, therefore, for a bandaid -- a bandaid of moderate cost, yet reasonable efficacy. If it's too expensive, we'll regret it later when the new technologies mature. But if it's too feeble, we'll waste thousands of dollars compensating for its inadequacy while we wait for something better to come along.

A Digression: Network Transfer?

One potential transport mechanism that we haven't been considering very seriously is "Internet2". We might ask whether it is possible to upload the observed data via TCP/IP to the observer's home site.

The theoretical max bandwidth we could ever get, as of today, is 35 Mb/s (the size of the pipe between Oahu and Hawai'i). But all the astronomy traffic from Big Island is on that link, so we are not going to get it all to ourselves! Let's take an optimistic scenario. Suppose we have 10 GB to move each day, and we get something close to ethernet spec: 10Mb/s (1MB/sec). At that rate, it would take 10,000 sec to transport the images, or 160 minutes. (We'd be hogging a greedy share of the Oahu link for more than 2 hours, but for the moment let's ignore any potential political issues.)

That is not very competitive with most removable media storage alternatives. But there's worse news: we have every reason to doubt that in practise we could get that kind of transfer rate. Why?

Without special TCP/IP driver tuning (i.e. on any stock workstation) we will never even get close to the ethernet spec of 10Mb/sec over such a long round trip. The propagation delay is about 100 ms from anywhere in the contiguous 48 states to the Big Island. This delay combines with the default 8K window size for TCP/IP to limit the actual transfer rate (to any unmodified workstation) to about 10 windows per second, or 80K/sec. To move 1 night's worth (10GB) of data at this rate would take over 1.5 days.

Obviously, if it takes 1.5 days (under ideal conditions) to move 1 night of observing data, we will never catch up. We could try to establish specially tuned workstations at various home institution; but the special IP tuning is an expert process involving kernel parameters, root privilege, etc. It would be lost at each OS upgrade. It seems unreasonable to expect that specially hacked "data upload" workstations will be reliably maintained at observers' home institutions.

There are other drawbacks to this method: the observer's home institution would have to make 10GB per night of disc space available for the data, or the transfer will fail; the transfer is vulnerable to routing failures, power outages, and other network mishaps along the way; the transfer must be initiated from inside Keck, which means the observer will not be able to retry it after having returned home and detected a problem.

For all these reasons we feel we can go on ignoring "network upload" as a practical method of bringing DEIMOS data home.


Physical Media Transfer (aka SneakerNet) Options

What is more usual and customary today than network transfer, is for the observer to carry the data home from the mountain. In the old days this was done using 8-inch floppy discs, DECtapes, 9-track tapes; more recently, we've used Exabytes and DAT tapes.

There is a psychological appeal to this method: the observer has a physical copy of the data in a tangible, securable form. Drawbacks are obvious: it's not always possible to verify the medium perfectly; any error in format or content is very unpleasant to discover after getting home again; media have to be transported (on airplanes), raising questions of weight, fragility, packaging, immunity to dust and dirt, etc. A major cost, in some cases, occurs when N observers at N institutions all have to buy their own expensive media readers for some new media standard imposed by the instrument.

These are some currently available media options.

Tape Drives

ExabyteThe new Exabyte Mammoth-2 claims a capacity of 60 GB per cartridge.
Drive Price$4000
Write Speed12 MB/sec
60GB uncompressed in 1.5 hrs
2 runs' worth of data, no operator intervention
BusSCSI
MediumExatape AME
$92 for one full-length cartridge
$1.50/GB
Ecrix VXAthe Ecrix VXA-1 drive claims a capacity of 33 GB per cartridge
Drive Price$1000
Write Speed3MB/sec
(est) 30 GB uncomprssed in 1.5 hrs
1 run, no operator intervention
BusSCSI
MediumEcrix VXA cartridge
$80 each
$2.66/GB
DLTcurrent DLT drives claim a capacity of 40 GB per cartridge
Drive Price$2000-$4000
Write Speed1.5MB/sec
40 GB in 2 hours
1 run, no operator intervention
BusSCSI
MediumDLT helical-scan TK-style monohub cartridge
$65 each
$1.62/GB
Sony 8mmSony's latest high-density 8mm drive claims a capacity of 25 GB per cartridge
Drive Price$2200
Write Speed3MB/sec
over an hour to write 25GB
might not quite fit one run; operator intervention>
BusSCSI
Medium170m AME 8mm;
not sure if Exatapes would work
specs are hard to get
Sony's link is broken

Tape Drive Summary

All the cartridge tape solutions have some characteristics in common:

BUT

Disc Drives

In this category we'll include both optical and magnetic media; they are all spinning-platter removable disc media.
Big Opticalold-style platters claim 15-30GB
Drive PriceStill waiting for quotes; expensive;
$5000-$10,000 guesstimate
Write Speed2.7MB/sec
an hour or so to write 15GB
2 hrs to write a run's worth
operator intervention if 15GB media
BusSCSI
Medium12 inch optical platter
$50 each
$3.33/GB if 15GB media
DVDcurrent media claim 4.7GB/side
only one-sided drives/media seem to be available
format plagued with contending standards
Drive Pricewriter $4500-$5000
reader $250
Write Speed11.08Mb/sec (similar to CDROM x1)
about 1.1 MB/sec
1 hour to write 1 4.7GB disc
5 or 6 discs for 1 run; operator intervention
Buswriter SCSI
reader various: SCSI, IDE, firewire, USB, etc
Medium5.25in plastic DVD
$40 each
$8.50/GB
SCA
SCSI hotswap
these are standard SCSI hard drives packaged in a carrier with a handle; they can be "hot swapped" in and out of a SCSI bay (single or multiple)
a reasonable capacity today would be 36GB per disc
Drive Pricebase unit $350-$400
Write Speedsales claim: 40MB/sec
actual limit about 25MB/sec
30GB in 1200sec, or 20 min
sub-10ms random access times
BusSCSI (fast wide, Ultra, etc)
Medium5.25in magnetic multiplatter
$400-$500 with carrier
$11/GB
PCMCIA minithese are standard 2.5in IDE half-height disc assemblies packaged in compact portable units with attached PCMCIA card. they can be read on any laptop having a PCMCIA port and configured with a recent Linux or Wi/NT release.
today's capacities stop around 18GB
Drive Pricelaptop with PCMCIA slot
wide price range
Write Speed16.6MB/seci
12ms random access times
BusPCMCIA
Medium2.5in IDE "wrapped" in PCMCIA adaptor
Handmade: 18GB drive $460, adapter $100, total $560
Prepackaged with flip-card: $700
$31/GB to $38/GB

Disc Summary

The big optical laserdisc style platters score very badly on portability and drive cost; there is also the fear that they are already obsolete or very close to obsolescence. They seem to be very proprietary technology; each vendor has a format, media standard, etc. of its own.

DVD-R is unique in that the investment required for *reading* is very low, so the cost to N observers' home sites to read DVDs is very small compared to most of our other solutions. However, it's not dense enough, and incredibly slow to write (slower than most tape drives). Worse, the standards wars are still raging. On the positive side, DVD media are small, light, and tough; they resist impacts and fingerprints quite well.

With either DVD or large MO disks, the question of standards and format longevity is a vexed one. The DVD format wars may be won soon by some format other than DVD-R. Or another technology, as mentioned above, may overtake both media and cast them into the pit of obsolescence almost overnight. We would then feel we had wasted any investment made in them.

SCSI discs, though they may be considered "small" in capacity after a year or two, can continue to be useful. Most sites have at least one or two unix workstations with SCSI capability, so an external SCA "cage" with one or more bays is a small incremental investment -- not quite as cheap as a DVD reader, but far cheaper than most tape drives. The removable drives themselves may be seen as overpriced or undersized after a couple of years, but they will remain compatible with large numbers of workstations.

The SCA discs are in the borderland of "convenience" -- they are a little too heavy, fragile, and large to be really "portable technology" by today's standards; however, they can be fitted into one corner of a suitcase, or a backpack.

The winner for pure portability would be the mini IDE discs, which literally fit in a shirt pocket (and are packaged specially for this kind of transport). Even the 5-6 DVDs needed to hold a run's-worth of data can be transported fairly easily if stored in tyvek sleeves rather than jewelboxes.

The SCA plug/play discs are heavier and larger than any other single medium, but their capacity is very large and they win hands-down on read and write speed and ease of access for data reduction. Also, they are the least proprietary or "exotic" of the technologies listed here.


Conclusions

At present we are inclined to favour the SCA SCSI removable disc drives. The cost to acquire multiple writers is small. Indeed the "bay" or "cage" for these devices is the cheapest *writer* of the collection. The cost per medium is high, but not much higher than DVD, and because a single medium will hold a run's-worth of data, there is no labour involved in babysitting the copying of data, switching media, etc. The medium can be reused -- hopefully *after* archival copying is done to some slower (probably tape) medium at the home site.

The SCA discs likewise do not require an expensive *reader*. And if we were to abandon this format, the discs can be taken out of their carriers and mounted internally in PCs or workstations for ordinary use; very little investment is lost if we change media, because these discs are more or less generic and can be recycled into other applications.

The winner for sheer convenience would be the PCMCIA-based mini-IDE discs, if it were not for the very high cost and not quite enough capacity to count on getting a whole run onto one disc. Being able to take one's data home in a shirt pocket is very attractive, but possibly needing to buy two of a $500-700 medium is nasty.

There are some concerns about transporting the SCA style discs in a suitcase, through airports, etc. One possible solution would be to FedEx them from Waimea to the home institution; another would be the adoption of a standard ATA approved carrier.

Now, if we return briefly to the "other problem" of backup/archival storage, we have to admit that disc-based solutions beg the question of CARA's need to make nightly backups of observed data. Obviously CARA cannot go on stockpiling $400 discs forever.

The archive/backup problem is not really DEIMOS-specific and can't be addressed by any one instrument project. However, the size of our images does exacerbate the problem considerably, so we should make some kind of suggestion or recommendation for dealing with it.

Our recommendation today would probably have to be the Exabyte Mammoth, due to its high density, decent speed (for a tape drive) and excellent reviews in the trade press. Should DEIMOS present CARA with a Mammoth drive, to aid in the backup of large DEIMOS images? We should discuss this along with the other issues raised above.