DEIMOS Information Management -- Section 9.7 Additional Resources

9.7: Additional Resources Needed

$Id: needed.html,v 1.6 1996/03/13 22:28:22 de Exp de $

This section describes resources (software, hardware, and human) required in addition to those inheritable or pre-existing resources described in Chaper 9.6.

Software Resources

There are two software cost elements: the cost of acquiring licensed software, and the cost of developing and integrating custom software. The development costs are shown in Section 12 (unified schedule and budget). Here we will consider only the purchase price of commercial software.

The only major commercial component of our specification is one RDBMS server license. RDBMS are priced according to a "number of concurrent users" rubric, and in our case I would say this number can be kept small. A license on the order of 8-16 users would probably meet our forseeable needs. A Sybase license of this size costs UC sites about $6K at this time.

Obviously whichever engine is chosen must run on one of the two standard Unix platforms chosen for the DEIMOS project: a SparcStation running Solaris or a Dec Alpha running OSF. Both Oracle and Sybase support these operating systems. (See below for recommended hardware configuration.)

I am not including any cost for commercial software development tools. Although there are many (expensive) GUI tools available for schema design and implementation, to date I have found them counterproductive rather than helpful for the experienced RDBMS hacker. I don't feel that any additional commercial software is required to complete the project. While there are certain costs associated with avoiding commercial software, my experience over the last decade has indicated that in many cases, the costs associated with adopting commercial software are far higher.

I therefore assume about $6000 as the cost of acquiring additional commercial software for the information management component. This assumes that we need to build a Sybase server for the Keck-II computer room, to support DEIMOS, and that the "offsite server" which offers the public archive, etc. is one of the existing UCO/Lick servers. If we assume that the "archival" server described in the text is not at UCO/Lick, then an additional server license is required for that machine, raising the total software acquisition cost to $12000.

Hardware Resources

Assuming for the moment that we decide to use a Sybase RDBMS (for various reasons of convenience, see Section 9.6). We know enough about Sybase to make some useful predictions about performance and platform requirements for the database engine. Given that the RDBMS is an essential logging and archival tool, it wants to run on a host which is secure and relatively immune to radical load average shifts, reboots, crashes, etc. A non-login machine not directly connected to any experimental or custom hardware would be a good choice.

The host which supports the database server should be located in the Keck-II computer room with the rest of the machines which directly support the observing process. The database server will be used to store (log) operational data during the night, as well as provide information for the observer or for the rest of the observing software. It should be considered an integral part of the observing software/hardware, and co-located with its peer machines which perform instrument, telescope, and dome control.

Configuration

We must obviously choose a standard platform, which for this project means either a Sparc running Solaris or a DEC Alpha running OSF. We then should estimate the class of machine (memory, disk, speed) required, to get an idea of the cost.

64MB of main memory is a good median configuration for a Sybase server. Data space is reconfigurable after server installation and startup, but we would want to start with a reasonable disk configuration (enough space for at least a year's worth of operation). If we wish to do volume mirroring, then we need to duplicate the partitions we choose to mirror, on another spindle. For example, one of my Sybase servers has a 500MB data partition on a 2GB drive; this partition is mirrored to a 500MB drive on the same machine. The rest of the data on the 2GB drive are either non-Sybase or non-mirrored. If we choose not to mirror any partitions, then only one spindle is really required (though performance improvements could be realized by using multiple smaller spindles).

One Server or Two?

We should decide whether there is just one Sybase server which handles logging, interactions with the observing software, and also random queries to the public portion of the archive, or whether there is a "production" server handling public queries, using a downloaded copy of data from the "critical" server at the telescope which is protected from outside access altogether. I strongly recommed a 2-server model, mostly because it is easier to ensure security, good performance, and uptime if no random outside connections are permitted to the critical machine.

A 2-server model also ensures a working backup of the data, and the 2nd server should preferably not be at the same site. For example, a Sybase server at UCO/Lick might offer the public portion of the data archive via WWW pages, getting fresh data daily from the private server on Mauna Kea.

Backups

Local backups of the archived data must also be made, in a format which permits fast recovery of the critical server should it suffer catastrophic media failure. Tools exist which make complete ASCII, human-readable backups of Sybase databases from which recovery of entirely destroyed servers can be done in a matter of a few hours. The backup files go to normal filesystem space, whence they can be backed up again to tape media. Both the critical server and the production server should be backed up at their respective sites. Backups should be done to local hard disk if possible (which might entail an increase in the disk configuration as the datastore grows).

Disk Space: Goals and Requirements

We should consider our hardware requirements in the light of our three major goals: to preserve

physical component data,
operational data, and
acquired data.

The slitmask library and operational (logged) data represent only a modest problem of volume and accumulation. The image header data likewise do not represent a real challenge in terms of storage space. It is the images themselves (as discussed in Appendix D (9.10.D), not stored in the database, which pose the real problem of storage space and access time. The data actually stored in the RDBMS represent only a few hundred megabytes per year. Some maintenance and re-indexing may be needed to ensure rapid access to the data as the accumulated record grows, but these tasks can be at least partially automated.

Given that these tables are unlikely to exceed a mega-record in a couple of years, I don't see a call for sophisticated multi- processor architecture or other expensive high-performance CPU power. SCSI-II disk speed would help to improve response time, but otherwise no state-of-the-art or specialized hardware is required for this fairly basic application.

Location

The database server hardware should be situated in the computer room with the rest of the telescope control computers, not at the end of a long and vulnerable link to some other site. Failure of network connectivity to the database engine could have a perceptible impact on the observer, forcing manual processes which could slow down observing. Failure of network connectivity could result in lost log information as well, damaging the historical record which we are trying to preserve. The database server wants to be on the local network with the rest of the instrument and telescope control equipment.

In summary, no particularly "heavy" hardware must be acquired to support the RDBMS portion of the software specification. An older sparc2 with 64MB memory and 2GB dedicated data disk would probably meet our needs for at least the first 2 years of operation. This machine could also serve a second purpose, if that second purpose did not compromise security and/or uptime. We would be wise, of course, to overspecify slightly (4GB of disk and a sparc5, for example).

Approximate Hardware Costs and Alternatives

Suppose we purchased a sparc5 ($6000) and a 4GB disk ($2000) for this component; the hardware acquisition cost would be approximately $8000. This would provide a platform adequate to the tasks we have assigned to our database server (the Sybase license, at $6000, increases the total "server platform" price tag to $14000).

If we were to "make do and mend" by using the existing Sybase license on Mauna Kea, building a server out of miscellaneous used parts, etc., we could probably reduce this cost to no more than the price of the disk drive; however, we'd have to examine what other functions were required of the existing Sybase server and whether those requirements conflicted with the restrictions recommended above. It might not be practical to relocate the "Remedy" server, or other considerations might prohibit this "penny pinching" strategy.

An economical suggestion:
It is possible that functions could be combined so that the database server host was also the designated host for some other low-level integral DEIMOS function. This function would have to be non-login, and involve no unpredictable and/or sudden load changes or interruptions of uptime. It should also not consume so much memory as to compete heavily with the database engine and drive the host into swapping. If such functions can be identified, sharing the hardware would be practical and even desirable.

A more luxurious suggestion:
It's been suggested (D. Koo) that two hosts should be constructed, one which is primarily a database server platform and can in case of emergency serve some other basic machine control function; the other is primarily a machine control or other low-level service host, but can function as a database server. This would provide a rapid recovery path should either host suffer hardware failure; however, it involves doubled hardware costs and some maintenance overhead.

If we assume that the "public" data server is not at UCO/Lick, then we might have to build an additional server for the public archive. In that case it might be wise to construct a twin of the DEIMOS database engine, for an additional $14000. However, given the existence and present underutilization of the Lick science database server, it seems reasonable to assume that for some fairly lengthy initial period the public archive could be managed and served from that machine. The additional cost then would be $2000 or so for additional disk space for the Lick server, rather than $14000 for an entire system.

If we assume that a sizable archive of acquired images is to be offered to the public, there are hardware costs associated with the jukebox system needed to manage the extensive CDROM library (see Appendix D (9.10.D)). The approximate cost today of a 500-disc jukebox is on the order of $15000. However, the image library will not achieve this size immediately, and smaller jukeboxes in the $6000 range could probably be used initially.

Human Resources

The specifications and requirements listed above for hardware, software, and functionality imply further requirements for, or impacts upon, staff time and policy. This document will attempt to outline reasonable limits or guidelines for these kinds of costs.

The specifications call for

one database server in the computer room (primary)
one database server off the Mountain (secondary)
one WWW server off the Mountain (talks to #2)
a fairly large number of meta data items which require human key entry (cannot be captured via telemetry) -- but update is very infrequent for these items
an extensive event and condition logging system which might be applicable to Keck-II more generally than just for DEIMOS support
date-sensitive access control to acquired data
observer control over the archiving of certain kinds of data

System and Service Management

The existence of three information servers, two RDBMS and one Webserver, implies management and maintenance of each of those servers (and hosts) by qualified people.

The local (primary) database server is integral to the observing process; observing can proceed in its absence, but quick-look reduction might be less automated or slower than with the database functioning properly. Loss of engineering information for one or more nights could impede our efforts to diagnose or even describe instrument problems.

The maintenance and management of this host and the database software are therefore critical to the perceived success of the instrument. The obvious question that arises is, "Who (which institution, Lick/CARA/Keck) will provide the qualified staff to handle this?"

The secondary DB server and WWW server, though having no impact at all on the observing process, need competent management and maintenance to preserve continuous access to the published body of data. Unavailability or corrupted data will make a bad impression, impairing the public image of Lick and Keck. If astronomers come to rely on the availability of these data for their research, then research efforts might also be obstructed or delayed if these machines and software engines are not kept in good shape. Some institution must commit to doing so.

The current design calls for many meta-data points which are not accessible via telemetry. We must discover whether any of these are logged to (e.g.) Keck's existing "Remedy" system, from which they could be automatically incorporated into the DEIMOS logs. If not, we must consider the staff time and degree of cooperation needed if Keck personnel are to enter reliable, accurate data. Typical of this class of meta-data are mirror re-aluminizing dates/times, optical alignment date/times, etc.

If Keck personnel are assigned to support the logging system by key entry of critical maintenance and repair events, then Keck (the institution) may well feel that this system should serve interests more general than "just DEIMOS". In that case, the logging/monitoring component of the DEIMOS software might be running every night, not just when DEIMOS was in use, and other instrument systems might want to retrieve/store data using the server. Issues of access control, security, and authority immediately arise: whose software, under what circumstances, is to have which kinds of access to which tables when?

In general, information services flourish best when a single individual is personally responsible for the server. (A deputy is of course required during that person's absences). Joint management is rarely successful, unless the joint managers work exceptionally well together and/or share an office (or are otherwise conveniently accessible to one another). Joint management usually results, over time, in inconsistent interpretations of policy and other forms of "tripping over each other" which can have serious impacts on the service provided.

Close cooperation is also required between the systems (OS) management personnel of these 3 machines and the info services managers/maintainers.

Access Control and Observer Preferences

Not all issues of configuration and data access can reside with software engineers. Issues of confidentiality (which involves slit mask designs, spectral and image frames, and even guider frames) can only be determined by the individual observer.

There is some implication beyond the design of software to permit the observer appropriate choices for publication of data; someone, probably the manager of the information service, is responsible for maintaining and periodically testing access control. Observers should feel confident that unauthorized access is prohibited, and that shareable central data archives are not a threat to their research programs. Security and access control require on-going responsible management.

In summary of these points: One identifiable, well-qualified individual should be responsible for each of the information service "nexi" posited in the project requirements. One person could conceivably manage all three (2 DBs and a WWW). However, three people should not manage one nexus. There should be very close cooperation between these managers and the system management of each machine, if indeed they are not the same person. Disorganization or incompetence in this matter could be expensive and/or very visible.

It seems sensible to propose, as a solution to the first set of human resource requirements, that Lick should take responsibility for the information management system for some number of months or years after commissioning of the instrument -- the same number of months or years during which Lick is responsible for repairs, design adjustments, and other direct support of DEIMOS. After the end of this period, either Keck/CARA provides the staff resources to continue the information management program, or Lick continues it for Lick's own purposes, or it is terminated: written off as part of the instrument commissioning and engineering process, now obsolete.

An inherent contradiction lurks in this support plan vs. the implementation schedule for the science data archive. Most data taken with DEIMOS will be considered private by the observers until some period, probably about 2 years, has elapsed. Not until that time will the publishing of DEIMOS science data begin; in other words, the moment when the data archive finally starts to become valuable and visible is about the same moment when Lick support for the instrument has faded away and the info management system is at most risk of being discontinued.

If there is any commitment to the archiving and public offering of data taken with DEIMOS, then that commitment must include some staff time allocated to handling, labelling and transport of CDROM media, and librarian/operator functions for the large jukebox system, as well as the system/database management functions described above. Some agreement should be worked out between the institutions involved, as to who will meet this minor but on-going cost.

de@ucolick.org
webmaster@ucolick.org
De Clarke
UCO/Lick Observatory
University of California