DEIMOS Information Systems Component

Software Resources Required

$Id: software.html,v 1.5 1996/03/12 19:34:41 de Exp de $
In order to manage, and offer open access to, the volume and variety of data described in these specifications, something more powerful than ASCII files and shell scripts (or C programs) is indicated. The most stable and well-understood currently available technology is the relational database management system (RDBMS) using some superset of ANSI SQL for data retrieval.

Consult the Glossary for explication of any specialized terms and abbreviations used in this document.

Commercial Software

Only a small handful of vendors offer full-featured RDBMS: Oracle, Sybase, and Ingres are the major players. Other vendors probably have too little market share to be a safe investment. Sybase is widely accepted in scientific research institutions as the database engine of choice for mid-size applications, whereas Oracle has a strong presence in financial and administrative, large-scale systems. Ingres has less market share than the other two of the Big Three.

Sybase has an attractive contractual agreement with the UC system, which makes it possible for us to acquire Sybase software at only 20% of list price. The Sybase product is also technically more advanced than the Oracle engine (the Oracle RDBMS suffers from a few basic design/implementation flaws, such as its inability to use multiple indices properly in query plan optimization).

Lick already has three years' experience with Sybase servers used both for science and administration. The existing body of expertise, tools, etc. makes the ramp-up cost of development using Sybase lower than the cost of a different product.

For the purposes of this specification, therefore, I will assume that Sybase is a good choice; but we should bear in mind that a schema designed according to good relational principles can be implemented using any RDBMS engine, with greater or lesser difficulty depending on the degree to which that manufacturer's features happen to suit the application. We do have some freedom of choice here.

Free Software

There is a free RDBMS, called Postgres-95, which perhaps should be considered as an alternative to the commercial products. I will install and test this engine, but I have a couple of reservations about it. As far as I know it does not support the use of raw devices for data storage, but only filesystem files. In the case of "live" (frequently updated) tables this can expose the application to the risk of lost data due to system crashes (if no sync is done to flush pending changes to the filesystem). I doubt that a freeware product like Postgres supports volume mirroring and transaction logging (I'll verify these assumptions soon), which also protect against catastrophic hardware/software failure.

Given the extremely dynamic nature of many of the tables in our proposed schema, I would advise the use of an RDBMS with good error recovery and volume mirroring features. Although this application can't be equated to a true OLTP (online transaction processing) app, it is lively enough that loss of a few log entries could reduce our ability to make effective use of the surviving data. Unless Postgres provides good transaction logging and other recovery features, I would have to advise against using it for critical dynamic data. It would be more applicable in that case for static "library" data.

Licensing, Platform, Tools

RDBMS are priced according to a "number of concurrent users" rubric, and in our case I would say this number can be kept small. A license on the order of 8-16 users would probably meet our forseeable needs. A Sybase license of this size costs UC sites about $6K.

Obviously whichever engine is chosen must run on a standard Unix platform such as a SparcStation running Solaris or a Dec Alpha running OSF. Both Oracle and Sybase support these operating systems. See recommended hardware specifications.

Although there are many costly GUI tools available for schema design and implementation, to date I have found them counterproductive rather than helpful for the experienced RDBMS hacker. I don't feel that any additional commercial software is required to complete the project. While there are certain costs associated with avoiding commercial software, my experience over the last decade has indicated that in many cases, the costs associated with adopting commercial software are far higher.

)Software Written for DEIMOS Project

Several modules or utilities will need to be written, as part of the project, to gather data from various sources and store it in the DB. Other modules will (for example) write out standard FITS tables based on data from the DB, or inload data from standard FITS tables to the DB. Other modules will perform sanity-checks on the contents of the DB, or generate standard status reports on a daily basis using the DB contents. All these modules can be written either in C or Tcl, using either Oracle or Sybase.

At this time the most flexible, popular, and successful interface to databases for public query is the WWW. We have had success using WWW query pages as a front end to various "databased" information, from the campus phone book to standard star catalogs. I feel that these tools and methods will evolve and continue to be the correct approach when we come to offer public DEIMOS data to the Net.


de@ucolick.org
webmaster@ucolick.org
De Clarke
UCO/Lick Observatory
University of California