Practical Applications of a Relational FITS Keyword Database

De Clarke, UCO / Lick Observatory


SLIDE 1 Introduce self and institution

Using a centralized database or information service to document FITS keywords is not a new idea. However, in previous incarnations of this concept that I've heard of, the object has either been purely documentary or single-purpose (as in STB, for example).

In the course of our design and conceptual preparation for building the DEIMOS multi-slit spectrograph for Keck-2, the virtue of a central online database for documenting our new and existing FITS keywords was evident. However, it rapidly became obvious that such a database had many practical applications beyond simply providing an authoritative shared repository of our keywords. In other words, the database of keywords is neither purely documentary nor single purpose, but can serve many practical purposes at once.

SLIDE 1 Since these talks are necessarily very brief, I've put more detailed information online at this URL -- there you will find development history, schema, examples, demos, and so on. What I would like to do here and now is show some of the output products from our keyword database, and describe some of the functions it can fulfill for our project.

First, HTML formatted documentation: by keyword name, by keyword provenance, by header name, etc. As noted, a live demo is online at the URL. SLIDE 3

Second, "fake headers" for documentary or test use: we can generate synthetic headers that look just like the real thing, but contain semi-random data (constrained to real minima and maxima where possible, of course). SLIDE 4 We can use these to test FITS reader/writer code, among other things.

Third, C source data structures for the storage of values read in from a target header: this can be very tedious code to write, and it's handy to be able to generate it automatically. SLIDE 5 We expect to be able to auto-generate other useful source code, and quite probably to configure and control pipeline reduction, with the aid of our database in general and the FITS keyword dictionary in particular.

Fourth, C source FITSIO code (this is a writer; we're still working on the reader). Again, this is tedious code and quite vulnerable to error. SLIDE 6 Being able to regenerate it in seconds, whenever a header definition or keyword usage changes, may save us time and money. I should say that these last two tools are "quite beta" at this point, and there are unresolved issues in the application namespace; but our progress so far is encouraging.

Fifth, the ability to define and re-use sub-groups of keywords so as to simplify and explicitly document the structure of FITS headers; here is a graphical representation of the structure of a MOS_Catalog header. SLIDE 7 We can make explicit in our database many complex relationships among FITS keywords, including structure, effective typedefs, index control, etc.

Sixth, the ability to sanity-check the entire FITS keyword library, checking for namespace collisions, syntax and usage violations, etc. SLIDE 8

Seventh, the ability to transform FITS headers and table extensions into database definition language, and vice versa, to turn database tables into FITS table extensions for transport and portability. SLIDE 9 this is a Sybase table turned directly into a FITS table extension. Also the ability to rotate FITS headers into FITS table extensions, so that it becomes fairly trivial to make a table extension containing all the headers for an entire run's worth of images. We can, by the way, also inload data from any defined FITS table extension into a dynamically-created Sybase table, without human intervention. We already have the code that produces C structs for storing these rotated headers, which can then be manipulated by the application as first-class objects.

This keyword dictionary looks a lot like a data dictionary, for those of you who have done time in the large database applications world; and as it turned out, it was trivial to use the same schema to define database tables. Keywords are equivalent to fields, headers look like database tables, and so forth. We can now use one central database to document not only keyword development for our project, but also database schema development in support of the project. The data dictionary in fact documents itself: SLIDE 10 and can itself be turned into a FITS table extension SLIDE 11

Our dictionary can describe information elements which end up in FITS headers and also those which don't; what seemed to me the obvious next step was to extend the database with tables describing information-handling agents, such that we could document (and generate graphical representations of) the flow of information through our software and hardware system. SLIDE 12 This documentation will serve for critical reviews and delivered paper documents, but will be online, dynamic, and (we hope) will continue to be useful long after the original paper becomes obsolete. Particularly, the ability to attach URLs to our memes offers many opportunities for online, up-to-date documentation.

We hope to use the information flow database online after instrument deployment, as a diagnostic tool and a user resource. For example, here I ask the question: what information is handled by a software module called "scrap"? SLIDE 13 To be able to generate this diagram in a matter of seconds, online, could be very useful when trying to predict the consequences of changing or shutting down processes, scripts, hardware units, etc.

I'd like to thank the free software developers without whose excellent products we'd never have got this much done in the last three months. SLIDE 14

We do have some ambitions in the direction of widening our FITS keyword database to document keyword use from other institutions, and offering all the tools shown above freely via WWW or other means. If anyone is interested in this wider application of our work please talk to me or Steve Allen at any time during the conference, or write to either of us SLIDE 1

de@ucolick.org