FITS keywords attributes

Next: Producing Fake Data Up: History of Development (keywords/memes) Previous: History of Development (keywords/memes)

FITS keywords attributes

The Memes database began as a single table, a catalog of FITS keywords. At that time it was cognizant only of the obvious attributes of a FITS keyword:

name
fortran format
units
min/max values
null/default values
semantics (a couple of sentences at most)

To these we immediately added

mid (the unique ID of a meme, for database internal reference and to provide a single reliable primarykey for the table)
context (for disambiguating FITS namespace)
alt_name (for compatibility with KTL systems which have long as well as short keywords names)
access (again for KTL, whose keywords are Read, Write, or Read/Write)
URI (for linking the keyword with docs, source code, web sites, or any other additional information needed to explicate the semantics)
C format
Sybase type (host data storage type)

It soon became obvious that to store such keywords as NAXIS1, NAXIS2 as though they were legitimate keywords in their own right was foolish; they were both clearly instances of a general keyword NAXISn, whose appearance was controlled by NAXIS. We introduced

ctrl_mid (the ID of a keyword which is a counter and controls the appearance of an indexed keyword)
index_format (the format of the integer index value(s) in an indexed keyword)
iscounter (an explicit indication that a keyword serves as a counter and controls the appearance of indexed keywords)

and our keyword database became cleaner. (For the finished version of the schema, see a href=Mschema.html Memes Schema /a . This document will continue to discuss the evolution of the schema.)

Some keywords, have not one simple value but rather a list of values, or a complex value, in which the elements of the list have differing semantics and should be separately documented. We began by asserting an attribute

parent_mid

which permitted sub-elements of a keyword (such as month, day, year in a formatted date) to be owned by the parent keyword. This model turned out to be insufficiently flexible, and we ended up with an auxiliary table of Meme bundles, in which a parent (tmid or table mid) was associated with several child (emid or element mid) memes. This very powerful and flexible strategy permitted us to model these complex keywords (which we called tuples, FITS headers, FITS table extensions, Sybase tables, and recurring groupings of FITS keywords such as the mandatory table extension header block. We rescinded

parent_mid

and later converted the field to isa_mid (see below) to model a different relationship between memes.

There was another flavour of keyword to describe: enumerated keywords, those whose values can only be one of a pre-determined list. We had to establish an additional auxiliary table, Meme-values, in which we could specify the list of permitted values for any keyword ID, and the semantics of each of those values (i.e., when the value is C2, that means...) Our schema was almost complete:

Memes (describes the memes themselves)
Mvalues (describes lists of permitted values for specific memes)
Mbundles (describes groupings of memes)

We now had to deal with the case in which a keyword value is complex AND the number of elements in the value is controlled by a counter such as NAXIS. For example, the Keck keyword BINNING consists of a list of integer values separated by commas. There will be NAXIS of these values. The standard (IRAF-legible) expression of an Array Section (rectangular n-dimensional section of image data) consists of N bracketed triples of image coordinates, where N is NAXIS. Bias sections, Trim sections, etc. are all NAXIS instances of this triple.

We introduced several new attributes to handle this:

delimiters (the character(s) which delimit a group)
separator (the character(s) which separate one repetition of the group from another)
group_format (the format of a repeating group)
repeat_mid (the ID of the keyword whose value controls the number of repetitions of the group)

So for example, a Bias section has a delimiter of [ (the matching delimiter can be inferred), a separator of ,, a group format (for the moment) of %s, and its repeat_mid is the ID of NAXIS. A Bias section for an image where NAXIS=3, in other words, will look like

'[something],[something],[something]'

(we know it is surrounded by quotes because its FORTRAN format is A68). How will we determine the internal structure of something? See below. We are now fairly advanced: we can document the general case of these complex keywords, and predict how to parse them later in real instances.

We now came to a point where it was evident that the same semantics was appearing in different formats (due to vagaries of development for different instruments or at different institutions; HIRES headers express PARANG, for example, as a quoted string), or where different fine semantics was being expressed in the same general format (as in the difference between a Bias and a Trim section). We needed some way to re-use either the semantics or the syntax. We settled for a known relational concept: ISA. ISA is the relationship A is a B. It is many-to-one; any A can only be one B, though several As may all be the same B.

The ISA relationship was exactly what we needed to implement alternate formats. For example, PARANG in the context HIRES1 now ISA PARANG in the context KECK1DCS. The KECK1DCS version is implicitly the right, canonical, or (as we began to call it) Platonic version of the keywords. The HIRES1 version is a variant or idiosyncratic instance of it. In this case, the Platonic or ideal PARANG has the FORTRAN format F13.8. The HIRES1 version has the format A9. The semantics are identical; HIRES headers simply express it oddly.

The ISA relationship also offered us something very like type definitions. We were now able to express the commonality between Bias sections and Trim sections: both of these ISA SubscriptTriple. A SubscriptTriple has the format %d:%d:%d, and expresses a single 1-dimensional extent of image data. By knowing that Bias Section ISA SubscriptTriple, we now know what to expect when we need to read one:

'[%d:%d:%d]:[%d:%d:%d],...'

where there are NAXIS occurrences of the triple.

The ISA concept lastly allowed us to ``re-use'' semantics in cases where the same meme appears in more than one place (as in the isa_mid field in Memes, which is an occurrence of the meme mid ). We were able to encode explicitly the self-referential nature of the Memes table, as well as the multiple roles of agents in the Mpaths table (see below).

The SubscriptTriple provides even more features of interest, because it is itself a tuple, that is, a value with multiple elements each of which has its own distinct semantics. Its elements are StartSubscript, EndSubscript, and StrideSubscript. We document each of these as a meme; this tuple, like a table or header, is a meme bundle (see above). The subscript triple forced us to consider the presence of wildcard characters in keyword values, so we added

wildcard (the wildcard character, if any, for this meme)

Finally, under pressure from the rest of the DEIMOS team, we added

nominal min/max (nominal operating min/max values)

for the delivery of alarms.

In possession of all this information about FITS keywords, we were now able to generate formatted documentation for all the keywords in any context or header: see the a href=http://www.ucolick.org/~deimos/memes/Memes.html Main Memes Database Page /a for a demo.

Next: Producing Fake Data Up: History of Development (keywords/memes) Previous: History of Development (keywords/memes)

De Clarke
Mon Sep 9 16:46:16 PDT 1996