RTD performance issues

ds9 vs figdisp


We've had some user reports that ds9, the RTD proposed for DEIMOS, is "confusing" to use. We think these are merely "new user" problems; folks here have been using figdisp for 10 years, and it takes some time to adjust to a new tool.

More worrying are the user reports that ds9 is "very slow". Here I should note that "slow" mean "compared to figdisp", of course -- we are seeing at least an order of magnitude slower response times out of ds9 than figdisp. That's quite a lot. A factor of two, people might not fuss over. A factor of 10 is scary.

Interactive responsiveness in the RTD is not merely a luxury; it's a very practical concern. Observing time at Keck is costed per minute (I forget the exact number, but we used to use a buck a second as the rule of thumb, so that would be about $60/minute). Needless to say both the Keck administration and the schedule observer want to waste, if possible, zero minutes in the course of a night.

So if the observers have to wait several seconds just to zoom in to part of the image to see whether the focus is any good, that delay will get to seem interminable after a few tens of images; and by the end of a 3-night run it will have added up to a lot of wasted observing minutes -- time that they won't get again until maybe next year or the year after. One of the design imperatives for our instruments and their support software is always to waste as little observing time as possible.

Figdisp, though primitive (it's really a "dumb frame buffer" sort of tool), was extensively optimized during the ESI project and offers quite responsive performance during observing. After the readout begins (for ESI or HIRES -- LRIS is still stuck with an older, slower version), only a second later -- or even less -- image rows begin to paint on the screen. The user can also pan and zoom during readout, and abort an image right away if the first few tens of rows shows it is unacceptable -- bad focus, clouds, whatever.

Readout times are fairly long with our larger detectors. Being able to abort an unsatisfactory image before spending a minute or so capturing it, is quite important.

Using ds9, we experience first of all the mysterious "header-getting" delay which means that 1/3rd of the readout has already gone by before any screen refresh. For 1/3rd of the total readout time, the observer sits there seeing nothing at all. Then chunks of image start to paint, and we have no difficulty keeping up with the readout after that. But unfortunately, ds9 is then so busy updating the frame that user mouse interactions -- pan and zoom commands for example -- are ignored or acted on belatedly. it is almost impossible to interact with the image until the readout is complete. This means effectively that the readout must complete (another 30 seconds or so) before the user can even decide whether it is acceptable. This is no worse than LRIS, but LRIS is the oldest and least capable of our instrument control systems, and its limitations are hardly what we would want for a flagship instrument like DEIMOS.

Once having acquired the image, the observers now want to pan and zoom around in it, to answer some questions before committing to the next exposure -- or to help plan their observing strategy for the next hour or even the next few minutes. If each pan or zoom operation takes several seconds, precious observing time is being lost just waiting for the RTD. There is usually a bit of basic interactive image qual assessment and quick look reduction taking place between images, and this activity wants to be very swift and efficient.

Figdisp's pan and zoom times on a 140 mb image, whether during readout or not, even on an encrypted X display, are sub-second -- so fast that we can't accurately time them with our stopwatches because our fingers aren't that fast. By unhappy contrast ds9's pan and zoom times on a 140 mb image (on an unencrypted display) are a factor of 10 or more slower -- 3, 4, even 5 seconds typically. Some actual benchmark results will be found below.

In figdisp, the old image (provided the new one was the same size) remained in the screen buffer while a new readout was starting, i.e. the new rows started overpainting the old rows. This meant that you could start the next exposure "hopefully," while still looking at the last one analytically. Then if you found something bad in the last image, you could abort the current exposure-in-progress.

In ds9, the start of the new image seems to require re-initing the frame buffer which destroys the image (reverts to white screen) so one cannot review the last image while also doing a new exposure. Again this is a lost efficiency -- activities which previously could be overlapped now being serialized. If ds9's response times were w/in a factor of 2 of figdisp's, I think no one would care terribly about this "screen being wiped" difference, but these little things add up ... Now we are not only waiting 25x longer to see the first pixels of the new image, but we can't even look at the old image while we wait :-(

Lately we have been reluctantely discussing a partial rollback to figdisp. No one is very keen on this, but there's a growing fear that ds9 will not be satisfactory for actual observing because of these inefficiencies (i.e. it is not, despite our efforts so far, a "quick-look" tool). This would be a disappointing outcome to say the least, since we had hoped to walk away from figdisp forever with this instrument, and we don't want to support both RTDs! We can only regard this as a bandaid solution, and we preserve a longer-term faith that ds9 interactive performance is improvable.

Yesterday in the lab, being pressed for time, the engineers reverted to non-mosaic images and figdisp, because it would have been impossible to acquire and review the necessary images in the time available, using ds9. We particularly wish to maintain our standard image format for DEIMOS -- multi-HDU mosaic files, so reverting to figdisp would mean some last-minute hackery to write multi-HDU files to disk, yet display a single-HDU file (all that figdisp can handle).


Here are the benchmarks.

ds9 RUNNING ON: Enterprise 450 (Sparc Solaris 5.8)
4 cpus, I believe, at the moment
3.25 GB RAM
3 GB swap
images on LOCAL RAID DISK
DISPLAY ON: Sparc Ultra 10 Solaris 5.7
openwin X server
256MB RAM
512MB swap
8 bit display
unencrypted display (xauth direct to this display, not virtual ssh encrypted display)

First we read FROM DISK a 16 amp full frame DEIMOS mosaic image. All the next series of timings are from this 140 MB, 16-HDU image.


	read file in from disk			22.7 sec

	resize ds9 window a bit larger		5 sec 		(!)

scale buttons

	zscale					3.4 sec

	histeq					57 sec		(!!!!!)

	minmax					4 sec

	98%					57 sec		(!!!!!)

panning (at 1/16 whole image in view)

	click in image in one quadrant		4 sec	(figdisp .25 sec)

	click in panner box			4 sec

	xoom to x1				4.5 sec

	drag cyan pan box to SW quadrant	2 sec

	drag to NE quad				5 sec

	drag to SW again			1.2 sec

	to SE					3.2 sec

	to SW					1.7 sec

	to NW					3.5 sec (!)

zoom back to 1/16				4 sec

	to x16					3.4 sec

	to 1/16 again				4 sec

middle click to re-centre image			3.5 sec

resize window even bigger, to take up
	about half of screen			10 sec (!)

resize window to take up all of screen		11.5 sec (!)

with this big window,

	zoom to x1				4.5 sec

	back to 1/16				11 sec

	to x1					5.8 sec

	to 1/16					10.8 sec

click in panner to pan

	to NE quad				14 sec 	(!)

	to SW					4 sec

	to NW					5.5 sec

	to SE					7.8 sec

and more scale (first return to linear etc)

	zscale					7 sec

	histeq					7.8 sec

	minmax					7 sec

	98 %					7 sec


shrink ds9 window				5 sec

We then did a live exposure. It's worth noting that on the first attempt to capture a live image, we did not succeed; ds9 hung and had to be restarted. The restarted copy accepted the incoming image OK, with following timing.


	IMAGE READOUT STARTS
		new shmem seg opened
		frame initialized
		headers scanned
		header data grabbed

	PAINTING STARTS		25 seconds after readout started, about	
				1/3 of image has already read out.

	IMAGE READOUT COMPLETE

	DS9 FINISHED PAINTING	about 2 sec after image readout complete.	

That final 2 sec wait is not a big deal. But the 25 seconds that go by before we ever see any updates on the screen, those are an issue. We are "blind" for those 25 seconds. If you review our prior correspondence on this delay, you'll recall that we discovered it scaled more or less linearly with the number of hdus, i.e. it had something to do with the way ds9 was scanning for hdus in the shmem fits file. This was never resolved.

I should note that for all pan and zoom operations, figdisp is sub-half-second, i.e. no perceptible delay at all. the contrast is astonishing. figdisp screen painting starts about 1 second after image readout starts. I don't think we expect ds9 to be exactly as fast as a dumb frame buffer, but I also think we don't expect it to be 25 times slower :-) a factor of 2 slower, or maybe even 4, would not be worrying. factors of 10 to 25 are scary.


Answers to questions in previous mail

 . . . .  1. are we in 8 bit or 24 bit mode?

always 8 bit. 24-bit is infeasible because one cannot scroll colormaps in real time (something we are always needing to do) on any 24 bit video hardware we have today. so at present we have no interest in 24 bit mode. I am running a 24 bit x server at present just out of curiosity, but will be switching back to 8 bit soon.

 . . . .  2. is the data local (ie on a local disk), or on an nfs mounted disk?

local. X display is remoted.

 . . . .  3. if yes, to #2, are you using nfs+ with cache enabled, and how fast is your 
 . . . .  network? 10mbit, 100mbit, 1Gbit? 

100 Mbit. but if severe network load were affecting the X display, it should have been affecting the figdisp X display equally -- same machine, same net, same X server, same glass.

 . . . .  and what kind of load? if you copy the mosaic 
 . . . .  file to a local disk, how long does that take?

in this case we are not concerned with files so much as with the data in shmem, so file transport shouldn't be an issue.

 . . . .  4. concerning the ultra 10 box, how much memory? how much swap? are you doing 
 . . . .  anything else at the same time? (for example, running netscape while displaying 
 . . . .  an animated gif will suck a cpu dry!)

We had no other x clients running, except a few x terms.

 . . . .  I'll be glad to download the 140mb mosaic, just to verify that all is well and 
 . . . .  normal.

we have lots of those :-)

try

http://www.ucolick.org/~de/deimos/backup.fits

(a 16 amp dark, taken yesterday, used for the benchmarks above)

btw while I remember it, pls also note (reported by users last week, a funny panner bug) -- notice how the colour map in the panner is not like the actual image, at the upper left? confusing for the user. it's almost as if 2 amps were swapped... but not really, 'cos the bad col is where it should be.


We are remembering now that we had to do a lot of optimization of figdisp over the last couple of years, to remedy inefficient ways in which the X11 API was used. What we propose to do now is to make a foray into the ds9 source code and see if we can identify any chunks of it which seem to replicate inefficiencies we ourselves committed with X11 while writing figdisp. A lot of these inefficiencies resulted in enormous performance hits with very large images or with high-latency connections. We achieved an enormous improvement in figdisp by correcting them. If we get really lucky we might find something you overlooked -- seems unlikely, but it's worth a shot -- in which case we may be able to suggest some fairly straightforward patches that would improve performance significantly. . . . . It may be helpful if one of you sit down with this person/people and just . . . . observe what they are doing/trying to do. This way we will have some real info, . . . . and not just so general comments... in our benchmarks we did pretty much everything they tried to do. we verified the slowness and the "can't load file" bug on the configuration described above.
de@ucolick.org
De Clarke