25 November 2008

A fistful of Rosetta stones

Reusing information after it has passed its initial shelf life is a bigger issue than it ought to be.
If I want to read what Bishop Erik Pontoppidan had to say about sea serpents in in the eighteenth century CE, state expenditures in Babylon of the sixth century BCE, or records from Ptolemaic Egypt, the source material can be read as long as I can get my hands on it. Admittedly, this relies on the work of translators, scholars, and sometimes hard won discoveries such as the Rosetta stone which enable their efforts. By and large, though, once a translation has been made I can count on its usefulness throughout my natural life and beyond. Age and medium are not problems in themselves.
Digital information, which is by definition recent and (you might think) ought to be more easily accessible, and more carefully curated, can sometimes be harder to reach than Pontoppidan or Nebuchadnezzar.
A charitable organisation recently presented me with a twenty year old digital data set for analysis. It was the only copy in the world of a large and detailed longitudinal epidemiological study ... and it was backed up onto a box of VHS video cassettes.
The first fear was for the vulnerability of the storage medium. Magnetic tape deteriorates with time, and is notoriously inclined to wind itself around capstans and other inner workings of the machinery used to play it. The stored signal also tends to "print through" from layer to layer when stored unplayed for long periods.
Finding a VHS cassette player wasn’t too difficult (though in another few years it may well be). One which was usefully connected to a PC (my first Rosetta stone) was more of a challenge, but an unsung hero in a university IT department was able to copy the content onto a backed up network where it would be safe.
Reading the backup files was another matter. After investigating several defunct VHS backup systems, I eventually found a helpful computer hobbyist in Albania who had a copy of the necessary restore program (my second Rosetta stone). With that, we were able to decode the backups to yield sets of files created and saved by the spreadsheet Wingz.
Now, if you are thinking that you've never heard of Wingz, it was a spreadsheet program from Informix. It was ground breaking in its day, far ahead of its time ... but that day and time came to an end in 1994.
GenStat has a well deserved reputation for being able to import a wide range of file formats, so I tried it. No dice: even GenStat was stumped. On the off chance, though, I sent a hopeful email off to VSNi (suppliers of GenStat) late on Friday afternoon, asking if they had any suggestions. By Saturday morning I had a response, and a solution "from our expert in NZ". I was offered a direct file format import solution in a week or two, or an immediate workaround; how's that for service?
The immediate workaround (which worked a treat) involved yet another Rosetta stone. This time it was a nominal "player" application for Wingz files. I say nominal because the application is actually much more than a player: I could, if I so wished, create and save new Wingz files with it, and manipulate them with Hyperscripts. More usefully in my particular case, I could load the epidemiology files into it and then save them as WK2 (middle period Lotus 123) format. WK2 files are readable by a number of current worksheet oriented products, so could then be saved yet again in any form I wished.
Interestingly, the source of this player application is a US academic user (Professor Tom O'Haver at the University of Maryland's Department of Chemistry and Biochemistry) who generated and still uses "virtual lab" teaching models in Wingz which he is gradually migrating to OpenOffice Calc.
After rigorously checking that the results were preserving the integrity of the original data, it was a fairly simple process to convert the whole archive. An alternative would have been to copy and paste via the Windows clipboard, but file saves were more elegant and preserved fuller numerical precision.
By invoking the Rosetta stone I am not, of course, seriously comparing myself to Young and Champollion. Nevertheless ... that this degree of ingenuity should be required to read information recorded only two decades ago does make you think, dunnit?

