Perhaps the most famous data retrieval case in the history of science comes
from sixteenth century orbital mechanics. Copernicus had laid the foundations
for a viable heliocentric system; Kepler stood ready to finalise it. Between the
two, both problem and solution, lay the mysteries of Mars: "the wanderer
planet". The data which Kepler needed already existed, in a database of naked
eye observations painstakingly constructed over two decades by Danish
philosopher Tycho Brahe.
The problem was twofold. Brahe had nailed his colours to a mixed system at
odds with that of Copernicus; and his data were his claim to posterity. He
employed Kepler as an assistant, but jealously guarded access to the full
observational data set.
Kepler did, eventually, gain access to the data. It wasn’t easy, nor always
amicable (though allegations that he murdered Brahe to achieve it have been
discounted), but it was done. He still had to learn how to retrieve it
productively, but six years of mining and analysis finally bridged the gap to
produce a final, successful, validated model.
Things have changed almost unrecognisably over the four or five centuries
since Copernicus, Kepler and Brahe, but some features recognisably remain amid
the new. Investment in research is balanced against the advantages of shared
access. Boundaries, proprietary or otherwise, remain between researchers and
data repositories. Murder and less extreme espionage methods may be rare (though
not unheard of) as means of gaining access to data stores, but Kepler would no
doubt recognise in essence the processes of negotiation and persuasion which
allow those boundaries to be permeated.
The biggest early twenty first century data retrieval issue, however, is a
different one. Acquisition in large quantity is becoming ever easier. Storage
is, in relative terms, becoming cheaper. The headache often becomes how to
ensure that one retrieves the right data for particular purposes from the
ever ballooning volumes which are thus becoming available.
And then there is the problem of storage format obsolescence. Unlikely as it
may seem, digital information which is by definition recent and (you might
think) ought to be more easily accessible, and more carefully curated, can
sometimes be harder to reach than older analogue stores.
[More...]
No comments:
Post a Comment