09 June 2011

A healthy approach to data analysis

As this appears, by a happy piece of synchronicity from my point of view, the Wellcome Collection in the UK has on show an exhibition called Dirt: the filthy reality of everyday life. One exhibit in particular is of pivotal relevance to data analytic epidemiology: Dr John Snow’s so called "ghost map". In 1854, using what would today be described as data visualisation, Dr Snow plotted cases of cholera on a map of Soho, London. From the results he deduced that a water pump, was the source of infection. This was particularly impressive because water was not, at the time, suspected as a transmission vector and the pathogenic germ theory of disease had not become generally accepted. The local council decision to disable the pump was therefore, in the circumstances, a seminal act of faith in datacentric deduction over conventional wisdom.

Seemingly unlikely causation chains are often discovered by more sophisticated variations on Snow’s theme, emerging through statistical winnowing of gathered data. More than most data analytic areas, epidemiology can benefit from pooled work by numerous users at the sharp end of their practice as well as high level overviews, and data analysis is vital across that whole range. Those who have me in preparing this article include theatre nurses, general practice managers, country vets and hospital porters.

In a more recent high profile example, again involving cholera, an outbreak in Haiti after the devastating earthquake seems to have been traced¹ to a tragic “confluence of circumstances” arising from the aid effort itself. Identification of the apparent initial import vector didn’t require any sophisticated analysis in this case, but patterns of spatial spread within the country subsequent to that were a different matter. In an unfunded study of data from census and hospitalisation records (using Madonna software, widely used software from the University of California at Berkeley) Tuite and others² were able to model transmission in a way which “Despite limited surveillance data ... closely reproduces reported disease patterns”. [more]


  1. Cravioto, A., et al. Final Report of the Independent Panel of Experts on the Cholera Outbreak in Haiti. 2011, New York: United Nations News Service Section.
  2. Tuite, A.R., et al., Cholera Epidemic in Haiti, 2010: Using a Transmission Model to Explain Spatial Spread of Disease and Identify Optimal Control Interventions. Annals of internal medicine, 2011. 154(8).

I would like to thank Dr Brian Corden for invaluable help in assessing an item which, as a result of his advice, was not eventually used. He thus saved me from making a fool of myself through lack of confidence in my own judgment

No comments: