15 June 2012

Navigating the sea of genes

Over my working life, statistical data analysis has explosively expanded in significance across every area of scientific endeavour. Over the last couple of decades, computerised methods have ceased to be an aid and become, instead, simply the way in which that statistical data analysis is done. Partly as a result, and partly as a driver, data set sizes in many fields have grown steadily. As with every successful application of technology, a ratchet effect has kicked in and data volumes in many fields have reached the point where manual methods are unthinkable.

Some areas of study, however, have seen their data mushroom more than others. Of those, few can match the expansion found in genetics, a field which has itself burgeoned alongside scientific computing over a similar time period. The drive to map whole genomes, in particular, generates data by the metaphorical ship load; an IT manager at one university quipped that “it’s called it the selfish gene for a reason, and not the one Dawkins gave: it selfishly consumes as much computational capacity as I can allocate, and then wants more”. [more]

1 comment:

Dr. C said...

Felix, Nice work. For the first time I feel like science is moving so fast ahead of me that I am far in its wake. An interesting result of the data manipulation in genomics is that of Michael Snyder of Stanford who had himself tested for two years for, I assume, not so much genes themselves but gene products or expressions:
http://www.wired.com/wiredscience/2012/03/diabetes-personal-genomics/
They must have used the same high power analysis that you describe.