23 January 2011

Analytically speaking

My first Damascene vision of what a wonderful tool data analysis can be was not in the physical sciences. In a holiday homework assignment, when I was fourteen, a maths teacher* asked us to explore, using what he had taught us that term, the suggestion that Shakespeare’s Hamlet might have been written by Marlowe. Two weeks of miscounted words, syllables, and parts of speech later, I understood the sheer intellectual thrill of using statistical analysis to explore the unknown.

Linguistics is a word used in many ways by different constituencies, but they have in common the scientific study of (usually, but not always, natural) language. This plurality of meaning makes it representative of language in general. Like all the words which make up natural languages and other means of human-to-human communication (including, for example, financial currencies) it is, to the exasperation and intrigue of science, an “arbitrary signifier”. Its meaning lies entirely in the intersection set of associations between those who transmit and receive it, and is defined by difference from what it is not rather that what it is.

The lure of the unknown is science’s greatest romantic pull. It can be along the banks of the Amazon or the Congo, it can be on the inaccessible ocean floors or in other galaxies, it can be down in the subatomic or up on the macrocosmic. But it can equally well be in the vast and ever shifting jungles of arbitrary sign systems with which we attempt to communicate that we seek and find – and scientific computing methods are just as central there as in more physical arenas.

At the same time, these qualities of language have strong practical importance. Language is how we become fully human – whether or not it is an attribute unique to our species, as consensus suggests, it is certainly a powerful component in our dominance. It is how we become socialised and acculturated. It is how some of us make the long climb from newborn tabula rasa to mature professional scientist – encountering, along the way, what Evelyn Rodriguez-Alamo[1] called “ the content and the vehicles of learning and scientific research for the 21st century”. Analytic approaches to language underpin the effectiveness of learning. Viewed from another perspective, language is how organisations are structured; analysis of effectiveness depends upon linguistic assumptions. As well as being itself an inviting subject for scientific enquiry, then, understanding how language does and doesn’t work is vital to both efficiency and outcomes for every stage and component of the context within which science happens.

Computerisation of linguistic data analysis can be traced...[more...]

* Mr Ernest Cothey: with affectionate respect.

1 Evelyn Rodriguez-Alamo, "The Conflict Between Conceptual and Visual Thought and the Future of Science" in Social Science Computer Review, 1995. 13(2): p. 207.

No comments: