Showing posts with label Words. Show all posts
Showing posts with label Words. Show all posts

31 July 2013

A line dotted with shimmering pebbles

Julie Heyward's blog, Unreal nature, has comments turned off at the moment* so there is no way for me to express, in situ, my delight at the following metaphor for an individual life:
"...a line dotted with shimmering pebbles..."
It is taken from Multiple Arts by Jean-Luc Nancy (2006); you can find Ms Heyward's full extract here

*I confess that my own inclination is often to switch comments off, too. Most feedback comes by email. I was persuaded by Ray Girvan of the importance of comments to transparency, early on, but have never really warmed to them...

21 February 2013

Omnilingual word spotting

In yesterday's out take from my recent “Joy of text” piece in Scientific Computing World, I mentioned in passing the term “word spotting”.
Word spotting looks for visually similar discrete components within a text, and classifies them using statistical comparisons. In approximate human terms, it is treating words (or combinations of words) as ideograms rather than as phonetic constructs.
Word spotting is not limited to written or printed material, though that is the context with which I'm concerned here: it also applies, for example, to speech recognition. Nor, intriguingly, is it necessarily limited to words of known meaning; it can equally well be applied to semantic units of entirely unknown signification. It could, as an extreme example, be used to analyse the manuscripts from a lost extraterrestrial civilisation in H Beam Piper's classic science fiction story Omnilingual.
Reverse the Omnilingual example to imagine a hypothetical extraterrestrial archaeologist trying to study post apocalyptic remains of our own cultures. It will be obvious from context that certain signification units are associated with the physical sciences: "volts" on electrical signs and appliances, just to pick one.
Our xenoarchaeologist (who may not have alphabetic scription systems, or even vocalisation, and certainly cannot assume that each letter represents a sound) looks at a mass of textual matter whose content, subject matter, purpose and reliability are unknowable. Where should attention be concentrated? The only certain knowledge is that words are identifiable visual entities found in isolation and occurring with spatial separation in books.
Word spotting, with no semantic assumptions, quickly shows that some books contain many instances of the visually related signifiers "volts" "volt", "voltage", "voltmeter" and so on, suggesting that those sections contain material related to electricity; others do not. There will, of course, be false hits such as "Voltaire" and "revolt", but as one discriminator amongst others in a multiple sieving process it would nevertheless be invaluable.
Handwritten notes and journals would be less amenable than printed books (in my own handwriting, for instance, computer transcription systems have trouble separating "volt" from "bolt" and sometimes even "void"), but could still be sieved using multiple discriminators in the same way.

  • Piper, H.B., Omnilingual, in Astounding Science Fiction. 1957, Dell Magazines: Northwalk CT.

20 February 2013

Making sense of the census

In preparing an article on any topic, there is often more good stuff than makes it into the final edit.
In the case of my recent “Joy of text” piece, about text analysis for Scientific Computing World, one of the out takes was a prototype content-based image retrieval system framework developed for census searches by Kenton McHenry and his ISDA group at NCSA. The problem to be solved is computerised searching of large volume handwritten census returns.
A user inputs a handwritten query – I might, for instance, write "Grant". The system derives a numerical feature vector which describes that input, then seeks occurrences of similar vectors within the image database.
The system is designed to self-validate, by recording which returned entities are selected by the user. I will not, for example, select false hits such as "Grand" or "Ghent" and the system will note which results I do or do not follow up. Over time, other Grants will make similar decisions and increase the system's confidence in selecting some hits for return and not others; gradually, those writing “Grant” as their query will see fewer and fewer offers of documents containing similar looking words.
The computer analytic process behind all this is progressive.
First, the lines and boxes on the census forms are used to carve up the content into image segments (for instance, surname will be in a box at the same location on each form and will become a data entity). Each segment is then converted into a numerical feature vector representing its appearance, and similar feature vectors are grouped hierarchically. Two million XSEDE (Extreme Science and Engineering Discovery Environment) CPU hours have been requested for initial record processing.
When the search query is entered, word spotting is used to compare its vector with those stored in the database, seeking matches within statistically defined limits of similarity. The search is not a blind one from the beginning of the database through seventy billion image segments to the end; the hierarchical grouping guides greatly reduces the number of entities which need to be compared.

15 February 2013

The joy of text

It's a mundane truism, not normally worth mentioning, that words and phrases as signification units in natural language have only the fuzziest of relations to that which they signify. It is, nevertheless, a live issue for the many researchers attempting to computerise data analytic activity using text as raw material. It's also a truism of which I have been reminded afresh as I discussed the topic with practitioners and consumers of textual analysis, no two of whom used the term in exactly the way.
Strictly speaking, textual analysis describes a social sciences methodology for examining and categorising communication content. In practice, though, it is widely used to cover a range of activities in which unstructured or partially structured textual material is submitted to rigorous analytic treatment. What they all have in common is a desire to wrestle the petabytes of potentially valuable information locked up in an ever inflating text reservoir (blogs, books, chat rooms, clinical notes, departmental minutes, emails, field journals, lab notebooks, patents, reports, specification sheets, web sites and a million other sources) into a form which is susceptible to useful, objective data analytic treatment. Temis, of whom more later, have on their website a headline which sums it up neatly: "Big data issue #1: a lot of content and no insights". Text mining, the consequent knowledge bases, and analysis of the results have become a major component of biomedical and pharmaceutical research.
For our purposes here, I have taken it to mean analysis whose purpose is to extract scientific value from texts, to examine those texts scientifically, or some combination of the two. [More]

28 April 2012

The girl on the hat-shelf

A couple of weeks or so ago, when I enthused about song as meme, Ray Girvan rightly reminded me that it is “...a very mutatable one, unless continually reinforced by knowledge of some canonical version. The very tendency to fit misheard lyrics to known words (even if semantically ridiculous) is part of that process.”

It will not be news to anyone who is a fan (and perhaps of Woodstock vintage...), and of no interest to those who are not, that Joan Baez is on tour in Europe. Having just heard her sing for the first time in many years, my partner and I fell to discussing her songs, and discovered a disagreement between us over the meaning of a particular line from the third stanza of Diamonds and rust: “The girl on the hat-shelf". One thing led to another and thence, inevitably, to one of the many lyrics websites ... where we discovered that for more than thirty five years we've been mishearing the line. It's not "hat-shelf" but "half shell".

[Edit: as Ray has subsequently pointed out, in a comment to this post, the "half shell" probably refers to to Botticelli's The birth of Venus ... an interesting iconographic extension to the Marian image of "The Madonna" on the previous line. My assumption before this had been that "the girl on the hat-shelf" was Baez's position as a lover taken down and put back at Dylan's convenience.]

Unable to believe this we searched further and, eventually, alighted on Baez’ own web site which confirmed our mistake ... but not before (in our certainty that we were right) we'd done a web search for "the girl on the hat-shelf". That search netted us just one hit: a Harry Potter fan fiction site. It seems that Harry Potter fan fiction makes frequent use of lyrics from this song; but I'll stick with this one instance by Morag X Henegev because, apart from replicating our own mishearing, it introduces several others (I should point out that it would seem Henegev is not a native English speaker, which greatly increases the difficulty of transcription).

Here are the two versions of that stanza...

As Baez sang itAs Henegev heard it
Well you burst on the scene
Already a legend
The unwashed phenomenon
The original vagabond
You strayed into my arms
And there you stayed
Temporarily lost at sea
The Madonna was yours for free
Yes the girl on the half-shell
Would keep you unharmed
Well, you burst on scene
Already a legend
A young, washed phenomena
The original beg-a-bong
Heading straight into my arms
And there you stayed
Temporarily lost at sea
The Madonna was yours for free
Yes, the girl on the hat-shelf
Could keep you unharmed.

28 January 2012

A wordle in your ear

These days, it seems, everyone who is anyone must have a Wordle (especially if they are to any extent involved in education) and some of my readers have been pointedly commenting on my lack of this essential accessory.

So, here is a wordle based on a recent atom feed from The Growlery (as always, click it you want to see a larger view).

Interestingly, "Ray" of JSB fame appears prominently but not "Girvan" ... the queen of Unreal Nature, on the other hand, reverses this with "Heyward" immediately spottable but not "Julie". why that should be, I've been too lazy to investigate ... Dr C (but without the "C") nestles between the "T" and the "o" of the word "Toss".

Update, 2012/01/29: Just to show what an exciting life I lead ... this is a Wordle compiled not from text but from the first ten thousand digits of π.

Tomorrow, the heady delights of the exponential function, e... [only joking]

21 January 2012

Foe Furren

I've just chanced to see the opening credits of the film Enemy of the state. They employ a variation on the "faux Cyrillic" (or more generally, "faux foreign") theme ... but, unlike Ray Girvan's examples or mine, I'm not sure what the point is.

The letter "E" is replaced by "Σ" (Greek capital sigma), "A" by " Λ" (Greek capital lambda). So, for instance, the credit for Gene Hackman is rendered:

I've seen the E/Σ substitution before, as a ham fisted over-egging (excuse the accidental food theme...) of faux Cyrillic, but here it becomes faux Greek. Which might make sense if there was any Greek connection in the film ... but there isn't ... Enemy of the state is set in the US, with Will Smith's African American lawyer pitted primarily against a US intelligence agency and secondarily against Italian American mobsters.

To further confuse matters, "Y" is replaced with something that resembles the currency symbol for the Japanese yen ( ¥). What's that all about?

Most odd...


Update: in a comment to this post, Ray has identified the font as Metrolox, which is available as a TTF font download (thanks for that, Ray). What its pseudo Greek references have to do with anything in the film is still a mystery.

However, there turns out to be another twist to this story. Having downloaded the font to look at, I opened the author's documentation file, the open sentences of which are:

Metrolox is loosely based on the titling of the Enemy of the State movie. I say "loosely" because the movie titling showed only so many letters, and the lab's final version turned out so big.

So Metrolox was born of Enemy of the state, rather than the other way around, which is interesting.

Apart from the specifics of relevance in the case of this film and font, I also wonder about the reasoning behind uses to which typography is put.

The role of "fancy" fonts is, generally, to capture attention; they are suited to signage, labels, short headlines (the examples Ray offered are perfect). They are not well suited to conveying textual information, since the very quality which makes them effective eye catchers (the fact that reading is momentarily interrupted, the eye tripping up, so to speak, over unexpected elements) becomes a barrier to extended reading.

It could be argued, perhaps, that the names of actors in a film, superimposed one at a time over its opening scenes, are not continuous textual information but a form of bulleted headline. I also concede that I probably paid more attention to the names depicted than I might otherwise have done ... so perhaps that's the point.


Yet another update: in a second comment to this post, Ray has made a good suggestion about the rationale for the use of the font, which I find convincing:

Could the allusion be to the villains of the film being in the NSA: in the field of cryptography, security and surveillance of foreign communications? That could explain the mixed foriegn characters; and the "O" looks very like the keyhole of a 180 degree toolbox cylinder key.

02 December 2011

History fatigue

I've just watched an episode of Anthony Horowitz's second world war police drama sequence Foyle's War, in which the eponymous Foyle tells his son that he (the son) is suffering from "combat fatigue".

My immediate assumption was that this was an anachronism. At a guess, I'd have fairly confidently said that the term dated from the 1960s. Not that I am viscerally opposed to anachronisms; it just surprised me in a fiction known for its diligent research.

Looking up combat fatigue, however, I discover that I couldn't be more wrong.

The OED seems to locate "combat fatigue" (“n. a nervous disorder resulting from prolonged or severe battle experience”) 1943 – firmly in Foyle's time. There are references to it in US medical journals from the mid to late 1940s, even though the earliest PubMed hits are from 1945.

Google Labs' Ngram shows a peak frequency at 1948. It also shows occurrences from as early as 1860, but a quick sampling suggests that this is a red herring – a dozen spot checks all yield either usage such as “At present we have no drugs that combat fatigue of the central nervous system directly”* or retrospective reference from later dates.

On a lazy search, then, it seems that this description dates from about twenty years earlier than I had assumed.


* Psychiatric bulletin of the New York State hospitals: Volume 2, Page 311, 1917)

24 September 2011

Hand ’is heye full hof harrer

Ray Girvan has just put up a JSBlog post entitled ’Arry and ’Arriet. It is, as always, fascinating (to me, anyway, since my interests and curiosities often run closely parallel to Ray's) ... but this post of mine has only the most tenuous connection with its substance. Instead, I found myself flicked back to childhood by the title itself.

My maternal grandfather played endless word games with me1 and would often coach me through tongue twisters. One of my favourites, which he attributed to his friends ’Arry and ’Arriet (there you are – a connection at last!), was this evocation of an iconic moment from English mythology:

’Arold of Hengland
Sat hon ’is ’orse
With ’is ’awk hin ’is ’and
Hand ’is heye full hof harrer.

The specific memory which first slid into my mind when I read Ray's title was of walking along the lines of pea plants in my grandparents' huge garden2, with my grandfather, when I was about four years old, trying to recite the whole thing with every deliberate error in place, whilst simultaneously scrumping peas straight from their pods...


  1. And probably, in doing so, played a very large part in making me the person I am. He teased me unmercifully (but always affectionately; I loved it and him) with things I couldn't understand. One strand was recounting to me conversations with, and the doings of, his friends ’Arry and ’Arriet. They always sounded wonderful people, who lived wonderful and joyous lives, and I wished that I could meet them ... I realise, now, of course, that they were imaginary ... and that they represented his own childhood, before a fluke of history and war shunted him into the military officer class where he adopted protective colouration with which he was never really comfortable.
  2. The same garden which, on another occasion, saw me burying chocolate buttons under its boundary hedge...

27 August 2011

Good vibes down the time line

In the middle of a conversation with Clarissa Vincent, she used the expression "good vibes" ... and I found myself wondering whether the Beach Boys invented "good vibrations", or whether we already had them?

Being a terminal nerd (not to mention terminally bad mannered), I wandered off to find out.

The first use of the phrase, in a book in English, according to Google Ngram, is from 1893: Law and the prophets: a scientific work on the relationship between physical bodies, vegetable, animal, human, and planetary by one Frank Earl Ormsby:

"You are embodied for the purpose of expressing your own spirit, see to it that no one robs you of the right. Receive all of the good vibrations that spirits can give you, but do something for yourself, if you expect results."

From then onwards, occurrence of the phrase in literature pootles along close to the bottom of the graph (though with a modestly significant increase from 1925) until 1966 ... after which it rises to a maximum in 1972 before dropping off again.

The Beach Boys released "Good vibrations" in 1966. So, it seems that the phrase had already been in existence for a century, but my generation (actually, probably the previous generation ... I was 14 in 1966, 20 in 1972, not yet writing books) picked it up from the Beach Boys and made it mainstream.

After 1972 it dropped back, but remained regularly used, until 1988 ... when it surged again, reaching a peak between 2004-2006 from which it now appears to be dropping off again.

(I've looked for Law and the prophets in the British Library catalogue, without success; the Library of Congress (probably a better bet, going by the author's name format) isn't responding at the moment ... perhaps later...)

[Later addition, 1611Z: Library of Congress still isn't talking to me ... but I've found Law and the prophets in the Library of Michigan. Published in Chicago by A.L. Fyfe]

[Later addition still, 1626Z: Thanks to Ray Girvan, voice of JSBlog, for an actual copy of Law and the prophets, from the cover page of which I note that Frank Earl Ormsby was "a magian mystic" whose book was "designed for the instruction and guidance of students in the occult sciences". It makes for fascinating reading.]

[And again, 1639Z: from Livia Passini, an MP3 copy of Good vibrations ... complete with authentic scratched vinyl 45rpm clicks and hisses...]

26 April 2011

Eletelephony

Continuing my occasional habit of inflicting childhood poetic memories on innocent readers, here is one which I remember with particular affection. It was introduced to my class by Ian Murray, grade 6 teacher at Elizabeth Grove Primary School in 1964.

Whatever else may have been good or bad about Australian primary education at that time, the teachers I encountered at Elizabeth Grove Primary had a knack for choosing poetry which would arouse my love of the form. There is a direct line (an elephone line, perhaps) of development from poems like this (and these) in my late primary years to my later embracing of Milton's Paradise lost, Dante's Divina commedia, T S Eliot's Four quartets, Muriel Rukayser's Speed of darkness, Elizabeth Browning's Aurora Leigh, Frank Jones' Everything is like fire...

Here you go ... Laura E Richards' Eletelephony

Once there was an elephant,
Who tried to use the telephant -
No! No! I mean an elephone
Who tried to use the telephone -
(Dear me! I am not certain quite
That even now I've got it right.)
Howe'er it was, he got his trunk
Entangled in the telephunk;
The more he tried to get it free,
The louder buzzed the telephee -
(I think I'd better drop the song
Of elephop and telephong!)

It's possible that the last two lines are apocryphal. The version shown at The literature network lacks them. Other on line versions include them, or something like them, though some omit other lines. Ray Girvan would get to the bottom of it and track down the definitive version; so, if I were the respectable academic I pretend to be, would I; but my affection is for the version I remember, so let it stand.

(The spell checker has had a ball with this post, let me tell you.)

23 April 2011

Highlights

Highlight of the day: discovering the "highlight of the day" taglines in email sigs from Clarissa Vincent (author of The voyage of Storm Petrel).

For example:

Highlight of the day: A pair of orange tip butterflies enjoying the new herb growth along the path.

Highlight of the day: Butter bean and tuna with fried garlic, onion and ginger, with rice.

Highlight of the day: Using the fan heater, on cold setting, for a change.

Highlight of the day: River path walk with Loba.

“I wanted something with the shortness of Twitter”, she says, by way of explanation, “but without the crassness.”

I've seen (and used) many variations on the sig tagline, from the utilitarian to the surreal, but I like the upbeat simplicity of this one.

28 February 2011

The s?ien?e of spelling

I'm sitting in a café. At the next table a group of teenagers chatter cheerfully. I've just overheard:

“My chemistry teacher I swear you can't believe him like nuffin. He spell 'science' like it got a "c" in it, man!"

11 February 2011

Virtual books at the British Library

After recently giving a talk on the use of text within visual art, I followed up some suggestions (thank you, Maureen) and questions from the very lively and participatory audience. One place to which this process took me was the Virtual Books index at the British library.

There is an excellent online Lindisfarne Gospels, a Qur'an and a Hebrew Bible (these links take you to static text/image pages; there are also animated page-turning versions). But I recommend looking through the whole list, in all its representative variety from Alice through botanical illustration to Leonardo, maps and Mozart.

05 February 2011

Slaving over half a trillion words...

A number of emailed queries suggest that I may, in my (19 December last year) "Picking over half trillion words" post, have inadvertently made light of the task facing anyone who wants to do their own analysis of Google's raw data sets. Here is a quick run down on what to expect; I think it worth the effort myself, but not everyone shares my view of what constitutes light entertainment.

02 January 2011

My dear, I was literally metaphorical...

We all have our private irritations and exasperations, which may not be reasonable but are nonetheless real. Many of mine lie in the use of words, and telling which are reasonable and which are not can be a grey area.

Language is, I passionately believe, an evolving thing and we cannot tie it down with rules. Words change their meanings ... get used to it. On the other hand, its richness and complexity rely on the existence of rules ... the rules can (indeed, must) change with time, they can be broken with magnificent effect, but like (to borrow Robert Frost's analogy) a tennis net, we do need them. If they break down entirely, or even change too fast, the glorious moderated anarchy which is language falls apart and becomes a puddle on the floor.

My reason for wittering on like this is, of course, a particular exasperation which has just happened by ... though in this case it's amused me rather than irritating. Over a long period, now, I've noticed (and generally restrained my irritation over) misuse of the word "literally", for example “I was so embarrassed, I literally died!” Today I've heard the converse: “I was so frightened, I was almost metaphorically looking over my shoulder!”


  • "Writing free verse is like playing tennis with the net down." Robert Frost, in an address to students. 1935, Milton, Massachusetts: Milton Academy.
  • "I'd as soon write free verse as play tennis with the net down." In an interview with Edward C Lathem, 1966. (Manuscript, part of the boxed Papers of Edward C. Lathem, 1913 - 2009, Hanover, New Hampshire, USA: Rauner Special Collections Library, Dartmouth College)

27 December 2010

Half a trillion words ... plus two

The ever enquiring young mind of Julie Heyward can always be relied upon for penetration to the philosophical heart of any issue without fear, favour or delay. In the case of Google's Ngram viewer and datasets, for example, she maintains this reputation with a comment heroically posted early on Christmas Day:

How long can it take to run "chicken boogers" through that thing? Surely that was your first and most urgent task, on realizing the power of such linguistic machinery?

... a question which subtly refines Edmund Burke's demand, in 1770, to know “where were the boogers?” Such serious enquiry deserves serious reply, so I immediately applied myself to the task.

The immediate headline answer is ... it took mere milliseconds to establish that the bigram "chicken boogers" seems, between 1500 and 2008 CE, to have appeared ... approximately ... zero times.

Undeterred, I tried the separate words. Boogers seem to first appear in print, as noted above, in the late 18th century. Chickens are immortalised in ink from almost two hundred year earlier, with a 1586 culinary reference from John Trusler – and Sir Philip Sidney, no less, was moved in 1599 to pen the words O Mopsa my beloved chicken, here am I thine owne father... (about which, perhaps, the least said the better).

Tracking both words across the same periods of time, chickens appear far more often (by roughly three orders of magnitude) than boogers, which makes a comparative graph on the same axes somewhat unenlightening. Rescaling and using two separate y axes, however, permits the illustration shown here (double click the graphic for a full size view in a separate window) which shows that there is an approximate correlation over most of the past century ... but that in the past decade chickens continue to increase their popularity while that of boogers may (only time will tell) have started to decline.

I am grateful to Ms Heyward for bringing this important and previously overlooked research issue to my attention.


  • Edward Burke, in The Annual register, or a view of the history, politics and literature of the year. 1771, London: J. Dodsley.
  • John Trusler, The London Adviser and Guide: containing every instruction ... necessary to persons living in London, and coming to reside there ... together with an abstract of all those laws which regard their protection against the frauds ... to which they are there liable. 1586, London.
  • Sir Philip Sidney, The Countesse of Pembrokes Arcadia. 1599, Edinburgh: printed by Robert Walde-graue.

19 December 2010

Picking over half a trillion words

Following JSBlog's enthusiasm, yesterday, (“Google just blew my bibliographic socks off”) for Google's new Ngram viewer, I've been busily catching up.

First stop was the viewer itself. Then a start on downloading the raw data sets which lie behind it, for more detailed analysis than the online viewer can deliver. Finally, while the data downloaded in the background (almost two gigabytes of it just for single words in English, even in ZIP form ... nearer to ten when expanded), reading the associated Science article by Michel et al.

It's going to be a good while before anything significant comes of the downloads, but I've done a couple of test drives. They can be intuitively checked with a quick visit to the viewer.

First experiment, resulting from a recent off the cuff discussion amongst a group of students: correlating uses of the words "twat", "twit" and "twerp". It's interesting to find positive correlation between the first and last from 1935 to 1980, but negative between them and "twerp" over the same period – which then reverses so that all three positively correlate over the past thirty years.

Second: the tendency to concatenate "bigrams" into single words. This train of thought was started by Google's example comparison of "child care" with "nursery school" and "kindergarten" ... I tried it out, and then added "childcare" to see if it made a difference. As examples to cut a long story short, "child care" declines markedly as "childcare" slightly increases (a negative correlation) from 1996 to 2008; "brood mare" and "broodmare" show a similar negative correlation from 1960 to 2000 but then "brood mare" recovers and the correlation becomes positive through to the present.

Those are, of course, trivial investigations and show nothing ... I mention them only to show the sort of five finger exercises that I've been playing with since yesterday. Much more interesting is some of the investigation mentioned by the authors of the Science article.

For example:

Suppression – of a person, or an idea – leaves quantifiable fingerprints... ... ... Such examples are found in many countries, including Russia (e.g. Trotsky), China (Tiananmen Square) and the US (the Hollywood Ten, blacklisted in 1947)...

We probed the impact of censorship on a person’s cultural influence in Nazi Germany. Led by such figures as the librarian Wolfgang Hermann, the Nazis created lists of authors and artists whose “undesirable”, “degenerate” work was banned from libraries and museums and publicly burned... We plotted median usage in German for five such lists ... ... ... The five suppressed groups exhibited a decline. This decline was modest for writers of history (9%) and literature (27%), but pronounced in politics (60%), philosophy (76%), and art (56%). The only group whose signal increased during the Third Reich was the Nazi party members [a 500% increase...].

Given such strong signals, we tested whether one could identify victims of Nazi repression de novo. We computed a “suppression index” s for each person by dividing their frequency from 1933 – 1945 by the mean frequency in 1925-1933 and in 1955-1965... In English, the distribution of suppression indices is tightly centered around unity. Fewer than 1% of individuals lie at the extremes... In German, the distribution in much wider, and skewed leftward: suppression in Nazi Germany was not the exception, but the rule... At the far left, 9.8% of individuals showed strong suppression... This population is highly enriched for documented victims of repression, such as Pablo Picasso..., the Bauhaus architect Walter Gropius, and Hermann Maas... ... ... At the other extreme, 1.5% of the population exhibited a dramatic rise... This subpopulation is highly enriched for Nazis and Nazi-supporters, who benefited immensely from government propaganda...

These results provide a strategy for rapidly identifying likely victims of censorship from a large pool of possibilities, and highlights how culturomic methods might complement existing historical approaches.


  • Jean-Baptiste Michel, et al., "Quantitative Analysis of Culture Using Millions of Digitized Books" in Science, 2010. DOI 10.1126/science.1199644

26 September 2010

Synchronicity, yet again...

My brother recently asked about the (social interaction) process by which I arrived at the "On the bench" photographs. (Several others have asked related questions.)

In the course of answering, yesterday morning, I made reference to portraiture as a joint enterprise between portrayer and portrayed (in this case, photographer and subject). While still thinking about that, I found the following Unreal Nature extract from Deleuze's Cinema 2: The Time-Image:

… The author takes a step towards his characters, but the characters take a step towards the author: double becoming. Story-telling is not an impersonal myth, but neither is it a personal fiction: it is a word in act, a speech-act through which the character continually crosses the boundary …

Which seems much the same thing ... and could apply as much to characters in a novel as in a film. Perhaps all arts are a similar contract. Perhaps social interaction itself is, too.


  • Gilles Deleuze, Cinema 2 : the time-image. 2005, London: Continuum. 0826477062. (Original: Cinéma II: L'image-temps, Collection "Critique", 1985, Paris; first English trans 1989).

06 June 2010

Measure for measure

It seems to be a Dr C day.

In yesterday's "Friday crab blogging (late)" post, the good doctor twice refers to “...the English Pound, that funny letter "L,"...”. Ignoring the usual USAmerican conflation of "English" and "British" (analogous to the calling all USAmericans "New Yorkers" or, I concede, the British habit of calling anyone north of Mexico "Yankee") it's true that symbols are often perplexing.

The periodic table offers every child the puzzle of Ag for silver, K for potassium and Pb for lead. These derive, like "that funny letter L", from Latin origins: argentum, kalium, plumbum, librae, respectively.

Why "librae"? The Roman system of currency, before its departure from Britain, was arranged in a three tier system in which librae (from libra, the Roman standard unit of weight – from which also comes the abbreviation "lb" for the pound weight) were subdivided into solidi and then into denarii. The resulting abbreviations somehow survived Saxon and Norman invasions, to become the "£sd" (Pounds, shillings and pence) system which was still around through my youth and early adulthood.

For those too young to remember, there were twelve pennies, or pence, (d) in a shilling (s) and twenty shillings in a pound (£). In a sudden spasm of common sense (vigorously contested, but successful) when I was in my early twenties, the United Kingdom replaced this lovable but unwieldy cultural heirloom with a shiny new system. There were now one hundred pence (p) to the pound (still £).

The rest of Europe has since leapfrogged on from its various ancestral currencies to an even shiner and newer system: the common currency Euro, represented by a funny E (€) which has the virtue of a direct link to the currency unit. The UK has yet to find much enthusiasm for this obvious next step ... another 1300 years, perhaps.

The rest of Europe, and indeed the rest of the world, also shows greater enthusiasm than the UK for adoption of other standard measures – in particular, adoption of the almost completely logical SI system. The UK has adopted the system, but not to the extent of replacing day to day use of older customary units, which is somewhat confusing. The US (in company with those other two great global leaders Liberia and Myanmar) hasn't adopted it at all, preferring to stick with a quirkily unique variant on the Imperial system used by ... um ... nobody else.

So, to return to the point, the "funny letter L" is an abbreviation inherited from a predecessor currency whose name actually did start with an L.

The US, of course, uses the Dollar, the name of which derived from an old European silver (Ag!) coin called the Taler, and so its symbol is a funny letter T. Oh, wait a minute ... no it's not ... it's a funny letter S ($). Explanations for this vary, with the front runner being ... that ... S is an abbreviation for ... "Spanish peso" ... at this point, since my head hurts, I will hand you back to Dr C... or even onward to JSBlog, which has a stronger stomach for such labyrinthine historical cold trails than I...

Time for a cup of coffee and a shortcake biscuit.