06 June 2006

Living with big brother

[published in Imaging & Machine Vision Europe]

I’m a scientist, using machine vision as one tool in what I do. That tool is changing almost in my hands as I use it, and throwing up clouds of important sociopolitical issues in the process.

Recent issues of IMVE have included Nick Morris’s look at the rôle of MV beyond the factory floor, his particular examination with Tom Wilkie of its place in security applications (April/May issue this year), and Tim Gillet’s piece on facial recognition (August/September 2005) which closes with a commendably balanced reference to Big Brother. Machine vision is indeed spreading rapidly beyond industry, in uncountably many ways; and we have already moved into a more effective and sophisticated surveillance culture than George Orwell’s 1984 could imagine. But I’m not here to cry doom. Yes, the social changes concern me; but science and technology cannot be put back in the bottle, and neither accusing nor defending machine vision on apocalyptically dystopic charges really gets us anywhere. Better to consider what Big Brother might look like, down the line a way, so that we can think about influencing the world into which we are heading.

When considering the unknown, we need models if only as departure points, and fictions are all we have for the future. I have invoked the ready made token of Big Brother (from George Orwell’s 1984) but you can pick from a plethora of alternatives. Our present (in the developed world, at least) owes more to Huxley’s Brave New World, and even that isn’t a very good fit. The DNA scanner above, omnipresence of ANPR (automatic number plate recognition), smart tagged clothing, and pharmaceutical solutions everywhere, point to Ira Levin’s This Perfect Day; but, again, only up to a point. The symbiotic human/machine society of Iain M Banks’ “Culture” novels offers another view. There are combinatorially many avenues which future gazing can validly take, but I’ve settled on two which seem likely to become reality within my working life and materially impact what I do.

Visual approaches are only the start of it. “Imaging” is an inherent part of the hardware/software package which is the human window onto the world, but “machine vision”, while convenient, is a misleadingly restrictive tie to biological metaphor. I’ve just been to see the prototype of a machine which may one day sample your DNA for identification purposes as you pass through a controlled entrance (ticket gate, primary school, supermarket). DNA comparison may not seem to be “vision” – but think of films and TV dramas where the moment of revelation is invariably conveyed by visual overlay of two transparencies.

“Vision” has two parts. The physical eye or camera specifically receives light (or any elecromagnetic radiation), but the software behind either is omnivorous, not necessarily concerning itself with modality. Software processes input into the most useful form. Spend any length of time forced to function meaningfully in darkness, and you will find your brain beginning to form images from the sound, touch and smell stimuli which it is receiving. “Machine vision” is partly an artefact of modern emphasis on eyes as highest grade sensory input, partly of recognition that visual information is more readily susceptible to productive algorithmic handling than, for example, tactile or olfactory input. As methods advance, the distinction between inputs will become purely one of availability and quality in context, the camera being only one tool in a larger rack, and this magazine will naturally evolve into something like “Machine Perceptual Imaging, Europe”.

As machine vision spreads throughout our technological infrastructures, it does so on an ad hoc basis. Today your client needs a site security system; yesterday it was a CCTV system at a football stadium; tomorrow it may be high speed capture cameras for a production line monitoring solution. I, meantime, am applying similar methods to study of malnutrition effects, the behaviour of ants, roadside impacts. But since the rise of the internet, data doesn’t stay ad hoc for long.

Biological sensory inputs do not remain separate. Your visual, auditory, olfactory, and tactile inputs on entering a room, for example, are immediately synthesised into an integrated impression of that room. At the same time, both inputs and synthesis are compared with a lifetime’s catalogue of other situations for signs of threat or advantage. The new input data are simultaneously being added to that catalogue. So it is with machine inputs: we use captured industrial, social, or scientific data in the same way, combining primary sources with secondary to derive value added from our investment in collection. This used to be a deliberate manual process, but increasing connectivity has made automation more and more feasible – not only automated comparison and integration, but automatic selection of what is to be compared and integrated. In a research lab recently I was introduced to a software agent which tracks down any accessible video footage of multiple objects moving in complex patterns (termite colony, shopping mall, pebbles in a tidal race, billiard table, dance floor), analysing and cataloguing patterns before adding the results to a database without human intervention. Commerce similarly mines disparate databases for relationships between customers or behaviours of which those customers themselves are unaware. In industrial context, with many variables being automatically monitored, the logic of automated and ongoing analysis is inescapable.

So, gradually, both local networks and the internet will increasingly resemble autonomous entities – with machine vision hardware as their eyes. Your client’s site security, CCTV, or production line monitoring will not be able to ignore the synergistic advantages of tapping into larger entities. Just as disparate CCTV systems have been integrated into large scale “big brother” ANPR networks in return for security payoffs, so your client will reap the benefits of at least partly submerging local systems in these extended structures. Your machine vision imaging will not, in many cases, be a separate operation but one of many eyes for a larger one with considerable autonomy.

The scope of the integration be will not be limited to industrial input: the same type of mutual back scratching will link you into global input, analysis and catalogue sharing of all kinds. And vision not be a particularly separate sense of that “big brother” organism of which you will be part: it will be part of a holistic undifferentiated sensory array stretching from thermometers to DNA scanners.

While I don’t suggest that this large machine entity will be “aware” any time soon, AI work and algorithmic theories of mind are progressing at a rate which makes any fixed view insecure. Keeping an eye on the likes of Sourceforge’s AI mind project, the American Association for Artificial Intelligence, and similar sites.

But as electronic information handling becomes more efficient and more compact, it will inevitably be concentrated into more localised centres where that offers advantages. Some data, commercially or militarily sensitive, will never be shared, and some analyses are needed very rapidly as realtime feedback into control of volatile processes, which will increasingly suggest the use of integrated perceptive machines instead of human beings.

Honda has a number of humanoid ASIMO robots with sophisticated machine vision. While currently childlike in size and restricted to cute demonstration tasks, they demonstrate generic capabilities which could well develop into data capture and intervention platforms. Obvious first points of application would be hazardous industrial contexts requiring flexibly mobile sensory perception linked to manipulation; extension to other industrial, scientific, military or social rôles takes little imagination. Many of these small entities will, despite their need for local autonomy, be mobile avatars of networks and thence of the global structure, though some will be largely (or completely) self determining.

Knowledge has long been too large for any human individual to contain or contribute more than a small fraction of it; much of it will exist in machine entities without connection to any human individual at all. This is Big Brother being born, for better as well as for worse, and machine perception (vision and otherwise) is at its heart. We need to think ahead: how will we make use of it, rather than just fall into it?


How does it work?

Recognising any object involves a variety of mental tricks, but boils down to comparing features of the image formed with those of images previously stored. A face is one of the trickier recognitions, because it moves so much (particularly with expression), is seen from multiple viewpoints (as the head turns and tilts, or with camera angle) and changes with time. The human brain often picks on transient features such as facial hair, but designers of machine software aim for invariants such as distance between fixed physical points: eye centres, top of cheek bones, and so on.

Numerous software approaches deal with this variability, but Hidden Markov Models (HMMs) are important. A Markov chain (from mathematician Andrei Andreyevich Markov, 1856-1922) is a type of stochastic sequence modelling change in system states; an HMM attempts to do this when vital information is unknown and must be deduced. An early application, before machine vision, was to speech recognition and HMMs played an important part in the development of DNA sequencing.

Much current work concentrates on reducing the variability of surfaces to be recognised (the face, for instance). Michael Bronstein’s work with multidimensional scaling and photographic deconvolution at the Isra’el Technion(http://www.cs.technion.ac.il/~mbron/research_facerec.html) is an example. R&D work by the likes of MERL (Mitsubishi Electric Research Laboratories) is focussed on principal component analysis of multiple laser scanned images from the same face. Where the person being recognised coöperates (inn door security systems, for example) the subject can be asked to assume a neutral expression and look at the camera, eliminating many problems orientation and viewpoint. One approach, used for example by A4vision, is to project a grid and analyse the ways in which it is deformed by the contours of the face, thus avoiding much uncertainty inherent in images of the face itself.

No comments: