Computer Vision: How Bots See The World Around Them

In my piece on neural network art, I looked at how bots generate images based on their existing ‘knowledge’ of shape and form. These computers are trained on large data sets of images, all classified and tagged so the machine can make sense of them. Google’s Deep Dream, for example, uses a set of ImageNet material with 120 dog categories, explaining why almost everything it hallucinates has some kind of dog, however subtle.

Projects like Deep Dream are more of an artistic side-project than a useful tool, but the tech it’s based on is a bridge towards computer programs being able to make sense of the world around them — whether that’s an image tagger for a search engine, or a robot with nuanced spacial awareness.

While Deep Dream worked in reverse by matching any image to multiple images it recognizes and then mashing the pixels together, other projects like the Computer Vision API from Microsoft work by recognizing patterns.

Generally, an API is a program built by another company that can be accessed by an application. For example, if I wanted to build a Twitter bot I’d use Twitter’s API to access the Twitter service. To make a bot that can recognize images, I’d write it to communicate with the Computer Vision API.

Applications built on top of APIs are able to send queries (like a source image) to another server and get a response (like that image’s tags) without needing to ‘know’ how the code on the other side works. An application connected to Microsoft’s Computer Vision API can make use of the API’s pre-knowledge of image classification for its own purposes.

This technology obviously has practical use, but, as with any new and publicly-available API, it has artistic and thought-provoking uses, too.

Computer Vision Sorting Daemon

As you can see from the above screenshot, the API tool generates tags and then turns these tags into a natural language sentence (highlighted).

The source image on the left is from David Rokeby’s Sorting Daemon, a 2003 installation built to protest the surveillance and categorization of the general public. The computer vision technology used was pre-Computer Vision API, and rudimentary by today’s standards (with it being only able to separate the street backdrop from passers-by), but it was a hint at how computers can be taught to classify visual information, and one of the early windows into how that classification can become more than just a dry, practical tool.

The breakthrough came from improvements in data processing. While Sorting Daemon has a few simple rules it follows to detect a person, a program that can hold and follow billions of rules has many more creative applications.

One such application, with philosophical overtones, is the How Bots See Art project from Chris Johnson. Drawing from The Met museum’s open library of artwork, the program pushes randomly selected artworks to the Computer Vision API, and then tweets the outcome. The result is naive, often humorous, and always stripped of the context that makes art culturally and historically important.

This stark reduction of art down to the bare facts shows why bots currently rely on human interactions to create art. Caedmon, for example, which we featured in a post and a podcast, takes input from humans on social networks to rate its abstract mash-ups of source material, and then feeds those ratings back into its creation algorithm.

Another bot that relies on computer vision (this time an API from Imagga) is Holiday. This German project (automatic English translation here) simulates the experiences of a traveling bot that sees the world through Google Street View.

Its output — occasionally grim, always mathematical — is a reflection on the reality of travel; wherever you go, the features are the same. Like How Bots See Art, it omits the magical, sentimental elements from things humans appreciate. The absence of human feeling emphasizes the extent to which people will go to apply a greater context to their personal experience of the world around them.

Holiday (and How Bots See Art) makes observations that are both too simplistic and too informed to reflect the real human experience of vision and analysis. In an essay accompanying Holiday’s exhibition, Julia Pelta Feldman writes about how the bot uniquely perceives its surroundings as a commentary on the alienating nature of travel, and the simplicity of computer vision.

Holiday also demonstrates that what is gratuitously obvious to me – the names of kitchen appliances, for example – may be a matter of some difficulty for someone without my knowledge and experiences. If you have been a stranger in a strange land, you know the feeling. But imagine how a computer must feel: it knows only what it is told, lacking Des Esseintes’s power of imaginative synthesis. Thus, in Holiday, the narrator’s observations veer from the superfluous to the spurious: whether the algorithm detects something that is obviously there, or apparently invents something that is not, it has missed the point of the photograph.”

The perceptions of machines — usually the reserve of science fiction — are finally put forward in human terms, and contextualized in a way that shows the depth of meaning we humans apply to the world around us. To some, Marcel Duchamp’s Fountain was a symbol of protest against traditional art, and a rallying cry to rip up the old rules and start again. To others — Computer Vision API included — it’s a toilet.

R Mutt Computer Vision

Space landscape-obsessed dreck penman. Appears on TechCrunch, The Next Web, and on Secret Cave in a far less restrained capacity.