As we’ve seen from many Twitter bots, software is more than capable of creating captivating art.
But art created by neural networks moves past basic random patterns, using millions of source images, and AIs so advanced they can create surreal landscapes from scratch, paint portraits of dog-men and model alien volcanoes on planets we can’t closely observe.
In this post, I’m going to explain how neural networks — software designed to emulate the human brain — generate images, and speculate what that could mean for the future of art and entertainment.
A quick history of machine vision and software art
Software art has its roots in applications that weren’t designed for art at all, but to further research in the field of robotics and AI.
To explain how computers generate art, it’s useful to know how they perceive and understand images they’re given. Research into artificial intelligence and image processing started in the ‘60s at MIT.
An excerpt from a 1966 paper by Seymour Papert “ The Summer Vision Project ”:
What Papert was trying to accomplish in one summer (by simply connecting a camera to a computer, of course!) is still being perfected half a century later.
Even though Papert’s summer project didn’t do what he intended, he did lay the foundation for researchers in the 1970s to create the basis for algorithms still used today.
With the goal of giving a robot the power to understand the 3-dimensional world by making sense of what it sees, researchers set to work on creating ways for computers to process ‘hierarchical data structures’. For example, Hanen Samet’s 1980 paper on the topic was written to help robots walk through a multi-level area, like (a) below, instead of bumping into edges or, more likely, falling down the stairs.
Throughout the ‘80s, computers learned to detect the edges in images to infer 3D shapes, and understand shading and depth.
A decade later, the first research emerged that stopped focusing on understanding images, and started to generate them instead.
An early example is Oxford professor Paul Beardsley’s attempt to create a 3D model of a house first recorded with a shaky camcorder to emulate the movements a robot would make in practice:
From the recording, the software generated an outline of the building and rendered it in 3D:
From there, the rising popularity of the internet, production of computers with higher processing power, and the ability to share large datasets of source material blew the world of computer vision wide open. With that, came the generative art we know today, based on concepts and algorithms from the 60s and 70s, but finally with enough research behind them to generate complex images.
In almost all cases throughout history, generative art wasn’t the goal — just a side-effect, and something teams were given the freedom to play around with at companies like Google, who created DeepDream in 2015.
DeepDream, and how it works
In 2015, Google released an image processing program called DeepDream. It’s built on the same architecture as their facial recognition software, but it doesn’t detect patterns in the same way. Instead, it abstracts and distorts the image by training the neural network to not look for existing objects, but to look for patterns organize the pixels in such a way that it turns those patterns into objects.
For facial recognition, the software was given a huge archive of faces so it can learn what a face looks like, and later identify faces in images it’s never seen before.
For the DeepDream project, the software was turned on its head: the input image could be anything a user uploads, and the software selects the training image itself based on patterns it recognizes. The AI picks out sections of the input image that resemble what it knows, and warps the pixels to match, over and over.
Alexander Mordvintsev, a Google software engineer explained simply:
And, since DeepDream could generate images from pure noise, it only makes sense that it could algorithmically generate totally original work by stitching together a surreal collage of material:
Images like the one above are made by setting DeepDream to iterate on itself over and over again, each time zooming, warping, and overlaying images. It uses MIT’s Places library — an archive of over 2.5m scenes developed to make it easier to build software that relies on understanding the physical world.
Why neural networks generate dogs, flowers, and birds
Right now, neural networks are trained using ImageNet, a data set of 1,000 image classes, including poisonous plants, dogs, volcanoes. Also included in the set are some particularly odd categories: ‘souls’, ‘defecators’, and ‘dead people’. I do wonder what future neural networks are going to do with that information…
While DeepDream draws from a huge pool of images, some researchers have taken to using specific categories from ImageNet to model galaxies and alien geology we don’t have the technology to see ourselves.
An AI to model volcanoes on alien planets
While there was (and still is) a great deal of interest about DeepDream’s ability to make art and what that means for us organic lifeforms, recent research from Jeff Clune shows deep learning can be used to predict and model the geology of distant worlds.
These volcanoes were generated from scratch by a pair of neural networks. One generates the images, the other approves/rejects. The first bot iterates and improves until the image passes:
You can see the iteration process in action in the video below. Each time the software is given a new word, it works to warp the previous image to match the word:
Neural networks, trained on nightmare imagery
Going beyond the confines of ImageNet’s (mostly) tame source material, MIT has built a nightmare machine trained with disturbing imagery:
This is one of the first looks into what happens when you step outside of the bounds of dogs, trees, and ferns. A hint at the future of art generated by neural networks?
The future of generative art?
At the moment, DeepDream isn’t a particularly revolutionary ‘artist’.
While it can create some amazing things, a lot of it does just end up looking like some LSD-fueled album cover from a 70s band without enough imagination to make a name for themselves.
What makes it interesting is the way it’s built, how its code could be adapted in the future, and what artists and researchers can learn from it.
Google and MIT are making leaps in the field of neural network art; easy for static pieces, but with the potential to be revolutionary as VR progresses and as video games are built with more and more sophisticated procedural generation.
What we’re seeing right now is the basis for something that could change the way all media is created. In the future, it could be possible to use neural networks to generate TV shows, entire universes (hopefully ones that are actually interesting), and virtual environments.
Right now though, I guess you can just look at a portrait where the pixels have been jiggled around so it looks like the subject is a dog made up of hundreds of other dogs.