Ever see an awesome photo and wonder where it was exactly taken? Sure, we can usually figure out the general area it was taken based on languages or type of architecture. We can even figure out exactly where it was taken if it’s an obvious landmark such as the Eiffel Tower or Grand Canyon. But what about the photos that don’t have any landmarks or other indicators? Google’s answer to this problem is artificial intelligence.
Tobias Weyand, a computer vision specialist at Google, along with Ilya Kostrikov and James Philbin trained an AI (dubbed PlaNet) to figure out the most likely location of almost any photo by using just the pixels.
How they did it
First, Weyand and company divided the world into a grid consisting of thousands of squares. 26,263 to be exact. The size of the squares depends on the number of images taken at that location. Tourist hotspots obviously have more squares than remote areas. Plus, the team ignored areas where few photographs have been taken such as the oceans.
Here is how PlaNet’s grid breaks down:
You can also see how the deep-learning machine identifies the location for different images. For the Eiffel Tower, it pinpoints the location with certainty. But the other two images it assigns certain levels of probability. In the beach image, it assigns the highest probability to southern California (which is correct) with lower levels of probability in beaches in the Mediterranean and Mexico.
Besides the grid, the researchers created a database of geolocated images from the Internet and used the location data to determine the grid square for each image. We’re talking 126 million images and their location data.
Here’s a better look at the distribution of the grid squares.
After that, the researchers used 91 million of the 126 million images to teach a neural network to find the grid location using just the pixels in the image. “We applied very little filtering, only excluding images that are non-photos (like diagrams, clip-art, etc.) and porn. Our dataset is therefore extremely noisy, including indoor photos, portraits, photos of pets, food, products and other photos not indicative of location,” the researchers write.
How does PlaNet stack up against humans?
PlaNet faced off against 10 people who have travelled the world in a game of GeoGuessr. You can head on over to Geoguessr.com to give it a shot. It gives you a random street view panorama, and you place a marker on the map where you think the image was taken. Good luck. My first guess was off by more than 2,000 kilometers.
“In total, PlaNet won 28 of the 50 rounds with a median localization error of 1131.7 km, while the median human localization error was 2320.75 km,” the study reads. Here’s an example of five images from GeoGuessr.
PlaNet is far from perfect, but it’s already better than humans and other geolocation methods. “Our experiments show that PlaNet far outperforms other methods for geolocation of generic photos and even reaches superhuman performance,” says the study.
Read the full study here if you’re interested in learning more about PlaNet.