There are large regions of the planet which (although inhabited) remain unmapped to this day. DigitalGlobe has launched crowdsourcing campaigns to detect remote population centers in Ethiopia, Sudan and Swaziland in support of NGO vaccination and aid distribution initiatives.This is one of several current initiatives to fill in the gaps in the global map so first responders can provide relief to vulnerable, yet inaccessible, people.
Crowdsourcing the detection of villages is accurate but slow. Human eyes can easily detect buildings, but it takes them a while to cover large swaths of land. In the past, we have combined crowdsourcing with deep learning on GBDX to detect and classify objects at scale. This is the approach: collect training samples from the crowd, train a neural network to identify the object of interest, then deploy the trained model on large areas.
In the context of a recent large-scale population mapping campaign, we were faced with the usual question. Find buildings with the crowd, or train a machine to do it? This led to another question: can the convolutional neural network (CNN) that we trained to find swimming pools in Adelaide be trained to detect buildings in Nigeria?
To answer this question, we chose an area of interest in northeastern Nigeria, on the border with Niger and Cameroon. DigitalGlobe’s image library furnished the required content: nine WorldView-2 and two GeoEye-1 image strips collected between January 2015 and May 2016.
We selected four WorldView-2 strips, divided them into square chips of 115 m per side (250 pixels at sensor resolution) and asked our crowd to label them as ‘Buildings’ or ‘No Buildings’. In this manner, we obtained labeled data to train the neural network.
The trained model was then deployed on the remainder of the strips. This involved dividing each image into chips of the same size as those that we trained on, then having the model classify each individual chip as ‘Buildings’ or ‘No Buildings’.
The result: a file which contains all the chips classified as ‘Buildings’ or ‘No Buildings’, along with a confidence score on each classification.
Here are sample classifications of the model:
The intensity of green is proportional to the confidence of the model in the presence of a building. It is apparent that confidence increases with building density. The model is doing its job!
What is the neural network actually learning? Below are examples of hidden layer outputs produced during classification of a chip that contains buildings. Note that as the chip is processed by successive layers, the locations of buildings become more and more illuminated, leading to a high confidence decision that the chip contains buildings.
Here is a bigger sample of the results. A quick check on Google maps shows that most of these villages are not on the map.
So to answer our original question: yes, the same neural network architecture used successfully to detect swimming pools in a suburban environment in Australia can be used to detect buildings in the Nigerian desert. The trained model can classify approximately 200000 chip (a little over 3000 km2) on a GPU-equipped Amazon instance. GBDX allows the parallel deployment of the model over an arbitrary number of strips — making continental-scale mapping of population centers a reality.