One of GBDX’s most promising capabilities is its ability to identify and quantify a vast assortment of different objects visible in high-resolution satellite imagery. But GBDX’s strongest attribute—its ability to provide a very specific data set—sometimes creates an interesting dilemma: what’s the fastest, most cost-effective way to create a desired GBDX outcome?
We have seen the power and flexibility of artificial intelligence algorithms in the past, as when we successfully used a neural network architecture to identify properties with pools in Australia and remote villages in Nigeria. But training and deploying an effective model is expensive and slow, and while cloud-based computation is relatively cheap, the costs of feature detection do start to add up when you move from a relatively compact AOI to a regional or global scale. What are the alternatives?
Protogen (short for PROTOcol GENerator) is a geospatial image analysis and processing software suite developed within DigitalGlobe and available to GBDX subscribers. It uses state-of-the-art hierarchical image representation structures (called ‘trees’) to efficiently access, retrieve and organize image information content.
Here’s a real-world example of Protogen’s potential. Estimating oil reserves through analysis of high-resolution satellite imagery has become fashionable in geospatial analytics. Oil is typically stored in tanks with floating roofs. As the oil level (and therefore the lid) sinks, the shadow that’s cast on the inside of the tank (and is visible in Earth imagery) provides a good estimate of the fill level. A pretty neat idea.
But how are these oil tanks (regardless of fill level) detected in the first place? With sufficient training data, a neural network can probably learn to identify them—but as we’ve already established, this might not be the most efficient path. What are the other possibilities?
Oil tanks are distinctive. They’re round, they’re relatively big, and they look like bright disks when filled. Using the Protogen max-tree, we can extract oil tanks by simply selecting the max-tree nodes which satisfy certain size and compactness requirements. Here’s an example:
We’ve filtered a WorldView-3 panchromatic image chip from Houston, TX, to extract features with size between 100m2 and 3500m2, and compactness greater than 0.97 (1.00 being a perfect disk). For an image of this size, this filtering operation is instantaneous.
If we want to increase recall, we can decrease the minimum compactness. Here is another example if we set this value to 0.8:
We’ve picked up most of the tanks and a bit of noise. Not really a problem: we can use our crowd to weed out the false positives as we have successfully done in the past. You can imagine this workflow at scale: Protogen detects oil tank candidates on an entire strip then the crowd cleans up the results. Much faster than having the crowd scan the entire strip and much more accurate than doing it strictly with Protogen.
Protogen also includes a vectorization module that produces a geojson file with the bounding boxes of the detected oil tanks:
Having vectors makes it easy to count. According to Protogen, there are 133 oil tanks (give or take!) in this image segment.
GBDX makes it easy to run Protogen at scale. You can explore a full-resolution slippy map of oil tanks in Houston here. How about a different location? Want to find all the oil tanks in Cushing, OK?
The orderly spots to the north and south of the image center correspond to oil tanks, while the randomly scattered spots are noise. Check out this close-up:
You can find the full story here. We are currently working on improving the accuracy of our oil tank detector by using Protogen’s Land Use Land Cover classification method on the multispectral image in order to filter out false detections on soil and water, as well as combining Protogen with Machine Learning. Stay tuned for updates!