top of page
  • Writer's pictureArpit Shah

Detecting Objects from Imagery using Deep Learning

Updated: Aug 22, 2021

Case 1: Detecting Building Footprint

A Building Footprint, as the name suggests, is an imprint of a constructed object on a surface. As with any object's imprint, one can determine the object's outline, features as well as area occupied from it. A building footprint data layer (comprising of multiple building footprints across a geographic extent) acts as a foundation for several workflows - urban planning, insurance, transportation, security etc. In my previous blog post and video, we witnessed the workflow to calculate rooftop solar potential in a locality using building footprint data. Essentially, if one knows how a building looks like from an aerial view, one can use Geoprocessing tools to identify how much sunlight the roof will receive over a duration of time.

Example: Building Footprints in a Locality (Source)

Digitizing building footprints manually is a very tedious process and susceptible to errors. Imagine plotting the outline of a hundred buildings in your neighborhood manually on a computer! Thanks to advances in technology (higher resolution imagery, increase in computing / processing speed, object and pattern detection algorithms) performing this task is now more convenient, much faster and can be stunningly accurate for those who have the wherewithal.

Modern Mapping technology now integrates ready-to-use deep learning models to aid multiple workflows.


What is deep learning? It is a subset of Machine Learning - algorithms which improve automatically through experience. Recollect how, in my previous blog post, Random Forest (ML) algorithm was used to identify deforestation, mapping crop types and predicting voter turnout by feeding 'training data' to the model, basis which it classified and predicted the output with a high level of accuracy.

Deep Learning (ML) algorithm uses a similar method that our brain uses (neural networks) to arrive at a decision when faced with choices. Amazon, Google and Netflix use Deep Learning to show you recommended content basis the training data you provide i.e. your historical searches and set preferences. How often is the next song in your playlist, as recommended by a Spotify or a Youtube, so close to what you'd love to listen at that very moment!

Source: Esri's Spatial Data Science MOOC

This article outlines 10+ fascinating applications of Deep Learning. In spatial applications, particularly imagery, Deep Learning is used for Object Detection, Instance Segmentation (identifying boundary of the object detected), Image Classification (using predefined rules to identify whether an object is X or Y or Z) and Pixel Classification (Semantic Segmentation - identifying whether a pixel is from a desert, an ocean, forested area and so on) purposes.


As with such Machine Learning algorithms, the more training data one provides, the more accurate the output tends to become. Deep

One of the two topics covered in this blog is a ready-to-use deep learning model to extract building footprints (i.e. Object Detection) from a spatial dataset (satellite imagery). The model was trained on large quantities of U.S. imagery datasets (30-60 cm resolution). Naturally, the model works best for building footprint detection and extraction in U.S., however, it claims to work reasonably well for other locations too.

Clicking on the image below will lead you to an engaging storymap on this topic where you can explore the DL model results for yourself.

Esri's ArcGIS Storymap containing samples of building footprints extracted. Image shows footprints extracted from Swedish imagery.


I would have preferred to try using this Deep Learning model to extract footprints from an urban region in India, however, I could only manage to access a 30 cm resolution imagery sample over Madrid's Barajas Airport region in 2009 (from European Space Imaging) at the time of writing this post and hence used it to test the model.

Upon running the model at minimum settings, the output is as below -

(The sliders below are best viewed on a PC.)

Output 1: Building Footprint Extraction using Esri's ready to use Deep Learning model

Output 2: Building Footprint Extraction using Esri's ready to use Deep Learning model


The outputs are appealing on two counts - Firstly, without me specifying what or how buildings look like in the European image under study, the model was able to use its internal training knowledge about building types in US and proceeded to detect similar looking objects here, differentiating it from the rest of the imagery by masking it out.

Total 114 buildings were detected in output 2. Building Area is also captured.


It would be fair to assume that additional footprints would have been detected had I opted for more stringent parameters. Also, I didn't have access to the 'regularize footprint' geoprocessing tool which would have lent more precision to the building outlines (Instance Segmentation).

That being said, from the model perspective, as the researchers feed more training data and give feedback pertaining to the output generated (in terms of false positives and false negatives) to the Deep Learning model, it should evolve and become consistently better at what it does.


Case 2: Detecting Swimming Pools

The second case which we will cover in this article is fairly simple - we want to detect the number of newly constructed swimming pools in the city of Redlands, California. This information will be useful for tax assessors (swimming pools within a property enhances its value and the resulting property taxes) who generally rely on 'infrequent' survey data for these calculations.

The methodology is largely similar to detecting building footprint - we identify training samples, load it in the model and validate it, and finally use the DL model to detect new swimming pools. Video of the workflow is as below.

(The video is best viewed in Youtube's HD setting either on PC or on Landscape mode in Mobile Phone. You can reduce / increase the play speed as per your viewing preferences.)

Workflow for detecting Swimming Pools in the city of Redlands, California using Esri's Deep Learning model

Isn't this fascinating?


'How strong can a Deep Learning Model get?', one may wonder.

A: Enough to beat the best humans in business and go even further. I'd encourage you to watch this brilliant (yet scary) documentary - AlphaGo.

Intelloc Mapping Services | is engaged in selling products which capture geo-data (Drones), process geo-data (Geographic Information System) as well as services (PoI Datasets & Satellite Imagery). Together, these help organizations to benefit from Geo-Intelligence for purposes such as operations improvement, project management and digital enabled growth.

Write to us on Download our one-page profile here. Request a demo.



Much Thanks to Esri & European Space Imaging for the training material.

774 views0 comments

Recent Posts

See All
bottom of page