Automated Object Detection from Imagery using Deep Learning algorithm on GIS

Arpit Shah
Nov 21, 2020
4 min read

Updated: Dec 8, 2025

SECTION HYPERLINKS

Extracting Building Footprint from Aerial/Satellite Optical Imagery
Extracting Swimming Pools from Aerial/Satellite Optical Imagery (With Video Demonstration)

In my previous post, I highlighted how Random Forest-based Machine Learning algorithms can be applied on geospatial datasets to map deforestation, classify agricultural land use, and predict voter turnout. The workflow was fairly consistent: feed the algorithm supervised training data, allow it to learn from the examples, and then let it sift through the remaining dataset using decision-trees logic to make fast and accurate predictions.

Another subset of Machine Learning—Deep Learning—pushes this capability several steps further. Deep Learning algorithms mimic the way our brain processes inputs and resolves choices: through Artificial Neural Networks.

What makes them so powerful is that their decision-making ability improves automatically as they interact with more data—without the need for explicit reprogramming.

Consider the recommendations made by YouTube or Spotify (two platforms with humongous repository of content). The impressive aspect is not just that they infer one’s preferences from historical behaviour; it is their ability to suggest entirely new content that is surprisingly appealing—often at precisely the right moment. This is Deep Learning at work (the two hyperlinks in this paragraph redirects to relevant literature).

Read more about the applications of Deep Learning here.

The video below is an excellent explainer of Neural Networks-

Video 1: How Artificial Neural Networks function on quantitative data and images Source: Esri's Spatial Data Science MOOC

Modern geospatial platforms include ready-to-use Deep Learning Models capable of processing imagery at scale, particularly for:

Object Detection — identifying all objects in a scene by drawing bounding boxes and labeling them
Semantic Segmentation — classifying each pixel in an image (e.g., buildings, vegetation, water, roads). In a subsequent entry, I have demonstrated the use of Deep Learning to classify Power Lines

Other Computer Vision variants such as Instance Segmentation, Panoptic Segmentation, and Image Classification - offer even more granular capabilities.

In this post, I will demonstrate Deep Learning–based Object Detection through two workflows:

Extracting Building Footprints
Extracting Swimming Pools

Workflow 1: Extracting Building Footprint from Aerial/Satellite Optical Imagery

Building Footprint layers outline the shape of all buildings within a geographic area. From these outlines, one can derive attributes such as height, roof area, or building density. These layers are essential inputs to workflows across Urban Planning, Property Insurance, Public Transit Expansion, Emergency Response and even Solar Power Assessments.

Figure 1: Geospatial Layer - Building Footprint of a neighborhood - utilized for the Rooftop Solar Power Potential study — Figure 1: Building Footprint layer used in the Rooftop Solar Power Potential study

Manually digitizing buildings—even semi-automatically—is slow, expensive, and error-prone. Deep Learning automates this entire process with high accuracy and speed, thanks to:

High-resolution imagery
Advances in computing hardware
GIS software that integrates Deep Learning models

Accuracy still depends on training data quality and processing parameters, but the results can be impressive.

Esri - the world's leading GIS software developer - has published a ready-to-use Deep Learning model for extracting Building Footprints. It was trained on labelled buildings from large volumes of high-resolution (10–40 cm) imagery across the USA. Naturally, the model performs best on US-like building structures, but it also generalizes reasonably well to developed regions with similar construction styles.

Below is an example from Esri’s StoryMap (image is hyperlinked to it):

Image from the hyperlinked ArcGIS StoryMap depicting Building Footprint at a location in Sweden. Extracted through the deployment of Esri's ready-to-use Deep Learning Model — Figure 2: Building Footprint extraction in Sweden using Esri’s Deep Learning Model

I tested the model on a 30 cm optical satellite image over the Barajas Airport region in Madrid, Spain (2009 imagery from European Space Imaging). After running the model with conservative parameters, here is the output:-

(Sliders best viewed on large screens)

Slider 1: Building Footprint extractions near Barajas Airport, Madrid

Slider 2: Building Footprint extractions near Barajas Airport, Madrid

A total of 114 buildings were detected and demarcated from the Imagery near Barajas Airport in Madrid, Spain by Esri's ready-to-use Deep Learning model. Dimensional attributes of these features were added by ArcGIS Pro GIS software — Figure 3: 114 detected buildings with dimensional attributes generated using ArcGIS Pro

While the output is fairly accurate, tightening the parameters would have improved the extraction quality further. Additional geoprocessing tools—such as Regularize Footprint—also help refine building outlines (refer related video demonstration from this post of mine).

As Esri incorporates more diverse training data and as users report false positives and false negatives, the Deep Learning model will continue to improve.

Workflow 2: Extracting Swimming Pools from Aerial/Satellite Optical Imagery

In this workflow, I extract swimming pools from high-resolution aerial imagery over Redlands, California (Credit: Esri Learn ArcGIS). The resulting layer is highly valuable for Tax Assessors, who routinely update property valuations based on improvements such as swimming pools.

Current practice largely relies on manual, infrequent surveys—which makes automated Object Detection a transformative capability.

The methodology mirrors the building-extraction workflow:

Provide supervised samples (labelled swimming pools)
Train the Deep Learning model
Validate model parameters
Run the model on the full geographic area

The video below demonstrates the process:

(Best viewed in HD on desktop or landscape mode on mobile)

Video 1: Extracting Swimming Pools from Aerial Imagery in Redlands, California using Esri’s Deep Learning Model

Isn’t the utility of Deep Learning remarkable? I hope to re-run this model in the future to show you how accuracy improves over time as the underlying neural network is refined.

Even at this relatively early stage, some Deep Learning systems already surpass human capability in certain domains. If you haven’t watched it, I highly recommend the documentary AlphaGo, which showcases Google DeepMind's AI defeating world champion Lee Sedol in the strategy game Go—an eye-opening example of what these algorithms can achieve.

ABOUT US - OPERATIONS MAPPING SOLUTIONS FOR ORGANIZATIONS

Intelloc Mapping Services, Kolkata | Mapmyops.com offers a suite of Mapping and Analytics solutions that seamlessly integrate with Operations Planning, Design, and Audit workflows. Our capabilities include — but are not limited to — Drone Services, Location Analytics & GIS Applications, Satellite Imagery Analytics, Supply Chain Network Design, Subsurface Mapping and Wastewater Treatment. Projects are executed pan-India, delivering actionable insights and operational efficiency across sectors.

My firm's services can be split into two categories - Geographic Mapping and Operations Mapping. Our range of offerings are listed in the infographic below-

Range of solutions that Intelloc Mapping Services (Mapmyops.com) offers

A majority of our Mapping for Operations-themed workflows (50+) can be accessed from this website's landing page. We respond well to documented queries/requirements. Demonstrations/PoC can be facilitated, on a paid-basis. Looking forward to being of service.

Regards,

Arpit Shah

Credits: Esri, European Space Imaging