I have fond memories of playing the Hot & Cold game as a kid. Perhaps you do, too. Hiding a tiny object somewhere in the house and asking your siblings and friends to find it, within a stipulated time. Upon being asked, one has to give a verbal clue - 'Hot', 'Very Hot', 'Very Cold', 'Not Hot-Not Cold' and other iterations to inform the seeker the status of his/her 'proximity' to the hidden object - hot implying near and cold implying distant. The joy of hiding the object securely or spotting it in quick time was immense. Funnily enough, there were frequent disagreements if the seeker found the (subjective) call of proximity to be inaccurate or misleading!
We will use a similar Hot and Cold technique, albeit statistical in nature, to understand Vehicle Accident trends in a particular US county spatially and spatiotemporally on a modern mapping platform - Geographic Information System, commonly known as GIS.
Spatiotemporal method of analysis is a very powerful way of making sense of geodata in that it adds a whole new dimension (time & space together) to data visualization. While we'll get to that in due course of this article, you may want to refer to examples of unidimensional workflows from the following video links - spatial & temporal.
For our vehicle accidents case purposes, we will use the Hot Spot Analysis tool & the Space-Time Cube tool. I would recommend you to watch the two videos below to get a clear understanding of the methodology involved.
Video 1: Hot Spot Analysis in GIS
Video 2: Space-Time Cube in GIS
Interesting, isn't it?
Applying Location Analytics to Vehicle Accidents Records
I have attempted to explain this topic in much detail. You may choose to see the video compilation below if you are more keen to see clips of the technology at work rather than read more about the concept involved. (I'd recommend you to read the article first, followed by a video viewing if you'd like to explore further).
Video 3: Walkthrough on deploying Location Analytics on Vehicle Accidents Records
Let's begin.
At first, we will load > 1 lakh vehicle accident records from 2010 to 2015 onto the mapping platform. Aside from the positional information i.e. the exact coordinates of the accidents, there are several attribute information about the accidents available, such as - date and time of the accident, number of fatalities and injuries if any, whether the driver was under the influence of alcohol or was distracted at the time of the accident and the weather condition during the time of the accident.
These attribute information appear very standard in nature, one expects that such records are captured at every major urban centre by the local law enforcement agencies.
Because the data we possess contains positional information i.e. the coordinates, we can plot it on a mapping platform - Esri's ArcGIS (Geographic Information System).
Alongside this accident data, we have another crucial piece of information stored as a separate information layer - the digitized road network of the Area of Interest (AoI).
Again, this is a standard piece of information expected to be available with law enforcement agencies worldwide.
The next step involves restructuring the data into a Space-Time Cube. In simple words, just as we format plain data into a pivot table in Microsoft Excel to lend a more meaningful structure to information, the GIS software arranges the multiple, complex data points into individual spatiotemporal buckets or Bins using the Space-Time Cube.
Each individual bucket of information (Bin) in the space-time cube will aggregate information pertaining to 2 miles of the territory (spatial) across 16 weeks i.e. 4 months of data (temporal).
Unlike Pivot Table in MS Excel, the space-time cube is not visualized in the software - rather, the output summary is available for us to review while the output file is stored in the system and used as an input in the subsequent workflow as we shall observe later on.
The single paragraph above gives us a good summary about our geodata and how it has been arranged spatiotemporally in the mapping platform.
The space-time cube forms an integral part of our next workflow - the Emerging Hot Spot analysis.
Emerging Hot Spots do not represent the density of the accidents, rather it captures the 'trend' of the accidents in that spatial area over a period of time and categorizes it as per its statistical significance i.e. when X causes Y
At first, we will deploy Emerging Hot Spot Analysis on just the 'Count' of Vehicle Accidents over a single neighborhood time-step (technical note captured in the image below).
The output of the Emerging Hot Spot Analysis is depicted below. Read the map legend on the left to understand the hexagon symbology-
The Emerging Hot Spot output table below indicates to us that there are 2 new hot spots, 17 consecutive hot spots, 59 sporadic hot spots and 13 oscillating hot spots in our study area.
To know more about what each hexagon pattern means, you may read the infographic below -
The pattern most commonly found in our Emerging Hot Spot output is the Sporadic Hot Spot - means that the spatial bin under observation continually (& statistically) switches from being a hot spot to not being a hot spot to being a hot spot again.
One can presume that, given the context of our topic, the New, Persistent & Intensifying hot spots are the ones which would capture the immediate attention of law enforcement agencies.
Some of you would have noticed that as part of our Emerging Hot Spot Analysis, we did not factor in the Road Network layer available with us. Yes, that is true: the 2 mile spatial distance within each Bin is Euclidean i.e. based on straight line computation and does not factor the distance in existing Road Network terms. Factoring in the Road Network would lead to more accurate analysis and improve our interpretation of it.
After all, if I were to ask you how much distance can you travel by car in 45-50 minutes, which representation would be more accurate - The Euclidean one on your left or the Road Network factored one on your right in the depiction below?
The depiction to your right is more accurate as it mimics the real-world scenario more closely. One can only travel as much in 45 minutes as the existing Road Network allows us to.
So how can we analyze Vehicle Accident spots factoring in the Road Network?
Before we proceed to do so, we need to pre-process our geodata first as there are certain anomalies present. Observe from the image below that the location of some of the accident locations (red dots) do not fall directly on roads - rather, they are located outside the road boundaries. It could be so that the location recorded is where the vehicle landed after the accident and not where the accident occurred in reality. Or it could be a case of mistaken record-taking, faulty GPS calibration etc.
To correct this, we use the Snap tool in GIS wherein we command the mapping platform to link any accident data points within 0.25 miles of the road network to the nearest road.
This leads to a shift of the outlier accident spots to within the road boundary.
The revised output (below) corrects the anomaly - now virtually all the accident spots are located within the Road Network...
… which therefore allows us to integrate the two layers - Accident Spots and Road Network seamlessly, by using the Spatial Join tool.
Now, the Accident geo-data appears to be properly structured and directly linked to the Road Network.
We are ready to do another Hot Spot Analysis now...
Or are we?...
Unfortunately no, Longer roads will have more accidents assigned to them and the hot spot output will be biased towards longer roads as a result. This isn't correct and will hamper the quality of our interpretation.
To standardize this implicit defect in the geo-data, we will compute the 'Crash Rate per mile, per year', first.
The Crash Rate is now decoupled from the length of the road. The newly computed data column Crash_Rate is added to the extreme right in the attribute table below.
Now we are ready to perform the Hot Spot Analysis. This time we'll not use the Emerging Hot Spot Analysis Tool, rather we will use the Hot Spot Analysis (Getis-Ord Gi*) Tool as we want our analysis to capture the spatial relationships within the road network as well.
To explain it simply, we want to assign weights not just based on the recorded location of the vehicle after the accident but also to the entire section of the road where the accident sequence would have played out (driver spotting a person / vehicle on the road ---> hitting the brakes ----> hitting the person / another vehicle ----> vehicle eventually halting).
The technical note for this tool reads - "To keep the crash hot spots local, the Impedance Distance Cutoff parameter was set to 360 feet (about the length of a football field), which is the minimum stopping sight distance for a vehicle traveling 45 mph."
In case you are interested, you may read detailed concept note here.
Now that we've run the Hot Spot Analysis (Getis-Ord Gi*) Tool, you may see a cross-section of its output below-
Now, the hot spots are aligned with the road network (do not appear as hexagons as they did previously), allowing for more meaningful interpretation.
Next, we'll deep dive further and analyze hot spots for specific variables beginning with only those vehicle accidents which led to fatalities. We will use the same workflow as above, just the geodata is filtered to capture only those accidents which led to fatalities.
The Fatality hot spot output is naturally different from the All Accidents hot spot output.
The GIS platform allows us to compare both the outputs visually. See the All Accidents Hot Spot output (Left) v/s Accidents involving Fatalities Hot Spot output (Right) comparison from the depiction below-
This comparison is very illuminating. Some hot spots have emerged at new locations in the image on the right which law enforcement has to play close attention to. You would appreciate that running the hot spot analysis on a specific variable (fatality) brought to the fore certain areas of trouble which were diluted in the All Accidents hot spot and hence weren't visible there. Even within the hot spot output of the All Accidents analysis, we are able to narrow down on the sections which are more Fatality-prone. Obviously, several sections are not hot spots at all in the image on the right as there was no statistically significant relation to fatality there - perhaps these are less troublesome roads and can be given second priority by law enforcement agencies and policy makers.
Similarly, we will use the same workflow to compare All Accident Hot Spots (left) to Accident Hot Spots where the driver was under the influence of Alcohol (right) from the depiction below -
A clear indication of river-side partying?
I hope you can appreciate how powerful the spatiotemporal hot spot analysis can be to develop a deep understanding of the accident trends. The analytical output can be useful for a wide variety of stakeholders from law enforcement agencies and policy makers to vehicle manufacturers and general public.
Do note that the quality of the output is dependent on the quality of the geodata captured.
I cannot emphasize it more, organizations especially in India should lay stress on capturing and improving the quantity and quality of the geodata they capture.
Our next sequence of analysis is to demonstrate the power of modern map-based analytics where we can micro-analyze the accident geo-datasets at even greater depth and at much faster speeds.
So after computing specific variable-based Hot Spot Analysis, the next question you may ask is - during which hours of the day are the vehicle accidents peaking in and how do their hot spots look like / compare to the original All Accident hot spots?
Luckily, aside from doing map based analytics, modern GIS platforms are adept at doing chart and table based analytics just as we do on spreadsheet based platforms such as Microsoft Excel.
The GIS has created a line chart for us below. What trends can you observe ?
Do the trends become more evident / visible in the differently color-coded line chart below?
Hours 1500 - 1700 (3 pm - 5 pm) on Weekdays (Monday-Friday) are the peak times for vehicle accidents.
Using this discovery, we can fine-tune our hot spot analysis for this time filter and re-apply the same workflow. However, this time we will not proceed to do it one-step-at-a-time. Instead, we can replicate it by creating and executing a Geo-Processing Model as depicted below-
Such models are easy to configure and requires minimal coding. As you would gauge, having such geo-analysis models allow us to have faster and error free re-runs of validated geo-workflows. It saves enormous time and effort - location intelligence at the click of a button!
The depiction below compares All Accidents Hot Spot (left) to the output of the Accidents Hot Spot during Peak Hours of Weekdays (right) geo-model. The new output gives us fresh insights about vehicle accident patterns and when their probability of occurrence is at its maximum.
Those who aren't yet genuinely impressed by the capability of modern map-based location intelligence platforms are sure to be blown away by the final analysis step I'll demonstrate up next. After all, what good is Location Analytics in 2D when the option to do Spatiotemporal Analysis in 3D is available to us!
That being said, we can only use Spatiotemporal 3D Hot Spot analysis tool, provided we have appropriate questions to ask the Geographic Information System (GIS). In the current context, we can run this tool if we wish to know the Trends of Vehicle Accidents during Peak Hours on Weekdays on a Year-on-Year basis. To visualize the output, there are two geo-processing models which we would need to configure-
a) Yearly Hot Spot Maps for each year from 2010 - 2015 (6 years)
b) This output (6 yearly hot spot maps) flows as one of the inputs in our next model where we execute the Peak Hours during Week Days geo-processing model -
These models may appear complex, however, we have just 'codified' our existing workflows which I took you through early on in this article. The only complex part actually is involving the rendering of the output in 3D and luckily, it is the GIS platform which has to do this bit of geoprocessing and not us!
3D Visualization of the Year-on-Year Vehicle Accidents during Peak Hours on Weekdays Model Output
Struggling to make sense of the red bars in the image above?
See the image above. At the intersection on Prospect Avenue lies the depiction of a sporadic hot spot - the type most common as per our hot spot analysis. The first year is right at the bottom and is statistically very significant (dark red) i.e. it represents a strong hot spot for vehicle accidents. In the second year, the hot spot vanishes. In the third year, the hot spot is statistically significant, yet weaker (light red) than the first year. In the fourth year, the hot spot vanishes again only to return in full force (dark red) in the fifth year and vanish in the sixth year.
Hope you can interpret the 3D visualization now and also understand the importance of seeing a trend in a spatiotemporal dimension on a 3D platform.
Using the same information that I've mentioned, can you interpret the following? -
And this?
And this?
Thank you for reading! Hope you found this content to be appealing.
(Much thanks to Esri authors Lauren Scott Griffith & Lixin Huang for developing this concept in their tutorial).
ABOUT US
Intelloc Mapping Services | Mapmyops.com is based in Kolkata, India and engages in providing Mapping solutions that can be integrated with Operations Planning, Design and Audit workflows. These include but are not limited to - Drone Services, Subsurface Mapping Services, Location Analytics & App Development, Supply Chain Services & Remote Sensing Services. The services can be rendered pan-India, some even globally, and will aid an organization to meet its stated objectives especially pertaining to Operational Excellence, Cost Reduction, Sustainability and Growth.
Broadly, our area of expertise can be split into two categories - Geographic Mapping and Operations Mapping. The Infographic below highlights our capabilities.
Our 'Mapping for Operations'-themed workflow demonstrations can be accessed from the firm's Website / YouTube Channel and an overview can be obtained from this flyer. Happy to address queries and respond to documented requirements. Custom Demonstration, Training & Trials are facilitated only on a paid-basis. Looking forward to being of service.
Regards,