Studying Vehicle Accidents - the Location Analytics involved
Updated: May 7, 2022
I have very fond memories of playing the Hot and Cold game as a kid. Perhaps you do, too. Hiding a tiny object somewhere in the house and asking your siblings and friends to find it, within a stipulated time. Upon being asked, one has to give a verbal clue - 'Hot', 'Very Hot', 'Very Very Cold', 'Not Hot-Not Cold' - to inform the seeker the status of his/her 'closeness' to the hidden object - hot implying near and cold meaning distant. The joy of hiding the object securely or spotting it in quick time was immense, however, there were frequent disagreements if the seeker found the (subjective) call of closeness to be inaccurate or misleading!
We will use a similar Hot and Cold technique, albeit statistical in nature, to understand Vehicle Accident trends in a particular US county spatially and spatio-temporally on a modern mapping platform - Geographic Information System, commonly known as GIS.
Spatio-temporal method of analysis is a very powerful way of making sense of geo-data / location-data in that it adds a whole new dimension (space & time together) to data visualization. While we'll get to that in due course, you may want to refer to examples of the standard, standalone workflows from the following video links - spatial & temporal.
For our vehicle accident case analysis, we will use the Hot Spot Analysis tool & the Space Time Cube tool. I would recommend you to watch the two videos below to get a clear understanding of the methodology involved.
Hot Spot Analysis
Interesting, isn't it?
Applying Location Analytics on Vehicle Accidents Records
The topic of study has been explained in much detail. You may choose to see the video compilation below instead, if you are more keen to see clips of the technology at work rather than read more about the topic and the concept involved. (I'd recommend you to read the article first, followed by a video viewing if you'd like to explore further).
At first, we will load > 1 lakh vehicle accident records from 2010 to 2015 onto the mapping platform. Aside from the positional information i.e. the exact coordinates of the accidents, there are several attribute information about the accident available, such as - date and time of the accident, number of fatalities and injuries if any, whether the driver was under the influence of alcohol or was distracted at the time of the accident and the weather condition during the time of the accident.
These attribute information are standard in nature, one expects that such records are captured for every major urban centre by law enforcement agencies worldwide.
Because the data we possess contains positional information i.e. the coordinates, we can plot it on a modern mapping platform - Esri's ArcGIS (Geographic Information System).
Alongside this accident data, we have another crucial piece of information stored as a separate information layer - the digitized road network of the territory under study.
Again, this is a standard piece of information expected to be available with law enforcement agencies.
The next step involves restructuring the data into a Space-Time Cube. In simple words, just as we format plain data into a pivot table in Microsoft Excel to lend a more meaningful structure to the information, the GIS software arranges the multiple, complex data points into individual spatio-temporal buckets or Bins using the Space-Time Cube.
Each individual bucket of information (Bin) in the space-time cube will aggregate information pertaining to 2 miles of the territory (spatial) across 16 weeks i.e. 4 months of data (temporal).
Unlike Pivot Table in MS Excel, the space-time cube is not visualized in the software - rather, the output summary is available for us to review while the non-visual output file is stored in the system and used as an input in various location analytics workflows as we shall observe later.
The single paragraph above gives us a good summary about our geo-data and how it has been arranged in the mapping platform.
The space-time cube forms an integral part of our next workflow - the Emerging Hot Spot analysis.
Emerging Hot Spots do not represent the density of the incidents, rather it captures the 'trend' of the incidents in that spatial area over a period of time and categorizes it as per its statistical significance i.e. when X causes Y i.e. a non-random occurrence.
At first, we will deploy Emerging Hot Spot Analysis on just the 'Count' of Vehicle Accidents over 1 neighborhood time-step (technical note captured in the image below).
The output of the Emerging Hot Spot Analysis is depicted below. Read the map legend on the left to understand what the hexagon symbology means.
The Emerging Hot Spot output table below indicates to us that there are 2 new hot spots, 17 consecutive hot spots, 59 sporadic hot spots and 13 oscillating hot spots in our study area.
To know more about what each hexagon pattern means, you may read the infographic below -
The pattern most commonly found in our Emerging Hot Spot output is the Sporadic Hot Spot - means that the spatial bin under observation continually (& statistically) switches from being a hot spot to not being a hot spot to being a hot spot again.
One can safely presume that given the context of our topic, the New, Persistent & Intensifying hot spots are the ones which captures the immediate attention of the law enforcement agencies.
Some of you who may have closely observed the workflow would have noticed that as part of our Emerging Hot Spot Analysis, we did not factor in the Road Network layer available with us. Yes, that is true, the 2 mile spatial distance within each Bin is Euclidean i.e. based on straight line computation and does not factor the distance in Road Network terms.
Factoring in the Road Network would lead to more accurate analysis and improve our interpretation of it.
After all, if I were to ask you how much distance can you travel by car in 45-50 minutes, which representation would be more accurate - The Euclidean one on your left or the Road Network factored one on your right in the depiction below?
The depiction to your right is more accurate as it mimics real-world scenario. One can only travel as much in 45 minutes as the existing Road Network allows us to.
So how can we analyse Vehicle Accident spots factoring in the Road Network?
Before we proceed to do so, we need to pre-process our geodata first as there are certain anomalies present.
Observe from the image below that the location of some of the accident locations (red dots) do not fall directly on roads - rather, they are located outside the road boundaries. It could be so that the location recorded is where the vehicle landed after the accident and not where the accident occurred in reality. Or it could be a case of mistaken record-taking, faulty GPS calibration etc.
To correct this, we use the Snap tool wherein we command the mapping platform to link any accident data points within 0.25 miles of the road network to the nearest road.
This leads to a shift of the outlier accident spots.
The revised output (below) corrects the anomaly - now virtually all the accident spots are located within the Road Network...
... which therefore allows us to integrate the two layers - Accident Spots and Road Network seamlessly, by using the Spatial Join tool.
Now, the Accident geo-data appears to be properly structured and directly linked to the Road Network.
We are ready to do another Hot Spot Analysis now...
Or are we?...
Unfortunately no, Longer roads will have more accidents assigned to them and the hot spot output will be biased towards longer roads as a result. This isn't correct and will hamper the quality of our interpretation.
To standardize this implicit defect in the geo-data, we will compute the 'Crash Rate per mile, per year' first.
The Crash Rate is now decoupled from the length of the road. The new data column is added to the extreme right in the attribute table below.
Now we are ready to perform the Hot Spot Analysis. This time we'll not use the Emerging Hot Spot Analysis Tool, rather we will use the Hot Spot Analysis (Getis-Ord Gi*) Tool as we want our analysis to capture the spatial relationships within the road network as well.
To explain it simply, we want to assign weights not just based on the recorded location of the vehicle after the accident but also to the entire section of the road where the accident sequence would have played out (driver spotting a person / vehicle on the road ---> hitting the brakes ----> hitting the person ----> vehicle eventually halting).
The technical note reads - "To keep the crash hot spots local, the Impedance Distance Cutoff parameter was set to 360 feet (about the length of a football field), which is the minimum stopping sight distance for a vehicle traveling 45 mph."
In case you are interested, you may read detailed concept note here.
Now that we've run the Hot Spot Analysis (Getis-Ord Gi*) Tool, you may see a cross-section of the output below-
Now, the hot spots are aligned with the road network (do not appear as hexagons as they did previously), allowing for more meaningful interpretation.
Next, to deep dive further, we will analyze hot spots for specific variables beginning with analyzing only those vehicle accidents which led to fatalities. We will use the same workflow as above, just the geo-data is filtered to capture only those accidents which led to fatalities.
The Fatality hot spot output is, naturally, different from the All Accidents hot spot output.
The GIS platform allows us to compare both the outputs visually. See the All Accidents Hot Spot output (Left) vs Accidents involving Fatalities Hot Spot output (Right) comparison from the depiction below.
The comparison is very illuminating. Some hot spots have emerged at new locations in the image on the right which law enforcement has to play close attention to.
You would appreciate that running the hot spot analysis on a specific variable (fatality) brought to the fore certain areas of trouble which were diluted in the All Accidents hot spot and hence weren't visible there.
Even within the hot spot output of the All Accidents analysis, we are able to narrow down on the sections which are more Fatality-prone.
Obviously, several sections are not hot spots at all in the image on the right as there was no statistically significant relation to fatality there - perhaps these are less troublesome roads and can be given second priority by law enforcement agencies and policy makers.
Similarly, we will use the same workflow to compare All Accident Hot Spots (left) to Accident Hot Spots where the driver was under the influence of Alcohol (right) from the depiction below -
A clear representation of river-side partying?
I hope you can appreciate how powerful the spatio-temporal hot spot analysis can be to develop a deep understanding of the accident trends. The analytical output can be useful for a wide variety of stakeholders from law enforcement agencies and policy makers to vehicle manufacturers and general public.
Do note that the quality of the output is dependent on the quality of the geo-data.
I cannot emphasize it more, organizations especially in India should lay stress on capturing and improving the quantity and quality of the geodata they capture.
Our next sequence of analysis is to demonstrate the power of modern map based analytics where we can micro-analyze the accident geo-datasets at even greater depth and at much faster speeds.
So after computing specific variable-based Hot Spot Analysis, the next question you may ask is - during which hours of the day are the vehicle accidents peaking in and how do their hot spots look like / compare to the original All Accident hot spots?
Luckily, aside from doing map based analytics, modern GIS platforms are adept at doing chart and table based analytics just as we do on spreadsheet based platforms such as Microsoft Excel.
The GIS has created a line chart for us below. What trends can you observe ?
Do the trends become more evident / visible in the differently color-coded line chart below?
Hours 1500 - 1700 (3 pm - 5 pm) on Weekdays (Monday-Friday) are the peak times for vehicle accidents.
Using this discovery, we can fine-tune our hot spot analysis for this time filter and re-apply the same workflow. However, this time we will not proceed to do it one step at a time. Instead, we can replicate it by creating and executing a Geo-Processing Model as depicted below -
Such models are easy to configure and requires minimal coding. As you would gauge, having such geo-analysis models allow us to have faster and error free re-runs of validated geo-workflows.
Quite evidently, it saves enormous time and effort - location intelligence at the click of a button!
The depiction below compares All Accidents Hot Spot (left) to the output of the Accidents Hot Spot during Peak Hours of Weekdays (right) geo-model. The new output gives us new knowledge and indications about vehicle accident patterns.
The comparison gives us new knowledge and insights about vehicle accident patterns when their probability of occurrence is at its maximum.
Those who aren't yet genuinely impressed by the capability of modern map based location analytics are sure to be blown away by the one final analysis step we will execute. After all, what good is Location Analytics in 2D when Spatio-Temporal Analysis in 3D option is available at our disposal!
That being said, we can only use Spatio-Temporal 3D Hot Spot analysis tool, provided we have suitable questions to ask the Geographic Information System (GIS).
In the current context, we can run this tool if we wish to know the Trends of Vehicle Accidents during Peak Hours on Weekdays on a Year-on-Year basis.
To visualize the output, there are two geo-processing models which we would need to configure-
a) Yearly Hot Spot Maps for each year from 2010 - 2015 (6 years)
b) This output (6 yearly hot spot maps) flows as one of the inputs in our next model where we execute the Peak Hours during Week Days geo-processing model -
These models may appear complex, however, we have just 'codified' our existing workflows which I took you through initially. The complex part actually involves rendering the output in 3D and luckily, it is the GIS platform which has to do the geo-processing and not us!
3D Visualization of the Year-on-Year Vehicle Accidents during Peak Hours on Weekdays Model Output
Struggling to make sense of the red bars in the image above?
See the image above. The intersection on Prospect Avenue is a sporadic hot spot - the type most common as per our initial finding. The first year is right at the bottom and is statistically very significant (dark red) i.e. it represents a strong hot spot for vehicle accidents. In the second year, the hot spot vanishes. In the third year, the hot spot is statistically significant, yet weaker (light red) than the first year relatively. In the fourth year, the hot spot vanishes again only to return in full force (dark red) in the fifth year and vanish in the sixth year.
Hope you can interpret the 3D visualization now and also understand the importance of seeing a pattern in space as well as in time on a 3D platform.
Using the same information that I've mentioned, can you interpret the following? -
Thank you for reading! Hope you found the content to be appealing.
(Much thanks to Esri authors Lauren Scott Griffith & Lixin Huang for developing this concept in their tutorial).
Intelloc Mapping Services | Mapmyops is engaged in selling products which capture geodata (Drones & Drone Services), process geodata (Geographic Information System) and enhance geodata (Imagery & Location Analytics). Together, these help organizations to benefit from Geo-Intelligence for purposes such as operations improvement and project management.
Write to us on firstname.lastname@example.org.