Saturday, September 13, 2014

Crime Different as Night and Day

Dallas Open Data has among others data set of Dallas Police reports with narratives from October 2013 to May 2014 totaling over half a million reports. By focusing on text fields with crime description and narrative from these reports I will plot several (not completely unexpected) text-based visualizations.

First approach associates each police report with one of logical text corpora to display wordclouds of most frequent terms. Rules defining text corpora may be based on time, location, gender, race, etc. To illustrate I define 4 logical documents - Night, Morning, Day, and Evening - by dividing 24-hour day into 4 segments: Night (reports with time of offense from midnight to 6 a.m.), Morning (time of offense from 6 a.m. to noon), Day (time of offense from noon to 6 p.m.), and Evening (time of offense from 6 p.m. to midnight). This results in 4 clouds displayed in pairs of opposing times: Night vs. Day and then Morning vs. Evening:

At night we are looking at ASSAULT, VEHICLE, and FLED, while day is about SUSPECT, THEFT, and NFI (No F...ing Idea). Not sure what CAUSING and CONSENT mean on top of the Night, but PAIN there makes sense. Evening and Morning seem to differ the most on ASSAULT and BURGLARY.

Another way to look at the same data is with slopegraph (example): terms (rows) moving up or down according to their frequency across 4 logical document (columns), but I am leaving this for my next post.

Unstructured data as text can still utilize types of plots usually associated with structured types. Bar graphs (histograms) by hour of day for reports with certain term is one way to achieve this. To illustrate let's count police reports with term BURGLARY for each hour (from 0 to 23) across all data (roughly 6 months). This results in the following plot:


Not exactly surprising but it offers several conclusions, for example: most burglar's work day starts at 9 in the morning until 10 in the eveining. They also have lunch break at about 3 p.m or possibly it's police officers who do. Finally, high peak at 5 is something everyone should be aware of unless, again, it's police officers who skew the time of offense in reports towards the end of their shift.

Different trend can be observed for reports containing BMV (Burglary Motor Vehicle):

At first, it's rather surprising to see the peak in the morning hours, since we expect this type of crime happening at night. But, likely, victims find out about this type of trouble only in the morning and police has no better time to report offense.

Last graph is for term ASSAULT:

This crime obviously gets reported by victim with good knowledge of when it took place. ASSAULT crime reports peak between 9 in the evening and 1 in the morning significantly cooling off by morning hours. I would like to see promoting of good night sleep as a crime prevention program some day.