Crime and Policing, Part 1:: Motivation, Use Case and Dataset Summary

Many police departments in the United Kingdom have turned to data science to translate data into actionable insights. However, the efforts that go into designing an effective predictive policing strategy involve many challenges relating to data; gathering, processing, cleaning, manipulating, and visualising data so that predictive engines are simple to understand and at the same time accurate enough to be useful.

Web application I want to present today is a simple graphical overview of crime and police forces’ performance data that enables direct comparison of different regions and areas in a “compact” and easy way. You can view the running app here. It allows users to create and customise graphical data summaries (i.e. boxplots, heatmaps) and spatial visualisation in a few clicks without prior knowledge of R; statistical programming language used in this project.

One more thing I would like to mention here is that it is a self-learning project that I am doing for fun and in my spare time.


1. Data

Police recorded crime figures are an important indicator of police workload. They can be used for local crime pattern analysis and provide a good measure of trends in well-reported and well-recorded crimes. There are some categories of crime where the volume of offences recorded are heavily influenced by police activities and priorities; in such cases recorded crime figures may indicate police activity in this area rather than levels of criminality.

Recorded crime figures do not include crimes that have not been reported to the police or incidents that the police decide not to record. It was estimated in 2013/14 that around 43% of CSEW comparable crimes were reported by the public to the police, although this proportion varied considerably for individual offence types.

Recorded crime statistics are affected by changes in reporting and recording practices. To ensure consistency, police recording practice is governed by Home Office Counting Rules (HOCR) and the National Crime Recording Standard (NCRS). These rules provide a national standard for the recording and classifying of notifiable offences by police forces in England and Wales.

Last year, Her Majesty’s Inspectorate of Constabulary (HMIC) raised serious concerns in relation to significant under-recording of crime and weak management and supervision of crime recording in some police forces. It was implied that up to 20% of crimes may be going unrecorded.

In January 2014, the UK Statistics Authority published its assessment of ONS crime statistics. It found that statistics based on police recorded crime, having been assessed against the Code of Practice for Official Statistics, did not meet the required standard for designation as National Statistics.

After these concerns have been raised, there has been a renewed focus on the quality of police recorded crime data.

2. Software

I used R statistical programming language and RStudio’s shiny framework for R. The graphical abilities of R and its packages ggplot2, shiny and shinyapps makes it a very good choice for crime data analysis and visualisation, while shiny enables turning statistical analysis into interactive web application without the knowledge of HTML, CSS, or Javascript. On the top of that, it adjusts the layout of the app for best viewing experience on desktops, tablets and smartphones.


  1. Context and Prerequisites
  2. Police Dataset: Download and Summary
  3. Data Extraction & Exploration
  4. Data Manipulation
  5. Data Visualisation
1. Context and Prerequisites

This use case is based on Office of National Statistics (ONS) figures derived based on Police recorded crime figures for years 2009- 2014, supplied by the 43 territorial police forces of England and Wales via the Home Office (HO) to ONS. The coverage of police recorded crime statistics includes a range of offences, from murder to minor criminal damage, theft and public order offences. The less serious offences (i.e., littering, begging, drunkenness, parking offences and TV licence evasion) are excluded from the recorded crime collection.

RStudio version 0.98.1028 – © 2009-2013 RStudio, Inc

R version 3.1.1 (2014-07-10); Platform: i386-w64-mingw32/i386 (32-bit)

Attached packages: rmarkdown_0.2.64, GGally_0.4.8, reshape2_1.4, ggplot2_1.0.0, markdown_0.7.4, shinyapps_0.3.57, shiny_0.10.1

2. Police Dataset

Download the dataset by clicking on the Download button in the Data tab (Navbar >> Data >> Original Crime Records). The file size is around 82.3 KB and will take few seconds to download. <>

This dataset contains close to 0.5K records that contains type of crime, year of the incident, regions and areas names and codes, range of crime categories, officer, PSCO and police staff numbers, and population of given region and area.

Type This is type of the crime: RC- recorded crime and SD- sanction detected (solved crime)*.
Year Year of the incident
Region.Name Region’s (shire’s) name.
Region.Code Which code this region corresponds.
Area.Name Area’s name.
Area.Code Which code this region corresponds.
Crimes categories These are all categories of the crime. Eg: Robbery, Fraud, Theft, Arson, and so on, organised into one level structure for the ease of further data manipulation. For the crime types organised into a logic tree format, see link crime tree :

*sanction detected- “solved crimes”. A sanction detection is counted as any police-recorded crime where a suspect has been identified and notified as being responsible for committing that crime and has received an official sanction.

For more information about crimes’ classification and definition, see HMIC website >> About the data section:






website profile picAnotherContext is all about data potential. This blog is packed with ideas, initiatives and data news.

If you like what you’ve just read, subscribe to regular email updates sent right into your inbox. Hanging out elsewhere? Say hi from anywhere, anytime ->

Twitter  |  Facebook  |  Google+ | Contact | About.Me

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>