Contributed by Gordon. Gordon took NYC Data Science Academy 12 week full time Data Science Bootcamp pr… between Sept 23 to Dec 18, 2015. The post was based on his second class project(due at 4th week of the program).
The Problem: Interesting Data, Uninteresting Use
The Death Penalty Information Center provides new and resources about executions – both past and upcoming – in the United States. One of its best features is a database containing information on every executed person since 1977. The database has an interface which allows filtering by several categories, and a download option to acquire a csv of the data generated. However, descriptive statistics and comparisons are either absent or hidden in a pdf.
My goal was to make an app that proves to be more inclusive of the data; i.e, providing a platform for both the raw data and statistics to be displayed in the same interactive environment. R Shiny provides a great platform to build such an environment that could grow into a web app for the general public.
Methodology
Unexpectedly, the data had a lot of missing values. Given its nature of the data, however, imputation was off the table, as was any discarding. The cleaning of the data was reduced to separating dates into month, day, and year, and partitioning the column of the victims of each category of execution by sex. Ultimately, I didn’t use this data, but it could be integrated in the future.
The App
The app consists of three main pages, the last of which is an “About” page.
The first page displays a state map of the executions per state. I used dplyr to to wrangle the data and the Plotly API to generate an interactive choropleth map.
And this is the code which produced this plot:
output$map = renderPlotly({
l = list(color = toRGB("white"), width = 2)
g = list(scope = 'usa', projection = list(type = 'albers usa'))
plot_ly(state.executions, z = count,locations = State, type = ‘choropleth’,
locationmode = ‘USA-states’, color = count, size = 10, colors = ‘Purples’,
marker = list(line = l), colorbar = list(title = “Number of Executions”),
filename=”r-docs/usa-choropleth”) %>%
layout(title = ‘Executions In The U.S.A Since 1977
(Hover for Numbers By State)’,
geo = g)
})
This is direct usage of examples provided by Plotly page’s on using the service with R.
The second page allows the user to do some exploration of the data. A menu on the left allows one to choose which state’s data to display, the year, the method of execution, and the race of the person executed. The filtered results are shown in a table in the first tab to the right of the menu.
What was most interesting here was that R’s data table object is Javascript generated on the backend. That allowed me to customize to my heart’s desire by removing pagination and filtering and adding a scroll bar–among other things.
output$table <- renderDataTable({
data = executions[,c(2,3,5,8,18,6)]
if (input$st != "All"){
data = data[data$State == input$st,]
}
if (input$yr != "All"){
data = data[data$Year == input$yr,]
}
if (input$md != "All"){
data = data[data$Method == input$md,]
}
if (input$rc != "All"){
data = data[data$Race == input$rc,]
}
data},
options = list(searching = FALSE, pageLength=10, lengthChange = FALSE, ordering = FALSE,
scrollY = "310px", scrollCollapse = TRUE, paging = FALSE, info = FALSE)
)
The second tab produces aggregated plots of statistics such as race, method of execution, age, and sex. This data can also be filtered by state.
Below lies the central component of this tab.
df2 = reactive({
if (input$st == 'All') executions
else filter(executions, State == input$st)
})
Using a reactive allows for the app to update and generate new graphs and charts based on a user’s choices.
The last tab is a simple timeline showing the number of executions per year from 1977 to the present day.
I generated this chart using ggplot2.
output$time.series = renderPlot({
#plot_ly(year.executions, x = Year, y = count, name = "Executions Year On Year", filename="r-docs/basic-time-series")
ggplot(data = year.executions, aes(x=Year,y=count)) +
geom_line(colour="darkgreen") +
ylim(0,50) +
ggtitle('Executions Year On Year') +
ylab('Number Executed') +
theme_bw() +
theme(plot.title = element_text(size=20, face="bold", vjust=2),
axis.title.x = element_text(face="bold", vjust=-1),
axis.title.y = element_text(face="bold"))
})
Demo
You can explore the app here and see the code here.
Further Work
Given this base, there is room to make this app into a complete wrapper for the executions database. This would involve finding a way to fully integrate data about victims into the interface. There is also an opportunity to add even more data since the Death Penalty Information Center links to another data set which has executions since 1602. Extensive cleaning of this data would be necessary before integration, however. Finally, the ultimate goal would be full automation, with a script checking for updates to the database and updating the app if new data has been added.