Los Angeles is a city of around 4 Mio1 inhabitants from diverse backgrounds (about half2 of the population is latino) and had a crime rate of about 6353 per 100’000 inhabitants per year in 2015. The city hosts the biggest port complex in the US and is an illegal drugs hub4 for the country. Finding a way to foster this diversity so as to take advantage of the city’s economical potential and reduce crime would provide more room for children to develop, achieve high education levels and freely roam around the city.

The aim of this paper is to identify how demographic data at the zip code level in Los Angeles impact crime rates in 2010. More precisely, thw two following hypotheses are formulated and tested in the following:

For the purpose of the study, three data sets are used:

In the following, non-spatial regression analysis is used. This provides the basis for further exploration with spatial methods, i.e. bivariate spatial maps, centrographic statistics and LISA statistics based on Moran’s I. First, an exploratory data analysis is performed in order to assess the distribution of the variables of interest.

Descriptive Statistics

The histograms below (see Figure @ref(fig:histograms)) show that average household size and median age are centered around 3 and 40 respectively. Crime rate has a high number of 1, 2 and 3 values (the underlying data has no 0 values). This might be due to the lack of data in certain regions. Those 1s values were not dismissed in order to not dismiss zip code areas arbitrarily in the final analysis, instead the spatial plots use quantiles, which allow to categorize all low values together and allow for comparison with higher and more reasonable estimates of crime rates.

The interaction between the explanatory variables (median age and average household size) and the dependent variable (number of crimes per zip code) can be seen on the plot below. The natural log of number of crimes was used in order to standardize the data, since it is originally left skewed. It seems that that no particular relationship exists on a first look, this will be tested by the regressions below.

non-spatial bivariate regressions, non-spatial correlations

Below are two regression analyses for the two independent variables (average household size and median age). It is deceiving in terms of explained variance and of the explanatory value of the independent variable median age. Average household size is significantly different from zero at a 5% confidence level though: a bigger household would imply more crimes as stated in the hypothesis in the introduction.

Dependent variable:
log(crimes)
Average.Household.Size 0.675**
(0.305)
Median.Age 0.005
(0.028)
Constant 3.398**
(1.441)
Observations 242
R2 0.020
Adjusted R2 0.012
Residual Std. Error 3.993 (df = 239)
F Statistic 2.457* (df = 2; 239)
Note: p<0.1; p<0.05; p<0.01

Those results are not surprising given the scatter plot shown in the previous section. This relative lack of relationship especially for median age does not entail that no trends can be found by treating the data as spatial data. This will be investigated further below.

2 variable maps

Number of Crimes VS Average Household Size

The spatial data delivers more detailed information on certain underlying relationships. Under the hypothesis stated in the introduction, one would expect that the more a circle tends to reddish (i.e. the higher the average household size), the more an area tends to blueish (the higher the number of crimes). This is however not the case, except for three areas north-west of Santa Monica.

Number of Crimes VS Median Age

Similarily, for median age against crime rates no particular relationship can be read from the plot.

Centrographic statistics and maps

In the following is an attempt of exploring the relationships based on centroids and standard deviation ellipses (SDE). The independent variables are split into two groups. Above (red) VS below (orange) 40 for the median age, above (red) VS below (orange) 3 for the average household size.

number of crimes VS median age

number of crimes VS median age

The plot above shows that younger people spread in the north and in the south of the city mostly, whereas older people spread along a west-east axis. Given that the number of crimes is heterogeneous along any of those axis, a conclusion is again hard to draw on the effect of age on crime rates. The plot below, illustrates the relationship between crimes and household size.

number of crimes VS average household size

number of crimes VS average household size

Households, whether big or small, seem to be spread equally along the city, since both SDEs have similar shape.

Conclusion

The two hypotheses stated above can not be confirmed by the analysis above. It seems that the heterogeneity in crime data (as can be seen in the plots above) is not well modelled by either median age or average household size (except for the linear regression, for wich a higher average household size did involve more crimes). Other variables that are available in the crime and Zip datasets will be further investigated, in order to better model and predict crime rates.

R Script

knitr::opts_chunk$set(echo = F, message = F, warning = F)

packages <- c("rgdal", "foreign", "gdata", "ggmap", "ggplot2",
              "plyr", "rgeos", "sf", "ggrepel", "dplyr", "sp", "aspace",
              "spdep", "bookdown", "stringr", "maptools", "leaflet", "broom", "stargazer",
              "RColorBrewer")

package.check <- lapply(packages, FUN = function(x) {
  if (!require(x, character.only = T)) install.packages(x)
  if (! (x %in% (.packages() )))  library(x, character.only = T)
})


p <- read.csv("../Research Data/2010_Census_Populations_by_Zip_Code.csv")
load("../Research Data/crime.RData")
load(file = "../Research Data/zipCrimes.RData")

names(zc)[2] <- names(p)[1] <- "zip"

zCrimes <- zc
zCrimes@data <- merge(zc@data, p, by = "zip")
# zCrimes <- zCrimes[!is.na(zCrimes$crimes),]

# writeOGR(obj=zCrimes, driver="ESRI Shapefile", "../Research Data/zCrimes")



par(mfrow=c(1,3))
hist(zCrimes$crimes, xlab = "crimes since 2010", main=NULL)
hist(zCrimes$Average.Household.Size, xlab = "average household size in 2010", main=NULL)
hist(zCrimes$Median.Age, xlab = "median age in 2010", main=NULL)
par(mfrow=c(1,1))


par(mfrow=c(1,2))
plot(zCrimes$Median.Age, log(zCrimes$crimes), xlab = "median age", ylab = "ln(crimes)")
plot(zCrimes$Average.Household.Size, log(zCrimes$crimes), , xlab = "average household size", ylab = "ln(crimes)")
par(mfrow=c(1,1))


regHH <- lm(log(crimes) ~ Average.Household.Size + Median.Age, data = zCrimes)

stargazer(regHH, no.space = T, type = "html")

# regMedAge <- lm(crimes ~ Median.Age, data = zCrimes)
# 
# stargazer(regMedAge, no.space = T,  type = "html")


centroids <- as.data.frame(gCentroid(zCrimes,byid=TRUE))

# pal <- colorNumeric(
#   palette = "YlGnBu",
#   domain = zCrimes$crimes
# )

qpal <- colorQuantile("YlGnBu", zCrimes$crimes, n = 5)
qHH <- colorQuantile("YlOrRd", zCrimes$Average.Household.Size, n = 5)

leaflet(zCrimes) %>% addPolygons(weight = 1, smoothFactor = 0.5,
    opacity = 1.0, fillOpacity = 0.5,
    color = ~qpal(crimes),
    highlightOptions = highlightOptions(color = "white", weight = 2,
      bringToFront = TRUE)) %>% 
                addCircles(lng = ~centroids$x, lat = ~centroids$y, weight = 1, color = ~qHH(Average.Household.Size),
                  radius = 800, popup = ~zCrimes$zip, opacity = 0.9, fillOpacity = 0.8 ) %>% 
                addTiles() %>%
    addLegend(pal = qpal, values = ~crimes, opacity = 1) %>%
    addLegend(pal = qHH, values = ~Average.Household.Size, opacity = 1)

qMA <- colorQuantile("YlOrRd", zCrimes$Median.Age, n = 5)


leaflet(zCrimes) %>% addPolygons(weight = 1, smoothFactor = 0.5,
    opacity = 1.0, fillOpacity = 0.5,
    color = ~qpal(crimes),
    highlightOptions = highlightOptions(color = "white", weight = 2,
      bringToFront = TRUE)) %>% 
                addCircles(lng = ~centroids$x, lat = ~centroids$y, weight = 1, color = ~qMA(Median.Age),
                  radius = 800, popup = ~zCrimes$zip, opacity = 0.9, fillOpacity = 0.8 ) %>% 
                addTiles() %>%
    addLegend(pal = qpal, values = ~crimes, opacity = 1) %>%
    addLegend(pal = qMA, values = ~Median.Age, opacity = 1)


zCrimes@data$id = rownames(zCrimes@data)
crimePoints = fortify(zCrimes, region="id")
crimesDf = join(crimePoints, zCrimes@data, by="id")


# c$weapon <- !c$Weapon.Description %in% c("", "STRONG-ARM (HANDS, FIST, FEET OR BODILY FORCE)", "VERBAL THREAT")
# c$verbal <- c$Weapon.Description %in% "VERBAL THREAT"
# c$fists <- c$Weapon.Description %in% "STRONG-ARM (HANDS, FIST, FEET OR BODILY FORCE)"
# 
# clean_coords <- gsub(pattern = '[()]', replacement = '', x = c$Location)
# split_coords <- str_split(string = clean_coords, pattern = ', ', n = 2, simplify = T)
# c$lat <- as.numeric(split_coords[,1])
# c$lon <- as.numeric(split_coords[,2])


zCrimesLoc <- cbind(zCrimes@data, centroids)

ggplot(crimesDf) + 
  geom_polygon(aes(long,lat,group=group, fill = crimes)) +
  coord_equal() + 
  stat_ellipse(data = subset(zCrimesLoc, Median.Age > 40), aes(x = x, y = y), level=0.5, color = "red") +
  geom_point(data = subset(zCrimesLoc, Median.Age > 40), aes(x = mean(x), y = mean(y)), color = "red", size = 0.5) +
  stat_ellipse(data = subset(zCrimesLoc, Median.Age <= 40), aes(x = x, y = y), level=0.5, color = "orange") +
  geom_point(data = subset(zCrimesLoc, Median.Age <= 40), aes(x = mean(x), y = mean(y)), color = "orange", size = 0.5) +
  theme_void()


ggplot(crimesDf) + 
  geom_polygon(aes(long,lat,group=group, fill = crimes)) +
  coord_equal() + 
  stat_ellipse(data = subset(zCrimesLoc, Average.Household.Size > 3), aes(x = x, y = y), level=0.5, color = "red") +
  geom_point(data = subset(zCrimesLoc, Average.Household.Size > 3), aes(x = mean(x), y = mean(y)), color = "red", size = 0.5) +
  stat_ellipse(data = subset(zCrimesLoc, Average.Household.Size <= 3), aes(x = x, y = y), level=0.5, color = "orange") +
  geom_point(data = subset(zCrimesLoc, Average.Household.Size <= 3), aes(x = mean(x), y = mean(y)), color = "orange", size = 0.5) +
  theme_void()

  1. https://en.wikipedia.org/wiki/List_of_United_States_cities_by_crime_rate

  2. https://www.discoverlosangeles.com/press-releases/facts-about-los-angeles

  3. https://en.wikipedia.org/wiki/List_of_United_States_cities_by_crime_rate

  4. https://www.businessinsider.com/dea-maps-of-mexican-cartels-in-the-us-2016-12

  5. https://catalog.data.gov/dataset/crime-data-from-2010-to-present

  6. https://catalog.data.gov/dataset/crime-data-from-2010-to-present

  7. https://www.census.gov/cgi-bin/geo/shapefiles/index.php?year=2018&layergroup=ZIP+Code+Tabulation+Areas