We propose a geographic regression discontinuity approach to estimating the effect of a characteristic of a geographic unit (the electoral institutions of a country, for example) on a spatially distributed outcome (level of forest cover, for example). We use multiple borders over multiple time periods and show that keeping only data near the discontinuity can be more efficient than using all of the data.

Caption: Examining the effect of institutions on change in cropland, using only border areas controls for spatial confounders like climate, soil, and elevation.

Caption: Examining the effect of institutions on change in cropland, using only border areas controls for spatial confounders like climate, soil, and elevation.

Introduction

There is not a great method for causal inference with spatial data. We are often interested in political variables that lead to a spatial outcome. For example, how do elections impact deforestation? Current methods- mainly regression discontinuity and matching- are insufficient. Geographic regression continuity is a young and limited method: it only covers one time period and one boundary. Here, we expand this method to include multiple borders and time-series.

Data Generating Process

We assign each cell a geographic location, denoted by its “latitude” and “longitude,” and a country based on that cell’s location. Each cell is then assigned a treatment level (between 1 and 4, based on country) and an outcome (in this case, 2 times the treatment plus some normally distributed noise). Colors show the outcome as a function of different treatment levels by country The plots below show the full map, and then the data used to estimate the GRDD.

As expected, throwing out data leads to a less efficient estimate above. However, to simulate spatially autocorrelated noise, we add a shock in a randomly chosen cell that dissipates with the square of distance. In this case, the shock disproportionately affects Country D.

Findings

In the presence of a shock that is sufficiently large and sufficiently autocorrelated, an OLS estimate based on only the border cells is more efficient than an estimate that uses all of the data. The figure below shows that the distribution of the GRDD estimator has a smaller standard error than the usual OLS estimate. Both are centered on the true causal effect (5).

Introducing Time: Panel Data

In addition to estimating on multiple discontinuities, we show that a GRDD can be more efficient than a within estimator in a panel setting. Each cell starts with a dependent variable value of 0, and in independent variable value of 5. Then, the independent variable in each country is assigned a random walk process, where the change in each round is -1, 0 or 1. The independent variable for each cell takes its value in the previous period, plus the treatment effect (one times the independent variable) plus the (randomly located) spatial shock from above and some noise. Below we show the outcome variable for each cell over 100 periods, with blue indicating higher values:

Below we plot the distribution of a two-way fixed effects estimator that uses the full sample against an estimator that uses only border cells (the GRDD estimator) to show that using only border cells produces a more efficient estimate. Both are centered on the true effect size (1).

Summary

We have shown that considering only points near the border between two treatment areas, or the simplest form of a geographic regression discontinuity design, generates a more efficient estimate of a treatment effect in cross-sectional and panel settings, even after it throws out more than half of the data. We find this effect in both cross-sectional and panel settings with multiple discontinuities and multiple treatments. So far we have only considered the efficiency of the estimator. In the future, we would like to generalize this research into settings where the usual OLS or panel methods are biased by the spatial autocorrelation of the data but where a geographic regression continuity design is still unbiased.