# Spatio-temporal modelling of property prices in Brunei Darussalam

LSE Social Statistics Seminar

October 26, 2023

## Motivation

### Residential property price index (RPPI)

RPPI measures the rate at which the prices of private residential properties purchased by households are changing over time — BDCB

Ideally, any such index should capture the underlying trend in property values, rather than changes in characteristics of the houses being transacted.

## Motivation

### Methodological challenges

Hedonic regression is commonly employed to measure the constant-quality property price changes, i.e. $y_i = f(x_i, t_i) + \epsilon_i.$

Some challenges faced:

• Data sparsity (infrequent transactions and missing covariates)
• Spatial and temporal autocorrelation
• Coverage and sample selection bias
• Data collection and processing

## Motivation

### Pretty visualisations

Source: github.com/Pecners

# Introduction

## Types of spatial data

Location of Starbucks® around London.

Data source: Kaggle

Transport for London (TFL) Tube, Overground, Elizabeth line, DLR & Tram lines (including Crossrail alignment)

Data source: github/oobrien

London Middle Super Output Area (MSOA) population density (2011).

Data source: London Datastore

• Geospatial points vs point processes
• Point processes: Locations are random variables (e.g. earthquakes, crime, diseases, etc.)
• Geospatial points: Locations are fixed
• Raster/Grid data
• Pixelated (or gridded) data where each pixel is associated with a specific geographical location
• E.g. satellite imagery, digital elevation models, etc.
• Network data.
• E.g. road networks, social networks, etc.

## Spatial data in R

• Simple features refers to a formal standard (ISO 19125-1:2004) that describes how objects in the real world can be represented in computers, with emphasis on the spatial geometry of these objects.
• The {sf} package provides support for handling spatial data in R.

## Spatial data in R (cont.)

london_sf |>  # read in using sf::st_read()
select(MSOA11CD, LAD11NM, PopDen)
Simple feature collection with 983 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -0.5102995 ymin: 51.28676 xmax: 0.3339955 ymax: 51.69187
Geodetic CRS:  WGS 84
# A tibble: 983 × 4
<chr>     <chr>                 <dbl>                      <MULTIPOLYGON [°]>
1 E02000001 City of London         25.5 (((-0.0977199 51.52292, -0.097492 51.5…
2 E02000002 Barking and Dagenham   31.3 (((0.1475682 51.59681, 0.1481361 51.59…
3 E02000003 Barking and Dagenham   46.9 (((0.1510245 51.58589, 0.1510833 51.58…
4 E02000004 Barking and Dagenham   24.8 (((0.1851372 51.56552, 0.1833498 51.56…
5 E02000005 Barking and Dagenham   72.1 (((0.1492369 51.56738, 0.1483768 51.56…
6 E02000007 Barking and Dagenham   50.6 (((0.1561951 51.56508, 0.1543864 51.56…
7 E02000008 Barking and Dagenham   81.5 (((0.1357881 51.55903, 0.1344422 51.55…
8 E02000009 Barking and Dagenham   87.4 (((0.1162061 51.557, 0.113344 51.55732…
9 E02000010 Barking and Dagenham   76.8 (((0.1563492 51.55102, 0.1525812 51.55…
10 E02000011 Barking and Dagenham   38.8 (((0.1646052 51.55265, 0.1640759 51.55…
# ℹ 973 more rows
• E.g. Data set with ($$n=983$$) features containing spatial attributes (geometry) and feature-related variables (MSOA11CD, LAD11NM, PopDen).

• Notice the use of {dplyr} verbs! Link: Tidyverse methods for sf objects

## Spatial analysis

To assess – or at least, to account – for spatial patterns in the data.

• Crucial to acknowledge the inherent non-randomness and interdependence within spatial data, increasing the validity and accuracy.

• Numerous motivations for doing so including

• Agriculture
• Ecology
• Education
• Epidemiology
• Image analysis
• Spatial statistics are indispensable for capturing the complexity of spatial data, leading to more accurate, reliable, and nuanced insights and decisions.

## Types of spatial patterns

• Values are randomly distributed in space, independently of each other (and particularly its neighbours)

• Lack of a spatial pattern.

• Values are non-randomly distributed in space.

• Large values are “attracting” other large values, and vice versa.

• A clustering effect is present.

• Values are non-randomly distributed in space.

• Large values are “repelling” other large values, and vice versa.

• A dispersion effect is present.

## Statistical test

### Moran’s I test

• A test of “global” trends: Looks for evidence of spatial heterogeneity across the entire study area indexed by $$i=1,\dots,M$$.

• The Moran’s I test statistic (1948) is defined as: $I = \frac{M}{\sum_{i}{\sum_{j}{w_{ij}}}} \frac{\sum_{i}{\sum_{j}{w_{ij}(y_i - \bar{y})(y_j - \bar{y})}}}{\sum_{i}{(y_i - \bar{y})^2}} \in [-1, 1]$ where $$w_{ij}$$ is the “spatial weight” between locations $$i$$ and $$j$$.

• The null hypothesis is “the data are randomly distributed in space (no spatial autocorrelation)”, and is typically compared against an alternative of “>”.

• The Moran’s I value can be thought of as the ratio between the observed spatial variability and the expected variability.

## Statistical test

### Moran’s I test (cont.)

Two common approaches for $$p$$-value calculation:

#### $$Z$$-test

• Under $$H_0$$, $$\operatorname{E}(I) = -1/(M-1)$$.
• The variance $$\operatorname{Var}(I)$$ also has a closed form expression.
• So, the CLT is invoked and a $$p$$-values based on a $$Z$$ score is calculated.
• In practice, the normality assumption is not always met.

#### Permutation test

• Under $$H_0$$, the spatial pattern is random, so we can shuffle the values and recompute $$I$$.

• Compute the proportion of times $$I$$ is larger than the observed value.

• The Monte Carlo approach can be more accurate.

## Neighbour definitions

For areal data it is quite common to implement $$W=A$$ where $$A = (a_{ij})$$ is an adjacency matrix with the following definition: $a_{ij} = \begin{cases} 1 & i \sim j \text{ where } i \neq j \\ 0 & \text{otherwise.} \end{cases}$

  A B C D E
A 0 1 1 0 0
B 1 0 1 1 1
C 1 1 0 0 1
D 0 1 0 0 1
E 0 1 1 1 0

Other useful definitions:

• $$D = \operatorname{diag}(A1)$$ is a diagonal matrix containing number of neighbours.

• $$\tilde W = D^{-1}A$$ is the scaled adjacency matrix (row sums are 1).

## Example in R

library(spdep)
y <- negfc_sf\$value
Wl <- poly2nb(negfc_sf) |> nb2listw(style = "B")
moran.test(y, listw = Wl, alternative = "less")

Moran I test under randomisation

data:  y
weights: Wl

Moran I statistic standard deviate = -1.4695, p-value = 0.07084
alternative hypothesis: less
sample estimates:
Moran I statistic       Expectation          Variance
-0.11145491       -0.01234568        0.00454854 
moran.mc(y, listw = Wl, nsim = 1000, alternative = "less")

Monte-Carlo simulation of Moran I

data:  y
weights: Wl
number of simulations + 1: 1001

statistic = -0.11145, observed rank = 58, p-value = 0.05794
alternative hypothesis: less

# Spatio-temporal models

## Spatial areal unit models

The study area $$\mathcal S$$ is partitioned into $$M$$ non-overlapping areal units. To each areal unit $$i$$ is associated a response variable $$y_i$$ and a vector of covariates $$x_i$$. Let $\begin{gather} y_i \sim p(y_i \mid \mu_i, \nu^2) \\ g(\mu_i) = x_i^\top\beta + \psi_i. \end{gather}$

• Parameters of interest are $$\beta$$ (regression coefficients) and $$\nu^2$$ (response variance).

• $$g(\cdot)$$ is a link function, e.g.

• $$y_i \sim \operatorname{N}(\mu_i, \nu^2) \Leftrightarrow g(x) = x$$
• $$y_i \sim \operatorname{Bin}(n_i, \pi_i) \Leftrightarrow g(x) = \log(\frac{x}{1-x})$$
• $$y_i \sim \operatorname{Pois}(\mu_i) \Leftrightarrow g(x) = \log(x)$$
• Additionally, $$\psi_i$$ are structured random effects to be estimated. It partly contains spatial random effects which we label $$\phi_i$$.

## Conditionally autoregressive (CAR) priors

Let $$\psi_i=\phi_i$$. Then, the CAR prior is defined as $\phi_i \mid \phi_{-i},W,\rho,\tau^2 \sim \operatorname{N}\left(\frac{\rho\sum_{j} w_{ij} \phi_j}{\rho\sum_{j} w_{ij}+1-\rho}, \frac{\tau^2}{\rho\sum_{j} w_{ij}+1-\rho}\right),$ where $$\rho\in[0,1]$$ is a spatial dependence parameter; $$\tau^2$$ is a variance parameter; and $$W$$ is a known weight matrix (e.g. $$W=A \in \{0,1\}^{M\times M}$$ adjacency matrix).

Special cases:

1. $$\rho = 0$$ corresponds to independence;
2. $$\rho = 1$$ corresponds to the Intrinsic CAR;
3. $$\rho=1$$ and $$\psi_i = \phi_i + \theta_i$$, where $$\theta_i\sim\operatorname{N}(0,\sigma^2)$$ – this is the BYM (1991) model for binomial, Poisson and ZIP likelihoods.

## CAR joint distribution

Using Brook’s lemma, the joint distribution of $$\phi$$ is $(\phi_1,\dots,\phi_M)^\top \sim \operatorname{N}_M(0,\tau^2 Q^{-1}),$ where $Q = \rho[\overbrace{\operatorname{diag}(A1)}^{D}-\overbrace{A}^{DD^{-1}A}] + (1-\rho)I_M.$

• Clearly $$Q$$ is a convex combination of a fully intrinsic spatial structure $$Q=D(I_M - \tilde W)$$ and a completely independent structure $$Q=I_M$$.

• Estimating $$\rho$$ from the data allows us to quantify the extent of spatial autocorrelation.

## Aside

### Simultaneously autoregressive models

In econometrics , the simultaneous autoregressive (SAR) model is more widely used: $\begin{gathered} y = \rho W y + X\beta + \epsilon \\ \epsilon \sim N_M(0, \sigma^2 I_M) \end{gathered}$

• A direct spatial relationship between the response variables.

• Assume spatial interactions have a global effect (impacts are spread throughout the entire network of relationships).

• The $$\rho$$ parameter is interpeted as the “spillover” effect.

• More appropriate to model “directional flow” by customising the $$W$$ matrix.

## Aside

### Simultaneously autoregressive models

Now suppose data at each spatial unit is recorded for $$t=1,\dots,T$$ time periods. Extend the model: $\begin{gather} y_{it} \sim p(y_{it} \mid \mu_{it}, \nu^2) \\ g(\mu_{it}) = x_{it}^\top\beta + \psi_{it}. \end{gather}$

We’ll discuss three ways to model the temporal dependence:

1. Linear time effects
2. ANOVA decomposition
3. Time autoregressive process

### Linear time effects (Bernardinelli et al. 1995)

Estimate the linear time trends for each areal unit. The goal is to estimate areas exhibiting changing linear trends in $$y$$ over time. $\begin{gather} \psi_{it} = \phi_i + (\alpha + \delta_i)t \\ \phi_i \mid \phi_{-i} \sim \operatorname{CAR}(W, \rho, \tau^2) \\ \delta_i \mid \delta_{-i} \sim \operatorname{CAR}(W, \tilde\rho, \tilde\tau^2) \end{gather}$

• Spatially varying intercept $$\phi_i$$.
• Spatially varying time slope $$\alpha + \delta_i$$, where $$\alpha$$ is a “global” time effect.
• Assume linear temporal trend (like growth models).

### ANOVA decomposition (Knorr-Held 2000)

Decomposes the spatio-temporal variation into their constituent parts. The goal is to estimate overall time trends and spatial patterns. $\begin{gather} \psi_{it} = \phi_i + \delta_t + \gamma_{it} \\ \phi_i \mid \phi_{-i} \sim \operatorname{CAR}(W, \rho, \tau^2) \\ \delta_i \mid \delta_{-i} \sim \operatorname{CAR}(D, \tilde\rho, \tilde\tau^2) \\ \gamma_{it} \sim \operatorname{N}(0, \omega^2) \end{gather}$

• $$\phi$$ are the spatial random effects, and $$\delta$$ are the temporal random effects.
• $$D$$ is the “temporal adjacency” matrix: $$D_{tt'}=1$$ if $$|t-t'|=1$$, otherwise 0.
• The (optional) interaction terms $$\gamma_{it}$$ are not identified for normal models.

### Time autoregressive process (Rushworth, Lee, and Mitchell 2014)

Include and AR(1) or AR(2) temporal process. The goal is to estimate the evolution of spatial random effects over time. $\begin{gather} \psi_{it} = \phi_{it} \\ \phi^{(t)} \mid \phi^{(t-1)},\phi^{(t-2)} \sim \operatorname{N}_M\big(a_1\phi^{(t-1)} + a_2\phi^{(t-2)}, \tau^2 Q(W,\rho)^{-1}\big) \hspace{2em} t\geq 3 \\ \phi^{(1)} \sim \operatorname{N}_M\big(0, \tau^2 Q(W,\rho)^{-1}\big) \end{gather}$

• The $$Q$$ precision matrix is the same as the CAR Leroux model.
• Temporal autocorrelation is induced by the mean.
• Spatial autocorrelation is induced by the variance.

## Model estimation and inference

• Model estimated fully Bayesian using Markov chain Monte Carlo (MCMC) methods. Additional priors:
• $$\beta \sim \operatorname{N}(\mu_\beta,\Sigma_\beta)$$. Default choice of mean 0 and very large independent variance.
• $$\Gamma^{-1}(1,0.01)$$ for variance parameters.
• $$\operatorname{Unif}(-1,1)$$ for the $$\rho$$s.
• Use the {CARBayesST} package .
• Gibbs sampling for closed-form full conditionals (e.g. $$\beta$$, $$\phi$$, $$\tau^2$$).
• Metropolis or Metropolis-Hastings for the rest.
• Make use of sparsity of $$W$$ (and $$D$$) to speed up computation.
• Model fit criteria: Log-likelihood; Deviance information criterion (DIC) ; and Watanabe-Akaike information criterion (WAIC) .

# Application to Brunei data

## The data

• Loan information data pertaining to residential property purchases obtained from BDCB as reported from financial institutions (2015–2022).

• $$N=3,512$$ sample points spread across $$T=32$$ quarters containing:

1. price: Property price in thousands of Brunei dollars
2. type: Property type (Land, Detached, Semi-Detached, Terrace, Apartment)
3. built_up: Built up area in 1,000 square feet
4. beds: No. of beds
5. baths: No. of baths
6. land_type: Land type (Freehold, Leasehold, Strata)
7. land_size: Lot area in acres
• Model: price ~ house_characteristics + st_effects + error (Gaussian errors, identity link).

## Spatial distribution of prices

• Brunei is administratively divided into four districts with $$M=39$$ mukims (sub-districts).

• Some mukims will not have any transactions due to:

• Nature reserve areas
• Government land
• No residential properties
• The $$W$$ matrix is the binary mukim relationship ignoring the non-observed areas.