Spatio-temporal modelling of property prices in Brunei Darussalam

LSE Social Statistics Seminar

Haziq Jamil

Assistant Professor of Statistics
Universiti Brunei Darussalam

https://haziqj.ml/lse-rms-oct23/

October 26, 2023

The team

Statistical modelling of spatio-temporal data in Brunei Darussalam

Motivation

Residential property price index (RPPI)

RPPI measures the rate at which the prices of private residential properties purchased by households are changing over time — BDCB


Ideally, any such index should capture the underlying trend in property values, rather than changes in characteristics of the houses being transacted.

Motivation

Methodological challenges

Hedonic regression is commonly employed to measure the constant-quality property price changes, i.e. \[ y_i = f(x_i, t_i) + \epsilon_i. \]

Some challenges faced:

  • Data sparsity (infrequent transactions and missing covariates)
  • Spatial and temporal autocorrelation
  • Coverage and sample selection bias
  • Data collection and processing

Motivation

Pretty visualisations

Source: github.com/Pecners

Introduction

Types of spatial data

Location of Starbucks® around London.

Data source: Kaggle

Transport for London (TFL) Tube, Overground, Elizabeth line, DLR & Tram lines (including Crossrail alignment)

Data source: github/oobrien

London Middle Super Output Area (MSOA) population density (2011).

Data source: London Datastore

  • Geospatial points vs point processes
    • Point processes: Locations are random variables (e.g. earthquakes, crime, diseases, etc.)
    • Geospatial points: Locations are fixed
  • Raster/Grid data
    • Pixelated (or gridded) data where each pixel is associated with a specific geographical location
    • E.g. satellite imagery, digital elevation models, etc.
  • Network data.
    • E.g. road networks, social networks, etc.

Spatial data in R

  • Simple features refers to a formal standard (ISO 19125-1:2004) that describes how objects in the real world can be represented in computers, with emphasis on the spatial geometry of these objects.
  • The {sf} package (Pebesma and Bivand 2023) provides support for handling spatial data in R.

Spatial data in R (cont.)

london_sf |>  # read in using sf::st_read()
  select(MSOA11CD, LAD11NM, PopDen)
Simple feature collection with 983 features and 3 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -0.5102995 ymin: 51.28676 xmax: 0.3339955 ymax: 51.69187
Geodetic CRS:  WGS 84
# A tibble: 983 × 4
   MSOA11CD  LAD11NM              PopDen                                geometry
   <chr>     <chr>                 <dbl>                      <MULTIPOLYGON [°]>
 1 E02000001 City of London         25.5 (((-0.0977199 51.52292, -0.097492 51.5…
 2 E02000002 Barking and Dagenham   31.3 (((0.1475682 51.59681, 0.1481361 51.59…
 3 E02000003 Barking and Dagenham   46.9 (((0.1510245 51.58589, 0.1510833 51.58…
 4 E02000004 Barking and Dagenham   24.8 (((0.1851372 51.56552, 0.1833498 51.56…
 5 E02000005 Barking and Dagenham   72.1 (((0.1492369 51.56738, 0.1483768 51.56…
 6 E02000007 Barking and Dagenham   50.6 (((0.1561951 51.56508, 0.1543864 51.56…
 7 E02000008 Barking and Dagenham   81.5 (((0.1357881 51.55903, 0.1344422 51.55…
 8 E02000009 Barking and Dagenham   87.4 (((0.1162061 51.557, 0.113344 51.55732…
 9 E02000010 Barking and Dagenham   76.8 (((0.1563492 51.55102, 0.1525812 51.55…
10 E02000011 Barking and Dagenham   38.8 (((0.1646052 51.55265, 0.1640759 51.55…
# ℹ 973 more rows
  • E.g. Data set with (\(n=983\)) features containing spatial attributes (geometry) and feature-related variables (MSOA11CD, LAD11NM, PopDen).

  • Notice the use of {dplyr} verbs! Link: Tidyverse methods for sf objects

Spatial analysis

To assess – or at least, to account – for spatial patterns in the data.

Types of spatial patterns


  • Values are randomly distributed in space, independently of each other (and particularly its neighbours)

  • Lack of a spatial pattern.


  • Values are non-randomly distributed in space.

  • Large values are “attracting” other large values, and vice versa.

  • A clustering effect is present.


  • Values are non-randomly distributed in space.

  • Large values are “repelling” other large values, and vice versa.

  • A dispersion effect is present.

Statistical test

Moran’s I test

  • A test of “global” trends: Looks for evidence of spatial heterogeneity across the entire study area indexed by \(i=1,\dots,M\).

  • The Moran’s I test statistic (1948) is defined as: \[ I = \frac{M}{\sum_{i}{\sum_{j}{w_{ij}}}} \frac{\sum_{i}{\sum_{j}{w_{ij}(y_i - \bar{y})(y_j - \bar{y})}}}{\sum_{i}{(y_i - \bar{y})^2}} \in [-1, 1] \] where \(w_{ij}\) is the “spatial weight” between locations \(i\) and \(j\).

  • The null hypothesis is “the data are randomly distributed in space (no spatial autocorrelation)”, and is typically compared against an alternative of “>”.

  • The Moran’s I value can be thought of as the ratio between the observed spatial variability and the expected variability.

Statistical test

Moran’s I test (cont.)

Two common approaches for \(p\)-value calculation:

\(Z\)-test

  • Under \(H_0\), \(\operatorname{E}(I) = -1/(M-1)\).
  • The variance \(\operatorname{Var}(I)\) also has a closed form expression.
  • So, the CLT is invoked and a \(p\)-values based on a \(Z\) score is calculated.
  • In practice, the normality assumption is not always met.

Permutation test

  • Under \(H_0\), the spatial pattern is random, so we can shuffle the values and recompute \(I\).

  • Compute the proportion of times \(I\) is larger than the observed value.

  • The Monte Carlo approach can be more accurate.

Neighbour definitions

Adjacency matrix

For areal data it is quite common to implement \(W=A\) where \(A = (a_{ij})\) is an adjacency matrix with the following definition: \[ a_{ij} = \begin{cases} 1 & i \sim j \text{ where } i \neq j \\ 0 & \text{otherwise.} \end{cases} \]


  A B C D E
A 0 1 1 0 0
B 1 0 1 1 1
C 1 1 0 0 1
D 0 1 0 0 1
E 0 1 1 1 0

Other useful definitions:

  • \(D = \operatorname{diag}(A1)\) is a diagonal matrix containing number of neighbours.

  • \(\tilde W = D^{-1}A\) is the scaled adjacency matrix (row sums are 1).

Example in R

library(spdep)
y <- negfc_sf$value
Wl <- poly2nb(negfc_sf) |> nb2listw(style = "B")
moran.test(y, listw = Wl, alternative = "less")

    Moran I test under randomisation

data:  y  
weights: Wl    

Moran I statistic standard deviate = -1.4695, p-value = 0.07084
alternative hypothesis: less
sample estimates:
Moran I statistic       Expectation          Variance 
      -0.11145491       -0.01234568        0.00454854 
moran.mc(y, listw = Wl, nsim = 1000, alternative = "less")

    Monte-Carlo simulation of Moran I

data:  y 
weights: Wl  
number of simulations + 1: 1001 

statistic = -0.11145, observed rank = 58, p-value = 0.05794
alternative hypothesis: less

Spatio-temporal models

Spatial areal unit models

The study area \(\mathcal S\) is partitioned into \(M\) non-overlapping areal units. To each areal unit \(i\) is associated a response variable \(y_i\) and a vector of covariates \(x_i\). Let \[ \begin{gather} y_i \sim p(y_i \mid \mu_i, \nu^2) \\ g(\mu_i) = x_i^\top\beta + \psi_i. \end{gather} \]

  • Parameters of interest are \(\beta\) (regression coefficients) and \(\nu^2\) (response variance).

  • \(g(\cdot)\) is a link function, e.g.

    • \(y_i \sim \operatorname{N}(\mu_i, \nu^2) \Leftrightarrow g(x) = x\)
    • \(y_i \sim \operatorname{Bin}(n_i, \pi_i) \Leftrightarrow g(x) = \log(\frac{x}{1-x})\)
    • \(y_i \sim \operatorname{Pois}(\mu_i) \Leftrightarrow g(x) = \log(x)\)
  • Additionally, \(\psi_i\) are structured random effects to be estimated. It partly contains spatial random effects which we label \(\phi_i\).

Conditionally autoregressive (CAR) priors

Let \(\psi_i=\phi_i\). Then, the CAR prior (Leroux, Lei, and Breslow 2000) is defined as \[ \phi_i \mid \phi_{-i},W,\rho,\tau^2 \sim \operatorname{N}\left(\frac{\rho\sum_{j} w_{ij} \phi_j}{\rho\sum_{j} w_{ij}+1-\rho}, \frac{\tau^2}{\rho\sum_{j} w_{ij}+1-\rho}\right), \] where \(\rho\in[0,1]\) is a spatial dependence parameter; \(\tau^2\) is a variance parameter; and \(W\) is a known weight matrix (e.g. \(W=A \in \{0,1\}^{M\times M}\) adjacency matrix).

Special cases:

  1. \(\rho = 0\) corresponds to independence;
  2. \(\rho = 1\) corresponds to the Intrinsic CAR;
  3. \(\rho=1\) and \(\psi_i = \phi_i + \theta_i\), where \(\theta_i\sim\operatorname{N}(0,\sigma^2)\) – this is the BYM (1991) model for binomial, Poisson and ZIP likelihoods.

CAR joint distribution

Using Brook’s lemma, the joint distribution of \(\phi\) (Besag 1974) is \[ (\phi_1,\dots,\phi_M)^\top \sim \operatorname{N}_M(0,\tau^2 Q^{-1}), \] where \[ Q = \rho[\overbrace{\operatorname{diag}(A1)}^{D}-\overbrace{A}^{DD^{-1}A}] + (1-\rho)I_M. \]

  • Clearly \(Q\) is a convex combination of a fully intrinsic spatial structure \(Q=D(I_M - \tilde W)\) and a completely independent structure \(Q=I_M\).

  • Estimating \(\rho\) from the data allows us to quantify the extent of spatial autocorrelation.

Aside

Simultaneously autoregressive models

In econometrics (LeSage and Pace 2009), the simultaneous autoregressive (SAR) model is more widely used: \[ \begin{gathered} y = \rho W y + X\beta + \epsilon \\ \epsilon \sim N_M(0, \sigma^2 I_M) \end{gathered} \]

  • A direct spatial relationship between the response variables.

  • Assume spatial interactions have a global effect (impacts are spread throughout the entire network of relationships).

  • The \(\rho\) parameter is interpeted as the “spillover” effect.

  • More appropriate to model “directional flow” by customising the \(W\) matrix.

Aside

Simultaneously autoregressive models

Adding the temporal element

Now suppose data at each spatial unit is recorded for \(t=1,\dots,T\) time periods. Extend the model: \[ \begin{gather} y_{it} \sim p(y_{it} \mid \mu_{it}, \nu^2) \\ g(\mu_{it}) = x_{it}^\top\beta + \psi_{it}. \end{gather} \]

We’ll discuss three ways to model the temporal dependence:

  1. Linear time effects
  2. ANOVA decomposition
  3. Time autoregressive process

Adding the temporal element

Linear time effects (Bernardinelli et al. 1995)

Estimate the linear time trends for each areal unit. The goal is to estimate areas exhibiting changing linear trends in \(y\) over time. \[ \begin{gather} \psi_{it} = \phi_i + (\alpha + \delta_i)t \\ \phi_i \mid \phi_{-i} \sim \operatorname{CAR}(W, \rho, \tau^2) \\ \delta_i \mid \delta_{-i} \sim \operatorname{CAR}(W, \tilde\rho, \tilde\tau^2) \end{gather} \]

  • Spatially varying intercept \(\phi_i\).
  • Spatially varying time slope \(\alpha + \delta_i\), where \(\alpha\) is a “global” time effect.
  • Assume linear temporal trend (like growth models).

Adding the temporal element

ANOVA decomposition (Knorr-Held 2000)

Decomposes the spatio-temporal variation into their constituent parts. The goal is to estimate overall time trends and spatial patterns. \[ \begin{gather} \psi_{it} = \phi_i + \delta_t + \gamma_{it} \\ \phi_i \mid \phi_{-i} \sim \operatorname{CAR}(W, \rho, \tau^2) \\ \delta_i \mid \delta_{-i} \sim \operatorname{CAR}(D, \tilde\rho, \tilde\tau^2) \\ \gamma_{it} \sim \operatorname{N}(0, \omega^2) \end{gather} \]

  • \(\phi\) are the spatial random effects, and \(\delta\) are the temporal random effects.
  • \(D\) is the “temporal adjacency” matrix: \(D_{tt'}=1\) if \(|t-t'|=1\), otherwise 0.
  • The (optional) interaction terms \(\gamma_{it}\) are not identified for normal models.

Adding the temporal element

Time autoregressive process (Rushworth, Lee, and Mitchell 2014)

Include and AR(1) or AR(2) temporal process. The goal is to estimate the evolution of spatial random effects over time. \[ \begin{gather} \psi_{it} = \phi_{it} \\ \phi^{(t)} \mid \phi^{(t-1)},\phi^{(t-2)} \sim \operatorname{N}_M\big(a_1\phi^{(t-1)} + a_2\phi^{(t-2)}, \tau^2 Q(W,\rho)^{-1}\big) \hspace{2em} t\geq 3 \\ \phi^{(1)} \sim \operatorname{N}_M\big(0, \tau^2 Q(W,\rho)^{-1}\big) \end{gather} \]

  • The \(Q\) precision matrix is the same as the CAR Leroux model.
  • Temporal autocorrelation is induced by the mean.
  • Spatial autocorrelation is induced by the variance.

Model estimation and inference

  • Model estimated fully Bayesian using Markov chain Monte Carlo (MCMC) methods. Additional priors:
    • \(\beta \sim \operatorname{N}(\mu_\beta,\Sigma_\beta)\). Default choice of mean 0 and very large independent variance.
    • \(\Gamma^{-1}(1,0.01)\) for variance parameters.
    • \(\operatorname{Unif}(-1,1)\) for the \(\rho\)s.
  • Use the {CARBayesST} package (Lee, Rushworth, and Napier 2018).
    • Gibbs sampling for closed-form full conditionals (e.g. \(\beta\), \(\phi\), \(\tau^2\)).
    • Metropolis or Metropolis-Hastings for the rest.
    • Make use of sparsity of \(W\) (and \(D\)) to speed up computation.

Application to Brunei data

The data

  • Loan information data pertaining to residential property purchases obtained from BDCB as reported from financial institutions (2015–2022).

  • \(N=3,512\) sample points spread across \(T=32\) quarters containing:

    1. price: Property price in thousands of Brunei dollars
    2. type: Property type (Land, Detached, Semi-Detached, Terrace, Apartment)
    3. built_up: Built up area in 1,000 square feet
    4. beds: No. of beds
    5. baths: No. of baths
    6. land_type: Land type (Freehold, Leasehold, Strata)
    7. land_size: Lot area in acres
  • Model: price ~ house_characteristics + st_effects + error (Gaussian errors, identity link).

Spatial distribution of prices

  • Brunei is administratively divided into four districts with \(M=39\) mukims (sub-districts).

  • Some mukims will not have any transactions due to:

    • Nature reserve areas
    • Government land
    • No residential properties
  • The \(W\) matrix is the binary mukim relationship ignoring the non-observed areas.