# 1 Introduction

Property price modelling involves analysing and understanding the various factors that influence property values, be it intrinsic features (such as property characteristics) or extrinsic features (such as neighbourhood characteristics). It is crucial for guiding economic policy, informing investment decisions, and shaping urban development strategies. By serving as a key economic indicator, modelling enables policymakers to gauge market health and economic trends. For investors, it aids in optimizing investment strategies and managing risks, while urban planners rely on it to ensure balanced growth and to address housing affordability.

Brunei Darussalam, a small but affluent Southeast Asian country with a population of around 450,000, presents a unique case study in this context. Characterised by its distinct property trends–stemming from sociocultural norms (Hassan, 2023) and geographical peculiarities–Brunei’s residential market demands a nuanced analytical approach that can capture these unique spatial and temporal dynamics. This paper aims to fill this gap by quantifying the spatial and temporal effects on property prices, in addition to intrinsic property characteristics, thus contributing valuable insights into the economy and the real estate market’s role within it.

The most common technique employed for the current purpose is the linear regression model. Let \(Y\) be the price of a property, and let \(\mathbf X = \{X_1,\dots,X_p\}\) represent its characteristics such as size, property type, number of bedrooms, bathrooms, and so on. This model assumes a straightforward linear relationship between pairs of observations \(\{(Y_n,\mathbf X_n) \}_{n=1}^N\): \[
Y_n = \beta_{0} + \beta_{1} X_{1n} + \beta_{2} X_{2n} + \ldots + \beta_{p} X_{pn} + \epsilon_n.
\tag{1}\] Here, \(\epsilon\) is a term quantifying the errors or inadequacies of the model, and the least squares estimates of the coefficients \(\beta_0,\dots,\beta_p\) aim to minimise the sum of these squared errors. This simple model allows us to quantify the effect of each covariate on the price, and to even make predictions about the price of any property (hypothetical or not) given its characteristics. Breaking down the price or value of a commodity into its constituent parts is known as *hedonic regression* in the econometrics literature.

A key assumption of this linear model (Equation 1) is the independence of the errors. Plainly speaking, it is assumed that the prices of the properties themselves, given the information \(\mathbf X\), are independent of each other. In practice, however, substantial evidence supports the existence of spatial correlations within the housing market (Goodman & Thibodeau, 2003), since property prices within a specific area tend to be interconnected due to their proximity. Additionally, these prices are also likely to demonstrate correlations over time (Yao & Fotheringham, 2016). Failure to account for these spatio-temporal effects causes bias in the estimates of the coefficients \(\{\beta_1,\dots,\beta_p\}\) and underestimation of the standard errors (Troutman, 1983), each leading to potentially invalid inferences.

Central to spatial data analysis is the utilisation of Geographical Information System (GIS) data structure together with the study variables. GIS data encompasses, among others, three main types relevant for statistical analysis: point, line, and polygon (or areal) data (Bivand et al., 2008). This study focuses on areal data, specifically analysing property price data aggregated at the sub-district (*Mukim*) level.

This present study proposes enhancing the linear model to effectively capture the spatial and temporal dependencies observed in property price data. While numerous models can perform this task, the focus here will be on exploring a specific category known as Conditionally Autoregressive (CAR) models (Besag, 1974). These models provide a spatially-dependent mean structure adjustment to the linear model (Equation 1), and are particularly useful for geocoded data points at the areal level. This category of models offers various methods to account for temporal dependencies, and the exploration will cover three specific approaches: modelling time linearly, decomposing time using ANOVA, and structuring time autoregressively. These models are particularly suited to be analysed under a Bayesian framework as they can be formulated as hierarchical models, an approach that will be adopted for the analysis in this study.

This paper unfolds as follows. Section 2 reviews relevant literature, highlighting previous studies and methodologies relevant to property valuation. Section 3 introduces the study area, providing a detailed description and a preliminary analysis of the data to set the context for the ensuing investigation. In Section 4, methodologies employed in the study are outlined. Section 5 presents the results of our analysis, offering insights into the patterns, trends, and correlations identified within the property market. Finally, Section 6 concludes the paper, summarizing the main findings, discussing their implications, and suggesting directions for future research.

# 2 Literature review

The origins and history of the hedonic regression property price model can be traced back to its introduction by Rosen (1974). Hedonic price analysis primarily focuses on explaining prices based on attributes of the properties, neighbourhood characteristics, and even geographic and locational attributes. This model has since been widely used in real property valuation, with the development of Geographic Information Systems (GIS) further supporting its application (Aladwan & Ahamad, 2019). The model’s theoretical background, empirical issues, and limitations have been extensively reviewed by Chau & Chin (2003), with a focus on its applicability to housing markets. Conventional hedonic price analysis overlooks two critical spatial aspects: spatial dependence and spatial heterogeneity (Anselin & Griffith, 1988). This has spurred on a wide range of research into spatial econometrics.

The linear regression model (Equation 1) is referred to as a *global* model, in that it assumes the same relationship between the price and the covariates across all areas. One approach to account for spatial variability is to use a spatially varying coefficient model, where the coefficients are estimated using information local to a specific location \(i\) in the dataset. This is known as Geographically Weighted Regression (GWR) (Brunsdon et al., 1998), and the model (Equation 1) is extended to become \[
Y_i = \beta_{0i} + \beta_{1i} X_{1i} + \beta_{2i} X_{2i} + \ldots + \beta_{pi} X_{pi} + \epsilon_i.
\tag{2}\] Data points are weighted according to their proximity to the location \(i\), which are typically determined by an appropriate kernel function. Ultimately, the weights \(w_{in}\), \(n=1,\dots,N\) are collected into a diagonal matrix \(\mathbf W_i = \operatorname{diag}(w_{in})\), and the GWR model is estimated by solving the weighted least squares problem which has solutions \[
\hat{\boldsymbol\beta_i} = (\mathbf X^T \mathbf W_i \mathbf X)^{-1} \mathbf X^T \mathbf W_i \mathbf Y.
\] Clearly this works better when there are multiple observations related to each area of interest. However the areas \(i=1,\dots,M\) themselves need not refer to predetermined administrative boundaries, but can be defined quite arbitrarily, e.g. a point inclusion criteria using a fixed radial distance to a central location \(c_i\). It should be noted that the computational cost can be high when \(M\) is large, as the weighted least squares has to be repeated \(M\) times.

The use of GWR or GWR-similar methodologies for hedonic house price modelling has been explored by several authors for various markets: Yu (2006) and Geng et al. (2011) for Beijing, China; McCord et al. (2012) for Belfast, Ireland; and Cellmer et al. (2020) for counties in Poland, to name a few. The work of Pavlov (2000) dispenses the linearity assumption of the model, and instead uses a non-parametric approach to model the spatially varying coefficients. This is reminiscent to Generalised Additive Models (GAMs) (Wood, 2017), in which relationships between predictors and the response are modelled with smooth functions, allowing for flexible, non-linear associations without specifying a fixed parametric form.

The GWR methodology can also be extended so that it is able to forecast and interpolate local parameters over time. A key reference in this area is Huang et al. (2010), who pioneered the use of a so-called *geographically and temporally weighted regression (GTWR)* to model variation in house prices. Fotheringham et al. (2015) applied GWR to a 19-year set of house price data in London, revealing spatio-temporal variations in determinants of house prices. Soltani et al. (2021) applied this technique to identify clustering within the Metropolitan Tehran housing value data. By comparing linear regression, GWR and GTWR model fits, Wang et al. (2020) offer benchmarks for urban mass appraisal, shedding light on spatial price characteristics in dense residential zones for improved policy recommendations in Beijing’s core area. These studies collectively demonstrate that GTWR was shown to substantially improve the goodness-of-fit and reduce absolute errors more than GWR models.

When analysing spatial data, researchers might find it more suitable to aggregate individual records into larger geographic units rather than using geocoded point data. This decision is often influenced by data availability, which may only be accessible in aggregated forms due to confidentiality concerns or collection methods. Although aggregated data may result in some loss of resolution in the analysis, there are some advantages that it brings. Certainly, the underlying spatial heterogeneity remains available for analysis, but the computational demands are reduced, making analyses more manageable, especially when temporal trends are under consideration. This is particularly useful for identifying broad spatial trends relevant to policy and decision-making. It also helps in smoothing the effects of spatial outliers and providing sufficient data density in sparsely populated areas.

Spatial autocorrelation is encoded by means of a *neighbourhood matrix* \(\mathbf W \in \mathbb R^{M \times M}\), where \(M\) is the number of distinct and non-overlapping study areas. In essence, the elements \(w_{kj}\) of \(W\) represent the “closeness” between areas labelled \(k\) and \(j\) (\(k,j=1,\dots,M\)) using some weighting scheme. Arguably the simplest method, and the one used throughout this paper, is the *binary method*: Set \(w_{kj} = 1\) if areas \(k\) and \(j\) satisfy a contiguity-based neighbourhood-defining rule (i.e. areas share one or more boundary point), and \(w_{kj} = 0\) otherwise, including when \(k=j\) (areas are not their own neighbours).

Suppose one has, for analysis, non-overlapping spatial areas indexed by \(i=1,\dots,M\). In econometrics, the spatial autoregressive (SAR) model (LeSage & Pace, 2009) is a popular choice for modelling spatial dependence: \[ \begin{gathered} Y_i = \rho \sum_{j=1}^M w_{ij} Y_j + \beta_0 + \sum_{k=1}^p \beta_kX_k + \epsilon_i \\ \epsilon_i \sim N(0, \sigma^2) \end{gathered} \tag{3}\] In addition to the coefficients \(\beta_0,\dots,\beta_p\) and the error variance \(\sigma^2\), the model includes a spatial autoregressive parameter \(\rho\) that needs to be estimated. The model is said to be ‘simultaneous’ in nature because of the dependency of the dependent variable \(Y_i\) on its neighbours, which induces autocorrelation in the error term. The parameter \(\rho\) thus quantifies the degree to which spatial dependence in the dependent variable across spatial units is present.

Many other flavours of the SAR model exist. This is best explained by way of the General Nested Spatial Models (GNSM) framework (Elhorst, 2014): \[ \begin{gathered} Y_i = \rho \sum_{j=1}^M w_{ij} Y_j + \beta_0 + \sum_{k=1}^p \beta_kX_k + \sum_{j=1}^M w_{ij}\Big(\gamma_0 + \sum_{k=1}^p \gamma_kX_k \Big) + u_i \\ u_i = \lambda \sum_{j=1}^M w_{ij} u_j + \epsilon_i \\ \epsilon_i \sim N(0, \sigma^2) \end{gathered} \tag{4}\] An additional feature that may be added to (Equation 3) is the spatial lags of the covariates, as captured by the \(\gamma\) parameters in (Equation 4). This splits the overall effect of the covariates into two components: the direct effect and the indirect effect through the spatial lags. This model is known as the Spatial Durbin Model (SDM) (Anselin & Griffith, 1988). The Spatial Error Model (SEM) is another variant of the SAR model, in which the spatial dependence is captured in the error term rather than the dependent variable. Evidently, the SAR model is a special case of the GNSM model when \(\lambda=0\) and all \(\gamma\) values are set to zero.

Accounting for time-varying data points may be done by extending the GNSM family of models (Equation 4) by appropriately indexing the time element \(t=1,\dots,T\). From here, a simple fixed effects model for each time period can be estimated. A more sophisticate method however is to incorporate either time autoregressive components, time autoregressive errors, or both. Elhorst (2022) terms this the Dynamic GNSM.

These dynamic GNSM offer similar benefits to GTWR for spatio-temporal modelling of property prices. Several studies have shown to significantly improved accuracy of house price modelling in terms of error reduction and predictive power (Liu, 2013; Pace et al., 1998). The suitability of these models to analyse real life data is demonstrated by a multitude of studies, including Basu & Thibodeau (1998), Abelson et al. (2013), Helbich et al. (2014), Cohen et al. (2016), and Stamou et al. (2017). These reference studies were conducted in different countries such as Australia, the United States, Greece, Austria, which show applicability across different economic conditions.

Another popular model choice to account for spatial heterogeneity at the areal level is the Conditional Autoregressive (CAR) model (Besag, 1974). The model appends a spatially structured random effect \(\phi_i\) to the linear regression model (Equation 1), which is assumed to follow the conditionally autoregressive distribution (Leroux et al., 2000) \[ \phi_i \mid \boldsymbol \phi_{-i} \sim \operatorname{N} \left( \frac{\rho \sum_{j=1}^M w_{ij} \phi_j }{\rho \sum_{j=1}^M w_{ij} + 1 - \rho}, \frac{\tau^2}{\rho \sum_{j=1}^M w_{ij} +1 - \rho} \right). \tag{5}\] Similar to SAR models, additional free parameters \(\rho\) and \(\tau\) are introduced, which control the level of spatial dependence among the areal units and the variability of the spatial random effects \(\phi_m\), respectively. Of note, the specification \(\rho=1\) refers to the intrinsic CAR model as outlined in Besag et al. (1991), while the specification \(\rho=0\) implies no spatial dependence, and the model reduces to a vanilla random intercepts model. For convenience, we shall refer to the CAR prior (Equation 5) by \(\operatorname{CAR}(\mathbf W, \rho, \tau^2)\).

Early applications of the CAR model was found in fields such as agriculture (e.g. Besag & Higdon, 1999), ecology (e.g. Brewer & Nolan, 2007), and recently in epidemiology (e.g. Li et al., 2011; Xie et al., 2017). In the context of property price modelling however, besides textbook examples and instructional articles (e.g. Lee, 2013; Moraga, 2019) the literature is severely lacking, with an overwhelming preference for the SAR model. This presents a ground breaking opportunity to examine the applicability of CAR-based models for property price modelling, with Brunei’s case study being the first of its kind. Extensions to CAR-type models for spatio-temporal modelling will be elaborated further in Section 4.3.

# 3 Description of study area and data

In this section, we provide an overview of the study area, Brunei Darussalam, and the data used in this study. A preliminary analysis of the data is also presented to provide a better understanding of the unique distribution of property prices in Brunei.

## 3.1 Description of study area

Brunei Darussalam, a small country on Borneo Island in Southeast Asia, is bordered by the South China Sea to the north and is surrounded by the Malaysian state of Sarawak. The nation’s 5,765 square kilometres territory is divided into two non-contiguous areas: The larger western section comprising Brunei-Muara, Tutong, and Belait districts; and the smaller eastern Temburong district. The connectivity between the two sections saw a significant enhancement in 2020 with the opening of the Sultan Omar Ali Saifuddien Temburong Bridge, allowing direct road access. Before the bridge’s completion, reaching Temburong required a journey by water or a detour through Limbang, Malaysia.