This initial, observational study introduces a few of the challenges in estimating the impact of flooding expectations on real property purchase price.
Responding to the global effects of climate change — whether by mitigation or adaptation — requires that we, as humans, collectively change how we live. In many ways, how we live is governed by what policies we, or whomever made those policies, decided. Perhaps an effective way in some societies to enact change for the common good is, sadly, for those in power — e.g., those wealthy — to internalize future effects. Thus, if someone demonstrates that climate change negatively affects those in power to change policies, change is more likely.
If wealthy property owners had credible analyses that estimated the effects of climate change, like sea level rise, on property values, through expected flooding, we’d have their attention.
That task in establishing a causal link between climate change and property value is, however, complex. Property value depends on many factors, even economies. A property cannot both flood and not flood at a moment in time. And we cannot run a controlled trial, randomly choosing to flood properties in a given area and measuring the resulting sale of properties in each group. Even if we could, each land property is unique at least in its geographical location: a neighbors’ land is not an owner’s land.
Further, even if all factors were known, there is no single true value for a property; it entirely depends upon a single-person’s willingness to purchase it at a given time. Thus, in reality we are trying to link the complexities of, say, expected future flooding (due to climate change, itself having large uncertainty) with perceived value.
As an exercise, let’s work with publicly available data on both property information, sale information, historic and estimated sea-level rise, and attempt to model an association.
There are several existing studies regarding home valuation Bao and Wan (2004), Crosby et al. (2016), and Montero, Mínguez, and Fernández-Avilés (2017). And a few studies have been attempted specifically on the association between sea-level rise and home value Bernstein, Gustafson, and Lewis (2019), Keenan, Hill, and Gumber (2018), McAlpine and Porter (2018). This study aims to build on the prior work, using a Bayesian framework, and moves forward the conversation.
Let’s define the expected proportion of property flooded as based on the highest tide in a given year. Public data is available and provides an interpolation of point estimates for expected proportion of each unique property flooded.
For this exploration of flooding impact on property value, we focus on the coastal Mid-Atlantic region that includes Maryland, District of Columbia, and Delaware:
Within this region, our data represent property sale price and corresponding characteristics of coastal properties. We’ll match properties with no estimates of expected flooding with similar properties where estimated sea level is, as interpolated, at least partially above the property elevation.
For each observed sale of a property estimated to be at least partially-flooded, these data include a corresponding observed sale of a property with no estimated flooding — i.e., all property elevation is higher than estimated sea level — but which shares or matches other characteristics of the at least partially-flooded property to be matched. The matching characteristics include census tract1, distance from coast, area of the building, area of the lot, year of observed sale, and more.
Our method of matching follows Ho et al. (2007). Below, we review the location of those matches.
Across the region, we find variation in the general distances between properties with and without expected flooding as shown by closer inspection of three areas:
The area near Easton, for example, has extensive flooding and, thus, relatively fewer properties without expected flooding nearby that can be compared to those with expected flooding.
In the matched data used for modeling, some nested regions have few — even one — observation while others have many,
The majority of blocks with a single observation had no expected flooding, which is unsurprising: a block with a flooded property would likely include a matched non-flooded property, too, since the match was based on distance from coast within a census tract. And if all properties in the block had some expected flooding, the number of observations would be higher. In that case, the match may be from a neighboring block.
Next we compare the distributions and means of measurements used as parameters for properties with (blue) and without (gray) expected flooding:
From the property-level estimates of proportion flooding of all properties — not just those sold — we estimated weighted averages of proportions of residential property flooding for blocks, block groups, census tracts, and counties for each year.
For each property sale, we’ve collected the following categorical information:
Variable | Distinct | Distribution |
Block Group | 787 | |
Block | 7,426 | |
County | 22 | |
Property ID | 35,228 | |
Instrument Date | 3,425 | |
Year of sale | 13 | |
State | 3 | |
Tract | 339 |
and numerical measurements:
Variable | Min | Distribution | Max | Distinct |
Distance from coast | 0.00 | 55,951.00 | 1,612 | |
Flooding: Block Group | 0.00 | 0.93 | 21,908 | |
Flooding: Block | 0.00 | 1.00 | 25,600 | |
Flooding: County | 0.00 | 0.57 | 3,152 | |
Flooding: Property | 0.00 | 1.00 | 15,349 | |
Flooding: State | 0.00 | 0.04 | 286 | |
Flooding: Tract | 0.00 | 0.90 | 5,168 | |
Log area building | 4.72 | 10.17 | 4,125 | |
Log area lot acres | -4.78 | 9.59 | 31,760 | |
Log distance from coast | 0.00 | 10.93 | 1,612 | |
Price per squart foot | 10.03 | 544.68 | 28,011 | |
Flooding: Adjacent Road(s) | 0.00 | 1.00 | 22,528 | |
Transfer amount | 4,450.00 | 9,500,000.00 | 6,569 | |
Longitude | -77.25 | -74.77 | 35,228 | |
Latitude | 37.96 | 40.25 | 35,228 | |
Year built | 1,700.00 | 2,018.00 | 224 |
For each observed sale, These data represent the lot area and proportion of expected flooding the year of the sale, and for each enclosing boundary, the area and proportion of residential property flooding outside or beyond that interior boundary. These boundaries include property lot, and census defined block, block group, tract, and county. In some cases, a boundary only encloses one immediate interior boundary. For those, the two boundaries are of the same size, and the outer boundary will have no area, no proportion flooding beyond the enclosed boundary.
Conditional on the above data, we have parameterized a working model as,
\[\begin{equation} \begin{split} y \sim &\textrm{normal}(\mu, \sigma) \\ \mu = &\textrm{b}_1 \cdot \exp(\sum_{k=1}^{K}{\textrm{b}_{2,k}}) \end{split} \end{equation}\]
where \(y\) represents overall price per square foot \((\frac{\$}{ft^2})\), \(\textrm{b}_1\) represents property value, and \(\exp(\sum_{k=1}^{K}{\textrm{b}_{2,k}})\) a discount of that value due to impact of of flooding, on the property, on nearby roads, and within the property’s various geographic boundaries.
Focusing on parameterization for our estimate of unflooded property value, we include parameters for physical attributes: location (latitude and longitude), distance from coast, square footage of the primary building, acrage of the lot. We also attempt to account for, as mentioned, geographic boundaries including block, block group, census tract, and county. Finally, we attempt to account for year of sale. We parameterize these as,
\[\begin{equation} \begin{split} \textrm{b}_1 = &\mathcal{f}_{\textrm{latitude}, \textrm{longitude}, \textrm{year}}(\cdot) + \\ &\mathcal{f}_{\textrm{coastal distance}}(\cdot) + \\ &\mathcal{f}_{\textrm{log area building}}(\cdot) + \\ &\mathcal{f}_{\textrm{log area lot}}(\cdot) + \\ &\mathcal{f}_{\textrm{year built}}(\cdot) + \\ &\alpha_{\textrm{year}[i]} + \\ &\alpha_{\textrm{block}[i]}\\ \end{split} \end{equation}\]
Of note, \(\alpha_{\textrm{block}[i]}\) is nested within \(\alpha_{\textrm{block group}[i]}\), which is nested within \(\alpha_{\textrm{census tract}[i]}\), which is nested within \(\alpha_{\textrm{county}[i]}\). We describe these nesting relationships below.
The first several parameters depend upon smoothing functions \(\mathcal{f}(\cdot)\), generically described as,
\[\begin{equation} \begin{split} \mathcal{f}_{\textrm{x}, \textrm{z}, \textrm{v}}(\cdot) = \sum_{i=1}^{I}\sum_{l=1}^{L}\sum_{k=1}^{K} \beta_{ilk} b_k(\textrm{v})d_l(\textrm{z})a_i(\textrm{x}) \end{split} \end{equation}\]
for the exemplary three measured variables \([x_i, z_i, v_i]\), each having its own set of smoothing functions (\(b_k(\cdot), d_l(\cdot), a_i(\cdot)\) respectively). In the case of isotropic relationships among the observations such as, perhaps, latutude and longitude, the base functions \(b(\cdot)\) and \(d(\cdot)\) can be the same. We explored a first model isotropically, without the interaction of time within the smooth — \(\mathcal{f}_{\textrm{lat}, \textrm{long}}(\cdot)\) — along with a second model including this interaction. For the second, we cannot assume isotropy for the function \(\mathcal{f}_{\textrm{lat}, \textrm{long}, \textrm{year}}(\cdot)\) as it operates over time as well as space. For the univariate smoothing functions \(f(\cdot)\), like with \(\mathcal{f}_{\textrm{coastal distance}}(\cdot)\), only one base function is present.
We partially-pool information temporally — to attempt to account for economic factors, for example — and geographically. Accounting for year of sale in this way allows us to marginalize value across years.
\[\begin{equation} \begin{split} \alpha_{\textrm{year}[i]} \sim &\textrm{normal}(\mu_{\textrm{year}[i]}, \sigma_{\textrm{year}}) \\ \mu_{\textrm{year}} \sim & \textrm{normal}(0, 1) \\ \sigma_{\textrm{year}} \sim &\textrm{t}^+(3, 0, 80) \\ \end{split} \end{equation}\]
We add robustness by modeling variation as a very-weakly informative student-t distribution. As with year-of-sale, we partially-pool information geographically, and pass information from larger regions into their nested regions, parameterized as:
\[\begin{equation} \begin{split} \alpha_{\textrm{county}[i]} \sim &\textrm{normal}(\mu_{\textrm{county}[i]}, \sigma_{\textrm{county}})\\ \mu_{\textrm{county}} \sim & \textrm{normal}(0, .6) \\ \sigma_{\textrm{county}} \sim &\textrm{t}^+(3, 0, 80) \\ \\ \alpha_{\textrm{census tract}[i]} = &\alpha_{\textrm{county}[i]} + \textrm{normal}(\mu_{\textrm{census tract}[i]}, \sigma_{\textrm{census tract}})\\ \mu_{\textrm{census tract}} \sim & \textrm{normal}(0, .6) \\ \sigma_{\textrm{census tract}} \sim &\textrm{t}^+(3, 0, 80) \\ \\ \alpha_{\textrm{block group}[i]} = &\alpha_{\textrm{census tract}[i]} + \textrm{normal}(\mu_{\textrm{block group}[i]}, \sigma_{\textrm{block group}})\\ \mu_{\textrm{block group}} \sim & \textrm{normal}(0, .6) \\ \sigma_{\textrm{block group}} \sim &\textrm{t}^+(3, 0, 80) \\ \\ \alpha_{\textrm{block}[i]} = &\alpha_{\textrm{block group}[i]} + \textrm{normal}(\mu_{\textrm{block}[i]}, \sigma_{\textrm{block}})\\ \mu_{\textrm{block}} \sim & \textrm{normal}(0, .6) \\ \sigma_{\textrm{block}} \sim &\textrm{t}^+(3, 0, 80) \\ \\ \end{split} \end{equation}\]
such that the mean (\(\mu_{\textrm{subregion}}\)) of each nested region is partially-pooled toward the mean of its enclosing region (\(\mu_{\textrm{region}}\)). More specifically, for sub-regions with few observations, we are less certain of their true averages and the \(\textrm{normal}(0, 0.6)\) priors for these sub-regions pulls their means towards their enclosing regions, while sub-regions with more information increase certainty of their averages, diminishing the effect of their enclosing regions.
The parameters of interest are those related to proportional flooding of roads, property, and surrounding geography including blocks, block groups, census tracts, and counties. We parameterized these within \(\exp(\textrm{b}_2)\) such that the sum of flooding information is multiplicative of property base value \(\textrm{b}_1\). The total effect of flooding \(\textrm{b}_2\) is comprised, as mentiond, of proportion of flooding in the property, road within one tenth of a mile of the property boundary, block, block group, census tract, and county.
\[\begin{equation} \begin{split} \textrm{b}_2 = &\chi_{\textrm{lot}} \cdot \gamma_{\textrm{lot}} +\\ &\chi_{\textrm{road}} \cdot \gamma_{\textrm{road}} +\\ &\chi_{\textrm{block}} \cdot \gamma_{\textrm{block}} +\\ &\chi_{\textrm{block group}} \cdot \gamma_{\textrm{block group}} +\\ &\chi_{\textrm{census tract}} \cdot \gamma_{\textrm{census tract}} +\\ &\chi_{\textrm{county}} \cdot \gamma_{\textrm{county}}\\ \end{split} \end{equation}\]
Each of the \(\chi\), like \(\chi_{\textrm{lot}}\), represents the perceived impact as a function of proportion flooding, and we estimate their scales with \(\gamma\). We expect the relationship of \(\chi\) to proportion of flooding will be monotonically increasing — as proportion of flooding increases, so does the perceived impact on value — but we have little additional information of its shape. Should we expect percentage discount on value to be linear with increases in proportion flooding? Or might, perhaps, buyers know less until the proportion flooding reaches some threshold? And how much more should proportion flooding impact value once the lot is substantially flooded? We can estimate the shape of these functions with the Kumaraswamy cumulative distribution function, which obtains its form through two shape parameters, \(\alpha\) and \(\beta\) Jones (2009), parameters we can estimate from the data. Absent information suggesting a particular shape, we provide informative shape priors centered at one, which results in a linear relationship.
\[\begin{equation} \begin{split} \chi_{\textrm{lot}} = &F_\textrm{Kumaraswamy}(\textrm{x}_{\textrm{lot}}, \alpha_{\textrm{lot}}, \beta_{\textrm{lot}}) \\ \chi_{\textrm{road}} = &F_\textrm{Kumaraswamy}(\textrm{x}_{\textrm{road}}, \alpha_{\textrm{road}}, \beta_{\textrm{road}}) \\ \chi_{\textrm{block}} = &F_\textrm{Kumaraswamy}(\textrm{x}_{\textrm{block}}, \alpha_{\textrm{block}}, \beta_{\textrm{block}}) \\ \chi_{\textrm{block group}} = &F_\textrm{Kumaraswamy}(\textrm{x}_{\textrm{block group}}, \alpha_{\textrm{block group}}, \beta_{\textrm{block group}}) \\ \chi_{\textrm{census tract}} = &F_\textrm{Kumaraswamy}(\textrm{x}_{\textrm{census tract}}, \alpha_{\textrm{census tract}}, \beta_{\textrm{census tract}}) \\ \chi_{\textrm{county}} = &F_\textrm{Kumaraswamy}(\textrm{x}_{\textrm{county}}, \alpha_{\textrm{county}}, \beta_{\textrm{county}}) \\ \end{split} \end{equation}\]
A value of \(1\) for shape parameters \(\alpha\) and \(\beta\) would result in a linear relationship between proportion of flooding and proportion of impact. We center these priors to hold a linear relationship unless the data provides information to the contrary. And regardless, the shape parameters for a kumaraswamy CDF must be greater than \(0\). Thus, We model the shape parameters as distributed gamma, setting \(\alpha\) to, say, \(7\) in the context of this likelihood Gelman, Simpson, and Betancourt (2017), and beta with a mean equal to \(\alpha\), but having its own distribution shared among the vectors of \(\alpha\) and \(\beta\):
\[\begin{equation} \begin{split} \\ \begin{bmatrix} \alpha_{\textrm{lot}} \\ \alpha_{\textrm{road}} \\ \alpha_{\textrm{block}} \\ \alpha_{\textrm{block group}} \\ \alpha_{\textrm{county}} \end{bmatrix} \sim &\textrm{gamma}(7, \beta) \\ \\ \beta \sim &\textrm{normal}(7, 1) \\ \end{split} \end{equation}\]
and,
\[\begin{equation} \begin{split} \\ \begin{bmatrix} \beta_{\textrm{lot}} \\ \beta_{\textrm{road}} \\ \beta_{\textrm{block}} \\ \beta_{\textrm{block group}} \\ \beta_{\textrm{county}} \end{bmatrix} \sim &\textrm{gamma}(7, \beta) \\ \\ \beta \sim &\textrm{normal}(7, 1) \\ \end{split} \end{equation}\]
We then estimate the scales of this perceived impact using \(\gamma\) parameters. As we expect the proportion of impact from flooding from various geographies — e.g., of road, block, tract, county — to be correlated, we model their scales \(\gamma\) as distributed exponential and, like with the kumaraswamy shape parameters above, include a hyper-parameter for the rate \(\lambda\) to share information among the vector of \(\gamma\):
\[\begin{equation} \begin{split} \\ \begin{bmatrix} \gamma_{\textrm{lot}} \\ \gamma_{\textrm{road}} \\ \gamma_{\textrm{block}} \\ \gamma_{\textrm{block group}} \\ \gamma_{\textrm{county}} \end{bmatrix} \sim &\textrm{exponential}( \frac{1}{\lambda}) \\ \\ \lambda \sim &\textrm{gamma}(7, 7) \\ \end{split} \end{equation}\]
We coded an exploratory model, conditioned on a subset of the matches, stratified by census tract and whether any expected flooding was present on the property, of observed sales. From each strata, we included the minimum of the total observations or, if more, 25 random samples.
Next, we show the expected marginal contribution from the smooth of the spatial components for latitude and longitude to base price per square foot.
The expected marginal contribution of longitude and latitude to a property’s base price per square foot follow,
The marginal contributions of the smooth over both latitude and longitude to base price per square foot, overlain onto the region.
Remaining smooth parameters,
The model then estimates a discount on the base value of the property for flooding either on the property or in the surrounding areas.
Let’s review 1,000 draws from the posterior of the linear predictors in which we are interested:
Variable | Type | Min | Distribution | Max | Distinct |
b_b2_alpha[1] | numeric | 0.2 | 4.5 | 1,000 | |
b_b2_alpha[2] | numeric | 0.2 | 3.7 | 1,000 | |
b_b2_alpha[3] | numeric | 0.3 | 3.7 | 1,000 | |
b_b2_alpha[4] | numeric | 0.3 | 3.3 | 1,000 | |
b_b2_alpha[5] | numeric | 0.2 | 3.9 | 1,000 | |
b_b2_alpha[6] | numeric | 0.4 | 2.9 | 1,000 | |
b_b2_beta[1] | numeric | 0.2 | 2.8 | 1,000 | |
b_b2_beta[2] | numeric | 0.1 | 2.7 | 1,000 | |
b_b2_beta[3] | numeric | 0.2 | 3.8 | 1,000 | |
b_b2_beta[4] | numeric | 0.1 | 3.3 | 1,000 | |
b_b2_beta[5] | numeric | 0.1 | 2.9 | 1,000 | |
b_b2_beta[6] | numeric | 0.2 | 3.1 | 1,000 | |
b_b2[1] | numeric | -0.1 | 0.0 | 1,000 | |
b_b2[2] | numeric | -0.1 | 0.0 | 1,000 | |
b_b2[3] | numeric | -0.1 | 0.0 | 1,000 | |
b_b2[4] | numeric | -0.6 | 0.0 | 1,000 | |
b_b2[5] | numeric | -0.3 | 0.0 | 1,000 | |
b_b2[6] | numeric | -3.3 | 0.0 | 1,000 |
The \(b\_b2[\cdot]\) parameters correspond with road, lot, block, block group, census tract, and county. We show the posterior distributions of their scale, along with the Kumaraswamy CDF transform of proportion flooding to proportion of impact using 1000 draws from its parameters below.
By multiplying the proportion of impact to the scale of impact, we get the scale of impact given proportion flooding for each geographic band surrounding a property.
Diagnostic information on the Hamiltonian Monte Carlo simulations did not flag any computational issues:
Divergences:
0 of 6000 iterations ended with a divergence.
Tree depth:
0 of 6000 iterations saturated the maximum tree depth of 11.
Energy:
E-BFMI indicated no pathological behavior.
Similarly, leave-one-out, pareto-smoothed importance sampling did not indicate observations that were highly influential in fitting parameters:
The realized, expected value lost in properties sold due to flooding is the expected value of all properties sold had no flooding been present, either on the lot or within its relevant geographic boundaries, less the price sold with the expected flooding present. Of properties sold between 2005-2018, the distribution in fractional discount of price per square foot from flooding was,
The total impact on a property value then depends on the combined effect of proportion of flooding in all relevant geographic bands surrounding that property, but the model suggests that expected flooding in the surrounding areas matter more than expected flooding on the property.
The year-over-year expected discounted value from flooding due to sea level rise in this mid-Atlantic region is the expected value of all properties in the region given the proportion of flooding across the region in a given year less the expected value of those properties given the proportion of flooding across the region in the previous year after attempting to account for factors external to property characteristics, location, and flooding: e.g., economy. We attempted to account for these factors each year with \(\alpha_{year}\).
From an initial model such as this, we can continue in many directions. We might, for example, jointly model proportion of flooding due to sea-level rise with this current model to capture and propogate uncertainty in those interpolations. Secondly, we might consider incorporating a measurement error model. Third, the incorporation of additional information about the property, sale, and context seems helpful Crosby et al. (2016). Fourth, we might model the probability of a sale and, conditioned on that sale, model its perceived property value. e.g., a selection model van Hasselt (2011), Doğan and Taşpinar (2017). Fifth, we might reparameterize model using nearest neighbor Gaussian processes. Datta et al. (2016). Finally, we might incorporate information on repeat sales. All these approaches provide information on how we might model the association between see-level rise and perceived property value.
We matched at this level of region because some smaller geographic regions, namely, blocks and in a few cases block groups, did not contain non-flooded property with which to match.↩︎