This paper provides a systematic review of machine learning applications in automated property valuation. We analyze 87 studies published between 2015 and 2024, examining the evolution of methodologies from traditional hedonic pricing models to advanced deep learning approaches. Our findings indicate that ensemble methods, particularly Random Forest and Gradient Boosting, consistently outperform traditional regression models, achieving median absolute percentage errors (MdAPE) of 5-8% compared to 12-15% for conventional approaches.
本文針對機器學習在自動化不動產估價之應用進行系統性文獻回顧。我們分析了 2015 年至 2024 年間發表的 87 篇研究,檢視方法論從傳統特徵價格模型演進至先進深度學習方法的發展歷程。研究發現,集成方法(特別是隨機森林與梯度提升)持續優於傳統迴歸模型,達到 5-8% 的中位數絕對百分比誤差(MdAPE),相較於傳統方法的 12-15%。
We identify key factors influencing model performance, including feature engineering strategies, spatial dependencies, and market heterogeneity. The review highlights emerging trends such as the integration of computer vision for property image analysis and the use of natural language processing for extracting value-relevant information from property descriptions. We conclude with recommendations for practitioners and directions for future research.
我們識別出影響模型表現的關鍵因素,包括特徵工程策略、空間相依性及市場異質性。本回顧強調了新興趨勢,如整合電腦視覺進行物件影像分析,以及使用自然語言處理從物件描述中擷取價值相關資訊。最後,我們提出實務建議及未來研究方向。
Property valuation is a fundamental activity in real estate markets, serving critical functions in transactions, taxation, mortgage lending, and investment decision-making (Pagourtzi et al., 2003). Traditional valuation approaches rely heavily on the expertise of professional appraisers who assess property characteristics and market conditions to estimate value. However, the subjective nature of this process can lead to inconsistencies and potential biases (Crosby et al., 2018).
不動產估價是房地產市場的基礎活動,在交易、稅務、抵押貸款及投資決策中扮演關鍵角色(Pagourtzi et al., 2003)。傳統估價方法高度依賴專業估價師的專業知識,透過評估物件特徵與市場條件來估計價值。然而,這一過程的主觀性可能導致不一致性及潛在偏誤(Crosby et al., 2018)。
The emergence of big data and advances in machine learning have created opportunities to develop more objective and scalable valuation methods. Automated Valuation Models (AVMs) leverage statistical and machine learning techniques to estimate property values based on transaction data and property characteristics (Glumac & Des Rosiers, 2021). These models have gained significant traction in the industry, with major financial institutions and technology companies investing heavily in AVM development.
大數據的興起與機器學習的進步,為開發更客觀且可擴展的估價方法創造了契機。自動估價模型(AVM)運用統計與機器學習技術,根據交易資料與物件特徵估計不動產價值(Glumac & Des Rosiers, 2021)。這些模型在業界獲得顯著關注,主要金融機構與科技公司大量投資於 AVM 開發。
Despite the growing body of literature on machine learning-based property valuation, there remains a need for a comprehensive synthesis of existing research. Previous reviews have focused on specific aspects, such as particular algorithms (Abidoye & Chan, 2017) or geographic regions (Mayer et al., 2019). This paper aims to provide a holistic overview of the field, examining methodological advances, performance benchmarks, and practical implementation considerations.
儘管基於機器學習的不動產估價文獻日益增加,仍需對現有研究進行全面性的綜合分析。先前的回顧僅聚焦於特定面向,如特定演算法(Abidoye & Chan, 2017)或特定地理區域(Mayer et al., 2019)。本文旨在提供該領域的整體性概述,檢視方法論進展、績效基準及實務應用考量。
The application of quantitative methods to property valuation has a long history, with the hedonic pricing model (Rosen, 1974) serving as the theoretical foundation for most approaches. Hedonic models decompose property prices into constituent attributes, allowing researchers to estimate implicit prices for individual characteristics such as location, size, and amenities.
量化方法在不動產估價的應用歷史悠久,其中特徵價格模型(Rosen, 1974)作為大多數方法的理論基礎。特徵價格模型將房價分解為構成屬性,使研究者能夠估計個別特徵(如區位、面積與設施)的隱含價格。
2.1 Traditional Statistical Approaches
2.1 傳統統計方法
Multiple regression analysis has been the dominant technique in hedonic pricing studies. Ordinary Least Squares (OLS) regression provides interpretable coefficients but assumes linear relationships and homoscedasticity, assumptions often violated in real estate data (Fik et al., 2003). Researchers have addressed these limitations through various extensions, including log-linear specifications, spatial econometric models, and quantile regression.
多元迴歸分析一直是特徵價格研究的主流技術。普通最小平方法(OLS)迴歸提供可解釋的係數,但假設線性關係與同質變異數,這些假設在房地產資料中經常被違反(Fik et al., 2003)。研究者透過各種延伸方法來處理這些限制,包括對數線性設定、空間計量經濟模型及分量迴歸。
Spatial econometric models explicitly account for spatial dependencies in property values. The Spatial Lag Model (SLM) incorporates the influence of neighboring property prices, while the Spatial Error Model (SEM) addresses spatially correlated error terms (Anselin, 1988). These models have demonstrated improved predictive performance in many empirical applications, particularly in dense urban markets where spatial spillovers are pronounced.
空間計量經濟模型明確考量不動產價值的空間相依性。空間滯後模型(SLM)納入鄰近物件價格的影響,而空間誤差模型(SEM)則處理空間相關的誤差項(Anselin, 1988)。這些模型在許多實證應用中展現了改善的預測績效,特別是在空間外溢效果顯著的高密度都市市場。
2.2 Machine Learning Methods
2.2 機器學習方法
Machine learning methods offer several advantages over traditional regression approaches. They can automatically capture non-linear relationships and complex interactions without requiring explicit specification. Furthermore, ensemble methods combine multiple models to reduce overfitting and improve generalization performance (Breiman, 2001).
機器學習方法相較於傳統迴歸方法具有多項優勢。它們能夠自動捕捉非線性關係與複雜交互作用,而無需明確設定。此外,集成方法結合多個模型以減少過度擬合並提升泛化績效(Breiman, 2001)。
Random Forest (RF) has emerged as a particularly popular method in property valuation research. RF constructs multiple decision trees using bootstrap samples and random feature subsets, then aggregates predictions through averaging. Studies consistently report that RF outperforms linear regression models, with typical improvements in prediction accuracy of 20-40% as measured by mean absolute error (Antipov & Pokryshevskaya, 2012; Hong et al., 2020).
隨機森林(RF)已成為不動產估價研究中特別受歡迎的方法。RF 使用自助抽樣與隨機特徵子集建構多棵決策樹,然後透過平均匯總預測結果。研究一致顯示 RF 優於線性迴歸模型,以平均絕對誤差衡量,預測準確度通常提升 20-40%(Antipov & Pokryshevskaya, 2012; Hong et al., 2020)。
We conducted a systematic literature review following the PRISMA guidelines (Moher et al., 2009). The search strategy encompassed major academic databases including Web of Science, Scopus, and Google Scholar. We used the following search terms: ("property valuation" OR "real estate appraisal" OR "house price prediction") AND ("machine learning" OR "artificial intelligence" OR "neural network" OR "random forest" OR "deep learning").
我們依據 PRISMA 指南(Moher et al., 2009)進行系統性文獻回顧。檢索策略涵蓋主要學術資料庫,包括 Web of Science、Scopus 及 Google Scholar。我們使用以下檢索詞:(「property valuation」或「real estate appraisal」或「house price prediction」)且(「machine learning」或「artificial intelligence」或「neural network」或「random forest」或「deep learning」)。
The initial search yielded 1,247 records. After removing duplicates and screening titles and abstracts, 156 papers were selected for full-text review. We applied inclusion criteria requiring: (1) empirical application to residential or commercial property valuation, (2) use of at least one machine learning method, and (3) quantitative performance evaluation. The final sample comprised 87 studies.
初步檢索獲得 1,247 筆紀錄。移除重複文獻並篩選標題與摘要後,選定 156 篇論文進行全文審查。我們採用的納入標準要求:(1)實證應用於住宅或商業不動產估價,(2)使用至少一種機器學習方法,及(3)量化績效評估。最終樣本包含 87 篇研究。
Table 1 summarizes the performance of different machine learning methods across the reviewed studies. Ensemble methods demonstrated the strongest performance, with Random Forest achieving a median MdAPE of 6.2% and Gradient Boosting achieving 5.8%. Neural networks showed higher variance in performance, with MdAPE ranging from 4.5% to 12.3% depending on architecture and training approach.
表 1 彙整了各種機器學習方法在所回顧研究中的績效表現。集成方法展現最強的績效,隨機森林達到 6.2% 的中位數 MdAPE,梯度提升達到 5.8%。神經網路的績效變異較大,MdAPE 介於 4.5% 至 12.3% 之間,取決於架構與訓練方法。
A key finding is the importance of feature engineering in model performance. Studies that incorporated spatial features, such as distance to amenities and neighborhood characteristics, consistently achieved better results than those relying solely on property-level attributes. Geographic information systems (GIS) integration has become increasingly common, enabling the calculation of precise spatial metrics.
一項關鍵發現是特徵工程對模型績效的重要性。納入空間特徵(如與設施的距離及鄰里特徵)的研究,持續比僅依賴物件層級屬性的研究取得更好的結果。地理資訊系統(GIS)的整合日益普遍,使精確空間指標的計算成為可能。
Recent studies have explored the integration of unstructured data sources. Computer vision techniques have been applied to property photographs to extract features related to property condition, architectural style, and view quality (Law et al., 2019). Natural language processing has been used to analyze property descriptions, capturing qualitative aspects not reflected in structured attributes (Sirmans et al., 2005).
近期研究已探索非結構化資料來源的整合。電腦視覺技術已被應用於物件照片,以擷取與物件狀況、建築風格及景觀品質相關的特徵(Law et al., 2019)。自然語言處理已被用於分析物件描述,捕捉結構化屬性無法反映的質性面向(Sirmans et al., 2005)。
This systematic review has examined the application of machine learning methods to automated property valuation. Our analysis of 87 studies reveals significant advances in prediction accuracy, with ensemble methods and deep learning approaches consistently outperforming traditional statistical models. However, challenges remain in terms of model interpretability, data quality, and transferability across markets.
本系統性回顧檢視了機器學習方法在自動化不動產估價的應用。我們對 87 篇研究的分析顯示預測準確度有顯著進展,集成方法與深度學習方法持續優於傳統統計模型。然而,在模型可解釋性、資料品質及跨市場可移轉性方面仍存在挑戰。
For practitioners, we recommend starting with well-established methods such as Random Forest or Gradient Boosting before exploring more complex architectures. Careful attention to feature engineering, particularly spatial features, is essential for achieving optimal performance. For researchers, promising directions include the development of interpretable machine learning methods and the integration of multimodal data sources.
對實務工作者而言,我們建議在探索更複雜的架構之前,先從成熟的方法(如隨機森林或梯度提升)開始。仔細關注特徵工程,特別是空間特徵,對於達到最佳績效至關重要。對研究者而言,有前景的方向包括開發可解釋的機器學習方法,以及整合多模態資料來源。