Abstract
The impact of new real estate developments is strongly associated with its target population distribution, that is, the characteristics that define a population such as composition of household, income, and socio-demographics, conditioned on characteristics of the development itself, such as dwelling typology, price, location, and floor level. This paper presents a machine learning-based method to model the population distribution of upcoming developments of new buildings within larger neighborhood/condo settings. We use a real data set from Ecopark Township, a real estate development project in Hanoi, Vietnam and study two machine learning algorithms from the deep generative models literature to create a population of synthetic agents: conditional variational auto-encoder (CVAE) and conditional generative adversarial networks (CGAN). A large experimental study was performed, showing that the CVAE outperforms both the empirical distribution, a non-trivial baseline model, and the CGAN in estimating the population distribution of new real estate development projects.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-021-06622-2/MediaObjects/521_2021_6622_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-021-06622-2/MediaObjects/521_2021_6622_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-021-06622-2/MediaObjects/521_2021_6622_Fig3_HTML.jpg)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-021-06622-2/MediaObjects/521_2021_6622_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-021-06622-2/MediaObjects/521_2021_6622_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-021-06622-2/MediaObjects/521_2021_6622_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs00521-021-06622-2/MediaObjects/521_2021_6622_Fig7_HTML.png)
Similar content being viewed by others
References
O’Donoghue C, Morrissey K, Lennon J (2014) Spatial microsimulation modelling: a review of applications and methodological choices
Shi Z, Fonseca JA, Schlueter A (2017) A review of simulation-based urban form generation and optimization for energy-driven urban design. Build Environ 121:119–129
Litman T (2014) Transportation and the quality of life. Springer, Netherlands, Dordrecht, pp 6729–6733
Deller SC, Tsai TH, Marcouiller DW, English DB (2001) The role of amenities and quality of life in rural economic growth. Am J Agr Econ 83(2):352–365
Sohn K, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R (eds) Advances in neural information processing systems. Curran Associates Inc., pp 3483–3491
Mirza M, Osindero S (2014) Conditional generative adversarial nets
Choi E, Biswal S, Malin B, Duke J, Stewart WF, Sun J (2017) Generating multi-label discrete patient records using generative adversarial networks
Yoon J, Jordon J, Van Der Schaar M (2019) PATE-GAN: generating synthetic data with differential privacy guarantees. In: International conference on learning representations
Badu-Marfo G, Farooq B, Paterson Z (2020) Composite travel generative adversarial networks for tabular and sequential population synthesis. 04
Xu L, Skoularidou M, Cuesta-Infante A, Veeramachaneni K (2019) Modeling tabular data using conditional gan. ar**v preprint ar**v:1907.00503
Garrido S, Borysov SS, Pereira FC, Rich J (2019) Prediction of rare feature combinations in population synthesis: application of deep generative modelling. Elsevier
Saadi I, Eftekhar H, Teller J, Cools M (2018) Investigating scalability in population synthesis: a comparative approach. Transp Plan Technol 41(1–12):07
Farooq B, Bierlaire M, Hurtubia R, Flötteröd G (2013) Simulation based population synthesis. Transp Res Part B Method 58:12
Sun L, Erath A (2015) A bayesian network approach for population synthesis. Transp Res Part C Emerg Technol 61:49–62
Saadi I, Mustafa A, Teller J, Farooq B, Cools M (2016) Hidden markov model-based population synthesis. Transp Res Part B Method 90(1–21):08
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press
Borysov S, Rich J, Pereira FC (2019) How to generate micro-agents? a deep generative modeling approach to population synthesis. Transp Res Part C Emerg Technol 106:73–97
Borysov SS, Rich J (2019) Introducing super pseudo panels: application to transport preference dynamics
Borysov S, Rich J, Pereira F (2019) Scalable population synthesis with deep generative modeling. Elsevier
Kingma DP, Welling M (2014) Auto-encoding variational bayes. CoRR, ar**v:1312.6114
Tschannen M, Bachem O, Lucic M (2018) Recent advances in autoencoder-based representation learning. CoRR
Kingma DP, Welling M (2019) An introduction to variational autoencoders. Foundations and trends in machine learning
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Proceedings of the 27th International conference on neural information processing systems - volume 2. NIPS’14, page 2672–2680, Cambridge, MA, USA, MIT Press
Arjovsky M, Chintala S, Bottou L (2017) Wasserstein gan
Zhao Y, Chetty G, Tran D (2019) Deep learning with xgboost for real estate appraisal. 12:1396–1401
Bin J, Gardiner B, Li E, Liu Z (2019) Peer-dependence valuation model for real estate appraisal. Data-Enabled Discov Appl 3:12
Alejandro Y, Palafox L (2019) Gentrification prediction using machine learning. Advances in soft computing. Springer
Baldominos Gómez A, José Moreno A, Iturrarte R, Bernárdez Ó, Afonso C (2018) Identifying real estate opportunities using machine learning. ar**v:1809.04933
Lv HX, Yu G, Tian XY, Wu G (2014) Deep learning-based target customer position extraction on social network. In: International conference on management science and engineering—annual conference proceedings. pp 590–595, 08
Robinson C, Dilkina B, Hubbs J, Zhang W, Guhathakurta S, Brown MA, Pendyala RM (2017) Machine learning approaches for estimating commercial building energy consumption. Appl Energy 208:889–904
Ryu SH, Moon HJ (2016) Development of an occupancy prediction model using indoor environmental data based on machine learning techniques. Build Environ 107:1–9
Lan J, Guo Q, Sun H (2018) Demand side data generating based on conditional generative adversarial networks. Energy Proc 152:1188–1193
Mae R (2019) 21 ai real estate companies to know
Go-Weekly (2020) Go weekly magazine: the 20 most innovative companies in real estate (or proptech)
CIO-Applications(2019) Top 10 proptech companies: 2019. www.proptech.cioapplicationseurope.com
Violet W, Brian H. Sidewalk labs blog: a first step toward the future of neighborhood design
Jeff B (2020) Citybldr website: https://www.citybldr.com/solutions
Localize (2020) Localize website: https://www.localize.city/
Yan X, Yang J, Sohn K, Lee H (2016) Attribute2image: conditional image generation from visual attributes
Fedus W, Goodfellow I, Dai AM (2018) Maskgan: better text generation via filling in the
Mohamed S, Rosca M, Figurnov M, Mnih A (2020) Monte carlo gradient estimation in machine learning. J Mach Learn Res 21(132):1–62
Harder F, Adamczewski K, Park M (2021) Dp-merf: Differentially private mean embeddings with randomfeatures for practical privacy-preserving data generation. In: International conference on artificial intelligence and statistics. PP 1819–1827. PMLR
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
1.1 Partial joints in the extended application data set
See Fig. 8.
Performance of the partial joints in the Extended application data set. From left to right; (1) the bivariate distribution between age and nationality, (2) the trivariate distribution between age, nationality, and prior home district, and (3) the trivariate distribution between age, prior home district, and investor. The scatter plot represents the partial joint distribution between the sampled agents from the Extended application set against the real agents from the Extended application set. The axes are denoted in normalized bin frequencies on both the vertical and the horizontal axis
Rights and permissions
About this article
Cite this article
Johnsen, M., Brandt, O., Garrido, S. et al. Population synthesis for urban resident modeling using deep generative models. Neural Comput & Applic 34, 4677–4692 (2022). https://doi.org/10.1007/s00521-021-06622-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-021-06622-2