The importance of geographic space to minimize the error of representative samples

Authors

  • Ricardo Truffello Robledo Pontificia Universidad Católica de Chile. Instituto de Estudios Urbanos y Territoriales
  • Monica Flores Castillo Pontificia Universidad Católica de Chile. Subdirectora Observatorio de Ciudades UC
  • Matías Garreton Universidad Adolfo Ibáñez (Chile). Profesor Asistente, Design Lab
  • Gonzalo Ruz Universidad Adolfo Ibáñez (Chile).Facultad de Ingeniería y Ciencias. Center of Applied Ecology and Sustainability (CAPES), Santiago, Chile

Keywords:

Regionalization, spatial stratification, spatial sampling

Abstract

This paper discusses the importance of geographic space in the context of generating a sample framework for surveys, questioning the traditional statistical premise of randomness and independence of the number of observations. The contribution of quantitative geography in the generation of regionalization methodologies is analyzed, since these allow the improvement of the sampling error of the surveys, focusing mainly on urban areas, and in the presence of stratification variables with spatial autocorrelation.
Regionalization algorithms with and without heuristic optimization processes are empirically tested, using census data, to subsequently define the level of error and establish comparisons against traditional random and two-stage random sampling, using a Monte Carlo procedure.
The results obtained show a decrease of up to 20% in error against traditional methodologies or alternatively, a reduction of up to 100 cases with the same level of error. It is concluded that spatialized sampling methodologies with heuristic optimization offer advantages in urban areas, in the presence of spatial autocorrelation.

Author Biographies

Ricardo Truffello Robledo, Pontificia Universidad Católica de Chile. Instituto de Estudios Urbanos y Territoriales

Prodesor asistente adjunto, Instituto de Estudios urbanos y Territoriales, Director Observatorio de Ciudades UC, investigador de CEDEUS

Monica Flores Castillo, Pontificia Universidad Católica de Chile. Subdirectora Observatorio de Ciudades UC

Subdirectora Observatorio de Ciudades UC, Pontificia Universidad Católica de Chile

Matías Garreton, Universidad Adolfo Ibáñez (Chile). Profesor Asistente, Design Lab

Profesor Asistente, Design Lab , Universidad Adolfo Ibáñez, Jefe de Investigación Design Lab, UAI.

Gonzalo Ruz, Universidad Adolfo Ibáñez (Chile).Facultad de Ingeniería y Ciencias. Center of Applied Ecology and Sustainability (CAPES), Santiago, Chile

Profesor Titular, Facultad de Ingeniería y Ciencias, Universidad Adolfo Ibáñez, Santiago, ChileCenter of Applied Ecology and Sustainability (CAPES), Santiago, Chile

References

Arretx, C. (1989). La conciliacion censal. Retrieved from http://repositorio.cepal.org/handle/11362/32637

Borchsenius, L. (2001). From a Conventional To a Register-Based Census of Population. Census Seminar, 20–21. Retrieved from http://www.demography-lab.prd.uth.gr/european-census/Files/general-data/Insee-Eurostat/borchsenius.pdf

Bravo, D., Larrañaga, O., Millán, I., Ruiz, M., & Zamorano, F. (2013). Informe final Comisión externa revisora del CENSO 2012. Resumenes I Congreso Iberoamericano de Gestión Integrada de Áreas Litorales., 23–30. Retrieved from http://www.censo.cl/documentos/informe_final-comision-nacional.pdf

Brus, D. J., & De Gruijter, J. J. (1997, October 1). Random sampling or geostatistical modelling? Choosing between design-based and model-based sampling strategies for soil (with Discussion). Geoderma, Vol. 80, pp. 1–44. https://doi.org/10.1016/S0016-7061(97)00072-4

Cochran W.G. (1977). Sampling Techniques. Retrieved from http://agris.fao.org/agris-search/search.do?recordID=XF2015028634

Cohen, B. (2006). Urbanization in developing countries: Current trends, future projections, and key challenges for sustainability. Technology in Society, 28(1–2), 63–80. https://doi.org/10.1016/j.techsoc.2005.10.005

Cook, L. (2004). The quality and qualities of population statistics, and the place of the census. Area, 36(2), 111–123. https://doi.org/10.1111/j.0004-0894.2004.00208.x

Cressie, N. A. C. (1993). 01 Statistics for Spatial Data. In Statistics for Spatial Data (pp. 1–26). https://doi.org/10.1002/9781119115151

de Gruijter, J. J., & ter Braak, C. J. F. (1990). Model-free estimation from spatial samples: A reappraisal of classical sampling theory. Mathematical Geology, 22(4), 407–415. https://doi.org/10.1007/BF00890327

Duque, Juan C., Anselin, L., & Rey, S. J. (2012). The max-p-regions problem. Journal of Regional Science, 52(3), 397–419. https://doi.org/10.1111/j.1467-9787.2011.00743.x

Duque, Juan Carlos, Ramos, R., & Suriñach, J. (2007). Supervised regionalization methods: A survey. International Regional Science Review, 30(3), 195–220. https://doi.org/10.1177/0160017607301605

ESRI. (2018). Análisis de agrupamiento—Ayuda | ArcGIS Desktop.
Retrieved February 29, 2020, from https://desktop.arcgis.com/es/arcmap/10.3/tools/spatial-statistics-toolbox/grouping-analysis.htm

Folch, D. C., & Spielman, S. E. (2014). Identifying regions based on flexible user-defined constraints. International Journal of Geographical Information Science, 28(1), 164–184. https://doi.org/10.1080/13658816.2013.848986

Garreton, M., & Sánchez, R. (2016). Identifying an optimal analysis level in multiscalar regionalization: A study case of social distress in Greater Santiago. Computers, Environment and Urban Systems, 56, 14–24. https://doi.org/10.1016/j.compenvurbsys.2015.10.007

Griffith, D. A. (2005). Effective Geographic Sample Size in the Presence of Spatial Autocorrelation. Annals of the Association of American Geographers, 95(4), 740–760. https://doi.org/10.1111/j.1467-8306.2005.00484.x

Guo, D. (2008). Regionalization with dynamically constrained agglomerative clustering and partitioning (REDCAP). International Journal of Geographical Information Science, 22(7), 801–823. https://doi.org/10.1080/13658810701674970

Guzman, J., & Schkolnik, S. (2001). AMÉRICA LATINA: LOS CENSOS DEL 2000 Y EL DESARROLLO SOCIAL.

Heaton, M. J., & Gelfand, A. E. (2012). Kernel averaged predictors for spatio-temporal regression models. Spatial Statistics, 2, 15–32. https://doi.org/10.1016/J.SPASTA.2012.05.001

Horn, M. E. T. (1995). Solution Techniques for Large Regional Partitioning Problems. Geographical Analysis, 27(3), 230–248. https://doi.org/10.1111/j.1538-4632.1995.tb00907.x

Jin, X., Wah, B. W., Cheng, X., & Wang, Y. (2015). Significance and Challenges of Big Data Research. Big Data Research, 2(2), 59–64. https://doi.org/10.1016/J.BDR.2015.01.006

Lefebvre, H. (1991). The production of space. Blackwell.

Legendre, P. (1993). Spatial autocorrelation: trouble or new paradigm? Ecology, 74(6), 1659–1673. https://doi.org/10.2307/1939924

Lindley, D. (1956). On a measure of the information provided by an experiment. The Annals of Mathematical Statistics, 27, 986–1005. Retrieved from http://www.jstor.org/stable/2237191?casa_token=CB0UZDZivncAAAAA:-6VZV_tAaxiAQbOXVGJGf3VXwodidklVhnUmtZbdjavY2Hk9LOMxJZrRCWbc6IQMV9wnBQAUz5JYV_I_GIoNQaXQti7q_tvHlXGyiAZ78MqdFGyB448

Miller, H. J. (2004). Tobler’s first law and spatial analysis. Annals of the Association of American Geographers, 94(2), 284–289. https://doi.org/10.1111/j.1467-8306.2004.09402005.x

Ministerio de Desarrollo Social. (2018). Metodología de Diseño Muestral. Retrieved from http://observatorio.ministeriodesarrollosocial.gob.cl/casen-multidimensional/casen/docs/Diseno_Muestral_Casen_2017_MDS.pdf

Montello, D. R. (2003). Regions in geography: Process and content. In M. Duckham, M. F. Goodchild, & M. F. Worboys (Eds.), Foundations of geographic information science (Taylor & F, pp. 173–189). https://doi.org/doi:10.1201/9780203009543.ch9

Moreno, P., García, J., & Lacalle, L. D. E. (2011). Estado del Arte en procesos de zonificación. Geofocus, 11, 155–181. Retrieved from www.geo-focus.org

Observatorio de Ciudades PUC. (2018). ISMT | Infraestructura de Datos Espaciales OCUC. Retrieved February 28, 2020, from IDE OCUC website: https://ideocuc-ocuc.hub.arcgis.com/datasets/97ae30fe071349e89d9d5ebd5dfa2aec_0

Openshaw, S. (1977a). A geographical solution to scale and aggregation problems in region-building, partitioning and spatial modelling. Transactions of the Institute of British Geographers, 2(4), 459–472. https://doi.org/10.2307/622300

Openshaw, S. (1977b). Optimal zoning systems for spatial interaction models. Environment and Planning A, 9(2), 169–184. https://doi.org/10.1068/a090169

Openshaw, S. (1978). An optimal zoning approach to the study of spatially aggregated data. https://doi.org/10.1007/978-1-4613-4067-6_5

Openshaw, S., & Baxter, R. S. (1977). Algorithm 3; a procedure to generate pseudo random aggregations of N zones into M zones where M is less than N. Environment and Planning A, 9(12), 1423–1428. https://doi.org/10.1068/a091423

Openshaw, S., & Taylor, P. J. (1979). A million or so correlation coefficients: three experiments on the modifiable areal unit problem. Statistical Applications in the Spatial Sciences, 127–144. Retrieved from https://ci.nii.ac.jp/naid/10009667572/

Pettitt, A. N., & McBratney, A. B. (1993). Sampling Designs for Estimating Spatial Variance Components. Applied Statistics, 42(1), 185. https://doi.org/10.2307/2347420

Phillips, S. J., Anderson, R. P., & Schapire, R. E. (2006). Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190(3–4), 231–259. https://doi.org/10.1016/J.ECOLMODEL.2005.03.026

Rodríguez-iglesias, G., & Teresa, M. (2010). La importancia de la especificidad territorial en la construccion de indicadores locales. Ciencia Ergo Sum, 18(m), 145–152. Retrieved from http://www.redalyc.org/articulo.oa?id=10418753005

Sáenz, H. (2016). Revisando los métodos de agregación de unidades espaciales: MAUP, algoritmos y un breve ejemplo. In Estudios Demográficos y Urbanos (Vol. 31). https://doi.org/10.24201/edu.v31i2.1592

Sánchez, R. (2015). Spatial self-organization in Santiago. Methods and Applications. Universidad Adolfo Ibáñez.

Shewry, M. C., & Wynn, H. P. (1987). Maximum entropy sampling. Journal of Applied Statistics, 14(2), 165–170. https://doi.org/10.1080/02664768700000020

Spielman, S. E., & Logan, J. R. (2013). Using High-Resolution Population Data to Identify Neighborhoods and Establish Their Boundaries. Annals of the Association of American Geographers, 103(1), 67–84. https://doi.org/10.1080/00045608.2012.685049

Stein, A., & Ettema, C. (2003). An overview of spatial sampling procedures and experimental design of spatial studies for ecosystem comparisons. Agriculture, Ecosystems & Environment, 94(1), 31–47. https://doi.org/10.1016/S0167-8809(02)00013-0

Tobler, W. R. (1969). Large sample standard errors of kappa and weighted kappa. Psychological Bulletin, 72(5), 234. https://doi.org/10.1037/h0028106

Vallejos, R., & Osorio, F. (2014). Effective sample size of spatial process models. Spatial Statistics, 9(C), 66–92. https://doi.org/10.1016/j.spasta.2014.03.003

Villalón, G., & Vera, S. (2011). Análisis de la Cobertura Censo 2002. Retrieved from https://www.cepal.org/celade/noticias/paginas/3/45123/chile_cobertura.pdf

Wallgren, Andres; Wallgren, B. (2016). Frames and Populations in a Register-based National Statistical system. Journal of Mathematics and Statistical Science, 2016, 208–216.

Wallgren, A., & Wallgren, B. (2007). Register-based Statistics: Administrative Data for Statistical Purposes. In Register-based Statistics: Administrative Data for Statistical Purposes. https://doi.org/10.1002/9780470061350

Wang, J.-F. F., Stein, A., Gao, B.-B. B., & Ge, Y. A review of spatial sampling. , 2 Spatial Statistics § (2012).

Wang, J.-F., Jiang, C.-S., Hu, M.-G., Cao, Z.-D., Guo, Y.-S., Li, L.-F., … Meng, B. (2013). Design-based spatial sampling: Theory and implementation. Environmental Modelling & Software, 40, 280–288. https://doi.org/10.1016/j.envsoft.2012.09.015

Wang, J., Haining, R., & Cao, Z. (2010). Sample surveying to estimate the mean of a heterogeneous surface: Reducing the error variance through zoning. International Journal of Geographical Information Science, 24(4), 523–543. https://doi.org/10.1080/13658810902873512

Williamson, I., Rajabifard, A., & Binns, A. (2006). Challenges and Issues for SDI Development. International Journal of Spatial Data Infrastructures Research, 1(1), 24–35. https://doi.org/10.2902/

Xavier, A., Carvalho, Y., Henrique, P., Albuquerque, M., Rezende, G., Junior, A., & Dantas Guimarães, R. (2009). Spatial Hierarchical Clustering. 27(3), 411–442. https://doi.org/10.1053/j.pcsu.2013.01.006

Yates, F. (1946). A Review of Recent Statistical Developments in Sampling and Sampling Surveys. Journal of the Royal Statistical Society, 109(1), 12–43. https://doi.org/10.2307/2981390

Published

2022-06-23

How to Cite

Truffello Robledo, R., Flores Castillo, M., Garreton, M., & Ruz, G. (2022). The importance of geographic space to minimize the error of representative samples. Revista De Geografía Norte Grande, (81), 137–160. Retrieved from https://ojs.uc.cl/index.php/RGNG/article/view/18249

Issue

Section

Artículos