Spatial distribution analysis and driving factors of traditional villages in Henan province: a comprehensive approach via geospatial techniques and statistical models

Traditional villages are repositories for preserving human artifacts and cultural heritage. An investigation of the spatial distribution characteristics and factors influencing traditional villages in provincial administrative regions can provide new insights regarding the protection of traditional villages and rural development. This study focused on 275 traditional villages in Henan Province. Using ArcGIS and GeoDa software, we analysed the spatial autocorrelation and heterogeneity of the nearest neighbour index, Gini coefficient, Moran’s I, and kernel density of the villages. Additionally, in conjunction with the Python sklearn library and GeoDetector, 15 indicators were selected to construct a decision tree model, spatial lag regression model, and geographic detector. Then the influence and interaction mechanisms of each indicator were analysed. The results revealed that (1) the spatial distribution of traditional villages in Henan Province was clustered and uneven, with a spatial layout comprising “3 high-density areas + 1 medium-density belt”; (2) overall, the number of traditional villages was negatively correlated with altitude, slope, rainfall, population density, proportion of the minority population, and historical-cultural intensity; and (3) the decision tree model results demonstrated that the selected 15 indicators had good predictive ability and that population density was particularly important. The spatial lag regression model results showed that the spatial distribution of traditional villages was positively correlated with distance from rivers, urbanization rate, and tourism resources, and negatively correlated with population density, per capita GRP, historical-cultural intensity, and NDVI. (4) The GeoDetector results indicated that historical-cultural intensity and population density were the two factors with the most significant explanatory power for the spatial differentiation of traditional villages in Henan Province. In terms of interactive factors, population density ∩\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\cap$$\end{document} population was the strongest interactive driving force, followed by population ∩\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\cap$$\end{document} historical-cultural intensity.


Introduction
Recent years have seen rapid urbanization contributing to substantial rural population loss and ageing [1], significant fragmentation of agricultural land coupled with severe environmental pollution [2], and the emergence of manifold systemic risks in rural societies [3,4].Globally, rural areas are experiencing a decline [5], and numerous traditional rural landscapes along with ancient villages are continuously vanishing [6,7].To foster sustainable rural development and transform rural areas into regional complexes characterized by natural, social, and economic features [8], the Chinese government has championed the implementation of projects such as the rural revitalization strategy and the beautiful villages initiative.In light of these circumstances, in September 2012, a collaboration among the Ministry of Housing and Urban-Rural Development, the Ministry of Culture, the State Administration of Cultural Heritage, and the Ministry of Finance led to the establishment of the Chinese Traditional Village Expert Committee, tasked with reviewing the "List of Chinese Traditional Villages".As of March 2023, six selection rounds have seen a total of 8,155 villages designated as national-level traditional villages.
Despite the burgeoning social attention given to the preservation of traditional villages, the disparities in economic status, cultural heritage, natural conditions, and population distribution across various regions [9] pose significant challenges to traditional village development.As per the "Guiding Opinions on Effectively Strengthening the Protection of Traditional Villages in China", the governments of provincial administrative regions bear the responsibility of fostering traditional village development within their jurisdictions and formulating development strategies and support measures.However, research on the spatial distribution and influential factors of traditional villages within these provincial administrative regions remains sparse.To address this gap, the present study examines 275 national-level traditional villages across six groups in Henan Province.It evaluates the overall distribution pattern and characteristics of these villages by leveraging Python and ArcGIS to establish a Henan Province traditional village information database from 15 selected influencing factors encompassing natural conditions, population and economy, history and culture, and ecology.Statistical analysis along with spatial tools such as GeoDa, decision tree, spatial lag regression, and GeoDetector models are employed to scrutinize the drivers of spatial differentiation among traditional villages (Fig. 1).Multiple statistical methods and models are used to identify the patterns and characteristics of traditional villages in Henan Province that are influenced by natural, cultural, historical, and ecological factors.Such comprehensive data and theoretical foundations facilitate the differentiated protection and development of traditional villages, and offer a novel perspective for sustainable rural development.
Fig. 1 The method framework for spatial distribution analysis and driving factors of traditional Village

Research background
Traditional villages denote settlements characterized by a rich history, well-conserved historical structures, traditional layouts, unique intangible cultural heritage, and inherent vitality [10].In September 2012, the Ministry of Housing and Urban-Rural Development, in conjunction with three other departments, promulgated the "Traditional Village Evaluation and Identification Index System (Trial)"(https:// www.mohurd.gov.cn/ gongk ai/ zheng ce/ zheng cefil elib/ 201208/ 20120 831_ 211267.html).This system proposed quantitative and qualitative evaluation criteria for traditional villages, focusing on aspects such as traditional architecture, site selection, layout, and the intangible cultural heritage they encapsulate.Traditional villages encapsulate the evolution of socioeconomic, political, and cultural dimensions under specific historical and geographical conditions, embodying multifaceted social, historical, and cultural values [11,12].Research and analysis of traditional villages not only support the preservation of cultural heritage but also contribute to sustainable rural development by leveraging the villages' historical and cultural resources.
Traditional villages have garnered global interest, prompting an increasing number of scholars to investigate factors related to their preservation from various disciplinary perspectives.In the context of rural tourism, scholars have developed indicators for the sustainable development index of rural settlements, positing that sustainable tourism can foster the sustainable development of traditional villages [13].From an architectural culture standpoint, studies based on social surveys of local inhabitants and tourists suggest that safeguarding and developing the architectural heritage of traditional villages necessitates societal consensus [14,15].The lineage of traditional village artistic characteristics was constructed through the study of traditional village architecture [16], and the protection strategy for traditional village architectural style was proposed through the examination of architectural style transitions [17].Moreover, research on environmental protection [18] and ecological restoration [19] has significantly enriched the understanding of traditional village preservation.
With the advent of computer technology, the use of Geographic Information Systems (GIS) for spatial analysis in rural areas has grown to encompass a broad range of issues such as agricultural land protection [20], rural traffic equity [21], and rural spatial accessibility [22].Many scholars have begun integrating various research methods and models with GIS to solve more complex problems.For example, Dina Statuto utilized GIS to merge geographic information from historical maps with modern digital cartography and remote sensing imagery, enabling the detection of changes in typical rural landscapes in the Basilicata region spanning 188 years [23].Achamyeleh Birhanu Teshale combined geographically weighted regression and multilevel analysis to investigate spatial variations and factors linked with skilled birth attendant delivery in Ethiopia [24].Ayodeji E. Iyanda employed nonspatial negative binomial Poisson regression and geographically weighted Poisson regression to evaluate the disparity in the COVID-19 case fatality rate between urban and rural areas and determined that the COVID-19 case fatality rate in rural counties surpassed the national average [25].
Research into the spatial aspects of traditional Chinese villages is conducted at varying scales.At a micro scale, the focus lies mainly on the natural landscape [26], spatial evolution [27], and landscape genes [28] of individual villages or small clusters of villages.Scholars support the preservation of traditional villages by developing spatial quantitative indicators [29] and landscape networks [30].At a macro scale, the study of the spatial distribution of traditional villages is multiscaled.Wanxu Chen discovered that traditional villages in the Yangtze River Basin are predominantly located in areas with favourable natural conditions and unfavourable socioeconomic circumstances [31].HeDan Ma combined ArcGIS and GeoDa and identified six spatial clusters of traditional villages in Southwest China, with spatial differentiation being mainly influenced by natural factors [32].At the cultural district level, Guanhong Xie suggested that Hakka traditional villages are distributed in noteworthy clusters, with site selection culture as the internal driving force [33].Chao Wu indicated that the spatial distribution of traditional villages exhibits distinct comprehensive and regional characteristics and is highly sensitive to changes in spatial scales [9].Thus, the study of the spatial distribution of traditional villages at various scales is both necessary and meaningful.Furthermore, the driving factors influencing the distribution of traditional villages also differ according to the specific research areas.Natural environmental factors such as altitude, slope, temperature, and precipitation have the most significant impact on the distribution of traditional villages in the Qinghai-Tibet Plateau [34].Among the factors affecting the spatial distribution of traditional villages in the Wuling Mountain Area in Hunan Province, cultivated land has the largest impact scale, and per capita GDP has the greatest impact intensity [35].Social factors and regional culture are decisive and key factors affecting the distribution of traditional villages in Awa Mountain [36].

Study area
Henan Province (31-37 • N, 110-117 • E) (Fig. 2) is situated in central China and covers a total area of 167000 km 2 .The population is approximately 99 million, ranking third among the provinces and municipalities directly under the central government(http:// www.stats.gov.cn/ sj/ pcsj/ rkpc/ d7c/) and surpassed only by Guangdong and Shandong Provinces.The province has a warm temperate monsoon climate, with an average annual temperature of approximately 15.8 • C. The average annual precipitation is 600-800 mm and is primarily concentrated during the summer.The province has 18 municipal-level administrative divisions, including 17 prefecture-level cities and 1 provincial municipality, as well as a total of 157 countylevel administrative regions.As China's second-largest grain-producer, Henan has played a crucial role in safeguarding China's food security [37].However, factors such as overall economic underdevelopment, ecological vulnerability, and economic disparity between regions [38] have precipitated severe issues in Henan's rural areas, including agricultural population loss, rural hollowing, and abandoned farmland [39].

Data source and processing
As of November 2022, according to the Traditional Chinese Villages website (http:// www.chuan tongc unluo.com/), 275 traditional villages in Henan had been selected for inclusion on the list of traditional Chinese villages, serving as research targets.The geographical coordinates of each village were determined by searching for the names of traditional villages on AMap (https:// www.amap.com), and a spatial database of traditional villages in Henan Province was constructed.Data on the administrative divisions in Henan Province were sourced from the Henan Provincial Geographic Information Public Service Platform.A digital elevation model (DEM) of Henan Province was derived from the GDEMV3 30-metre data information available from the Geospatial Data Cloud (https:// www.gsclo ud.cn/).Annual average temperature and daily precipitation data for county-level administrative regions in Henan Province were obtained from the National Earth System Science Data Center (http:// www.geoda ta.cn/).Transportation infrastructure and hydrological system data were retrieved from OpenStreetMap (https:// www.opens treet map.org/).The normalized difference vegetation index (NDVI) for the county-level administrative regions in Henan Province was sourced from the MOD13A3 dataset of NASA Earthdata Search (https:// www.earth data.nasa.gov/), and was normalized year by year, according to the county-level administrative regions in 2022.The population, proportion of ethnic minorities, and proportion of urban populations in the county-level administrative regions of Henan Province were obtained from the seventh national census.The Fig. 2 Location of the study area gross regional product (GRP) of the county-level administrative regions was extracted from the "China County Statistical Yearbook, 2021." Data on A-level scenic spots were obtained from the Culture and Tourism Department of Henan Province (https:// hct.henan.gov.cn/).National intangible cultural heritage data for Henan Province were sourced from the China Intangible Cultural Heritage Data Museum (https:// www.ihchi na.cn/).The data on key national cultural relic protection units in Henan Province were taken from the National Cultural Heritage Administration (http:// www.ncha.gov.cn/).
According to a list of 8,155 traditional villages published by the Ministry of Housing and Urban Rural Development of China, there are 275 traditional villages in Henan Province, accounting for 3.37% of the total.As shown in Table 1, traditional villages are distributed across all 18 city-level administrative regions of Henan Province; however, their distribution is uneven.

Nearest neighbour index
The nearest neighbour index (NNI) serves as a measure of the spatial distribution patterns of geographic features.By comparing the actual data with the nearest neighbour distance for a completely random distribution pattern, the spatial distribution pat-tern of geographic elements can be determined as clustered, uniform, or random [40].The formula used is as follows: where N is the total number of points, d i denotes the distance from each point i to its nearest neighbour, r E is the theoretical shortest distance, and signifies the point density, which is calculated by dividing the total number of points by the area.The value of R determines the distribution type of the point elements; if R = 1 , the distri- bution is random; if R > 1 , the distribution is uniform; and if R < 1 , the distribution is clustered.

Gini coefficient
The Gini coefficient serves as a metric for assessing inequality or disparity.Frequently employed in geography, it facilitates the examination of inequalities in urban development, regional discrepancies, resource allocations, and other related issues [41].The Gini coefficient ranges from 0 to 1, with 0 denoting perfect equality and 1 signifying complete inequality.A higher Gini coefficient indicates greater inequality.The formula is as follows: where n represents the number of categories, and p i rep- resents the proportion of the i − th category.

Global Moran's I
Global Moran's I is a comprehensive statistical index employed to gauge the spatial autocorrelation of spatial data, thereby determining the presence of significant spatial patterns within a dataset [42].The formula is as follows: where n represents the number of observational units, x i denotes the value of the i − th observational unit, and x denotes the average of all observational unit values.w ij is an element of the spatial weight matrix that illustrates the extent of spatial connectivity between the i − th and j − th locations.A positive value of the global Moran's index indicates positive spatial autocorrelation, suggesting that similar values are spatially proximate, whereas a negative value implies negative spatial autocorrelation, in which dissimilar values tend to cluster spatially.If the global Moran's index approaches zero, the spatial distribution exhibits a random pattern, and no significant spatial pattern is present. (

Local Moran's I
The local Moran's I quantifies the local spatial autocorrelation between each cell and its adjacent cells within the spatial data [42].This metric aids in identifying spatial clustering and distinct phenomena: where I i denotes the local Moran index of cell i, y i rep- resents the variable value of cell i, ȳ signifies the mean value of the neighbouring cells, σ 2 is the variance of the variable values for all cells, and w ij is the spatial weight between cell i and neighbouring cell j.

Kernel density estimation
Kernel density estimation (KDE) constitutes a nonparametric approach for estimating probability density functions [43].Frequently employed to portray the spatial distribution density of geographical occurrences, KDE can smooth the influence of individual events within the dataset, yielding density estimates for each location.Consequently, this method enables the identification of spatial clustering trends among events.The formula is as follows: where f (x) represents the density estimate at location x, and n denotes the total number of events.K serves as a kernel function that contributes to the estimated position x density of each event.x i corresponds to the loca- tion of each event and h controls the width of the kernel function.

Decision tree model
A decision tree model is a supervised learning algorithm that is frequently employed to address classification and regression problems [44].It primarily makes predictions by using a sequence of judgement rules (IF-THEN logic).The classification and regression tree (CART) algorithm is typically applied to regression problems.For CART, the splitting nodes are determined by minimizing the mean square error (MSE).The formula for the MSE is as follows: where N represents the number of samples, y i denotes the observed value for each sample, and y i denotes the average value of all sample observations.A smaller MSE value indicates a narrower gap between the model's (4) predicted results and actual outcomes, thus reflecting improved model performance.

Spatial lag regression model
The spatial lag regression model (SLRM) is a notable spatial econometric model employed to manage data that exhibit spatial dependence [45].The SLRM primarily accounts for spatial effects by incorporating a spatial lag variable that represents the average value of the adjacent area.The formula is as follows: In this context, Y represents an n × 1 dependent variable vector, ρ is a spatial lag parameter, and W is an n× n spa- tial weight matrix that describes the relationship between spatial units.The spatial weight matrix is predefined and is often based on spatial distance or administrative divisions.W Y is a spatially lagged dependent variable that signifies the average value of the adjacent area.X is an n × k matrix denoting the explanatory variables.β cor- responds to a k × 1 parameter vector, and ε is an n × 1 error term.

Geodetector
GeoDetector is a statistical technique designed to identify the influence of explanatory variables (AKA factors or predictor variables) on explained variables (AKA out-come or response variables) [46].As a data-mining approach grounded in spatial statistical analysis, GeoDetector is primarily employed to examine the impacts of environmental or geographical factors on a specific phenomenon.The formula is as follows: where H represents the number of values or intervals of the explanatory variable (if it is continuous).n h denotes the number of observations of the explanatory variable within the h − th value or interval.σ 2 Y is the variance of the explained variable Y within the h − th value or inter- val of the explanatory variable.N signifies the total number of observations.σ 2 Y refers to the population variance of explained variable Y.The q values range from 0 to 1.A q-value closer to 1 indicates a stronger influence of the explanatory variable on the explained variable.

Clustering distribution type of 275 traditional villages
The spatial structural characteristics of traditional villages were represented by point elements on a map.There (7 are three main types of spatial distribution for point elements: clustered, random, and uniform.The ArcGIS 10.8 NNI method ( Formula 1 ) was used, and the distribution of traditional villages in Henan Province was calculated with R = 0.447 < 1 .The average observed distance (observed mean distance) was 11.31 km; the expected average distance (expected mean distance) was 25.29 km; and the absolute value of the Z score, − 17.54, was signifi- cantly large.This indicates a considerable deviation from a random distribution.The results demonstrated that the spatial distribution of traditional villages in Henan Province was clustered.

Uneven distribution of traditional villages
City-level administrative divisions were taken as units and the distribution and quantity information of traditional villages in each prefecture-level administrative division of Henan Province were utilized (Table 1).Then, the Gini coefficient of spatial distribution( Formula 2 ), G = 0.43 , was calculated.The Lorenz curve (Fig. 3) devi- ated significantly from the uniform distribution line and exhibited a large curvature, indicating an uneven distribution of the villages within the cities.Pingdingshan, Xinyang, Luoyang, Anyang, and Sanmenxia each have more than 30 traditional villages, with a total of 168 villages, which accounts for 61% of the total.In contrast, 8 cities have fewer than 10 traditional villages, indicating a substantial gap.
County-level administrative divisions were used as units; 63 of the 157 county-level administrative divisions in Henan Province have traditional villages.A spatial weight matrix was created using the ROOK method in the GeoDa software.The global Moran's I( Formula 3 ) was 0.27, with a p value of 0.001, and a Z value of 5.85 after 999 permutations, indicating a positive correlation in the overall spatial distribution of traditional villages in Henan Province.
Further analysis of the local Moran's I( Formula 4 ) for traditional villages in Henan Province (Fig. 4) showed that high-high clusters are present in Anyang, Hebi, and Xinxiang in the north; Pingdingshan, Luoyang, and Sanmenxia in the centre and west; and Xinyang in the south.These administrative regions have a relatively high density of traditional villages and strong spatial connectivity.Low-low clusters were found in Xinyang, Zhengzhou, Kaifeng, Zhoukou, and Luohe, where the density and spatial connectivity of traditional villages are poor.Juyang in Shangqiu is a high-low cluster, indicating that the density of traditional villages in this area is high but spatial connectivity is poor.Hubin in Sanmenxia is a low-high cluster, indicating that the density of traditional villages in this area is low but spatial connectivity is high.In conclusion, traditional villages in Henan Province exhibit a certain degree of clustering and an imbalanced spatial distribution.

Spatial layout of traditional villages: "3 high-density areas + 1 medium-density belt"
The kernel density( Formula 5 ) of traditional villages in Henan Province was estimated using the KDE tool in ArcGIS 10.8 "Spatial Analysis" by setting a bandwidth of 50 km (search radius) to reveal villages' agglomeration According to the degree of agglomeration of traditional villages, the three high-density areas are Northern Henan (the junction of Anyang, Hebi, and Xinxiang), Central Henan (the junction of Zhengzhou, Xuchang, and Pingdingshan), and southern Xinyang.In northern Henan, the density of traditional villages in the core area exceeds 130 per 10,000 km 2 .The medium-density belt is in northwestern Henan Province and connects the high-density areas in the northern and central parts.

Altitude and slope
As illustrated in Fig. 6, Henan Province has an altitude range of 30-2,400 m, displaying a terrain pattern characterized by higher elevations in the west and lower elevations in the east.The eastern and central parts of Henan consist primarily of plains and hilly areas with relatively low altitudes, generally between 30 and 500 m.The western part of Henan features mountainous terrain with major mountain ranges, such as the Taihang Mountains and Qinling Mountains, which have relatively high altitudes.
Spatial statistical analysis of traditional villages in Henan Province revealed that the average altitude of 275 traditional villages is 367 m, and the elevation difference between the lowest (47 m) and highest (1,685 m) is nearly 1,638 m.There are 140 villages below 300 m above sea level, accounting for 51% of the total.There are 81 villages at altitudes between 300 and 600 m, with a relatively scattered distribution.There are 54 villages 600 m above sea level, accounting for 19.6% of the total.When Figs. 5 and 7a are combined, it is evident that the three high-density areas are all located in hilly areas transitioning from mountains to plains, whereas medium-density areas are distributed in the Qinling Mountains.Concurrently, the analysis showed that the number of traditional villages in Henan Province tends to decrease as the altitude increases (Fig. 7c).
ArcGIS 10.8 was used to transform the 30 m precision DEM elevation data of Henan Province into slope data, and the spatial analysis tool was employed for "neighbourhood analysis" With traditional villages as centre points, a circle with a 1,200 m ( 30 × 40 ) radius was cho- sen as the neighborhood type to obtain the average slope data within the area.According to the technical specifications of the third national survey, the cultivated land slope was categorized into five grades: grade 1 ( ≤ 2 • ), grade 2 (2-6 • ), grade 3 (6-15 • ), grade 4 (15-25 • ), and grade 5 ( > 25 • ).When Fig. 7b, d are combined, it is evi- dent that the average slope surrounding more than half of the traditional villages in Henan Province is relatively steep.A total of 125 villages are situated within neighbourhoods in areas with low average slopes ( < 6 • ), con- stituting 45.45%.Eighty-three villages are distributed in areas with average slopes of 6-15 • .The number of villages in neighbourhoods with relatively high average slopes (above 15 • ) progressively diminishes, representing 24.36% of the total (Fig. 7d).Traditional villages in Henan Province are primarily located in areas with relatively flat terrain and minimal undulations.These regions are favourable for constructing houses, engaging in production and agriculture, facilitating foreign exchange, and carrying out daily activities.The vicinity of traditional villages exhibits considerable topographical variation, mainly concentrated in the highdensity area of the Taihang Mountains in the north and the medium-density belt in the west (Fig. 7b).These villages are situated in the transitional zone between mountainous and plains areas, constituting an essential region for the layout of traditional villages suitable for the development of diverse agricultural types.
River system ArcGIS 10.8 was used to analyse the buffer zone of Henan Province's main rivers, and a hierarchical division of the traditional village distribution was created using a buffer distance of 1 km (Fig. 8a).Within the 1 km buffer zone, there are 95 traditional villages.In the 0-4 km range, the number of traditional villages decreases with in-creasing distance from the river (Fig. 8b).This can be attributed to the fact that proximity to the river not only provides convenience for daily life and production, but also serves as a means of transportation and climate regulation.
There are 85 traditional villages located more than 4 km away (straight-line distance) from the river, primarily concentrated in the central and northern areas.This distribution pattern is due to the low river density in the area and its location within the Yellow River Basin, rendering it susceptible to floods.Consequently, most traditional villages maintain a certain distance from the main river and large water bodies.

Temperature and rainfall
The climatic environment significantly influences the site selection of traditional villages, where humans work and reside.Site selection for village construction requires consideration of climatic factors and that the design and construction align with local climate characteristics; these factors are crucial for ensuring and enhancing living conditions.As depicted in Fig. 9a, b, the annual average temperature in Henan Province exhibits an east-west gradient, whereas annual The distribution of traditional villages in Henan Province is uniform within the annual average temperature range of 12.02-− 16.60 • C. Excessively high or low tem- peratures impose certain constraints on agricultural production.A total of 163 villages are situated in areas with an average annual rainfall of 450-600 mm, constituting 59.27% of the total (Fig. 9c, d).In regions with extreme precipitation levels, the number of traditional villages is comparatively lower.Insufficient precipitation may result in water scarcity, adversely affecting agricultural production, whereas excessive precipitation increases the risk of natural disasters, such as floods.

Population
According to the seventh census, Henan has a total population exceeding 99.37 million, with a population density of 595 people/km 2 (the national average is 148 people/ km 2 .As shown in Fig. 10a, b, the population dis- tribution characteristics of Henan are closely related to its topography.Most of the population resides in the relatively flat central, eastern, western, and southern areas; the sparsely populated western and southern regions feature rugged terrain, particularly in the high-altitude Qinling Mountains.Figure 10d shows that the number of traditional villages is relatively small: 17 of the villages have small populations (62,700-155,400), while 45 have large ones (over 873,700).In areas with smaller populations, the construction and maintenance of traditional villages are limited because of scarce labour and funding.Conversely, in areas with larger populations, the number of traditional villages is comparatively low because of the higher degree of modernization.The largest number of traditional villages (60) is found in the population range of 459,500-549,100, indicating that this population size is most conducive to village maintenance and development.While a sufficient population can provide necessary labour and community activities, an excessive population may exert environmental pressure and affect the sustainability of traditional villages.
In summary, the distribution of traditional villages in Henan Province tended to decrease as the population density increased (Fig. 10e ).Among the 186 traditional villages, 67.64% had population densities below the provincial average.Furthermore, the population density of the locations of 123 villages was lower than the national average.On the one hand, areas with high population density exhibited higher levels of urbanization, such as the entire eastern region, northern part of Zhengzhou City in the central region, central parts of Xuchang City As urbanization continues to advance, ethnic minorities become more inclined to migrate to cities with better socioeconomic prospects [47].Figure 10f indicates that the distribution of traditional villages in Henan Province decreased as the proportion of ethnic minorities in the region increased.Areas with relatively high ethnic minority populations in Henan Province were primarily concentrated in the more developed central region, regions with a complex geographical environment in the south, and areas with large populations in the east (Fig. 10c ).These areas are not conducive to the development and protection of traditional villages.As shown in Fig. 10f, the number of traditional villages in Henan Province decreased as the proportion of ethnic minorities in the region increased.

Urbanization rate
The urbanization rate is the proportion of the urban population to the total population.China's urbanization rate growth primarily results from rural-urban migration, and the lower urbanization levels in the economically underdeveloped central and western regions make the rural economy heavily dependent on urbanization growth [48].By 2021, the urbanization rate of permanent residents in Henan Province reached 56.4%, which is lower than the national average of 64.7%.In Henan Province, 246 traditional villages had urbanization rates below the national average, accounting for 89.45%.Additionally, 178 traditional villages were in areas with urbanization rates below the provincial average, accounting for 64.73%.These data suggest that lower urbanization rates in county-level administrative regions contributed to the better protection of traditional villages in Henan Province (Fig. 11).
Economic development GRP is an economic indicator that measures the total value of all final goods and services produced within a specific period by all residents of a region (such as a province, state, or city) and by production units within its territory.Per capita GRP is obtained by dividing the GRP of a region by the total population of that region.The per capita GRP reflects the average production level of residents in a region and is commonly used as an indicator to measure national living standards and wealth distribution.
Based on the "China County Statistical Yearbook 2021, " the GRP and per capita GRP of 157 county-level administrative regions in Henan Province were displayed in Arc-GIS 10.8 (Fig. 12).In the range of low GRP (7.907-13.916 billion yuan) and relatively medium GRP (31.756-45.277 billion yuan), the number of traditional villages was relatively large at 40 and 45, respectively.This suggests that there are more traditional villages in economically underdeveloped areas.Although the impact of economic development was greater in areas with relatively moderate economic development, many traditional villages remained.A total of 129 traditional villages were located in areas where the per capita GRP was significantly lower than the provincial average of 54,400 yuan.In the higher GRP range (63.590-− 84.959 billion yuan), the number of traditional villages was the lowest at 12.This indicates that in areas with high levels of economic development, the damage to traditional villages is relatively large.In the lower range of per capita GRP (12,000-21,900 yuan), there were only 6 traditional villages.This suggests that there are fewer traditional villages in areas with lower per capita income, likely because poor economic conditions have led to significant population loss in the region, thus destroying traditional village settlements.
Transportation conditions Roads serve as connections between villages and are the main channels for information flow [49].ArcGIS 10.8 was used to conduct a buffer zone analysis of the transportation net-work, such as highways and national highways, in Henan Province.The impact of the road traffic network on the spatial distribution of traditional villages was reflected in the number of villages within different buffer distances (Fig. 13a).Figure 13b shows the number of traditional villages in each buffer zone.There were 145 traditional villages located over 4 km away from the transportation network.Being farther away from the transportation network may be more beneficial for protecting traditional villages from the impacts of urbanization and industrialization.
Tourism resources Over the past 40 years, China's tourism industry has gradually integrated into the national economic system and developed into a strategic pillar industry [50].According to statistics from the Ministry of Culture and Tourism, in 2019, the comprehensive contribution of the national tourism industry to GDP reached 10,940 billion-yuan, accounting for 11.05% of the total GDP.By analysing the impact of tourism resources on the distribution of traditional villages in Henan Province, we can discover the tourism development potential of traditional villages, which can help realize sustainable development and protection.
According to data from the Henan Provincial Department of Culture and Tourism from December 2021, there are 578 A-level scenic spots in Henan Province.Amap was used to obtain location information for each A-level scenic spot.ArcGIS was used to generate a kernel density raster map (Fig. 14a) of these scenic spots in Henan Province, with the grid value representing the tourism resource intensity value of traditional villages.Overall, in the three high-concentration areas of traditional villages, the intensity of tourism resources was also very high.The tourism resource in-tensity of the 156 traditional villages was medium to high (Fig. 14b).

History and culture
Intangible cultural heritage refers to various traditional cultural expressions passed down from generation to generation by people of all ethnic groups and is regarded as an integral part of their cultural heritage; objects and places related to traditional cultural expression are also included.In China, 73% of intangible cultural heritage deposits are scattered across traditional villages [51].National key cultural relic protection units include historical and cultural sites in China that feature ancient buildings, ruins, tombs, caves, and other types of cultural heritage sites.
According to the China Intangible Cultural Heritage Data Museum, there are 124 national intangible cultural heritage sites in Henan Province and 418 national key cultural relic protection units.Amap was used to Fig. 13 Distribution of traditional villages influenced by road system in Henan: a traditional village distribution and road system buffer zone;b statistics on the distance between traditional villages and roads obtain the location information of each site and unit.ArcGIS 10.8 was used to generate a kernel density raster map (Fig. 15a), with the raster value representing the historical and cultural intensity of traditional villages in Henan Province.As the historical and cultural intensity increased, the number of traditional villages showed a decreasing trend (Fig. 15b).It was observed that the central part of Henan Province has high historical and cultural intensity and also has the highest regional economic intensity in the province.This trend might be because local governments tend to be more proactive in declaring national-level traditional villages only after economic intensity reaches a certain level.

Ecological environment
As a direct participant in and responder to environmental changes, vegetation profoundly affects the energy balance of the Earth-atmosphere system and ecological environment.The growth state of vegetation is the most direct response to changes in the climate, environment, and other factors [52].The NDVI was higher in the western and southern regions of Henan Province and lower in the central and northern regions (Fig. 16a).As shown in Fig. 16b, the number distribution of traditional villages has an inverted U-shaped relationship with the NDVI of the area where they are located.This can be attributed to the fact that areas with very low NDVI are not conducive
After standardizing the data, the Sklearn library in Python was used to build the decision tree model.In the construction of the model, 80% of the data were used as the training set, and 20% were used as the test set.The results of this model combined with the running results shown in Fig. 17a, indicated that the mean squared error (MSE)( Formula 6 ) and R-squared of the model were 0.018 and 0.77, respectively, indicating a small prediction error and good prediction performance.In the decision tree model, the importance of population density (X7) was the highest, followed by the importance of tourism resource intensity (X13) and temperature (X4) (Fig. 17b).

Spatial lag regression model
Considering the spatial autocorrelation of the independent variables and exploring whether each variable was significant, the data of 15 independent and dependent variables were standardized and analysed using GeoDa for a spatial lag model ( Formula 7 ).The R 2 value of the spatial lag model was 0.7844, indicating that the model explained 78.44% of the variance in the dependent variable.This suggests that the model fit the observed data well.
Many factors were not significantly correlated, indicating the complexity of the factors influencing traditional village distribution.This complexity requires further analysis of the actual situation in the study area.

Single-factor detection
The values of the 15 independent variables selected for their influence on the distribution of traditional villages in Henan Province were classified as follows: 1.The distance between traditional villages and rivers (X3) and roads (X12) was divided into five levels according to 1 km intervals.2. The average slope within the area of traditional villages (X2) was divided into five levels, according to the technical specifications of the third national land survey.3. The remaining 12 independent variables (X1, X4-X11, X13-X15) were divided into seven levels via the natural breakpoint stratification method.
The spatial differentiation factors of traditional villages in Henan Province were analysed using the geographic detector factor detection method( Formula 8 ).The results are shown in Fig. 19a; the q values of each factor detector are arranged according to the explanatory power: historical-cultural intensity (X14) > population density (X7) > NDVI (X15) > urbanization rate (X9) > tourism resource intensity (X13) > population (X6) > per capita GRP (X11) > rainfall (X5) > proportion of ethnic minorities (X8) > GRP (X10) > elevation (X1) > temperature (X4) > slope (X2) > distance from rivers (X3) > distance from roads (X12).The p values of the slope (X2), distance from rivers (X3), and distance from roads (X12) were 0.47, 0.39, and 0.49, respectively.The p values of the other 12 influencing factors were all less than 0.05, indicating a significant impact on the spatial distribution of traditional villages.In general, the two social factors of historical-cultural intensity (X14) and population density (X7) had the most significant explanatory power for the spatial differentiation of traditional villages.The NDVI (X15), urbanization rate (X9), and tourism resource intensity (X13) had high explanatory power.The explanatory power of elevation (X1), temperature (X4), and other natural and climatic factors was weak.
Interaction detection of factors There were strong or weak logical connections between the different factors.To evaluate the ability of the two factors to explain spatial differentiation, interactions between the 15 factors were detected separately using the interactive detector module of the geographic detector.By comparing the impact of the two interaction factors, A and B, on traditional villages in Henan Province, we determined whether they influenced the spatial distribution of traditional villages independently or in an interactive relationship.
Further analysis was conducted using an interaction detector to identify the interactions between factors (Fig. 19b).Among them, the interaction between population density (X7) ∩ population (X6) (q=0.78) was the strongest driving force for the most significant nonlinear growth, followed by the interaction between population (X6) ∩ historical-cultural intensity (X14) (q=0.75),proportion of ethnic minorities (X8) ∩ urbanization rate (X9) (q=0.74), and GRP (X10) ∩ NDVI (X15) (q=0.74).The results indicated that the spatial differentiation of traditional villages is influenced by the interactions of the population, economic and human resources, and complex ecological and environmental factors.The interaction effect between the factors was notably stronger than the explanatory power of a single factor.In terms of formation, each factor operated not independently but synergistically, and the variation in the interaction of each factor also reflected the main driving force of the spatial evolution of villages in Henan Province.

Discussion
This study utilized GIS and other technologies in combination with Python and GeoDa to analyse the spatial distribution pattern and characteristics of traditional villages in Henan Province.It quantified the impact of various factors influencing the spatial distribution of these villages through decision tree models, spatial lag regression, and GeoDetector.The spatial distribution of traditional villages in Henan Province is clustered, with three areas of high-density spatial aggregation [53].A medium-density belt interconnects the high-density areas in the central and northern regions.Traditional villages in Henan Province are primarily located in areas with low altitudes, small average slopes, and moderate temperatures and precipitation.Favourable climatic conditions also contribute to traditional agricultural production [54].Although most traditional villages in Henan Province are located in mountainous areas, unlike other regions where traditional villages are concentrated in areas with high altitudes and complex slopes [55], the number of traditional villages in Henan Province tends to decrease as altitude and slope increase.Approximately 94.18% of traditional villages have altitudes lower than 800 m, and 75.64% are located in areas with average slopes less than 15 • .Based on the results of the decision tree model, SLRM, and GeoDetector, it can be concluded that the 15 influencing factors selected in this study sufficiently explain the spatial distribution of traditional villages in Henan Province.Historical and cultural intensity (X14) and population density (X7) are the main drivers of the spatial distribution of traditional villages in Henan Province.Apart from the strong correlating factors of population density (X7) ∩ population (X6), population (X6) ∩ historical and cultural intensity (X14) has the strongest influence.
The driving factors of the spatial distribution of traditional villages in Henan Province display strong regional characteristics.Traditional villages are generally established near water sources [31]; however, northern Henan is in the floodplain of the Yellow River Basin [56], and the need for better agricultural production conditions necessitates that 30.91% of traditional villages are more than 4 km away from the river.Within a 0-4 km radius, the number of traditional villages decreases with increasing distance from the river, indicating a positive correlation between the spatial distribution of traditional villages and distance from the river (X3).The economic development of Henan is heavily reliant on its urbanization rate [48].With increasing national support for rural tourism [12], areas with high urbanization rates and abundant tourism resources often have sufficient funds to support the preservation and development of traditional villages.This explains why urbanization rate (X9) and tourism resource intensity (X13) have a positive impact on the distribution of traditional villages.However, the degree of coupling between the ecological environment and urbanization in Henan Province is high, but the coordination is poor [57].This results in areas with high urbanization rates often having poor ecological environments.Thus, the NDVI (X15) is negatively correlated with the spatial distribution of traditional villages.Issues such as underdevelopment in rural areas [58] and significant rural population loss [39] result in the negative impacts of population density (X7) and per GRP (X11) on the spatial distribution of traditional villages.Traditional villages are significant carriers of tangible and intangible cultural heritage [59].The negative correlation between historical and cultural intensity (X14) and the number of traditional villages reflects the underexplored historical and cultural values of traditional villages in Henan Province.
Considering the above research and regional cultural characteristics, the following recommendations are made for the protection and development of traditional villages in Henan Province:1.Establish a Support System at the Provincial Level: A support system for traditional villages should be established at the provincial administrative region level.This involves increasing policy support and capital investment for economically underdeveloped areas, thus providing comprehensive assistance for the development of traditional villages.2. Leverage Historical and Cultural Resources: Make full use of the historical and cultural resources of traditional villages.The local culture of traditional villages in different regions can be combined with rural tourism to form a tourism industry that is imbued with local characteristics.This will help drive the economic development of traditional villages and surrounding areas.3. Prioritize Landscape and Ecological Protection: Most traditional villages in Henan Province are located in areas with poor ecological environments.Therefore, during the development of traditional villages, it is essential to pay attention to the protection of the original natural landscape, cultural landscape, and ecological environment of these villages.
This study does not incorporate microlevel factors such as the formation time, spatial layout, and architectural characteristics of traditional villages, which bear equal significance for the protection and development of traditional village infrastructures.Consequently, future research will leverage field investigations of traditional villages in Henan Province, integrating microlevel factors with the existing research to establish a cultural landscape evaluation system for traditional village roads in Henan Province.This approach will enable a more extensive and comprehensive evaluation of traditional villages.

Conclusions
Most existing research predominantly focused on watersheds, regions, or folk culture areas and is constrained by natural or social factors, thereby lacking a comprehensive spatial analysis of traditional villages within provincial administrative regions.Utilizing GIS and Python for statistical analysis, this research employed a variety of models to conduct a more extensive and detailed spatial analysis.The overall explanatory power of the 15 selected impact factors was quantified using a decision tree model, while the SLRM further quantified the correlation between each impact factor and the spatial distribution.Ultimately, Geo-Detector was deployed to ascertain the influence of different factors and the interactive driving force between these factors, thereby more comprehensively quantifying the role of factors in driving the spatial distribution of traditional villages.This study not only enriches the research scale and area of traditional villages but also provides a more comprehensive method for spatial analysis.It carries substantial referential value for formulating policies related to the protection and development of traditional villages.

Fig. 3
Fig. 3 Lorenz curve of traditional villages distribution in Henan Fig. 4 Local Moran's I diagram of traditional villages in Henan

Fig. 5
Fig. 5 Local Moran's I diagram of traditional villages in Henan

Fig. 7
Fig. 7 Distribution of traditional villages influenced by topography in Henan: a elevation distribution of traditional villages; b slope distribution of 1200 m buffer zone in traditional villages; c elevation statistics of traditional villages; d slope statistics of 1200 m buffer zone in traditional villages

Fig. 8
Fig. 8 Distribution of traditional villages influenced by river system in Henan: a traditional village distribution and river system buffer zone; b statistics regarding the distance between traditional villages and rivers

Fig. 9
Fig. 9 Distribution of traditional villages influenced by climate in Henan: a the annual average temperature distribution of traditional villages; b the annual rainfall distribution of traditional villages; c the annual average temperature statistics of traditional villages; d the annual rainfall statistics of traditional villages

Fig. 10
Fig. 10 Distribution of traditional villages influenced by population in Henan: a population distribution of traditional villages; b population density distribution of traditional villages; c ethnic minority population distribution of traditional villages; d population statistics of traditional villages; e population density statistics of traditional villages; f ethnic minority population statistics of traditional villages

Fig. 14
Fig. 14 Distribution of traditional villages influenced by tourism intensity in Henan: a tourism intensity distribution of traditional villages; b tourism intensity statistics of traditional villages

Fig. 16
Fig. 16 Distribution of traditional villages influenced by NDVI in Henan: a NDVI distribution of traditional villages; b NDVI statistics of traditional villages

Fig. 17
Fig.17 The results of the decision tree model: a comparison between decision tree model results and practice; b importance of respective variables

Fig. 19
Fig. 19 The Results of GeoDetector factor detection: a single-factor detection results; b interaction detection results

Table 1
The number of traditional villages in cities in Henan