Skip to main content

A vulnerability evaluation method of earthen sites based on entropy weight-TOPSIS and K-means clustering

Abstract

The degradation of earthen sites due to natural and human factors has become a pressing issue, necessitating urgent protection measures. In this context, accurate assessment of the vulnerability of earthen sites is essential for the development of effective conservation strategies. In this study, a comprehensive evaluation framework that incorporates multiple indicators is proposed. In particular, the entropy weight- (Technique for Order Preference by Similarity to an Ideal Solution) TOPSIS method is employed for quantitative vulnerability assessment and combined with K-means clustering to define vulnerability levels for earthen sites. To validate the proposed approach, the vulnerability of 29 sections of the Ming Great Wall is evaluated. Eventually, the 29 earthen sites are categorized into three levels: high, medium, and low, according to their degree of vulnerability. The results of gray correlation analysis and entropy weight-TOPSIS method are compared using the ontology missing amount in the original data as the validation standard. The results show that the Pearson correlation coefficient value of the entropy weight-TOPSIS method with the ontology missing amount was 0.859, while the Pearson correlation coefficient value of the gray correlation analysis method with the ontology missing amount was 0.691, so that the results of the entropy weight-TOPSIS method can more accurately reflect the actual vulnerability of earthen sites.

Introduction

Among the eight batches of national key cultural relic protection units announced in China, a total of 1251 earthen sites have been identified [1]. Earthen sites refer to historical sites left by ancient people, in which soil was used as the main building material for various practical activities [2]. These sites carry significant historical information, cultural significance, and high artistic and scientific value [3]. However, these earthen sites in open-air environments face serious threats and resultant damage from natural and human activities [4]. Widespread issues such as collapse, erosion, and cracks have emerged, requiring urgent protection. Earthen sites, influenced by the physical and chemical properties of soil materials, are among the most challenging objects to protect as cultural relics. Furthermore, as immovable cultural relics, earthen sites can only be protected at their original locations, further increasing the difficulty of protection [5, 6].

With the advancement of cultural relic protection in China, the concept of protection has shifted from rescue to preventative protection. Numerous studies have emphasized the importance of accurately assessing the current condition of earthen sites, efficiently allocating resources, and implementing appropriate protection measures. Lu et al. [7] point out that the quantitative analysis of the development characteristics of damages in earthen sites is one of the difficulties in cultural relics protection, but there is currently little research on this aspect. Sun et al. [8] note that the current funding for the protection of earthen sites is still limited, and it is necessary to plan conservation work based on the vulnerability of the sites. Du et al. [9] point out that assessing the damage to earthen sites is a prerequisite for further protection planning and implementation, but there is currently a lack of related research. Vulnerability serves as an indicator to evaluate the status of earthen sites, representing their resistance to external factors. Higher vulnerability values indicate a greater likelihood of damage, highlighting the need for strengthened protection measures. Therefore, evaluating the vulnerability of earthen sites is of utmost importance.

While the vulnerability assessment of earthen sites relies on damage indicators, a single indicator cannot fully capture the overall vulnerability. To enhance the value of the vulnerability assessment results, the use of multiple damage indicators is necessary. According to the different weights assigned to the indicators, multi-indicator comprehensive evaluation methods can be classified into three categories: subjective weight determination, objective weight determination, and a combination of these two weight determination approaches [10]. Subjective weight determination relies on expert opinions, which may vary due to different experiences, potentially leading to bias and limitations, making it difficult to obtain objective evaluation results. The most commonly used algorithm involving subjective weight determination is the Analytic Hierarchy Process (AHP). To overcome the limitations of subjective weight determination, scholars prefer to use a combination weight determination or objective weight determination approach.

The grey relational analysis method is a representative combination weight determination method. Yao et al. [11] have used this method to establish a quantitative evaluation system for earthen damages in the Ming Great Wall in northern Shaanxi. With “ Amount of earthen site loss” as the dominant factor and undercutting, gully, crack, scaling off, biological destruction, and artificial destruction as the evaluation indices, the correlations between the evaluation indices and the dominant factor were obtained. Although the process considers the objective correlation degree, weight determination for the indices still relied on subjective judgment. Some scholars have also improved upon subjective methods. Lei et al. [12] took the undercutting, collapse, crack, and gully evaluation indices and combined the Data Envelopment Analysis (DEA) and AHP methods to establish a vulnerability evaluation system for the Jiayuguan pier site. The AHP–DEA method incorporates objective data from DEA and subjective judgment from AHP experts to determine the indicator weights, making it more suitable for qualitative and quantitative indicator weighting in the evaluation of earthen sites damage vulnerability in actual situations. Du [13] has proposed a three-layer structure evaluation system for the Ming Great Wall in Qinghai Province, utilizing the Fuzzy Analytic Hierarchy Process (FAHP)-TOPSIS method to assess its vulnerability. The FAHP approach employed a triangular fuzzy membership function to determine the importance of indicators, reducing the subjective determination of AHP. Guo [14] has conducted a risk assessment of potential hazards within the Mogao Grottoes cliffs using the Fuzzy AHP method, with results indicating that the FAHP approach yields higher precision in hazard evaluation compared to the AHP method. Zhang [15] has employed the Analytic Hierarchy Process (AHP), Fuzzy Analytic Hierarchy Process (FAHP), and AHP-TOPSIS methods to assess the risk of rockfall disasters at the Mogao Grottoes slopes. The findings suggested that FAHP was the most suitable method for evaluating rockfall hazards in grottoes. However, these methods still rely heavily on expert opinion, introducing certain limitations. To address this issue, this paper introduces an entropy weight-TOPSIS method that solely relies on objective weight determination. The entropy weight method calculates the weight for each indicator using actual damage data, and in combination with TOPSIS, provides a more objective evaluation of the vulnerability of earthen sites.

For soil vulnerability, empirical equations based on soil properties are widely used internationally. For Tell Helawa and Tell Aliawa in the Kurdistan Region of Iraq, Forti et al. [16] considered factors such as rainfall erosivity, soil erodibility, slope length, and steepness and established the revised universal soil loss equation (RUSLE) to simulate soil erosion, providing a potential tool for identifying geomorphic risks to archaeological sites. Similarly, Ames et al. [17] used the RUSLE to quantify the potential for future sediment loss to assess the erosional impacts on open-air archaeological sites along the Dorin River in South Africa. Polykretis et al. [18] considered soil erodibility factors, rainfall erodibility factors, and other factors to establish a unit stream power erosion and deposition (USPED) model. The archaeological sites most vulnerable to erosion hazards in the state of Chania were identified. The method of these empirical equations is to evaluate from the nature of the soil itself, but the disadvantage is that it is difficult to determine the parameters in the equations. The idea of this paper is to try to start from the results and evaluate the reasonable and feasible vulnerability of soil sites through the results of disease investigation.

But for comprehensive assessments that start with outcomes, there are problems when developing a vulnerability assessment system for earthen sites, such as the difficulty in selecting evaluation indicators and the complexity of relationships that are difficult to clarify. With the rapid development of artificial intelligence and machine learning technologies, they have been widely applied in various fields, including natural language processing [19], computer vision [20], and biomedical research [21]. Some cultural heritage conservation workers [22] have started to use machine learning to assist in the vulnerability assessment analysis of earthen sites. For example, Du et al. [23] have applied Support Vector Machines (SVM) and BP neural networks in a vulnerability assessment system for the Ming Great Wall in Qinghai, using the vulnerability of comprehensive assessment as the target value. The learning pattern of their algorithm is a linear weighted relationship, which is used to replace the multi-indicator comprehensive evaluation method for vulnerability assessment. This paper adopts the entropy weight-TOPSIS method to quantitatively evaluate the vulnerability of earthen sites. Furthermore, the K-means clustering algorithm is introduced to verify the quantitative evaluation results, and the site is divided into vulnerability levels. This provides accurate theoretical guidance for efficient cultural heritage conservation work.

Vulnerability rating evaluation system

The vulnerability rating evaluation system proposed in this paper is shown in Fig. 1. From an evaluation perspective, standardized measurement data from actual earthen sites can be used as evaluation indicators. It adopts a multi-indicator comprehensive evaluation method (i.e., the entropy weight-TOPSIS method) to determine and assign scores to the vulnerability, subsequently using the K-means algorithm to cluster the earthen sites.

Fig. 1
figure 1

Earthen site vulnerability assessment system

Multi-indicator comprehensive evaluation methods integrate multiple aspects and features to achieve a more comprehensive assessment of the evaluation object. Due to the diverse types of damage and complex degradation mechanisms of earthen sites, there are numerous factors that influence the vulnerability assessment. This comprehensive evaluation considers multiple types of damage, leading to a more thorough assessment of the overall vulnerability of an earthen site. The hierarchical structure of the multi-indicator comprehensive evaluation method is shown in Fig. 2. In this case, the Roman numerals represent the hierarchical level, while the Arabic numerals represent different indicators at the same level. For example, II 2 represents the second indicator at the second level, and III 2.1 represents the first indicator at the third level, which belongs to II 2.

Fig. 2
figure 2

Hierarchical structure of the vulnerability assessment method

Typical damage types at earthen sites include crack, gully, collapse, undercutting, scaling off, biological damage, and human-induced damage, as shown in Fig. 3. These types of damage are used as indicator layers in the vulnerability assessment process for earthen sites; in particular, the data related to these types of damage are used as the data layer to obtain a comprehensive evaluation system for the vulnerability of earthen sites (see Fig. 4). This allows for assessment of the overall vulnerability of earthen sites.

Fig. 3
figure 3

Typical types of deterioration in earthen sites of the Ming Great Wall: a crack; b gully; c collapse; d undercutting; and e scaling off [13]

Fig. 4
figure 4

Comprehensive evaluation system for vulnerability of earthen sites

Entropy weight-TOPSIS method

Entropy weight method

With the promotion, application, and in-depth research of entropy theory in various sciences, the concept of entropy has been further developed throughout the mid-twentieth century [24]. In 1948, Shannon proposed the corresponding mathematical expression of entropy, which quantitatively describes the “uncertainty” of data [25]. In recent years, many studies [26, 27] have successfully applied information entropy to multi-criteria comprehensive evaluation and described this “uncertainty” as the “degree of variation”. The idea is that the smaller the calculated information entropy, the greater the degree of variation in the data, the greater the amount of information provided, and the greater the role it plays in comprehensive evaluation, thus the greater the weight. Therefore, the “uncertainty” or “degree of variation” of the data can be used as a basis for weighting the indicators in the entropy weight method.

Assuming there are m objects to be evaluated, each of which has n evaluation indicators, the calculation steps are as follows [28, 29]:

First, establish the evaluation matrix \({X}_{mn}\)

$${X}_{mn}=\left[\begin{array}{cc}\begin{array}{cc}{x}_{11}& {x}_{12}\\ {x}_{21}& {x}_{22}\end{array}& \begin{array}{cc}\cdots & {x}_{1n}\\ \cdots & {x}_{2n}\end{array}\\ \begin{array}{cc}\cdots & \cdots \\ {x}_{m1}& {x}_{m2}\end{array}& \begin{array}{cc}\ddots & \cdots \\ \cdots & {x}_{mn}\end{array}\end{array}\right]$$
(1)

Second, perform data standardization. The original data for each indicator may have different dimensions, making them difficult to compare and analyze. Therefore, it is necessary to standardize the data. Positive and negative indicators are normalized differently as shown in Eq. (2).

$${y}_{ij}=\left\{\begin{array}{c}\frac{{x}_{ij}-{x}_{min}}{{x}_{max}-{x}_{min}}; The\; j \; column \; is\; positive \; indicators\\ \frac{{x}_{max}-{x}_{ij}}{{x}_{max}-{x}_{min}}; The \; j\; column\ ; is \; positive\; indicators\end{array}\right.i=1, 2, \dots , m; j=1, 2,\dots ,n$$
(2)

where \({x}_{ij}\) is the value in the ith row and jth column, representing the jth evaluation indicator of the ith object; and \({x}_{max}\) and \({x}_{min}\) represent the maximum and minimum values in the jth column, respectively.

After obtaining \({y}_{ij}\)(dimensionless data), further normalization is performed using Eq. (3) to obtain \({p}_{ij}\), the feature weight of the ith evaluation object under the jth indicator:

$${p}_{ij}=\frac{{y}_{ij}}{\sum_{i=1}^{m}{y}_{ij}}; i=1, 2,\dots ,m; j=1, 2, \dots ,n$$
(3)

Third, after obtaining the feature weights, Eq. (4) is used to calculate the information entropy (\({E}_{j}\)) of each indicator, where \({p}_{ij}\mathit{ln}\left({p}_{ij}\right)=0\) is taken when \({p}_{ij}\) is equal to 0 [27].

$${E}_{j}=-\frac{1}{\mathit{ln}\left(m\right)}\sum_{i=1}^{m}{p}_{ij}\mathit{ln}\left({p}_{ij}\right); i=1, 2,\dots ,m; j=1, 2,\dots ,n$$
(4)

Finally, the weight matrix \(W=\left[\begin{array}{cc}\begin{array}{cc}{w}_{1}& {w}_{2}\end{array}& \begin{array}{cc}\dots & {w}_{n}\end{array}\end{array}\right]\) is calculated based on the entropy values of each indicator. The weights of the indicators are denoted as \(\begin{array}{cc}\begin{array}{cc}{w}_{1}& {w}_{2}\end{array}& \begin{array}{cc}\dots & {w}_{n}\end{array}\end{array}\), and the calculation formula is as shown in Eq. (5):

$${w}_{j}=\frac{1-{E}_{j}}{{{\sum }_{1}^{n}(1-E}_{j})}; j=1, 2,\dots ,n$$
(5)

TOPSIS method

The TOPSIS method—also known as the method of distance to an ideal solution—is a classic indicator-based decision-making method first proposed by Hwang and Yoon in 1981 [30]. The basic principle is to identify the best and worst solutions among the limited alternatives from the normalized original matrix. The relative closeness is then calculated based on the distance between each objective and the best and worst solutions, serving as the comprehensive evaluation result for assessing the superiority or inferiority of the research objective. The TOPSIS method can effectively utilize the original data information and accurately reflect the distances between evaluation objects, whether it is for small or large sample data. The calculation process is detailed in the following [31, 32].

  • Step 1: Data Standardization

In the data processing step of the TOPSIS method, it is necessary to distinguish between positive indicators and negative indicators. For negative indicators, the data need to be transformed into positive values, as shown in Eq. (6). For normalization, the TOPSIS method uses the squared sum normalization (SSN), as shown in Eq. (7).

$${x}_{ij}^{*}={x}_{max}-{x}_{ij}; i=1, 2,\dots ,m; j=1, 2,\dots ,n$$
(6)
$${Z}_{ij}=\left\{\begin{array}{c}\frac{{x}_{ij}}{\sqrt{\sum {{x}_{ij}}^{2}}} ; The\; j\; column\; is \;positive\; indicators;\\ \frac{{x}_{ij}^{*}}{\sqrt{\sum {{x}_{ij}^{*}}^{2}}} ; The\; j\; column\; is \;negative\; indicators;\end{array} i=1, 2,\dots m; j=1, 2,\dots n\right.$$
(7)

where \({x}_{ij}^{*}\) represents the value of a negative indicator after the positive processing, \({x}_{max}\) is the maximum value of the indicator in column j, and \({z}_{ij}\) represents the values after normalization.

  • Step 2: Constructing the Ideal Solution Vectors

The combination of maximum values in each column forms the ideal solution vector \({z}^{+}\), while the combination of minimum values in each column forms the ideal worst solution vector \({z}^{-}\), as shown in Eq. (8), respectively.

$${z}^{+}=\left({z}_{1}^{+},{z}_{2}^{+}{,\dots ,z}_{n}^{+}\right)=\left\{max{({w}_{j}z}_{ij})|i=\mathrm{1,2},\dots ,m\right\}$$
$${z}^{-}=\left({z}_{1}^{-},{z}_{2}^{-}{,\dots ,z}_{n}^{-}\right)=\left\{min{({w}_{j}z}_{ij})|i=\mathrm{1,2},\dots ,m\right\}$$
(8)
  • Step 3: Calculating positive and negative ideal solutions

Using the formula (9), combined with the weights of each index from the entropy weighting method results, the distance between each assessment object and the positive and negative ideal solutions is calculated.

$$\left\{\begin{array}{c}{D}_{i}^{+}=\sqrt{\sum_{j=1}^{n}{{{w}_{j}(z}_{j}^{+}-{z}_{ij})}^{2}};\\ {D}_{i}^{-}=\sqrt{\sum_{j=1}^{n}{{{w}_{j}(z}_{j}^{-}-{z}_{ij})}^{2}};\end{array} i=1, 2,\dots ,m; j=1, 2,\dots ,n\right.$$
(9)
  • Step 4: Calculating the relative closeness

The relative closeness measure \({C}_{i}\) can be calculated using Eq. (10).

$${C}_{i}=\frac{{D}_{i}^{-}}{{D}_{i}^{-}+{D}_{i}^{+}}; i=1, 2,\dots ,m$$
(10)

K-means clustering algorithm

K-means clustering is a classic unsupervised learning algorithm [33], initially proposed by MacQueen in 1967 [34]. This algorithm partitions samples based on their similarity in features, grouping samples with high similarity into the same cluster. In an iterative manner, the K-means algorithm can automatically determine the coordinates of cluster centers and obtain classification results based on the obtained clusters. It has advantages such as high computational efficiency, easy interpretability, and strong operability [35]. The clustering process is shown in Fig. 5.

Fig. 5
figure 5

K-means clustering algorithm flowchart

The value of K is a hyperparameter, representing the number of clusters, which is usually determined using the elbow method or based on specific objectives. Based on the distance between features of earthen sites, which reflects the severity of damage, the final clustering result groups earthen sites with similar damage severity into the same class. Additionally, this clustering algorithm is completely based on objective damage data and can reflect the potential evolution patterns of damages, to some extent. For example, when conducting field investigations on the topography, environment, and soil conditions of the same class of earthen sites, there may be some similar features or severe impact from the same type of damage. These underlying patterns help to deepen our understanding of the causes, evolution patterns, and prevention strategies of damages in earthen sites.

Results of a case

Yao et al. [11] conducted on-site investigations on the damage status of 29 individual buildings of the Ming Great Wall in northern Shaanxi. In the case of this assessment, larger values of damage data indicate greater vulnerability to damage, and therefore these indicators of damage are positive. So according to Eq. (2), the standardized damage data based on their investigation are shown in Table 1.

Table 1 Standardized data on the damage for the Ming Great Wall in northern Shaanxi
Table 2 Weights for each indicator, calculated by using the entropy weighting method

Once the entropy weights are obtained, the comprehensive assessment score can be calculated in conjunction with the TOPSIS method. In this case, the data are calculated according to formulas (9) and (10), and the result is shown in Fig. 6. The positive ideal solution takes the maximum value of each damage index as an ideal point, whereas the negative ideal solution considers the minimum value of each damage index as the ideal point, so in Fig. 6a, points with a smaller \({D}_{i}^{+}\) and a larger \({D}_{i}^{-}\) ultimately calculate to a greater \({C}_{i}\) (vulnerability). Therefore, Earthen sites No. 3 and No. 17, which align with this situation as shown in Fig. 6b, demonstrate greater vulnerability.

Fig. 6
figure 6

Result of TOPSIS: a Positive and Negative Ideal Solutions, b Relative Closeness (also used as vulnerability results of 29 individual buildings of the Ming Great Wall in northern Shaanxi)

In this case, the aim of the study is to be able to classify the earthen sites into three categories, high, medium, and low, according to their susceptibility based on the data of each indicator based on the K-means algorithm, and, therefore, the value of K is specified as 3. Moreover, since the clustering employs six indicators, the visualization of the results needs to be done with the help of a certain reduction of dimensionality techniques. Principal Component Analysis (PCA) is an effective dimensionality reduction technique that has been widely used in the field of data analysis [36]. It is able to transform a high-dimensional feature space into a lower-dimensional space while preserving as much information as possible from the original data. It achieves this by calculating the covariance matrix of the data and performing an eigenvalue decomposition on this matrix. The top n eigenvectors with the largest eigenvalues are selected as the principal components, and the data are projected onto these components to achieve dimensionality reduction. To visually present the clustering results, PCA was conducted to reduce the dimensionality of the six-dimensional data to two dimensions. The clustering results after dimensionality reduction are shown in Fig. 7.

Fig. 7
figure 7

Clustering results of K-means

The straight arrows represent the projection and direction of each damage feature on the two principal component planes. The longer the projection of a straight line on the two coordinate axes, the greater the weight of the corresponding damage feature in the principal component. The artificial damage straight line presents the longest length, followed by the gully, and the weight distribution is consistent with the results obtained from the entropy weighting method. In addition, the direction of the arrow indicates the positive or negative correlation between the feature and the principal component.

Since clustering is an unsupervised learning method, after the results are obtained, the results of the algorithm itself do not yield a hierarchical picture of the vulnerability magnitude of a specific class. It is necessary to combine the results of the aforementioned entropy weight-TOPSIS method to determine the vulnerability size of each class. First of all, this paper then uses Eq. (11) to further normalize \({C}_{i}\), so that the vulnerability size value belongs to the interval 0–1 to get the normalized vulnerability value of \({V}_{i }\), and then draws the classification box diagram as shown in Fig. 8.

Fig. 8
figure 8

Distribution of vulnerability for different categories of earthen sites

$${V}_{i}=\frac{{C}_{i}}{{\sum }_{1}^{m}{C}_{i}}; i=1, 2,\dots ,m$$
(11)

In Fig. 8, the upper and lower boundaries of the box represent the third quartile (upper quartile, Q3) and the first quartile (lower quartile, Q1), respectively. The horizontal line in the box represents the median of the dataset. This box plot reflects the main distribution of the data. Therefore, the main distribution of the data can be reflected by the box. Therefore, the vulnerability of Category 1 earthen sites is mainly distributed between 0.053 and 0.222, and the vulnerability is relatively small. The vulnerability of Category 2 earthen sites is mainly distributed between 0.308 and 0.514, and the vulnerability is moderate. The vulnerability size of Category 3 earthen sites is mainly distributed between 0.782 and 0.943, and the vulnerability is high. It can be determined that: the vulnerability of category 1 is low, the vulnerability of category 2 is medium, and the vulnerability of category 3 is high, and the final classification result and vulnerability level are determined and shown in Table 3. The results indicate that the vulnerabilities of sites 3, 9, and 17 are high. According to the Chinese government’s cultural relics protection policy, those cultural relics with higher urgency for protection should be prioritized. Therefore, these three earthen sites are the objects of priority protection.

Table 3 Clustering results and vulnerability class assessment

Discussion

Reasonableness of weights

The weighting results reflect the degree of damage to the earthen sites by each indicator; in the comprehensive evaluation system, the greater the weight, the greater the impact on the evaluation results. It is well-known that water often has a significant impact and destructive force on soil. Mileto et al. [37] pointed out that rainfall is one of the directly related factors affecting soil structure. The occurrence of undercutting and gullies is closely related to rainwater erosion. Therefore, it is reasonable that these two types of damage have a higher weight in the weighting process. As for cracks and scaling off, these two types of damage are surface deterioration of cultural relics. For murals that record texts and images, surface damage is enough to cause loss of information, making cultural relics lose their original value and causing great harm. However, for earthen sites, the impact of shallow surface deterioration is relatively small. Therefore, it is reasonable to give them a lower weight in the weighting process. It is worth noting that the weight of “human-induced damage” is the highest. From the perspective of the time characteristics of the damage, the erosion process of earthen sites takes a long time to cause serious damage, while human-induced damage may be sudden and instantaneous, causing much greater damage to earthen sites in a short period of time. Therefore, it is reasonable that its weight is the highest in this weighting process.

Advantages of entropy weights-TOPSIS vulnerability assessment methods

To facilitate comparative analysis, the vulnerability assessment results obtained by Yao et al. [11] using the grey relational analysis method were also normalized, and are included in Table  4. In order to objectively compare the two methods, the “amount of earthen site loss,” as the dominant factor in the Grey Relational Analysis (GRA) method, was taken as the reference standard. The rationality of the two vulnerability assessment results was measured according to the correlation between the vulnerability assessment results and the normalized “amount of earthen site loss”.

Table 4 Evaluation of the vulnerability of the Ming Great Wall in northern Shaanxi

A comparative analysis was undertaken focusing on the extreme points of the earthen site loss data. Both evaluation methods identified Site 28, which experienced the least loss, as the least vulnerable. However, for Sites 3 and 17, which suffered the greatest loss, the GRA method assigned scores of 0.507 and 0.382, respectively, while the ET method produced scores of 1 and 0.886, respectively. This demonstrates a stronger correlation between the ET method’s evaluation results and the actual earthen site loss. Consequently, the ET method’s findings are more aligned with the real-world scenarios and offer greater practical significance.

To further analyze the complete data set of 29 sites, this study performed regression analysis using the earthen site loss as a reference criterion. Scatter plots were generated with the normalized earthen site loss on the x-axis and the evaluation scores from the two methods on the y-axis, as illustrated in Fig. 9. The Pearson correlation coefficient for linear fitting was computed to facilitate a comparative evaluation. The Pearson coefficient, ranging from −1 to 1, indicates the strength of a positive correlation, with values closer to 1 signifying a stronger relationship. The analysis revealed a Pearson correlation coefficient of 0.859 between the ET method’s evaluation results and the earthen site loss, compared to 0.691 for the GRA method. This indicates that the ET method’s evaluation results have a higher correlation with the earthen site loss and are more reflective of the earthen sites’ vulnerability.

Fig. 9
figure 9

Results of regression analysis: a The regression analysis results between the “earthen site loss” and the “vulnerability results by GRA”. b The regression analysis results between the “earthen site loss” and the “vulnerability results by ET”

After clustering, the vulnerability assessment scores based on GRA and ET were drawn and arranged according to the clustering results, as shown in Fig. 10. According to the K-means clustering results, the vulnerability assessment values of the two methods are compared in a single figure using a box plot to show the differences between the GRA and ET. The histograms show the results of the two methods, respectively, specifically showing the vulnerability of each earthen site.

Fig. 10
figure 10

The results of combining entropy weight TOPSIS and K-means: a Classification Box Plot of GRA and ET, b Classification Bar Plot of GRA, and c Classification Bar Plot of ET

The ultimate goal of this study is to classify the earthen sites according to their vulnerability, so that the vulnerability size of earthen sites in the same category is more similar, while those in different categories is quite different. in the GRA method, the boxes of Categories 1 and 2 are significantly longer, indicating that the vulnerability size within the same category fluctuates greatly, and the vulnerability intervals for Categories 3 and 1 overlap. Such results fail to achieve the purpose of classifying according to vulnerability size. The box size of the ET method is more uniform, showing an obvious trend of “Category 1 vulnerability < Category 2 vulnerability < Category 3 vulnerability,” which achieves the purpose of classifying according to vulnerability size.

From the histogram in Fig. 10, it can be clearly seen that the vulnerability scores of the first and third categories in the GRA method do not differ much. In the ET method, except for a few points, the vulnerability scores of the same category are relatively close, while the vulnerability scores of different categories differ greatly, thus achieving the purpose of clustering the results based on vulnerability size. Additionally, according to the normalized “amount of Earthen site loss” data in the original paper [11], it can be seen that the amount of Earthen site loss for sites 3, 9, and 17 in the third category was 0.979, 0.473, and 1, respectively, which are relatively high and can be evaluated as indicating higher vulnerability. This is consistent with the evaluation results obtained by using the ET method. Therefore, through the use of the clustering algorithms, the evaluation results of the ET method can be further validated as more objective and having higher reference value.

From Fig. 10, it can also be clearly seen that the results of ET method are insufficient. In Category 1, the evaluation results of the vulnerability of individual earthen sites exceed 0.222, which is closer to the range of Category 2. This is a disadvantage of the classification results. In Category 2, the minimum value is 0.307, Q1 is 0.308, so the lower whisker of the box plot is very short, almost coinciding with the minimum value line, indicating that the vulnerability of Category 2 site is positively skewed, that is, the data is right-skewed. This is due to the inherent variability of the field measurement data. When the amount of data is small, the impact of individual data fluctuations on the results becomes more significant. However, these problems do not affect the calculation process of the method. If more data can be collected, these problems can be alleviated to a certain extent.

Conclusions

In this paper, the earthen site erosion data are utilized to obtain objective vulnerability grades by adopting the ET method, and combines K-means clustering to further classify the vulnerability results. The conclusions are as follows:

  • 1. The weighted results of the entropy weighting method are data-driven, overcoming the problem of greater subjectivity in manual weighting, so that the evaluation process is not affected by subjective factors.

  • 2. This paper proposes to combine the clustering results of the K-means algorithm with the comprehensive evaluation results of the ET method to further classify the vulnerability of each earthen site, providing a more reasonable method for decision-makers in the field of cultural heritage preservation.

  • 3. In the evaluation case, the Pearson correlation coefficient between the vulnerability evaluation results obtained by the GRA method and the amount of earthen site loss was 0.691, while the R2 value of the Pearson’s correlation coefficient between the vulnerability evaluation results obtained by the ET method and the amount of earthen site loss was 0.859; This shows that the ET method more accurately reflects the vulnerability of the earthen site.

  • 4. This study uses K-means clustering to verify and analyze the comprehensive evaluation results. The results show that the evaluation results of the ET method are more in line with the characteristics of “similar vulnerability within the same category and significant differences in vulnerability between different categories.” The final classification results of 29 earthen sites of the Ming Great Wall in northern Shaanxi are provided.

  • 5. The case evaluation results indicate that sites 3, 9, and 17 have higher vulnerability. Under the condition of limited resources, protective measures should be preferentially taken for these three earthen sites.

Data availability

The data used to support the findings of this study are included within the article.

References

  1. Sun M, Chen Y, Shen Y, et al. New progress and prospects in research on earthen site deterioration. Dunhuang Res. 2022;02:136–48.

    Google Scholar 

  2. Liu T, Zhao X, Liu J, et al. Plant-induced diseases at an earthen site, using the Epang palace site as an example. Sci Conserv Archaeol. 2019;31(1):105–10.

    Google Scholar 

  3. Guo Q, Wang Y, Chen W, et al. Key issues and research progress on the deterioration processes and protection techoogy of earthen sites under multi-field coupling. Coatings. 2022;12(11):1677.

    Article  CAS  Google Scholar 

  4. Zhang B, Wei G, Yang F, et al. Challenges and future trends in conservation material research for immovable objects of cultural heritage. Sci Conserv Archaeol. 2010;22(4):102–9.

    Google Scholar 

  5. Wang X. Exploration of conservation philosophy for earthen sites in humid environments and an outlook on future conservation technology. Dunhuang Res. 2013;137:1–6+125.

    CAS  Google Scholar 

  6. Sun M. Research status and development of the conservation of earthen sites. Sci Conserv Archaeol. 2007;4:64–70.

    Google Scholar 

  7. Lu J, Zhao H. Research on characteristics of diseases of kizilgaha beacon tower based on quantitative analysis. Sci Conserv Archaeol. 2021;33(1):103–9.

    Google Scholar 

  8. Sun M. Research on the evaluation system of soil site diseases. Sci Conserv Archaeol. 2012;24(3):27–32.

    CAS  Google Scholar 

  9. Du Y, Chen W, Cui K, et al. Study on damage assessment of earthen sites of the ming great wall in qinghai province based on fuzzy-AHP and AHP-TOPSIS. Int J Architect Herit. 2020;14(6):903–16.

    Article  Google Scholar 

  10. Yang Y. Evaluation of weighting methods in multi-indicator comprehensive evaluation. Statistics Decis. 2006;13:17–9.

    CAS  Google Scholar 

  11. Yao X, Sun M. The quantitative evaluation of deterioration degrees of earthen sites based on gray correlation analysis. Dunhuang Res. 2016;1:128–34.

    Google Scholar 

  12. Lei H. Disease Development Characteristics and Risk Assessment of the Piers in Jiayuguan. PhD Dissertation, Lanzhou University, Lanzhou, China. 2020.

  13. Du Y. Military Defense System and Vulnerability Assessment of Earthen Sites of the Ming Great Wall in Qinghai Province. Doctoral Dissertation, Lanzhou University, Lanzhou, China. 2019.

  14. Guo Z, Chen W, Zhang J, et al. Hazard assessment of potentially dangerous bodies within a cliff based on the fuzzy-ahp method: a case study of the Mogao Grottoes, China. Bull Eng Geol Env. 2017;76(3):1009–20.

    Article  Google Scholar 

  15. Zhang L, Wang Y, Zhang J, et al. Rockfall hazard assessment of the slope of Mogao Grottoes, China Based on AHP F-AHP and AHP-TOPSIS. Environ Earth Sci. 2022;81(14):1–16.

    Article  CAS  Google Scholar 

  16. Forti L, Brandolini F, Oselini V, et al. Geomorphological assessment of the preservation of archaeological tell sites. Sci Rep. 2023;13(1):7683.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Ames CJH, Chambers S, Shaw M, et al. Evaluating erosional impacts on open-air archaeological sites along the Doring River, South Africa: methods and implications for research prioritization. Archaeol Anthropol Sci. 2020;12(5):103.

    Article  Google Scholar 

  18. Polykretis C, Alexakis DD, Grillakis MG, et al. Assessment of water-induced soil erosion as a threat to cultural heritage sites: the case of Chania prefecture, Crete Island. Greece Big Earth Data. 2022;6(4):561–79.

    Article  Google Scholar 

  19. Goyal P, Pandey S, Jain K. Deep Learning for Natural Language Processing. Berlin: Springer; 2018.

    Book  Google Scholar 

  20. Wu Q, Liu Y, Li Q, et al. The Application of Deep Learning in Computer Vision. In Proceedings of the 2017 Chinese Automation Congress (CAC). 2017; 6522–7.

  21. Liu G, Niu Y, Zhao W, et al. Data anomaly detection for structural health monitoring using a combination network of GANomaly and CNN. Smart Struct Syst. 2022;39(1):195–206.

    Google Scholar 

  22. Wang N, Zhao X, Wang L, et al. Novel system for rapid investigation and damage detection in cultural heritage conservation based on deep learning. J Infrastruct Syst. 2019;25(3):04019020.

    Article  Google Scholar 

  23. Du Y, Chen W, Cui K, et al. Damage assessment of earthen sites of the Ming Great Wall in Qinghai province: a comparison between support vector machine (SVM) and BP neural network. J Comput Cult Herit. 2020;13(2):1–18.

    Article  Google Scholar 

  24. Zhu Y, Tian D, Yan F. Effectiveness of entropy weight method in decision-making. Math Probl Eng. 2020; 3564835.

  25. Shannon CE. A mathematical theory of communication. The Bell Syst Technical J. 1948;27(3):379–423.

    Article  Google Scholar 

  26. Luo Y, Li Y. Comprehensive decision-making of transmission network planning based on entropy weight and grey relational analysis. Power Syst Technol. 2013;37(1):77–81.

    Google Scholar 

  27. Zhang B, Zhou G. Bridge safety assessment based on entropy weight method fusion of multi-source data. Bull Sci Technol. 2023;39(1):91–5.

    Google Scholar 

  28. Wen Z. Study on AHP-EWM Coupling model evaluation of heterogeneity in deformed coal reservoirs—taking panguan syncline as an example. PhD Dissertation, China University of Mining and Technology, Xuzhou, China, 2023.

  29. Teng W, Zhang Q. Research on risk control of logistics supply chain finance based on entropy right method. Times of Economy & Trade 2023;10:60–4.

  30. Hwang CL, Yoon K. Methods for Multiple Attribute Decision Making. In: Multiple Attribute Decision Making. Lecture Notes in Economics and Mathematical Systems, vol 186. 1981; Springer, Berlin, Heidelberg.

  31. Yang Y, Mechanistic study on the improvement of farmland fertility and typical crop yield in saline-alkali soil after the project of gully land consolidation. PhD dissertation, Xi’an University of Technology, China, 2023.

  32. Feng X, Li C, Wei S, et al. Comprehensive evaluation of quality of hemerocallis citrina baroni from different regions based on subjective assignment combined with entropy TOPSIS method. Chin J Mod Appl Pharm. 2022;39(22):2927–34.

    Google Scholar 

  33. Wang F, Liu Z. Optimization method of distributed k-means algorithm based on spark. Comput Eng Design. 2019;40(6):1595–600.

    Google Scholar 

  34. MacQueen J. Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. 1967;1(14):281-97.

    Article  Google Scholar 

  35. Yang J, Zhao C. Survey on K-means clustering algorithm. Comput Eng Appl. 2019;55(23):7–14+63.

    Google Scholar 

  36. Bagherzadeh F, Shafighfard T, Khan RMA, et al. Prediction of maximum tensile stress in plain-weave composite laminates with interacting holes via stacked machine learning algorithms: a comparative study. Mech Syst Signal Process. 2023;195:110315.

    Article  Google Scholar 

  37. Mileto C, López-Manzanares FV, Crespo LV, et al. The influence of geographical factors in traditional earthen architecture: the case of the Iberian Peninsula. Sustainability. 2019;11(8):2369.

    Article  Google Scholar 

Download references

Acknowledgements

Thanks to Chunlei Zhang, Weidong Zhang, and Jianxiong Liu for their guidance on this article’s software technology.

Funding

This research was funded by the National Key Research and Development Program of China (No. 2019YFC1520500), and the National Natural Science Foundation of China (No. 51808246, 52078373).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, N.P. and J.H.; methodology, Y.Z. (Ye Zhu) and C.Z.; software, C.Z.; validation, Y.Z. (Ye Zhu) and Y.Z. (Yue Zhang); formal analysis, C.Z., B.S. and F.W.; investigation, B.S. and F.W.; resources, B.S., F.W. and T.W.; data curation, C.Z., Y.Z. (Ye Zhu) and Y.Z. (Yue Zhang); writing—original draft preparation, N.P. and C.Z.; writing—review and editing, Y.Z. (Ye Zhu) and J.H.; visualization, Y.Z. (Yue Zhang) and C.Z.; supervision, N.P. and J.H.; funding acquisition, N.P. and J.H. All authors have read and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Ye Zhu or Jizhong Huang.

Ethics declarations

Competing interests

The authors declare no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Peng, N., Zhang, C., Zhu, Y. et al. A vulnerability evaluation method of earthen sites based on entropy weight-TOPSIS and K-means clustering. Herit Sci 12, 161 (2024). https://doi.org/10.1186/s40494-024-01273-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40494-024-01273-7

Keywords