A vulnerability evaluation method of earthen sites based on entropy weight-TOPSIS and K-means clustering

Peng, Ningbo; Zhang, Chaokai; Zhu, Ye; Zhang, Yue; Sun, Bo; Wang, Fengrui; Huang, Jizhong; Wu, Tong

doi:10.1186/s40494-024-01273-7

Research
Open access
Published: 20 May 2024

A vulnerability evaluation method of earthen sites based on entropy weight-TOPSIS and K-means clustering

Ningbo Peng^1,2,3,4,
Chaokai Zhang^2,3,
Ye Zhu^2,5,
Yue Zhang^1,4,
Bo Sun^2,5,6,
Fengrui Wang^2,5,6,
Jizhong Huang^1,4 &
…
Tong Wu⁷

Heritage Science volume 12, Article number: 161 (2024) Cite this article

306 Accesses
Metrics details

Abstract

The degradation of earthen sites due to natural and human factors has become a pressing issue, necessitating urgent protection measures. In this context, accurate assessment of the vulnerability of earthen sites is essential for the development of effective conservation strategies. In this study, a comprehensive evaluation framework that incorporates multiple indicators is proposed. In particular, the entropy weight- (Technique for Order Preference by Similarity to an Ideal Solution) TOPSIS method is employed for quantitative vulnerability assessment and combined with K-means clustering to define vulnerability levels for earthen sites. To validate the proposed approach, the vulnerability of 29 sections of the Ming Great Wall is evaluated. Eventually, the 29 earthen sites are categorized into three levels: high, medium, and low, according to their degree of vulnerability. The results of gray correlation analysis and entropy weight-TOPSIS method are compared using the ontology missing amount in the original data as the validation standard. The results show that the Pearson correlation coefficient value of the entropy weight-TOPSIS method with the ontology missing amount was 0.859, while the Pearson correlation coefficient value of the gray correlation analysis method with the ontology missing amount was 0.691, so that the results of the entropy weight-TOPSIS method can more accurately reflect the actual vulnerability of earthen sites.

Introduction

Among the eight batches of national key cultural relic protection units announced in China, a total of 1251 earthen sites have been identified [1]. Earthen sites refer to historical sites left by ancient people, in which soil was used as the main building material for various practical activities [2]. These sites carry significant historical information, cultural significance, and high artistic and scientific value [3]. However, these earthen sites in open-air environments face serious threats and resultant damage from natural and human activities [4]. Widespread issues such as collapse, erosion, and cracks have emerged, requiring urgent protection. Earthen sites, influenced by the physical and chemical properties of soil materials, are among the most challenging objects to protect as cultural relics. Furthermore, as immovable cultural relics, earthen sites can only be protected at their original locations, further increasing the difficulty of protection [5, 6].

With the advancement of cultural relic protection in China, the concept of protection has shifted from rescue to preventative protection. Numerous studies have emphasized the importance of accurately assessing the current condition of earthen sites, efficiently allocating resources, and implementing appropriate protection measures. Lu et al. [7] point out that the quantitative analysis of the development characteristics of damages in earthen sites is one of the difficulties in cultural relics protection, but there is currently little research on this aspect. Sun et al. [8] note that the current funding for the protection of earthen sites is still limited, and it is necessary to plan conservation work based on the vulnerability of the sites. Du et al. [9] point out that assessing the damage to earthen sites is a prerequisite for further protection planning and implementation, but there is currently a lack of related research. Vulnerability serves as an indicator to evaluate the status of earthen sites, representing their resistance to external factors. Higher vulnerability values indicate a greater likelihood of damage, highlighting the need for strengthened protection measures. Therefore, evaluating the vulnerability of earthen sites is of utmost importance.

While the vulnerability assessment of earthen sites relies on damage indicators, a single indicator cannot fully capture the overall vulnerability. To enhance the value of the vulnerability assessment results, the use of multiple damage indicators is necessary. According to the different weights assigned to the indicators, multi-indicator comprehensive evaluation methods can be classified into three categories: subjective weight determination, objective weight determination, and a combination of these two weight determination approaches [10]. Subjective weight determination relies on expert opinions, which may vary due to different experiences, potentially leading to bias and limitations, making it difficult to obtain objective evaluation results. The most commonly used algorithm involving subjective weight determination is the Analytic Hierarchy Process (AHP). To overcome the limitations of subjective weight determination, scholars prefer to use a combination weight determination or objective weight determination approach.

The grey relational analysis method is a representative combination weight determination method. Yao et al. [11] have used this method to establish a quantitative evaluation system for earthen damages in the Ming Great Wall in northern Shaanxi. With “ Amount of earthen site loss” as the dominant factor and undercutting, gully, crack, scaling off, biological destruction, and artificial destruction as the evaluation indices, the correlations between the evaluation indices and the dominant factor were obtained. Although the process considers the objective correlation degree, weight determination for the indices still relied on subjective judgment. Some scholars have also improved upon subjective methods. Lei et al. [12] took the undercutting, collapse, crack, and gully evaluation indices and combined the Data Envelopment Analysis (DEA) and AHP methods to establish a vulnerability evaluation system for the Jiayuguan pier site. The AHP–DEA method incorporates objective data from DEA and subjective judgment from AHP experts to determine the indicator weights, making it more suitable for qualitative and quantitative indicator weighting in the evaluation of earthen sites damage vulnerability in actual situations. Du [13] has proposed a three-layer structure evaluation system for the Ming Great Wall in Qinghai Province, utilizing the Fuzzy Analytic Hierarchy Process (FAHP)-TOPSIS method to assess its vulnerability. The FAHP approach employed a triangular fuzzy membership function to determine the importance of indicators, reducing the subjective determination of AHP. Guo [14] has conducted a risk assessment of potential hazards within the Mogao Grottoes cliffs using the Fuzzy AHP method, with results indicating that the FAHP approach yields higher precision in hazard evaluation compared to the AHP method. Zhang [15] has employed the Analytic Hierarchy Process (AHP), Fuzzy Analytic Hierarchy Process (FAHP), and AHP-TOPSIS methods to assess the risk of rockfall disasters at the Mogao Grottoes slopes. The findings suggested that FAHP was the most suitable method for evaluating rockfall hazards in grottoes. However, these methods still rely heavily on expert opinion, introducing certain limitations. To address this issue, this paper introduces an entropy weight-TOPSIS method that solely relies on objective weight determination. The entropy weight method calculates the weight for each indicator using actual damage data, and in combination with TOPSIS, provides a more objective evaluation of the vulnerability of earthen sites.

For soil vulnerability, empirical equations based on soil properties are widely used internationally. For Tell Helawa and Tell Aliawa in the Kurdistan Region of Iraq, Forti et al. [16] considered factors such as rainfall erosivity, soil erodibility, slope length, and steepness and established the revised universal soil loss equation (RUSLE) to simulate soil erosion, providing a potential tool for identifying geomorphic risks to archaeological sites. Similarly, Ames et al. [17] used the RUSLE to quantify the potential for future sediment loss to assess the erosional impacts on open-air archaeological sites along the Dorin River in South Africa. Polykretis et al. [18] considered soil erodibility factors, rainfall erodibility factors, and other factors to establish a unit stream power erosion and deposition (USPED) model. The archaeological sites most vulnerable to erosion hazards in the state of Chania were identified. The method of these empirical equations is to evaluate from the nature of the soil itself, but the disadvantage is that it is difficult to determine the parameters in the equations. The idea of this paper is to try to start from the results and evaluate the reasonable and feasible vulnerability of soil sites through the results of disease investigation.

But for comprehensive assessments that start with outcomes, there are problems when developing a vulnerability assessment system for earthen sites, such as the difficulty in selecting evaluation indicators and the complexity of relationships that are difficult to clarify. With the rapid development of artificial intelligence and machine learning technologies, they have been widely applied in various fields, including natural language processing [19], computer vision [20], and biomedical research [21]. Some cultural heritage conservation workers [22] have started to use machine learning to assist in the vulnerability assessment analysis of earthen sites. For example, Du et al. [23] have applied Support Vector Machines (SVM) and BP neural networks in a vulnerability assessment system for the Ming Great Wall in Qinghai, using the vulnerability of comprehensive assessment as the target value. The learning pattern of their algorithm is a linear weighted relationship, which is used to replace the multi-indicator comprehensive evaluation method for vulnerability assessment. This paper adopts the entropy weight-TOPSIS method to quantitatively evaluate the vulnerability of earthen sites. Furthermore, the K-means clustering algorithm is introduced to verify the quantitative evaluation results, and the site is divided into vulnerability levels. This provides accurate theoretical guidance for efficient cultural heritage conservation work.

Vulnerability rating evaluation system

The vulnerability rating evaluation system proposed in this paper is shown in Fig. 1. From an evaluation perspective, standardized measurement data from actual earthen sites can be used as evaluation indicators. It adopts a multi-indicator comprehensive evaluation method (i.e., the entropy weight-TOPSIS method) to determine and assign scores to the vulnerability, subsequently using the K-means algorithm to cluster the earthen sites.

Multi-indicator comprehensive evaluation methods integrate multiple aspects and features to achieve a more comprehensive assessment of the evaluation object. Due to the diverse types of damage and complex degradation mechanisms of earthen sites, there are numerous factors that influence the vulnerability assessment. This comprehensive evaluation considers multiple types of damage, leading to a more thorough assessment of the overall vulnerability of an earthen site. The hierarchical structure of the multi-indicator comprehensive evaluation method is shown in Fig. 2. In this case, the Roman numerals represent the hierarchical level, while the Arabic numerals represent different indicators at the same level. For example, II 2 represents the second indicator at the second level, and III 2.1 represents the first indicator at the third level, which belongs to II 2.

Typical damage types at earthen sites include crack, gully, collapse, undercutting, scaling off, biological damage, and human-induced damage, as shown in Fig. 3. These types of damage are used as indicator layers in the vulnerability assessment process for earthen sites; in particular, the data related to these types of damage are used as the data layer to obtain a comprehensive evaluation system for the vulnerability of earthen sites (see Fig. 4). This allows for assessment of the overall vulnerability of earthen sites.

Entropy weight-TOPSIS method

Entropy weight method

With the promotion, application, and in-depth research of entropy theory in various sciences, the concept of entropy has been further developed throughout the mid-twentieth century [24]. In 1948, Shannon proposed the corresponding mathematical expression of entropy, which quantitatively describes the “uncertainty” of data [25]. In recent years, many studies [26, 27] have successfully applied information entropy to multi-criteria comprehensive evaluation and described this “uncertainty” as the “degree of variation”. The idea is that the smaller the calculated information entropy, the greater the degree of variation in the data, the greater the amount of information provided, and the greater the role it plays in comprehensive evaluation, thus the greater the weight. Therefore, the “uncertainty” or “degree of variation” of the data can be used as a basis for weighting the indicators in the entropy weight method.

Assuming there are m objects to be evaluated, each of which has n evaluation indicators, the calculation steps are as follows [28, 29]:

First, establish the evaluation matrix ${X}_{mn}$

$${X}_{mn}=\left[\begin{array}{cc}\begin{array}{cc}{x}_{11}& {x}_{12}\\ {x}_{21}& {x}_{22}\end{array}& \begin{array}{cc}\cdots & {x}_{1n}\\ \cdots & {x}_{2n}\end{array}\\ \begin{array}{cc}\cdots & \cdots \\ {x}_{m1}& {x}_{m2}\end{array}& \begin{array}{cc}\ddots & \cdots \\ \cdots & {x}_{mn}\end{array}\end{array}\right]$$

(1)

Second, perform data standardization. The original data for each indicator may have different dimensions, making them difficult to compare and analyze. Therefore, it is necessary to standardize the data. Positive and negative indicators are normalized differently as shown in Eq. (2).

$${y}_{ij}=\left\{\begin{array}{c}\frac{{x}_{ij}-{x}_{min}}{{x}_{max}-{x}_{min}}; The\; j \; column \; is\; positive \; indicators\\ \frac{{x}_{max}-{x}_{ij}}{{x}_{max}-{x}_{min}}; The \; j\; column\ ; is \; positive\; indicators\end{array}\right.i=1, 2, \dots , m; j=1, 2,\dots ,n$$

(2)

where ${x}_{ij}$ is the value in the ith row and jth column, representing the jth evaluation indicator of the ith object; and ${x}_{max}$ and ${x}_{min}$ represent the maximum and minimum values in the jth column, respectively.

After obtaining ${y}_{ij}$(dimensionless data), further normalization is performed using Eq. (3) to obtain ${p}_{ij}$, the feature weight of the ith evaluation object under the j^th indicator:

$${p}_{ij}=\frac{{y}_{ij}}{\sum_{i=1}^{m}{y}_{ij}}; i=1, 2,\dots ,m; j=1, 2, \dots ,n$$

(3)

Third, after obtaining the feature weights, Eq. (4) is used to calculate the information entropy (${E}_{j}$) of each indicator, where ${p}_{ij}\mathit{ln}\left({p}_{ij}\right)=0$ is taken when ${p}_{ij}$ is equal to 0 [27].

$${E}_{j}=-\frac{1}{\mathit{ln}\left(m\right)}\sum_{i=1}^{m}{p}_{ij}\mathit{ln}\left({p}_{ij}\right); i=1, 2,\dots ,m; j=1, 2,\dots ,n$$

(4)

Finally, the weight matrix $W=\left[\begin{array}{cc}\begin{array}{cc}{w}_{1}& {w}_{2}\end{array}& \begin{array}{cc}\dots & {w}_{n}\end{array}\end{array}\right]$ is calculated based on the entropy values of each indicator. The weights of the indicators are denoted as $\begin{array}{cc}\begin{array}{cc}{w}_{1}& {w}_{2}\end{array}& \begin{array}{cc}\dots & {w}_{n}\end{array}\end{array}$, and the calculation formula is as shown in Eq. (5):

$${w}_{j}=\frac{1-{E}_{j}}{{{\sum }_{1}^{n}(1-E}_{j})}; j=1, 2,\dots ,n$$

(5)

TOPSIS method

The TOPSIS method—also known as the method of distance to an ideal solution—is a classic indicator-based decision-making method first proposed by Hwang and Yoon in 1981 [30]. The basic principle is to identify the best and worst solutions among the limited alternatives from the normalized original matrix. The relative closeness is then calculated based on the distance between each objective and the best and worst solutions, serving as the comprehensive evaluation result for assessing the superiority or inferiority of the research objective. The TOPSIS method can effectively utilize the original data information and accurately reflect the distances between evaluation objects, whether it is for small or large sample data. The calculation process is detailed in the following [31, 32].

Step 1: Data Standardization

In the data processing step of the TOPSIS method, it is necessary to distinguish between positive indicators and negative indicators. For negative indicators, the data need to be transformed into positive values, as shown in Eq. (6). For normalization, the TOPSIS method uses the squared sum normalization (SSN), as shown in Eq. (7).

$${x}_{ij}^{*}={x}_{max}-{x}_{ij}; i=1, 2,\dots ,m; j=1, 2,\dots ,n$$

(6)

$${Z}_{ij}=\left\{\begin{array}{c}\frac{{x}_{ij}}{\sqrt{\sum {{x}_{ij}}^{2}}} ; The\; j\; column\; is \;positive\; indicators;\\ \frac{{x}_{ij}^{*}}{\sqrt{\sum {{x}_{ij}^{*}}^{2}}} ; The\; j\; column\; is \;negative\; indicators;\end{array} i=1, 2,\dots m; j=1, 2,\dots n\right.$$

(7)

where ${x}_{ij}^{*}$ represents the value of a negative indicator after the positive processing, ${x}_{max}$ is the maximum value of the indicator in column j, and ${z}_{ij}$ represents the values after normalization.

Step 2: Constructing the Ideal Solution Vectors

The combination of maximum values in each column forms the ideal solution vector ${z}^{+}$, while the combination of minimum values in each column forms the ideal worst solution vector ${z}^{-}$, as shown in Eq. (8), respectively.

$${z}^{+}=\left({z}_{1}^{+},{z}_{2}^{+}{,\dots ,z}_{n}^{+}\right)=\left\{max{({w}_{j}z}_{ij})|i=\mathrm{1,2},\dots ,m\right\}$$

$${z}^{-}=\left({z}_{1}^{-},{z}_{2}^{-}{,\dots ,z}_{n}^{-}\right)=\left\{min{({w}_{j}z}_{ij})|i=\mathrm{1,2},\dots ,m\right\}$$

(8)

Step 3: Calculating positive and negative ideal solutions

Using the formula (9), combined with the weights of each index from the entropy weighting method results, the distance between each assessment object and the positive and negative ideal solutions is calculated.

$$\left\{\begin{array}{c}{D}_{i}^{+}=\sqrt{\sum_{j=1}^{n}{{{w}_{j}(z}_{j}^{+}-{z}_{ij})}^{2}};\\ {D}_{i}^{-}=\sqrt{\sum_{j=1}^{n}{{{w}_{j}(z}_{j}^{-}-{z}_{ij})}^{2}};\end{array} i=1, 2,\dots ,m; j=1, 2,\dots ,n\right.$$

(9)

Step 4: Calculating the relative closeness

The relative closeness measure ${C}_{i}$ can be calculated using Eq. (10).

$${C}_{i}=\frac{{D}_{i}^{-}}{{D}_{i}^{-}+{D}_{i}^{+}}; i=1, 2,\dots ,m$$

(10)

K-means clustering algorithm

K-means clustering is a classic unsupervised learning algorithm [33], initially proposed by MacQueen in 1967 [34]. This algorithm partitions samples based on their similarity in features, grouping samples with high similarity into the same cluster. In an iterative manner, the K-means algorithm can automatically determine the coordinates of cluster centers and obtain classification results based on the obtained clusters. It has advantages such as high computational efficiency, easy interpretability, and strong operability [35]. The clustering process is shown in Fig. 5.

The value of K is a hyperparameter, representing the number of clusters, which is usually determined using the elbow method or based on specific objectives. Based on the distance between features of earthen sites, which reflects the severity of damage, the final clustering result groups earthen sites with similar damage severity into the same class. Additionally, this clustering algorithm is completely based on objective damage data and can reflect the potential evolution patterns of damages, to some extent. For example, when conducting field investigations on the topography, environment, and soil conditions of the same class of earthen sites, there may be some similar features or severe impact from the same type of damage. These underlying patterns help to deepen our understanding of the causes, evolution patterns, and prevention strategies of damages in earthen sites.

Results of a case

Yao et al. [11] conducted on-site investigations on the damage status of 29 individual buildings of the Ming Great Wall in northern Shaanxi. In the case of this assessment, larger values of damage data indicate greater vulnerability to damage, and therefore these indicators of damage are positive. So according to Eq. (2), the standardized damage data based on their investigation are shown in Table 1.

Table 1 Standardized data on the damage for the Ming Great Wall in northern Shaanxi

Full size table

Table 2 Weights for each indicator, calculated by using the entropy weighting method

Full size table

Once the entropy weights are obtained, the comprehensive assessment score can be calculated in conjunction with the TOPSIS method. In this case, the data are calculated according to formulas (9) and (10), and the result is shown in Fig. 6. The positive ideal solution takes the maximum value of each damage index as an ideal point, whereas the negative ideal solution considers the minimum value of each damage index as the ideal point, so in Fig. 6a, points with a smaller ${D}_{i}^{+}$ and a larger ${D}_{i}^{-}$ ultimately calculate to a greater ${C}_{i}$ (vulnerability). Therefore, Earthen sites No. 3 and No. 17, which align with this situation as shown in Fig. 6b, demonstrate greater vulnerability.

In this case, the aim of the study is to be able to classify the earthen sites into three categories, high, medium, and low, according to their susceptibility based on the data of each indicator based on the K-means algorithm, and, therefore, the value of K is specified as 3. Moreover, since the clustering employs six indicators, the visualization of the results needs to be done with the help of a certain reduction of dimensionality techniques. Principal Component Analysis (PCA) is an effective dimensionality reduction technique that has been widely used in the field of data analysis [36]. It is able to transform a high-dimensional feature space into a lower-dimensional space while preserving as much information as possible from the original data. It achieves this by calculating the covariance matrix of the data and performing an eigenvalue decomposition on this matrix. The top n eigenvectors with the largest eigenvalues are selected as the principal components, and the data are projected onto these components to achieve dimensionality reduction. To visually present the clustering results, PCA was conducted to reduce the dimensionality of the six-dimensional data to two dimensions. The clustering results after dimensionality reduction are shown in Fig. 7.

The straight arrows represent the projection and direction of each damage feature on the two principal component planes. The longer the projection of a straight line on the two coordinate axes, the greater the weight of the corresponding damage feature in the principal component. The artificial damage straight line presents the longest length, followed by the gully, and the weight distribution is consistent with the results obtained from the entropy weighting method. In addition, the direction of the arrow indicates the positive or negative correlation between the feature and the principal component.

Since clustering is an unsupervised learning method, after the results are obtained, the results of the algorithm itself do not yield a hierarchical picture of the vulnerability magnitude of a specific class. It is necessary to combine the results of the aforementioned entropy weight-TOPSIS method to determine the vulnerability size of each class. First of all, this paper then uses Eq. (11) to further normalize ${C}_{i}$, so that the vulnerability size value belongs to the interval 0–1 to get the normalized vulnerability value of ${V}_{i }$, and then draws the classification box diagram as shown in Fig. 8.

$${V}_{i}=\frac{{C}_{i}}{{\sum }_{1}^{m}{C}_{i}}; i=1, 2,\dots ,m$$

(11)

In Fig. 8, the upper and lower boundaries of the box represent the third quartile (upper quartile, Q3) and the first quartile (lower quartile, Q1), respectively. The horizontal line in the box represents the median of the dataset. This box plot reflects the main distribution of the data. Therefore, the main distribution of the data can be reflected by the box. Therefore, the vulnerability of Category 1 earthen sites is mainly distributed between 0.053 and 0.222, and the vulnerability is relatively small. The vulnerability of Category 2 earthen sites is mainly distributed between 0.308 and 0.514, and the vulnerability is moderate. The vulnerability size of Category 3 earthen sites is mainly distributed between 0.782 and 0.943, and the vulnerability is high. It can be determined that: the vulnerability of category 1 is low, the vulnerability of category 2 is medium, and the vulnerability of category 3 is high, and the final classification result and vulnerability level are determined and shown in Table 3. The results indicate that the vulnerabilities of sites 3, 9, and 17 are high. According to the Chinese government’s cultural relics protection policy, those cultural relics with higher urgency for protection should be prioritized. Therefore, these three earthen sites are the objects of priority protection.

Table 3 Clustering results and vulnerability class assessment

Full size table

Discussion

Reasonableness of weights

The weighting results reflect the degree of damage to the earthen sites by each indicator; in the comprehensive evaluation system, the greater the weight, the greater the impact on the evaluation results. It is well-known that water often has a significant impact and destructive force on soil. Mileto et al. [37] pointed out that rainfall is one of the directly related factors affecting soil structure. The occurrence of undercutting and gullies is closely related to rainwater erosion. Therefore, it is reasonable that these two types of damage have a higher weight in the weighting process. As for cracks and scaling off, these two types of damage are surface deterioration of cultural relics. For murals that record texts and images, surface damage is enough to cause loss of information, making cultural relics lose their original value and causing great harm. However, for earthen sites, the impact of shallow surface deterioration is relatively small. Therefore, it is reasonable to give them a lower weight in the weighting process. It is worth noting that the weight of “human-induced damage” is the highest. From the perspective of the time characteristics of the damage, the erosion process of earthen sites takes a long time to cause serious damage, while human-induced damage may be sudden and instantaneous, causing much greater damage to earthen sites in a short period of time. Therefore, it is reasonable that its weight is the highest in this weighting process.

Advantages of entropy weights-TOPSIS vulnerability assessment methods

To facilitate comparative analysis, the vulnerability assessment results obtained by Yao et al. [11] using the grey relational analysis method were also normalized, and are included in Table 4. In order to objectively compare the two methods, the “amount of earthen site loss,” as the dominant factor in the Grey Relational Analysis (GRA) method, was taken as the reference standard. The rationality of the two vulnerability assessment results was measured according to the correlation between the vulnerability assessment results and the normalized “amount of earthen site loss”.

Table 4 Evaluation of the vulnerability of the Ming Great Wall in northern Shaanxi

Full size table

A comparative analysis was undertaken focusing on the extreme points of the earthen site loss data. Both evaluation methods identified Site 28, which experienced the least loss, as the least vulnerable. However, for Sites 3 and 17, which suffered the greatest loss, the GRA method assigned scores of 0.507 and 0.382, respectively, while the ET method produced scores of 1 and 0.886, respectively. This demonstrates a stronger correlation between the ET method’s evaluation results and the actual earthen site loss. Consequently, the ET method’s findings are more aligned with the real-world scenarios and offer greater practical significance.

To further analyze the complete data set of 29 sites, this study performed regression analysis using the earthen site loss as a reference criterion. Scatter plots were generated with the normalized earthen site loss on the x-axis and the evaluation scores from the two methods on the y-axis, as illustrated in Fig. 9. The Pearson correlation coefficient for linear fitting was computed to facilitate a comparative evaluation. The Pearson coefficient, ranging from −1 to 1, indicates the strength of a positive correlation, with values closer to 1 signifying a stronger relationship. The analysis revealed a Pearson correlation coefficient of 0.859 between the ET method’s evaluation results and the earthen site loss, compared to 0.691 for the GRA method. This indicates that the ET method’s evaluation results have a higher correlation with the earthen site loss and are more reflective of the earthen sites’ vulnerability.

After clustering, the vulnerability assessment scores based on GRA and ET were drawn and arranged according to the clustering results, as shown in Fig. 10. According to the K-means clustering results, the vulnerability assessment values of the two methods are compared in a single figure using a box plot to show the differences between the GRA and ET. The histograms show the results of the two methods, respectively, specifically showing the vulnerability of each earthen site.

The ultimate goal of this study is to classify the earthen sites according to their vulnerability, so that the vulnerability size of earthen sites in the same category is more similar, while those in different categories is quite different. in the GRA method, the boxes of Categories 1 and 2 are significantly longer, indicating that the vulnerability size within the same category fluctuates greatly, and the vulnerability intervals for Categories 3 and 1 overlap. Such results fail to achieve the purpose of classifying according to vulnerability size. The box size of the ET method is more uniform, showing an obvious trend of “Category 1 vulnerability < Category 2 vulnerability < Category 3 vulnerability,” which achieves the purpose of classifying according to vulnerability size.

From the histogram in Fig. 10, it can be clearly seen that the vulnerability scores of the first and third categories in the GRA method do not differ much. In the ET method, except for a few points, the vulnerability scores of the same category are relatively close, while the vulnerability scores of different categories differ greatly, thus achieving the purpose of clustering the results based on vulnerability size. Additionally, according to the normalized “amount of Earthen site loss” data in the original paper [11], it can be seen that the amount of Earthen site loss for sites 3, 9, and 17 in the third category was 0.979, 0.473, and 1, respectively, which are relatively high and can be evaluated as indicating higher vulnerability. This is consistent with the evaluation results obtained by using the ET method. Therefore, through the use of the clustering algorithms, the evaluation results of the ET method can be further validated as more objective and having higher reference value.

From Fig. 10, it can also be clearly seen that the results of ET method are insufficient. In Category 1, the evaluation results of the vulnerability of individual earthen sites exceed 0.222, which is closer to the range of Category 2. This is a disadvantage of the classification results. In Category 2, the minimum value is 0.307, Q1 is 0.308, so the lower whisker of the box plot is very short, almost coinciding with the minimum value line, indicating that the vulnerability of Category 2 site is positively skewed, that is, the data is right-skewed. This is due to the inherent variability of the field measurement data. When the amount of data is small, the impact of individual data fluctuations on the results becomes more significant. However, these problems do not affect the calculation process of the method. If more data can be collected, these problems can be alleviated to a certain extent.

Conclusions

In this paper, the earthen site erosion data are utilized to obtain objective vulnerability grades by adopting the ET method, and combines K-means clustering to further classify the vulnerability results. The conclusions are as follows:

1. The weighted results of the entropy weighting method are data-driven, overcoming the problem of greater subjectivity in manual weighting, so that the evaluation process is not affected by subjective factors.
2. This paper proposes to combine the clustering results of the K-means algorithm with the comprehensive evaluation results of the ET method to further classify the vulnerability of each earthen site, providing a more reasonable method for decision-makers in the field of cultural heritage preservation.
3. In the evaluation case, the Pearson correlation coefficient between the vulnerability evaluation results obtained by the GRA method and the amount of earthen site loss was 0.691, while the R2 value of the Pearson’s correlation coefficient between the vulnerability evaluation results obtained by the ET method and the amount of earthen site loss was 0.859; This shows that the ET method more accurately reflects the vulnerability of the earthen site.
4. This study uses K-means clustering to verify and analyze the comprehensive evaluation results. The results show that the evaluation results of the ET method are more in line with the characteristics of “similar vulnerability within the same category and significant differences in vulnerability between different categories.” The final classification results of 29 earthen sites of the Ming Great Wall in northern Shaanxi are provided.
5. The case evaluation results indicate that sites 3, 9, and 17 have higher vulnerability. Under the condition of limited resources, protective measures should be preferentially taken for these three earthen sites.

Data availability

The data used to support the findings of this study are included within the article.

References

Sun M, Chen Y, Shen Y, et al. New progress and prospects in research on earthen site deterioration. Dunhuang Res. 2022;02:136–48.
Google Scholar
Liu T, Zhao X, Liu J, et al. Plant-induced diseases at an earthen site, using the Epang palace site as an example. Sci Conserv Archaeol. 2019;31(1):105–10.
Google Scholar
Guo Q, Wang Y, Chen W, et al. Key issues and research progress on the deterioration processes and protection techoogy of earthen sites under multi-field coupling. Coatings. 2022;12(11):1677.
Article CAS Google Scholar
Zhang B, Wei G, Yang F, et al. Challenges and future trends in conservation material research for immovable objects of cultural heritage. Sci Conserv Archaeol. 2010;22(4):102–9.
Google Scholar
Wang X. Exploration of conservation philosophy for earthen sites in humid environments and an outlook on future conservation technology. Dunhuang Res. 2013;137:1–6+125.
CAS Google Scholar
Sun M. Research status and development of the conservation of earthen sites. Sci Conserv Archaeol. 2007;4:64–70.
Google Scholar
Lu J, Zhao H. Research on characteristics of diseases of kizilgaha beacon tower based on quantitative analysis. Sci Conserv Archaeol. 2021;33(1):103–9.
Google Scholar
Sun M. Research on the evaluation system of soil site diseases. Sci Conserv Archaeol. 2012;24(3):27–32.
CAS Google Scholar
Du Y, Chen W, Cui K, et al. Study on damage assessment of earthen sites of the ming great wall in qinghai province based on fuzzy-AHP and AHP-TOPSIS. Int J Architect Herit. 2020;14(6):903–16.
Article Google Scholar
Yang Y. Evaluation of weighting methods in multi-indicator comprehensive evaluation. Statistics Decis. 2006;13:17–9.
CAS Google Scholar
Yao X, Sun M. The quantitative evaluation of deterioration degrees of earthen sites based on gray correlation analysis. Dunhuang Res. 2016;1:128–34.
Google Scholar
Lei H. Disease Development Characteristics and Risk Assessment of the Piers in Jiayuguan. PhD Dissertation, Lanzhou University, Lanzhou, China. 2020.
Du Y. Military Defense System and Vulnerability Assessment of Earthen Sites of the Ming Great Wall in Qinghai Province. Doctoral Dissertation, Lanzhou University, Lanzhou, China. 2019.
Guo Z, Chen W, Zhang J, et al. Hazard assessment of potentially dangerous bodies within a cliff based on the fuzzy-ahp method: a case study of the Mogao Grottoes, China. Bull Eng Geol Env. 2017;76(3):1009–20.
Article Google Scholar
Zhang L, Wang Y, Zhang J, et al. Rockfall hazard assessment of the slope of Mogao Grottoes, China Based on AHP F-AHP and AHP-TOPSIS. Environ Earth Sci. 2022;81(14):1–16.
Article CAS Google Scholar
Forti L, Brandolini F, Oselini V, et al. Geomorphological assessment of the preservation of archaeological tell sites. Sci Rep. 2023;13(1):7683.
Article CAS PubMed PubMed Central Google Scholar
Ames CJH, Chambers S, Shaw M, et al. Evaluating erosional impacts on open-air archaeological sites along the Doring River, South Africa: methods and implications for research prioritization. Archaeol Anthropol Sci. 2020;12(5):103.
Article Google Scholar
Polykretis C, Alexakis DD, Grillakis MG, et al. Assessment of water-induced soil erosion as a threat to cultural heritage sites: the case of Chania prefecture, Crete Island. Greece Big Earth Data. 2022;6(4):561–79.
Article Google Scholar
Goyal P, Pandey S, Jain K. Deep Learning for Natural Language Processing. Berlin: Springer; 2018.
Book Google Scholar
Wu Q, Liu Y, Li Q, et al. The Application of Deep Learning in Computer Vision. In Proceedings of the 2017 Chinese Automation Congress (CAC). 2017; 6522–7.
Liu G, Niu Y, Zhao W, et al. Data anomaly detection for structural health monitoring using a combination network of GANomaly and CNN. Smart Struct Syst. 2022;39(1):195–206.
Google Scholar
Wang N, Zhao X, Wang L, et al. Novel system for rapid investigation and damage detection in cultural heritage conservation based on deep learning. J Infrastruct Syst. 2019;25(3):04019020.
Article Google Scholar
Du Y, Chen W, Cui K, et al. Damage assessment of earthen sites of the Ming Great Wall in Qinghai province: a comparison between support vector machine (SVM) and BP neural network. J Comput Cult Herit. 2020;13(2):1–18.
Article Google Scholar
Zhu Y, Tian D, Yan F. Effectiveness of entropy weight method in decision-making. Math Probl Eng. 2020; 3564835.
Shannon CE. A mathematical theory of communication. The Bell Syst Technical J. 1948;27(3):379–423.
Article Google Scholar
Luo Y, Li Y. Comprehensive decision-making of transmission network planning based on entropy weight and grey relational analysis. Power Syst Technol. 2013;37(1):77–81.
Google Scholar
Zhang B, Zhou G. Bridge safety assessment based on entropy weight method fusion of multi-source data. Bull Sci Technol. 2023;39(1):91–5.
Google Scholar
Wen Z. Study on AHP-EWM Coupling model evaluation of heterogeneity in deformed coal reservoirs—taking panguan syncline as an example. PhD Dissertation, China University of Mining and Technology, Xuzhou, China, 2023.
Teng W, Zhang Q. Research on risk control of logistics supply chain finance based on entropy right method. Times of Economy & Trade 2023;10:60–4.
Hwang CL, Yoon K. Methods for Multiple Attribute Decision Making. In: Multiple Attribute Decision Making. Lecture Notes in Economics and Mathematical Systems, vol 186. 1981; Springer, Berlin, Heidelberg.
Yang Y, Mechanistic study on the improvement of farmland fertility and typical crop yield in saline-alkali soil after the project of gully land consolidation. PhD dissertation, Xi’an University of Technology, China, 2023.
Feng X, Li C, Wei S, et al. Comprehensive evaluation of quality of hemerocallis citrina baroni from different regions based on subjective assignment combined with entropy TOPSIS method. Chin J Mod Appl Pharm. 2022;39(22):2927–34.
Google Scholar
Wang F, Liu Z. Optimization method of distributed k-means algorithm based on spark. Comput Eng Design. 2019;40(6):1595–600.
Google Scholar
MacQueen J. Some methods for classification and analysis of multivariate observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. 1967;1(14):281-97.
Article Google Scholar
Yang J, Zhao C. Survey on K-means clustering algorithm. Comput Eng Appl. 2019;55(23):7–14+63.
Google Scholar
Bagherzadeh F, Shafighfard T, Khan RMA, et al. Prediction of maximum tensile stress in plain-weave composite laminates with interacting holes via stacked machine learning algorithms: a comparative study. Mech Syst Signal Process. 2023;195:110315.
Article Google Scholar
Mileto C, López-Manzanares FV, Crespo LV, et al. The influence of geographical factors in traditional earthen architecture: the case of the Iberian Peninsula. Sustainability. 2019;11(8):2369.
Article Google Scholar

Download references

Acknowledgements

Thanks to Chunlei Zhang, Weidong Zhang, and Jianxiong Liu for their guidance on this article’s software technology.

Funding

This research was funded by the National Key Research and Development Program of China (No. 2019YFC1520500), and the National Natural Science Foundation of China (No. 51808246, 52078373).

Author information

Authors and Affiliations

Institute for the Conservation of Cultural Heritage, School of Cultural Heritage and Information Management, Shanghai University, Shanghai, 200444, China
Ningbo Peng, Yue Zhang & Jizhong Huang
Faculty of Architecture and Civil Engineering, Huaiyin Institute of Technology, Huaian, 223001, China
Ningbo Peng, Chaokai Zhang, Ye Zhu, Bo Sun & Fengrui Wang
College of Civil Engineering, Nanjing Tech University, Nanjing, 211816, China
Ningbo Peng & Chaokai Zhang
Key Laboratory of Silicate Cultural Relics Conservation, Ministry of Education, Shanghai University, Shanghai, 200444, China
Ningbo Peng, Yue Zhang & Jizhong Huang
Key Scientific Research Base of the State Administration of Cultural Heritage for Integrated Technology and Application of Grotto Cultural, Heritage Protection Engineering Department, Lanzhou, 730003, China
Ye Zhu, Bo Sun & Fengrui Wang
China Railway Cultural Heritage Rehabilitation Technology Innovation Co., Ltd., Chengdu, 610032, China
Bo Sun & Fengrui Wang
Huaian Urban Development Investment Holding Group Co., Ltd, Huaian, 223001, China
Tong Wu

Authors

Ningbo Peng
View author publications
You can also search for this author in PubMed Google Scholar
Chaokai Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Ye Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yue Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Bo Sun
View author publications
You can also search for this author in PubMed Google Scholar
Fengrui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jizhong Huang
View author publications
You can also search for this author in PubMed Google Scholar
Tong Wu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization, N.P. and J.H.; methodology, Y.Z. (Ye Zhu) and C.Z.; software, C.Z.; validation, Y.Z. (Ye Zhu) and Y.Z. (Yue Zhang); formal analysis, C.Z., B.S. and F.W.; investigation, B.S. and F.W.; resources, B.S., F.W. and T.W.; data curation, C.Z., Y.Z. (Ye Zhu) and Y.Z. (Yue Zhang); writing—original draft preparation, N.P. and C.Z.; writing—review and editing, Y.Z. (Ye Zhu) and J.H.; visualization, Y.Z. (Yue Zhang) and C.Z.; supervision, N.P. and J.H.; funding acquisition, N.P. and J.H. All authors have read and agreed to the published version of the manuscript.

Corresponding authors

Correspondence to Ye Zhu or Jizhong Huang.

Ethics declarations

Competing interests

The authors declare no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Peng, N., Zhang, C., Zhu, Y. et al. A vulnerability evaluation method of earthen sites based on entropy weight-TOPSIS and K-means clustering. Herit Sci 12, 161 (2024). https://doi.org/10.1186/s40494-024-01273-7

Download citation

Received: 08 February 2024
Accepted: 04 May 2024
Published: 20 May 2024
DOI: https://doi.org/10.1186/s40494-024-01273-7

A vulnerability evaluation method of earthen sites based on entropy weight-TOPSIS and K-means clustering

Abstract

Introduction

Vulnerability rating evaluation system

Entropy weight-TOPSIS method

Entropy weight method

TOPSIS method

K-means clustering algorithm

Results of a case

Discussion

Reasonableness of weights

Advantages of entropy weights-TOPSIS vulnerability assessment methods

Conclusions

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords