Identification of historic building “genes” based on deep learning: a case study on Chinese baroque architecture in Harbin, China
Heritage Science volume 11, Article number: 241 (2023)
The monitoring and protection of historic buildings require a highly professional team and material resources. Monitoring and protecting historical architectural features is an urgent issue. According to the theory of biological gene expression, genes are the fundamental units that control and express biological traits. Similarly, the “genes” of historical architecture are the basic units that control historic features. Identifying these historical architecture “genes” involves identifying the main factors that control the historic features. This process is important for monitoring and protecting the historic features. At present, qualitative subjectivity, difficult quantification, poor recognition accuracy, and low reasoning and recognition efficiency exist in the genetic identification of historic buildings. As an example, this article describes Chinese Baroque architecture in Harbin, China, and draws on the principles of biological gene recognition to reference methods of architectural gene recognition in cultural geography and architecture. Improved U-Net models, traditional U-Net models, FCN models, and EfficientNet models that incorporate channel attention mechanisms are used to identify historic building genes, obtaining the optimal intelligent recognition for historical architectural genes based on deep learning. This research shows that the accuracy of an improved U-Net model incorporating a channel attention mechanism is 69%, which is 4%, 7%, and 1% higher than those of the traditional U-Net, FCN, and EfficientNet, respectively. The F1 score of the improved U-Net model reaches 0.654, which is higher than the 0.619 of the traditional U-Net model, 0.645 of the EfficientNet model, and 0.501 of the FCN model. Therefore, the improved U-Net model is the optimal method for identifying historical architecture genes. This research can provide new tools and methods for identifying historical architectural genes.
The International Council of Monuments and Sites (ICOMOS) promulgated the Venice Charter (hereinafter referred to as the Charter) in 1964. The Charter clarifies the protection concepts, purposes, and contents of historic buildings, emphasizing the importance of protecting historic buildings and their features. The Charter defines historic buildings as individual buildings or groups of buildings that represent the development of a certain historic civilization and a certain city . The features and characteristics of historic buildings refer to their overall appearance, including their layout, facade, decoration, and materials [2, 3]. According to the theory of biological gene expression, biological genes are the fundamental units that control and express biological traits. Similarly, the genes of historic buildings are the fundamental units that control and express the building features, and they are the decisive factors determining the characteristics of historic buildings . Historic building gene identification has the following specific functions: protecting the genes of historical architecture protects the complete features of historic buildings, and identifying historical architectural genes allows for the precise monitoring and protection of historical architectural features. Chinese Baroque architecture is a "creative" experiment conducted by Chinese craftsmen in Harbin to explore modern architecture, showcasing a unique "Chinese–Western combination" of architectural features containing rich historical architectural genes that has valuable material and cultural heritage. The term "Chinese Baroque" was proposed by Japanese scholar Takahiko Nizawa in his article "The Characteristics of Modern Architecture in Harbin". It refers to a combination of Chinese and Western architectural features, with its concept based on Western Baroque, which is famous for its pursuit of excessive decoration [5, 6]. Therefore, the Chinese Baroque is also characterized by inordinate decoration, simply adopting traditional Chinese patterns and attaching them to various architectural features to imitate the West, forming hybrid architectural features. The Chinese Baroque originated in modern China and has a place in the history of modern Chinese architectural development. As shown in Fig. 1, Chinese Baroque architecture in Harbin is mostly concentrated in the Daowai area, which was built in the 1920s and 1930s by Chinese craftsmen outside the Daowai area, imitating the Western architectural features of the Daoli and Nangang areas at that time and adding traditional Chinese decorative patterns . This study conducted a detailed investigation of 68 existing Chinese Baroque buildings and 41 characteristic courtyards enclosed by each building. The architectural features were mainly a combination of Western Baroque, Art Nouveau, eclecticism, and traditional Chinese architectural patterns. The facade details of the buildings along the street were meticulously decorated, while the internal courtyards of the buildings used traditional Chinese corridors, as well as traditional Chinese hanging houses, sparrows, and columns. The architectural wall colours are mostly red or grey [8, 9]. Therefore, taking the Chinese Baroque as an example, it is important and necessary to conduct research on optimal methods for the genetic identification of historic buildings.
The existing research on historical architectural genes and their identification methods covers the following aspects: first, the classification and identification methods of historical architectural genes. Liu Peilin classified historical architecture genes into six categories based on the features and characteristics of historic buildings: plane structure, roof design, roof design, gable design, local decoration, and building materials [10,11,12,13]. Based on the research results of Liu Peilin's team, scholars such as Shen Xiuying proposed methods for identifying historical architectural elements, structures, patterns, and meanings based on the expression forms of historical architectural genes . The second aspect is the gene expression patterns and recognition methods of historic buildings. Scholars such as Hu Z analysed four patterns of gene expression in historic buildings—two-dimensional representation, three-dimensional features, visual expression, and structural expression—and proposed corresponding recognition methods . Two-dimensional representation refers to the architecture of building planes and facades; three-dimensional features refer to the overall control of building architecture; visual expression refers to intuitive perceptions of building features; and structural expression refers to building layout, enclosure mode, orientation features and other traits. The third aspect is the biological characteristics and identification methods of historical architectural genes, noting that the basic unit that constitutes a biological system is the cell. Similarly, the "cells" that make up the features of historical architecture are the genes of historical architecture . Related scholars have combined morphological and cartographic semiotic theories and methods to propose a "cell chain shape" graphical analysis method for historical architectural genes, which is used to identify and analyse the morphological features of historical architectural genes . However, the abovementioned methods for identifying historic building genes have problems such as subjective qualitative analysis, quantification difficulties, poor recognition accuracy, and low efficiency in reasoning and recognition, resulting in weak representativeness and explanatory power of the extracted historical architectural genes. In terms of the gene recognition image features of historic buildings, scholars such as Xiao Jing proposed a "dual system" recognition and interpretation method for historic building genes to address the bottleneck of traditional historic building protection methods such as "following the form and losing the rhyme". Combined with historical literature such as local chronicles, they conducted detailed CAD manual mapping of traditional Chinese historic buildings during the Chongqing Festival and drew a gene recognition feature map for historic buildings . Similarly, scholars such as Huang Huada conducted detailed field investigations on 17 traditional red brick historic buildings in the southern Fujian region of China and combined complex methods such as CAD manual mapping to draw genetic recognition feature maps of historic buildings. However, these methods of manually identifying historic building genes need to be combined with extensive field investigations, and images are mostly artificially drawn, which inevitably leads to issues of qualitative subjectivity and incomplete data, making it difficult to achieve precise monitoring and protection of the styles and features of historic buildings [19,20,21].
The development and application of digital technologies such as deep learning provide new methods for the precise monitoring and protection of historical architectural features and can provide a reference for the genetic identification of historic buildings. Relevant scholars have used FCNs to identify damaged areas of historic buildings and improve efficiency for monitoring and repairing historic building features [22,23,24]. Casillo M and other scholars explored a historic building protection model based on a combination of a deep neural network GAN and the Internet of Things, collecting data in real time and managing and sharing it through a network cloud IoT platform . Scholars such as Wei Z used DC nets to test the colour fusion of historic buildings, providing a new method for monitoring historic building features . Scholars such as Bruno S used the Mask R-CNN deep learning model to automatically evaluate and identify the degree of decay on historic buildings or their component images, achieving the intelligent diagnosis and dynamic monitoring of historic buildings . Reinhold A and other scholars used the EfficientNet algorithm to intelligently extract historic building features from massive street view image data and introduced a transfer learning module to enhance the learning ability of the EfficientNet algorithm, providing new tools for the protection and updating of historic buildings . Nugraheni D M K and other scholars applied the DCL-NN model to classify cracks in the concrete structures of historic buildings, achieving risk assessment and dynamic monitoring of historic building structures . Scholars such as Hoła A used the random forest algorithm and support vector machine to detect the moisture content of brick walls in historic buildings to determine the quality of brick walls in historic buildings . Samhouri M and other scholars proposed an automatic multicategory damage detection technology based on convolutional neural network (CNN) models for image classification and feature extraction, which is used to detect damage to historical structures, such as erosion, material loss, colour changes in stones, and damage problems . Scholars such as Liu Z W have combined semantic segmentation (DeeplabV3 +) with drone photogrammetry methods for the remote monitoring of changes in historical architectural features at heritage sites, improving the efficiency of regular inspection and maintenance of historical architectural heritage sites and reducing corresponding human and material resources . Scholars such as Croce V have made full use of machine learning or deep learning, as well as network-based collaborative annotation platforms, to assist heritage experts in the mixed annotation of architectural objects, such as identifying architectural components, degradation patterns, renovations, and material mapping, and to share and access 2D/3D information through the network, providing a more automated annotation tool for public and private stakeholders responsible for restoration and protection activities . Scholars such as Sun M R have used deep convolutional neural networks (DCNNs) to recognize the ages and features of different historic buildings to understand the evolution of architectural elements and features, as well as the relationships between architectural ages and features in space and time. They used public data and deep learning to track the spatiotemporal evolution process of architectural features . Hatir M E et al. developed a method for utilizing the Mask R-CNN algorithm to automatically detect and plot degradation (biological colonization, contour scaling, cracks, higher plants, impact damage, micro karst, missing parts) and restoration interventions, achieving monitoring of heritage degradation patterns . Similarly, Hatir M E et al. developed a petrographic determination and damage model for building stone based on a Mask R-CNN, which improved the efficiency of cultural heritage monitoring and restoration . In addition, Hatir M E et al. used the Mask R-CNN algorithm to detect and map degradation observed at Jumueller archaeological sites and monasteries (cracks, discontinuities, contour scaling, missing parts, biological colonization, presence of higher plants, sediment, weathering, and loss of murals). The proposed algorithm enabled the mapping to quickly and automatically detect the degradation of large monuments .
However, the aforementioned deep learning algorithms still face many challenges in identifying historical architectural genes. First, considering the complex genetic composition of historic buildings, previous algorithms were unable to extract genes from massive and complex historic buildings. Second, historic building genes cover multiple scales, and different scales of historical architectural genes have different requirements for the training accuracy of deep learning models, making it difficult to be competent in identifying historical architectural genes. Third, the identification of historical architectural genes needs to adhere to the principles of regional uniqueness and overall superiority to ensure that the identified historical architectural genes are representative. How to identify historical architectural genes is a major challenge faced by deep learning models.
This article develops an identification method for historical architectural genes based on deep learning. The research questions are as follows: (1) How can improved U-Net models, traditional U-Net models, FCN models, and EfficientNet models that incorporate channel attention mechanisms be used to identify different types and scales of historical architectural genes? (2) How can we demonstrate that an improved U-Net model incorporating a channel attention mechanism is most suitable for the recognition of historical architectural genes? This article takes Chinese Baroque architecture in Harbin, China, as an example, drawing on the principles of biological gene recognition and cultural geography architectural gene recognition methods. According to different spatial scales, by following the four principles of internal uniqueness, external uniqueness, local uniqueness, and overall superiority, the courtyard shape, facade form, local decoration, and building materials are selected as the identification indicators of historical architectural genes, utilizing improved U-net models, traditional U-net models, FCN models, and EfficientNet models that incorporate channel attention mechanisms to identify historical architectural genes and analyse the optimal methods for the identification of historical architectural genes.
The research area is located in the Chinese Baroque Historical and Cultural Block, Daowai District, Harbin City, Heilongjiang Province, China. The research object is the Chinese Baroque architecture inside the block. The block is located in the central eastern part of Harbin City, Heilongjiang Province, China, between 45° 20ʹ–46° 20ʹ N and 126° 15ʹ–127° 30ʹ E. The block area is approximately 31.23 square kilometres (Fig. 2). The time range of this study is the modern period. The modern history of Harbin began with the construction of the Middle East Railway in 1898, and the formation and development of Chinese Baroque architecture also originated from this stage. Therefore, the time range of this study is determined to be from 1898 to 1949 [38, 39].
Beijing, Shenyang, Shanghai, Qingdao and other places also have Chinese Baroque architecture. The Chinese Baroque architecture in Harbin has unique features: first, it is numerous and concentrated ; second, the detailed texture is rich and diverse; third, it remains preserved at the street and neighbourhood level, without serious damage, and has received government attention; and fourth, adding traditional Chinese decorations to the facades of Western Art Nouveau architecture is the most distinctive feature of Harbin . Therefore, taking the Chinese Baroque architecture in Harbin as an example, conducting research on genetic intelligent identification methods for historic buildings has strong representativeness.
Principles and indicators for genetic identification of historic buildings
Relevant scholars have introduced biological genes and their identification methods into the fields of cultural geography and architecture to study architectural genes and their identification [42,43,44]. Architectural genes are the fundamental units that control the overall style of a building, and the purpose of identification is precisely to find the architectural genes that control the overall style of a building . Relevant scholars have formulated four principles for identifying architectural genes based on the unique characteristics of the architectural features within the region: (1) the principle of intrinsic uniqueness: an intrinsic cause of formation that is not present in other regional buildings; (2) the principle of external uniqueness: in terms of external causes, it is not found in buildings in other regions; (3) the principle of local uniqueness: a certain local but key element that is not present in other regional buildings; and (4) the overall superiority principle: although there are similar architectural genes in other regions, they are particularly prominent in this region [46, 47]. Based on the above principles, we selected genetic identification indicators for Chinese Baroque historic buildings. We referred to the principles and indicators of architectural gene identification in cultural geography and architecture and found that the Chinese Baroque roof design and gable design did not reflect the overall superiority principle and local uniqueness principle in terms of quantity and uniqueness. Therefore, these two indicators were deleted , and the Chinese Baroque architectural gene identification indicators were determined to be courtyard form, facade form, local decoration, and building materials . The courtyard system refers to the enclosed style of Chinese Baroque architecture in Harbin on a flat surface, which reflects a unique style and feature of "external west and internal centre", "front store and rear factory", and "upper and lower stores". Facade form refers to the morphological characteristics of various architectural facades of the Chinese Baroque in Harbin. The exterior facades along the street display Western architectural styles mainly featuring Baroque and Art Nouveau, while the interior facades of the courtyard are traditional Chinese corridor-style facades, reflecting the unique style and characteristics of the Chinese Baroque. The partial decoration displays the morphological characteristics of the detailed decoration on various building components of Chinese Baroque architecture. Overall, the components can be divided into Western style exterior facade decorations along the street (mountain flowers, terraces, brackets, eaves, plaques, columns, doors, windows) and traditional Chinese style courtyard interior decorations (stairs, railings, external corridor columns, eaves, hanging, and bird replacement) . Building materials mainly refer to the Chinese Baroque building materials and their characteristics, which are unique in their blend of traditional Chinese blue bricks with Western plastered walls and red bricks [51,52,53,54]. Therefore, based on the above principles and indicators of historic building gene recognition, the improved U-Net model, traditional U-Net model, FCN model, and EfficientNet model integrated with a channel attention mechanism are used to identify historic building genes, and the optimal method of intelligent historic building gene recognition based on deep learning is analysed.
Deep learning-based genetic intelligent identification of historical buildings
The U-Net model was proposed by Ronneberger et al. . This article selects the U-Net model to identify historic building genes. The U-Net model structure has the following characteristics: (1) U-shaped architecture: with downsampling and upsampling paths, this architecture helps to preserve spatial context information and perform fine segmentation and helps to identify historic building genes of different spatial scales and types . (2) Jump connection: The skip connection in the decoder path allows for obtaining features of different resolutions from the encoder path, which is conducive to restoring details and identifying building decoration genes with strong regional identification, fine patterns, and complex features . (3) It performs well on small datasets, is suitable for small datasets and can be effectively segmented under limited data conditions. Due to the specific geographical identification of the Chinese Baroque historical architecture genes in Harbin, China, the number of datasets is limited. The U-Net architecture allows it to perform well on small datasets and capture boundary details very well . Therefore, for this study, the U-Net model has good applicability.
Improved U-Net model for integrating a channel attention mechanism
Based on the characteristics of Chinese Baroque historical architecture genes, we have made improvements based on the basic architecture of U-Net , including the introduction of additional convolutional layers and feature fusion steps in the decoder path. (1) Additional Convolutional Layer: Introducing additional convolutional layers into the decoder path increases the depth of the network, which helps extract higher-level feature representations and is suitable for identifying historical architectural genes at different spatial scales, such as courtyard types and architectural decoration, which have a large spatial span . (2) Feature fusion: When using skip connections, the features of the encoder path are fused in the decoder path, improving the feature representation ability. This has a good effect on identifying historical architectural genes with complex decorations such as mountain flowers, terraces, ox legs, eaves, and walls. These improvements can help the model better capture the details and semantic information of images, thereby improving segmentation performance. Our improved U-Net model has the following advantages: (1) Better feature extraction: By introducing additional convolutional layers, the model can learn image features more deeply and improve its ability to capture the genetic information of historic buildings at different scales, such as courtyard types and architectural decorations. (2) Detail capture: Feature fusion and additional convolutional layers help better capture boundary and detail information in the image. (3) Stronger generalization ability: By increasing the depth and feature fusion of the network, the model performs better on more different types of images and can recognize different types of historic building genes. (4) Performance improvement: These improvements bring higher segmentation performance, especially when dealing with complex images or small structures, such as architectural decoration genes such as hanging trees and sparrows. As shown in Fig. 3, on the basis of the basic U-Net architecture, the improved U-Net model introduces the channel attention mechanism and optimizes the convolutional layer and other parts to enhance the model's feature extraction and segmentation performance. The model starts from the input image and first extracts low-level features through two consecutive 3 × 3 convolutional layers (Conv2D-64) . Then, a channel attention mechanism is used to strengthen channel relationships to better capture historic building gene image information of different scales and types in subsequent feature extraction. In the convolutional layer section, the model uses multiple convolutional layers in the encoder and decoder paths, which help to gradually extract features of different scales. In the encoder path, two 3 × 3 convolutional layers (Conv2D-64) were used, while in the decoder path, two 3 × 3 convolutional layers (Conv2D-64) were used, and a channel attention mechanism was introduced to help ignore background factors such as wires in historic building images and achieve recognition of historic building genes. In the upsampling section of the decoder path, an upsampling layer (UpSampling2D) was used for scale recovery, and then two consecutive 3 × 3 convolutional layers (Conv2D-64) were used to fuse the features of the decoder and encoder and achieve intelligent recognition of historical architectural genes of different spatial scales and types . At this stage, the channel attention mechanism was reintroduced to enhance the representation ability of features and help extract complex and small historical architectural genes. In the final output section, a 3 × 3 convolutional layer (Conv2D-2) was used for feature fusion, and then a 1 × 1 convolutional layer (Conv2D-1) and the sigmoid activation function were applied to obtain the final segmentation result.
Fully convolutional networks (FCNs) are a class of deep learning models without a classification level used to identify local regions and pixel-level categories in images . The emergence of FCNs marks the beginning of the transition from convolutional neural networks (CNNs) to fully connected layers. In traditional image classification tasks, the last layer of the CNN model is usually the fully connected layer, which is used to predict the category of the entire image. However, for semantic segmentation tasks, we need to classify each pixel in the image, which means we need to preserve all convolutional and downsampling layers of the CNN. Therefore, FCNs typically convert the CNN model through a series of convolution and downsampling layers and then convert the final convolution and deconvolution layers into fully connected layers . Specifically, the FCN model first extracts image features through a series of convolution and pooling layers and then generates a 4-dimensional feature vector of the image. These 4-dimensional feature vectors include spatial size, number of channels, and category probability corresponding to each pixel. Then, the number of channels of the feature vector is amplified through the deconvolution layer (also known as the upsampling layer), and a series of fully connected layers are passed to output the final category probability. In this way, the FCN model can perform category prediction on each pixel in the image, achieving the task of semantic segmentation. An important feature of an FCN is that it allows the use of input images of any size without prescaling or cropping the input images. This makes FCNs very useful for processing various types of images and various tasks, such as medical image analysis and semantic segmentation in unmanned driving. However, the FCN model also has some drawbacks. For example, due to the lack of contextual information, FCN models may not provide accurate and complete results in some semantic segmentation tasks. In addition, the computational complexity of the FCN model is also relatively high, requiring a longer training time.
EfficientNet is a set of neural networks from Google that utilize a flexible set of scaling factors and components to more effectively learn tasks and datasets of various complexities. EfficientNet improves the efficiency and accuracy of the network by dynamically scaling its depth, width, and resolution . This flexible design makes EfficientNet well suited for handling various tasks and datasets, especially on resource-constrained devices such as mobile phones and embedded devices. The core concept of EfficientNet is to view network design as a simple scaling problem, where the depth, width, and resolution of the network can be independently scaled. This means that the performance of the network can be adjusted to meet the needs of different tasks and datasets by adjusting these factors. EfficientNet consists of two main parts: a component library used to construct networks, called "EfficientNetB0-E6L2" or "EfficientNetB0-E6L2-C4", and a detailed network blueprint used to guide how to construct networks based on this component library. Specifically, each component contains a list of layers that can be stacked together to form a sequence of layers that can be used to create a new EfficientNet. For example, "EfficientNetB0-E6L2" includes a basic building block consisting of 2 Conv layers, 1 BatchNorm layer, a ReLU layer, 1 Pooling layer, and 6 Conv layers. EfficientNet training and inference are usually conducted on a large amount of computing resources (such as TPU) to ensure the accuracy and efficiency of the model. However, the flexibility of EfficientNet enables it to run on devices with limited resources, which is very valuable for practical applications and edge computing. EfficientNet has achieved excellent performance in many computer vision tasks, including image classification, object detection, and semantic segmentation. Although EfficientNet has a relatively high demand for computing resources, its flexibility and efficiency make it an effective tool in research and application fields.
Important parameters for verifying the performance of the four models
The important parameters for verifying the performance of the four models are as follows: (1) Loss function: The loss function is used to measure the difference between the predicted results of the model and the actual labels, encouraging the model to learn the correct segmentation boundaries and details. The improved loss function is a combination of the binary cross-entropy loss function and the intersection over union (IoU) loss function, aimed at optimizing segmentation performance and reducing the error rate of model recognition of historical architectural genes.
Binary_Cross-entropy is a binary cross-entropy loss function used to measure the difference between predicted results and real labels. The IoU is the intersection and union ratio between the predicted segmentation results and real segmentation. The calculation formula is as follows:
Intersection represents the area of the intersection area between the predicted segmentation area and the actual segmentation area, while Union represents their union area.
(2) Accuracy: The ratio of the correctly predicted sample size to the total predicted sample size.
The true sample (TP) is the number of correctly predicted positive samples; false-positive samples (FP) are the number of samples that are incorrectly predicted to be positive; true negative samples (TN) are the number of correctly predicted negative samples; and false-negative samples (FN) are the number of samples incorrectly predicted to be negative .
(3) Precision: Precision is the ratio of the number of correctly classified positive samples to the number of correctly classified positive samples.
The true sample (TP) is the number of correctly predicted positive samples; false-positive samples (FP) are the number of samples that are incorrectly predicted to be positive.
(4) F1 score: The F1 score is the harmonic average of precision and recall, which can be used to measure the average performance of the model.
Precision is the ratio of the number of correctly classified positive samples to the number of correctly classified positive samples.
(5) Recall: The recall rate (also referred to as the true rate or sensitivity) is the ratio of the number of correctly classified positive samples to the actual number of positive samples.
The true sample (TP) is the number of correctly predicted positive samples; false-positive samples (FP) are the number of samples that are incorrectly predicted to be positive; and false-negative samples (FN) are the number of samples incorrectly predicted to be negative.
The above indicators can be used to verify the accuracy of the four models in identifying historical architectural genes.
Dataset acquisition and classification tags
The historic building data used in this experiment were used for field research and web crawling and consisted of 2518 pieces. We used data augmentation to expand the 2518 pieces of data to 6425 pieces. The training set and validation set were allocated at a ratio of 80% to 20% of the data, respectively . The labelling diagram is shown in Fig. 4.
Experimental environment construction and initialization settings
The computer model used in this study was a Dell Gamebox G15, the system was Windows 10, the memory was 512 GB, the graphics card was an NVIDIA GeForce RTX 3080 using CUDA 11.0, the CPU was an Intel Core i7-10700 K, the initial learning rate was 0.001, the momentum factor was 0.100, and the weight attenuation factor was 0.001.
Results and discussion
Model comparison test
To demonstrate the advantages of improving the U-Net model in this article, we compared the improved U-Net algorithm with traditional U-Net, FCN, and EfficientNet under the same configuration environment and parameters . The comparison results of the model loss values are shown in Table 1 and Fig. 5, the accuracy comparison experimental results are shown in Table 2, the accuracy is shown in Fig. 6, and a comparison of the precision, recall, and F1 scores among the four models is shown in Fig. 7.
According to Table 1 and Fig. 5, compared to traditional U-Net (0.393), FCN (0.465), and EfficientNet (0.113), the improved U-Net model has a lower loss value of 0.078. The reason is that the improved U-Net model has made some optimizations in architecture or hyperparameters, making it better fit the target task in training data. This optimization includes an improved network hierarchy, loss function, and activation function. The U-Net model incorporating the channel attention mechanism has lower losses, mainly because the channel attention mechanism reduces the training parameters of the model and improves its feature extraction ability. The channel attention mechanism focuses on the channel dimensions in input data, enabling the model to better understand and utilize channel information. In the U-Net model, the channel attention mechanism can compress global spatial information, perform feature learning in the channel dimension to form the importance of each channel, and finally assign different weights to each channel through the incentive part. In this way, the model can better utilize the channel information of input data to improve feature extraction capabilities. In addition, the channel attention mechanism can also reduce model loss by reducing training parameters. In deep learning, the more training parameters there are, the more time and computational resources the model requires. At the same time, the more errors that are generated during training, the longer the training time. The channel attention mechanism can extract features by adding the output and input of multiple convolutional layers cascading through the concept of shortcuts, thereby reducing training parameters. This approach can reduce the complexity of the model, improve training efficiency and, to some extent, reduce errors generated during training. Overall, by introducing a channel attention mechanism, the U-Net model can improve feature extraction ability and reduce training parameters, thereby reducing losses. This helps to reduce the loss of genetic recognition details in historic buildings and improve the model's anti-interference ability.
As shown in Table 2 and Fig. 6, the accuracy of the improved U-Net model is 0.690. Compared with the traditional U-Net (0.650), FCN (0.620), and EfficientNet (0.680), the accuracy is improved by 0.040, 0.070, and 0.010, respectively. The U-Net model incorporating the channel attention mechanism has higher accuracy, mainly due to the improved recognition and differentiation ability of the model for different features, as well as the attention and utilization of important channels. In deep learning, the channel attention mechanism focuses on the channel dimensions in input data, enabling the model to better understand and utilize channel information. For the U-Net model, the introduction of a channel attention mechanism can better utilize the channel information of input data, thereby improving its feature extraction ability. In addition, the channel attention mechanism can improve the contextual information perception ability of the U-Net model. In image segmentation tasks, contextual information is crucial for accurately segmenting images. The channel attention mechanism can compress global spatial information and then learn features in the channel dimension to better perceive contextual information. Finally, the channel attention mechanism can also improve the recognition and differentiation ability of U-Net models for different features. By learning and utilizing channel information, the U-Net model can better recognize and distinguish different features, thereby improving its accuracy. Overall, by introducing the channel attention mechanism, the U-Net model can better utilize the channel information of input data and improve the feature extraction ability, contextual information perception ability, and recognition and differentiation ability of different features, thereby obtaining higher scores in accuracy. This helps to identify the complex decorative genes of historic buildings.
As shown in Fig. 7, the F1 score of the improved U-Net model reaches 0.654, which is higher than the 0.619 of the traditional U-Net model, 0.645 of the EfficientNet model, and 0.501 of the FCN model. Our improved U-Net model has a higher F1 score because the channel attention mechanism helps the model better understand and utilize the channel information of input data. The channel attention mechanism enables the model to better understand and utilize channel information by focusing on the channel dimension in the input data. During feature extraction, some channels may be more important for a specific task, while others may be less important. The channel attention mechanism allows the model to focus more attention on important channels and suppress unimportant channels, thereby improving the representation ability of the model. In the U-Net model, the channel attention mechanism can compress global spatial information, learn features in the channel dimension, form the importance of each channel, and finally assign different weights to each channel through the incentive part. In this way, the model can better utilize the channel information of the input data to improve the F1 score, which helps to enhance the ability to identify the genetic characteristics of historic buildings at different scales. In addition, the recall of the improved U-Net model was the highest among the four models, reaching 0.884. The U-Net model incorporating the channel attention mechanism has a higher recall score, mainly due to the improved ability of the model to recognize and distinguish between different features. In deep learning, the channel attention mechanism enables the model to better understand and utilize channel information by focusing on the channel dimension in the input data. For the U-Net model, the introduction of the channel attention mechanism allows it to better utilize the channel information of the input data, thereby improving its feature extraction capability. In addition, the channel attention mechanism can improve the contextual information perception ability of the U-Net model. In image segmentation tasks, contextual information is crucial for accurate image segmentation. The channel attention mechanism can better perceive contextual information by compressing global spatial information and then performing feature learning in the channel dimension. Finally, the channel attention mechanism can also improve the recognition and discrimination capabilities of the U-Net model for different features. By learning and utilizing channel information, the U-Net model can better identify and distinguish different features, thereby improving its recall score, which has a good promoting effect on the genetic identification of historic buildings with strong regional identity. Overall, by introducing the channel attention mechanism, the U-Net model can better utilize the channel information of the input data and improve its feature extraction ability, contextual information perception ability, and recognition and differentiation ability of different features, resulting in higher recall scores. Therefore, the improved U-Net model is the most suitable for identifying historic building genes, which can be used for further precise monitoring and protection of historic building styles and features.
To verify that the channel attention mechanism helps improve the performance of the U-Net model in identifying historical architectural genes, we conducted ablation experiments. The experimental results are shown in Table 3.
According to Table 3, we can draw the following conclusions:
The improved U-Net model with the channel attention mechanism, the improved U-Net model with Squeeze-and-Excitation and the U-Net model without the channel attention mechanism both achieved the best performance in identifying historical architectural genes, with F1 scores of 0.654, 0.635 and 0.619, respectively. This is because the channel attention mechanism effectively enhances the model's feature representation ability, which has good expressive power and robustness for identifying historical architectural genes with strong spatial and geographical identity and subtle and complex decorative features. It can be seen that incorporating the channel attention mechanism plays a significant role in improving the recognition ability of the U-Net model for historical architecture genes and optimizing the performance of the U-Net model, which is helpful for the accurate monitoring and protection of historical architecture.
Shortcomings and prospects
There are several limitations in this study: this study preliminarily utilizes improved U-Net models, traditional U-Net models, FCN models, and EfficientNet models that incorporate channel attention mechanisms to identify historical architectural genes and finds the optimal method suitable for the recognition of historical architectural genes. There is still room for improvement. The genetic classification of historic buildings is relatively complex and has strong regional identification. However, this article contains 20 types of historic building genes, and 6425 samples in the article represent a relatively small dataset. This undoubtedly puts forward high requirements for the quality of the model, which is only applicable to the preliminary research practice of the optimal method for the identification of historical architectural genes. Future research will enhance model performance, expand the number of datasets, and apply an improved U-Net model incorporating a channel attention mechanism to practice genetic recognition of historic buildings.
This article takes Chinese Baroque architecture in Harbin, China, as an example and applies deep learning technology to identify historical architectural genes. The conclusions are as follows:
To solve the problems of qualitative subjectivity, difficult quantification, poor recognition accuracy, and low inference and recognition efficiency in historic building architectural recognition, we used improved U-Net models, traditional U-Net models, FCN models, and EfficientNet models that incorporate channel attention mechanisms to identify historical architectural genes and analysed deep learning methods suitable for the recognition of historical architectural genes. To demonstrate the advantages of incorporating the channel attention mechanism into the improved U-Net model, we compared the improved U-Net model with the traditional U-Net, FCN, and EfficientNet under the same configuration environment and parameters. The results showed that compared with traditional U-Net (0.393), FCN (0.465), and EfficientNet (0.113), the improved U-Net model had a lower loss value (LOS) of 0.078. The accuracy of the improved U-Net model is 0.690. Compared with traditional U-Net (0.650), FCN (0.620), and EfficientNet (0.680), the accuracy was improved by 0.040, 0.070, and 0.010, respectively. The F1 score of the improved U-Net model reaches 0.654, which is higher than the 0.619 of the traditional U-Net model, 0.645 of the EfficientNet model, and 0.501 of the FCN model. In addition, the recall of the improved U-Net model was the highest among the four models, reaching 0.884. Therefore, the improved U-Net model is most suitable for the identification of historical architectural genes, which helps to identify historical architectural genes with complex decorations, strong regional identity and different scales.
To verify that the channel attention mechanism helps improve the performance of the U-Net model in identifying historical architectural genes, we conducted ablation experiments. The experimental results show that the improved U-Net model with the channel attention mechanism, the improved U-Net model with Squeeze-and-Excitation and the U-Net model without the channel attention mechanism achieved the best performance in identifying historical architectural genes, with F1 scores of 0.654, 0.635 and 0.619, respectively. It can be seen that incorporating the channel attention mechanism plays a significant role in improving the recognition ability of the U-Net model for historical architecture genes and optimizing the performance of the U-Net model, which is helpful for the accurate monitoring and protection of historical architecture style.
Through this study, an improved U-Net model incorporating a channel attention mechanism was identified for the identification of historical architectural genes. This provides a new method for identifying the genes of historic buildings, which helps to accurately monitor and protect the characteristics of historic buildings, saves manpower and resources, and improves the quality and efficiency of historic building protection. We have conducted preliminary research on the genetic identification of historic buildings in Chinese Baroque architecture, and we will further apply and promote it in the future. Future research will enhance model performance, expand the number of datasets, and develop an improved U-Net model incorporating a channel attention mechanism to practice genetic recognition of historic buildings.
Availability of data and materials
The datasets used and/or analysed during the current study are available from the corresponding author upon reasonable request.
Convolutional Networks for Biomedical Image Segmentation
Convolutional Neural Network
Fully Convolutional Network
Generative Adversarial Network
Internet of Things
ICOMOS. International Charter for the Conservation and Restoration of Monuments and Sites. Venice; 1964.
Hu WF, Hu RQ. Creating historical building models by deep fusion of multi-source heterogeneous data using residual 3D convolutional neural network. Int J Archit Herit Conver Analy Restorat. 2023;26(6):1135–223. https://doi.org/10.1080/15583058.2023.2229253.
Yuk H, Choi JY, Kim YU, Chang SJ, Kim S. Historic building energy conservation with wooden attic using vacuum insulation panel retrofit technology. Build Environ. 2023;230:10. https://doi.org/10.1016/j.buildenv.2023.110004.
Liu PL, Zeng C, Liu RR. Environmental adaptation of traditional Chinese settlement patterns and its landscape gene mapping. Habitat Int. 2023;135:88–91.
Levoshko S, Kirichkov I. Tourist Quarter “Chinese-Baroque” of Dao Way District in Harbin City: experience, problems and perspectives of renovation. TPACEE. 2016;73:21–6. https://doi.org/10.1051/matecconf/20167306003.
Kim AA. The origins of the formation and features of the manifestation of the Chinese Europeanized Architecture in mid-19th-second half of the 20th century. ICCATS. 2020;10:962. https://doi.org/10.1088/1757-899X/962/3/032060.
Бeлoв MИ. Pedestrian street design. Cities. 2015;126: 103940. https://doi.org/10.1016/j.cities.2015.103940.
Liu SF. Modern Transformation and Model Analysis of Urban Architecture in Harbin from 1898 to 1949. Beijing: China Architecture & Building Press; 2003. (in Chinese).
Yu BY. Impression · Chinese Baroque-Protection and Renewal Plan for the Traditional Commercial City Style Protection Area Outside the Road in Harbin. Harbin: Heilongjiang Science and Technology Press; 2008. (in Chinese).
Nie ZY, Li N, Pan W, Yang YS, Chen W, Hong CL. Quantitative research on the form of traditional villages based on the space gene—a case study of Shibadong village in western Hunan. China Sustain. 2022;14:8965. https://doi.org/10.3390/su14148965.
Jin H, Zhao J, Liu SQ, Kang J. Climate adaptability construction technology of historic conservation areas: the case study of the chinese-baroque historic conservation area in Harbin. Sustainability. 2018;10:3374. https://doi.org/10.3390/su10103374.
Liu P L. Analysis and Expression of the "Gene Maps" of Historical Buildings: Taking the Cocoon Stations in the Hangzhou-Jiaxing-Huzhou Plain as an Example. 2023; 2: 1–11 (in Chinese).
Liu PL, Peng K, Yang LG. The storage, expression, and tourism value of the genetic information of traditional village landscape: A case study of Zhongtian village of Changning city Hunan province. Tourism Hospit Prosp. 2022;6(2):1–25 ((in Chinese).
Shen XY, Liu PL, Deng YY. Seeking the landscape’ s genes atlas: one kind of new angle of view about settlement’ s culural landscape districts research. Human Geograp. 2006;24(4):109–12 (in Chinese).
Hu Z. Research on landscape gene identification and mapping of traditional settlements: a case study of traditional settlements along shu routes in Southern Shaanxi. New Architecture. 2021;1:121–5 (in Chinese).
Yin LC, Li BH, Liu PL, Ning DG. Biological characteristics of landscape gene cells in traditional settlements. Econ Geogarp. 2022;3:182–9 (in Chinese).
Taylor G. Environment, village and city: a genetic approach to urban geography: with some reference to possibly. Ann Assoc Am Geogr. 1942;1:1–67.
Xiao J, Zhang Q, Yang Y, Cao K. Interpreting the “dual system” of landscape genes in mountainous historical towns and protecting their characteristics and inheriting their charm. Chinese J Lands Archit. 2021;37(6):43–8.
Huang H, Qi Z, Lin X, Chen J, Peng D. Identification of landscape genes in traditional red brick settlements in Minnan. Chinese J Lands Archit. 2020;34(6):53–7.
Conzen MRG. Morphogenesis, morphological regions, and secular human agency in the historic townscape, as exemplified by Ludlow. Urban Histor Geography. 1988;2:253–72.
Liu PL, Liu CL, Deng YY, Shen XY. A study on icon-expression of china’ s ancient-city landscape genes “Cell-Chain-Shape” and regional differences. Hum Geogr. 2011;26(1):94–9 (in Chinese).
Hatir ME, Ince I. Lithology mapping of stone heritage via state-of-the-art computer vision. J Build Eng. 2021;34: 101921. https://doi.org/10.1016/j.jobe.2020.101921.
Zou Z, Zhao XF, Zhao P, Qi F, Wang NN. CNN-based statistics and location estimation of missing components in routine inspection of historic buildings. J Cult Herit. 2019;38:1382–447. https://doi.org/10.1016/j.culher.2019.02.002.
Wang NN, Zhao XF, Zhao P, Zhang Y, Zou Z, Qu JP. Automatic damage detection of historic masonry buildings based on mobile deep learning. Autom Constr. 2019;103:53–66. https://doi.org/10.1016/j.autcon.2019.03.003.
Casillo M, Colace F, Gupta B B, Lorusso A, Marongiu F, Santaniello D. A Deep Learning Approach to Protecting Cultural Heritage Buildings Through IoT-Based Systems. 2022 IEEE International Conference on Smart Computing(SMARTCOMP). 2022; 63: 252–256.
Wei Z, Nie JH. Research on color fusion model of historic and cultural blocks in Shanghai based on deep learning algorithm-take Tianzifang as an example. Concurrency Computat Pract Exper. 2023;35:7530. https://doi.org/10.1002/cpe.7530.
Bruno S, Galantucci RA, Musicco A. Decay detection in historic buildings through image-based deep learning. Int J Archit Technol Sustain. 2023;8(1):6–17. https://doi.org/10.4995/vitruvio-ijats.2023.18662.
Reinhold A, Donaldson C, Gregory I, Rayson P. Exploring Deep Mapping Concepts: Crosthwaite’s Map and West’s Picturesque Stations. Proceedings of Workshops and Posters at the 13th International Conference on Spatial Information Theory. 2017; 34: 265–273.
Nugraheni DMK, Nugroho AK, Dewi DIK, Noranita B. Deca convolutional layer neural network (DCL-NN) method for categorizing concrete cracks in heritage building. Int J Adv Comput Sci Appl. 2023;14(1):722–30.
Hoła A, Czarnecki S. Random forest algorithm and support vector machine for nondestructive assessment of mass moisture content of brick walls in historic buildings. Auto Construct. 2023. https://doi.org/10.1016/j.autcon.2023.104793.
Samhouri M, Al-Arabiat L, Al-Atrash F. Prediction and measurement of damage to architectural heritages facades using convolutional neural networks. Neural Comput Appl. 2022;34:18125–41. https://doi.org/10.1007/s00521-022-07461-5.
Liu ZW, Brigham R, Long ER, Wilson L, Frost A, Orr SA, Grau-Bové J. Semantic segmentation and photogrammetry of crowdsourced images to monitor historic facades. Heritage Science. 2022;10:27. https://doi.org/10.1186/s40494-022-00664-y.
Croce V, Manuel A, Caroti G, Piemonte A, Luca LD, Véron P. Semi-automatic classification of digital heritage on the Aïoli open source 2D/3D annotation platform via machine learning and deep learning. J Cultur Heritage. 2023. https://doi.org/10.1016/j.culher.2023.05.017.
Sun MR, Zhang F, Duarte F, Ratti C. Understanding architecture age and style through deep learning. Cities. 2022;128:103787. https://doi.org/10.1016/j.cities.2022.103787.
Hatir ME, Korkanç M, Schachner A, Ince I. The deep learning method applied to the detection and mapping of stone deterioration in open-air sanctuaries of the Hittite period in Anatolia. J Cultur Heritage. 2021;51:37–49. https://doi.org/10.1016/j.culher.2021.07.004.
Hatir ME, Ince I. Lithology mapping of stone heritage via state-of-the-art computer vision. J Build Eng. 2021;34:2–12.
Hatir ME, Ince I, Korkanç M. Intelligent detection of deterioration in cultural stone heritage. J Build Eng. 2021;44:2–15.
Zhou LJ, Han P, Zou WP. A study on the form of modern harbin dao way courtyard from the perspective of cultural studies. Harbin: Harbin Engineering University Press; 2021. (in Chinese).
Fan, Z X. Study on the Art Form of Chinese Baroque Architectural Decoration(2019). http://kns-cnki-net-s.ivpn.hit.edu.cn/. Accessed on 12 Jun 2019 (in Chinese).
Shao, Z F. The Research of the Local Culture Impact on the Chinese Baroque Buildings in Harbin DaoWai District (2009). https://www.webofscience.com/. Accessed on 1 Apr 2009 (in Chinese).
Li GH, He N, Zhang CH. Evaluation of tree shade effectiveness and its renewal strategy in typical historic districts: a case study in Harbin, China. Environ Plann B-Urban Analy City Sci. 2022;49(3):898–914. https://doi.org/10.1177/23998083211029653.
Dawkins R. The Selfish Gene (30th anniversary edition). Oxford: Oxford University Press; 2006.
Tamarin RH. Principles of Genetics. Oxford: Oxford University Press; 2001.
Li HM. A study on strategies for protection and revitalization of historic streets from the perspective of cultural genes (meme): focusing on West Guan Historic Street in Guangzhou. China Asia-pacific J Conver Res Int. 2022;8(3):113–31. https://doi.org/10.47116/apjcri.2022.03.11.
Sun YK, Zhai FQ. A study on the landscape genealogical mapping of traditional settlements in the old city of Kashgar. World Architecture. 2021;9:27–31 (in Chinese).
Li X, Yi LJ. Spatial gene mapping of the dong settlement in Tongdao Hunan. South Architecture. 2020;2:89–96 (in Chinese).
Liu, P L. On Construction and Utilization of Chinese Traditional Settlements Landscape’s Genetic Map(2011). http://kns-cnki-net-s.ivpn.hit.edu.cn/. Accessed on 12 Jun 2011 (in Chinese).
An YY, Zhou YW, Su XW. A study of the spatial morphology and composition of the tibetan settlement landscape in Gannan, from the perspective of a gene Atlas. South Architecture. 2018;4:46–51 (in Chinese).
Liu D P, W Y. Art Nouveau Architecture in Harbin. Harbin: Harbin Institute of Technology; 2015 (in Chinese).
Shan LL. Research on the modern architectural decoration of north third street in Daowai district of Harbin from the perspective of nationality. ICACE. 2016;8:209–13.
Tan AL, Lu M. Research on the renewal strategies of historical block under the “urban renoviation and ecological restoration” a case study of the chinese baroque. Low Temp Build Technol. 2020;9(1):34–45 (in Chinese).
Lu M, Wu ST, Guo EZ. Practice of revitalizing traditional landscape protection areas: taking the controlled detailed planning of traditional landscape protection areas in the daowai district of harbin as an example. City Planning. 2005;29(11):89–92.
Zhang D, Xu S. Humantic needs and satisfaction study of multi-users in historic districts based on cognitive psychology. Psychiatry. 2021;8:681–92.
Dong, W W. Research on Feature of Façade Face to Street of “Chinese Baroque” Street Corner Building in Harbin(2019). http://kns-cnki-net-s.ivpn.hit.edu.cn/. Accessed on 14 Jun 2019 (in Chinese).
Jiang YH, Han SS, Bai Y. Scan4Facade: automated as-is facade modeling of historichigh-rise buildings using drones and AI. J Archit Eng. 2022;28(4):343–52. https://doi.org/10.1061/(ASCE)AE.1943-5568.0000564.
Suh JW, Ouimet W. Mapping stone walls in Northeastern USA using deep learning and LiDAR data. GISci Remote Sens. 2023;60(1):113–20. https://doi.org/10.1080/15481603.2023.2196117.
Trotter EFL, Fernandes ACM. Machine learning for automatic detection of historic stone walls using LiDAR data. Int J Remote Sens. 2022;8:2185–211. https://doi.org/10.1080/01431161.2022.2057206.
Zhang W, Tang P, Zhao LJ, Huang QQ. A comparative study of U-nets with various convolution components for building extraction. JURSE. 2019;8:127–34.
Temenos A, Temenos N, Doulamis A, Doulamis N. On the exploration of automatic building extraction from RGB satellite images using deep learning architectures based on U-net. Technologies. 2022;19(10):1134–5. https://doi.org/10.3390/technologies10010019.
Sariturk B, Seker DZ. A residual-inception U-Net (RIU-Net) approach and comparisons with U-shaped CNN and transformer models for building segmentation from high-resolution satellite images. Sensors. 2022;22(19):7624. https://doi.org/10.3390/s22197624.
Yu XD, Kuan TW, Tseng SP. EnRDeA U-net deep learning of semantic segmentation on intricate noise roads. Entropy. 2023;1085(23):1135–40. https://doi.org/10.3390/e25071085.
Zhang YJ, Kong JY, He FS. Convolutional block attention module U-Net: a method to improve attention mechanism and U-Net for remote sensing images. J Appl Remote Sens. 2022;16(2):1306–62. https://doi.org/10.1117/1.JRS.16.026516.
Mei K, Liu J, Wei JB. Performance analysis on machine learning-based channel estimation. IEEE Trans Commun. 2021;69(8):5183–93. https://doi.org/10.1109/TCOMM.2021.3083597.
Yang J, Duan J, Shi TL. Tool wear monitoring in milling based on fine-grained image classification of machined surface images. Sensors. 2022;22(21):445–56. https://doi.org/10.3390/s22218416.
Dong L, Wang CS, Li C. An improved ResNet-1d with channel attention for tool wear monitor in smart manufacturing. Sensors. 2023;23(3):1240. https://doi.org/10.3390/s23031240.
This research was funded by the Heilongjiang Province Philosophy and Social Science Research Planning Project (No. 18SHC230).
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Shao, L., Sun, J. Identification of historic building “genes” based on deep learning: a case study on Chinese baroque architecture in Harbin, China. Herit Sci 11, 241 (2023). https://doi.org/10.1186/s40494-023-01091-3