Sgrgan: sketch-guided restoration for traditional Chinese landscape paintings

Image restoration is a prominent field of research in computer vision. Restoring broken paintings, especially ancient Chinese artworks, is a significant challenge for current restoration models. The difficulty lies in realistically reinstating the intricate and delicate textures inherent in the original pieces. This process requires preserving the unique style and artistic characteristics of the ancient Chinese paintings. To enhance the effectiveness of restoring and preserving traditional Chinese paintings, this paper presents a framework called Sketch-Guided Restoration Generative Adversarial Network, termd SGRGAN. The framework employs sketch images as structural priors, providing essential information for the restoration process. Additionally, a novel Focal block is proposed to enhance the fusion and interaction of textural and structural elements. It is noteworthy that a BiSCCFormer block, incorporating a Bi-level routing attention mechanism, is devised to comprehensively grasp the structural and semantic details of the image, including its contours and layout. Extensive experiments and ablation studies on MaskCLP and Mural datasets demonstrate the superiority of the proposed method over previous state-of-the-art methods. Specifically, the model demonstrates outstanding visual fidelity, particularly in the restoration of landscape paintings. This further underscores its efficacy and universality in the realm of cultural heritage preservation and restoration.


Introduction
Ancient Chinese paintings represent a precious heritage of Chinese culture, reflecting the changes of the times and carrying rich cultural connotations.However, the influence of time and natural factors often leads to damage or blurring of these ancient artworks [1].In the conservation field of ancient paintings and calligraphy, the age of the works, the fragility of the materials, and the influence of environmental factors inevitably lead to various damages during circulation, including breakage, fading, mildew, and insect infestation, with some even missing key areas.
These challenges necessitate that the restoration of paintings and calligraphy demands not only a high level of professional skills and artistic sensitivity but also a profound understanding of history, culture, traditional materials, and techniques.The restoration process encompasses both physical repair and multifaceted and complex operations such as chemical stabilization and artistic restoration reproduction [2].Likewise, the restoration of ancient wall paintings is influenced by various factors [3].Utilizing techniques such as generative adversarial networks [4] and diffusion model [5] for repairing painting images avoids causing secondary damage to the original artworks, thereby enhancing the preservation and transmission of the cultural heritage embodied in ancient Chinese paintings [6].
To enhance the understanding of contextual information within images, Pathak et al. [7] introduced an unsupervised visual feature learning algorithm based on contextual prediction for generating content in arbitrary image regions.However, standard convolution often leads to artifacts like color inconsistency and blurring in image restoration.To address this issue, Liu et al. [8] proposed an image restoration method based on partial convolution.Li et al. [9] introduced a Recurrent Feature Reasoning Network, which utilizes neighboring pixels to iteratively and recursively speculate on the restored image.However, the method lacks sufficient consideration of constraints on the central area of the missing region.Guo et al. [10] introduced MISF, which produced high-fidelity restoration results while mitigating artifacts.However, the method lacked consideration of the features and structure of the original image.To address the challenge of inpainting models struggling to capture the unique painting style and intricate brush strokes of individual artists, Xu et al. [11] proposed a Chinese landscape painting restoration method based on fine-grained style.However, this approach heavily depends on both original and imitation paintings, making the restoration results vulnerable to the influence of imitation paintings, particularly in scenarios with limited datasets.Lyu et al. [5] reconstructed Chinese landscape painting images using a diffusion probability model.They also incorporated attention and self-attention mechanisms to enhance the quality of the reconstructed images.This approach presents a novel reference method for restoring ancient Chinese painting images.
In general, existing image restoration methods have been well developed for real-world image processing, but they still suffer from the following problems when applied to ancient Chinese paintings with complex semantic information and unique forms of artistic expression: (1) The majority of existing techniques heavily depend on extensive modern image datasets for training.However, datasets specific to ancient paintings are sparse and challenging to label, thereby constraining the adaptability and accuracy of these techniques for restoring ancient paintings.(2) The uniqueness of ancient painting techniques and individualized brushstrokes poses challenges for conventional image restoration techniques, which often struggle to address intricate details.These techniques are unable to accurately simulate or reconstruct the fine textures and brushstrokes that characterize the nuanced form of artistic expression.(3) Unlike natural photographs, the color schemes, compositional layouts, and elements in ancient paintings tend to be more abstract.Traditional algorithms may struggle to express these abstractions adequately, leading to challenges in accurately reproducing the details and overall mood of the artwork.(4) Ancient paintings frequently embody profound cultural connotations and historical backgrounds.Existing technologies may struggle to comprehensively understand and capture these deep semantic associations, leading to restored works that lose the mood and connotations of the originals.
To address the aforementioned challenges in Chinese traditional painting restoration, this paper presents a framework named Sketch-Guided Restoration Generative Adversarial Network, termed SGRGAN.The model employs a dual encoder-decoder architecture and incorporates a dual discriminator to reconstruct the structure and texture of the missing region using the texture encoder-decoder and the structure encoder-decoder, respectively.
To this end, the key contributions of this paper are four-fold.
1. We propose an Ancient Landscape Painting Restoration Dataset with Special Mask, termed MaskCLP.It is accessible at https://github.com/Makbaka1/Mask-CLP. 2. We introduce sketch as multimodal structural prior information to assist network restoration, facilitating the reconstruction of fine texture and stroke characteristics in ancient paintings.3. We propose a novel Focal block, effectively fusing fine-grained local features of color matching and element abstraction with coarse-grained global features of composition layout.

We propose a new BiSCCFormer block based on a
Bi-level routing attention mechanism to comprehensively understand the internal structure and semantic information of the image while preserving its historical style and cultural significance.
Following experimental verification, our method significantly improves the accuracy and realism of image restoration for ancient Chinese paintings.The restoration results not only preserve the artistic essence of the original paintings but also finely restore their distinctive visual structures and brushstroke textures.

Traditional Chinese painting restoration
Ancient Chinese painting restoration is primarily categorized into two classifications, with calligraphy and painting restoration falling under one category.Chang et al. [12] segment the image restoration process into two layers: Structure layer and texture layer.They introduce a dual-layer digital image restoration method that markedly enhances the accuracy of the restoration process.Zeng et al. [13] initially detect damages in the painting using a damage detection method and subsequently reconstructed the image using a patch-based graphic restoration approach.Luo et al. [14] propose an ancient Chinese painting restoration method leveraging improved generative adversarial networks.Another category is mural restoration.Wang et al. [15] propose a global and local feature weighting method based on structure guidance, which considers both global and local features of the image to complete mural image restoration.Cai et al. [16] propose a Bidirectional Feature Adaptation restoration method, which incorporates a spatial attention mechanism to adaptively enhance missing and known region features for mural restoration.Furthermore, Ge et al. [17] propose a virtual restoration network for ancient murals that utilizes global-local feature extraction and structural information guidance.This approach addresses the inadequacy of most restoration methods in filling lost mural areas with rich details and complex structures.

Traditional method
Traditional image restoration methods primarily rely on signal processing and statistical modeling, achieving some effectiveness but also exhibiting significant limitations.Chang et al. [18] propose a new simple interpolation (SI) restoration strategy for large damaged areas, which represents an improvement over conventional interpolation techniques [19].However, accurately restoring complex textures and nonlinear distortions remains challenging, and repairing large missing regions incurs high computational complexity.Dimiccoli et al. [20] propose a perceptual filter for image restoration.While capable of enhancing image quality to some extent, still falls short in preserving edge details and structural consistency, particularly when dealing with images featuring smooth color transitions or high-frequency information-rich content.
Additionally, optimization methods based on partial differential equations have been widely studied.Li et al. [21] propose a combination of two variational models for image restoration.Although this method can achieve a certain restoration effect through mathematical models, its parameter selection and computational complexity are high.Moreover, its adaptability is limited when facing diverse image degradation phenomena in real-world scenarios.Especially when facing highly unstructured and randomly distributed damaged parts, it cannot achieve the ideal restoration effect.Although traditional image restoration methods have practicality in specific scenarios, overall, their performance is unsatisfactory in dealing with complex image content, diverse degradation patterns, and the balance between fidelity and naturalness.This inadequacy is also a significant reason for the widespread adoption of deep learning methods.

GAN-based restoration
Image restoration methods utilizing Generative Adversarial Networks (GAN) have shown significant advantages in the domain of cultural relic image restoration.Given that cultural relic images may be subject to various forms of damage such as wear and tear, cracks, fading, etc., GAN can utilize surrounding contextual information to generate content for the missing part that closely resembles the style of the original painting, thereby achieving the recovery of both the details and the overall structure of the cultural relic image.Nazeri et al. [22] propose the EC model, which uses an edge generator to predict the structural contours of the missing region first and then fills in the color and texture details through an image completion network.This approach represents a breakthrough in terms of visual coherence and structural rationality.Lin et al. [23] propose PDGAN method achieves the generation of multiple, high-fidelity repairs for the same damaged region by training two antagonistic neural networks that are co-optimized.Zheng et al. [24] introduce the Transformer into GAN and propose a multivariate image completion framework that can achieve high quality and diversity at faster inference speeds, termed VQGAN.
However, unstructured GAN-based image restoration techniques have certain limitations, particularly when dealing with images of culturally complex and unique relics.Specifically, when handling images of culturally complex and unique relics, solely relying on the automatic generation capability of GAN may fail to accurately reproduce historical elements or cultural features.Moreover, GAN that lack a structure-guided strategy are prone to noise during the restoration process, resulting in discrepancies between the generated content and the intrinsic logic of the original painting, such as incorrect texture orientation and distorted shapes, among others.

Structure-guided
Structure-guided image restoration methods ensure that the generated restoration results maintain visual consistency and coherence with the original image.Liu et al. [25] propose a framework based on monotonic transformation structure guidance, which preserves both the neighborhood coherence around the restored region and the global structural properties of the image.Similarly, Guo et al. [26] propose the CTSDG method, which, although not specifically designed for ancient painting restoration, integrates the technical concepts of structural constraints and texture synthesis.Their approach demonstrates the effectiveness of integrating structural and texture information to guide image inpainting tasks.
Structure-guided image restoration methods make effective use of the structural information inherent in the image itself, and the incorporation of structural priors imposes robust constraints that yield semantically coherent and natural restoration results.However, when faced with artworks like ancient Chinese paintings characterized by highly intricate structures and delicate brushstroke textures, their distinctive unstructured features, including painting techniques, color transitions, and brush-and-ink rhythms, merit examination as critical considerations.

Transformer-based restoration
Convolutional neural networks have achieved remarkable results in image restoration tasks, but they are limited in their performance in certain complex image restoration scenarios due to challenges in modeling local receptive fields and long-range dependencies.The emergence of Transformer and attention mechanisms provides new ideas for addressing the challenges in image inpainting.Song et al. [27] propose a two-stage framework for contextual image restoration, where contextual information is learned using an attention module after the rough filling of missing regions with a GAN.Liu et al. [28] propose a novel coherent semantic attention method based on a deep generative model.This method not only preserves the contextual structure but also predicts the missing parts more efficiently by modeling the semantic correlation between hole features.Li et al. [29] propose MAT, a Transformer-based large void image restoration model that skillfully combines the advantages of Transformer and convolution.In the same year, Wan et al. [30] and others similarly combine Transformer with convolution to propose a high-fidelity multivariate image restoration method.Additionally, Dong et al. [31] achieve high-quality restoration results by efficiently reasoning in the lower resolution sketch space and employing the attentionbased Transformer module to gradually restore the overall structure of the image.
The attention mechanism in Transformer can effectively learn the global and structural information of colors, strokes, and compositions in ancient paintings, enabling a deeper and more comprehensive understanding of the semantic information in the images, thus preserving their cultural connotations and styles.

Methodology
This section focuses on the training and test datasets utilized in this paper, as well as the research methodology.

Dataset details
Our primary training is conducted using the MaskCLP dataset, which comprises the sketch dataset, ancient Chinese landscape painting dataset, and Mask dataset.To assess the generalization of the model, we included the constructed mural dataset mixed with the landscape painting data to evaluate the impact of various elements on the restoration results.

MaskCLP dataset
The proposed restoration dataset of ancient landscape paintings with masks divides the total of 12,242 images into two categories: the ancient landscape paintings dataset and the Mask dataset.Among these, 5,621 images constitute the ancient landscape paintings, while 1,000 images belong to the Mask dataset.The dataset includes 1,000 and 5,621 sketches of landscape paintings in grayscale.

Ancient landscape paintings dataset
Sourced from collaborating institutions and digital art painting databases, we meticulously classify the collected ancient paintings by professional art workers, taking into account different styles, dynasties, and color characteristics to obtain rich stylistic information.We carefully screen and categorize these ancient paintings to ensure the diversity and representativeness of the dataset.The dataset consists of 5,621 images, with 5,061 images allocated to the training dataset and 560 images to the test dataset.Fig. 2 enumerates the paintings by Xin Hong, Boju Zhao, Zhen Wu, Qichang Dong, and Daqian Zhang present in the MaskCLP dataset.

Mask dataset
The irregular mask dataset referenced by Liu et al. [32] is released by the Nvidia team in 2018, comprising 55,116 training mask samples and 24,866 test mask samples.However, in the study, we did not utilize the irregular mask resources from the Nvidia team.Instead, we opted for real antique landscape painting materials provided by specific cultural institutions and museums with which we collaborated.We meticulously digitally scanned these genuine, damaged landscape paintings with precision.Subsequently, a threshold segmentation method was employed to accurately extract 50 masks from the scanned paintings.To further enhance the diversity of the training set and improve the model's gener-

Grayscale sketch landscape painting dataset
We extract the grayscale sketch from the dataset of historical landscape paintings.In addition to keeping some of the grayscale information, the sketch also preserves the distinctive stylistic elements of historical paintings.Fig. 4 enumerates the Grayscale Sketch from the MaskCLP dataset, serving as prior condi- tions to guide the restoration process of the corresponding ancient paintings.In the history of Chinese painting, numerous outstanding painters emerged across various periods, each distinguished by unique painting styles.
Xin Hong is renowned for his delicate brushwork and exquisite landscape and flower-and-bird paintings.Boju Zhao's focus lies in rigorous composition and magnificent colors, employing the 'Outline and Filling' technique to depict rocks and trees.Zhen Wu inherits the style of Yuan Dong and Ran Ju, showcasing the unique Jiangnan landscape with vigorous and vigorous ink and a moist artistic conception.Qichang Dong innovated the 'Ku Shi' painting method characterized by dynamic brushwork and an elegant layout, reflecting the refined atmosphere of ancient literati.Daqian Zhang, a modern Chinese painter, adopted a dynamic and impactful style characterized by grandeur, powerful brushwork, and vibrant colors, blending Chinese and Western painting techniques.The artistic legacies of these painters, spanning different eras, are diverse and rich, contributing invaluable treasures to the tradition of Chinese painting.

Mural
In addition, we create a Mural dataset to confirm the restoration effect of our model.The Mural dataset includes pictures of four distinct kinds of murals: Thangka, Temple, Cave, and Burial.
The dataset contains various types of mural images from different locations.The Cave dataset comprises more than 300 images from Dunhuang, Yungang, Longmen, Xinjiang Ghuzi Grottoes, and Maijishan Grottoes.The Temple dataset mainly features murals from the Shanxi region and includes more than 300 images.Additionally, the Tomb dataset contains more than 300 images, while the Thangka dataset provides more than 200 images.To verify the generalization ability of the SGRGAN network, a total of 99 murals are selected from these four types of murals for inference testing in the experiment.

Landscape painting pre-processing
We have the meticulous process of sifting and classifying a substantial number of ancient paintings was undertaken.Furthermore, a variety of data enhancement techniques are applied to improve the model's performance and robustness, encompassing resizing, cropping, rotation, and image flipping.The original painting image is adjusted to a uniform resolution of 256 × 256 and then aligned with the Mask data for training purposes.Then, to preserve the semantic integrity of the image, it is randomly cropped and rotated to enhance the model's generalization ability and augment the diversity of the training data.

Mask pre-processing
We extract the regions with jumping pixel values are extracted from the real broken paintings, resulting in 50 masks.After data enhancement and expansion, 1000 mask images are generated through random cropping and rotating and then processed to a resolution size of 256×256, matching that of the landscape painting images.

Overall structure
The overall network framework, as shown in Fig. 5.The proposed SGRGAN method utilizes a generative adversarial network, comprising a generator and a discriminator.Specifically, the generator adopts a dual-stream encoder-decoder structure, reconstructing the structure and texture of the missing region through separate texture and structure encoder-decoder pairs.This approach allows the high-dimensional information from each stream to complement one another during the texture and structure information recovery process.Subsequently, the output of the feature from the two decoders is concatenated and passed through the feature fusion module, comprising the BiSCCFormer block, Focal block, MHSA Module, and Multi-scale Module.This configuration aims to enhance the correlation between local features of the image and fully explore and integrate features at various levels, thereby improving the quality and detail accuracy of the final image generation and reconstruction.
Finally, the texture and structural feature statistics are estimated by a dual-stream discriminator to distinguish the real image from the restored image.(a) The output restoration image is simultaneously fed into the texture discriminator along with the input real image.(b) The grayscale sketch of the restored image is extracted using the sketch extraction method, and then simultaneously fed into the structural discriminator along with the input grayscale sketch.Finally, the outputs of the two branches are concatenated in the channel dimension, based on which we calculate the adversarial loss.

Generator details
Inspired by GTSDG [26], this paper introduces a novel dual-stream coupling network designed to reconstruct the texture and structure of missing regions in landscape paintings, tailored to the characteristics of traditional Chinese paintings, as depicted in Fig. 5a.The image is simultaneously processed and optimized for structural information, including object shape, contour, and layout and texture features including color, texture details, and material properties.The network comprises two encoderdecoder structures.

Texture encoder-decoder
Facilitating structurally constrained texture synthesis to ensure that the generated texture aligns with the specified structure, thereby maintaining realism and diversity.In the texture encoder, the input consists of mask images and broken landscape painting, generating feature maps at different scales through the texture encoder.The structural feature mapping is represented as E s .

Structure encoder-decoder
Texture-guided structural reconstruction enables more accurate inference of structural details in deeper features by analyzing and leveraging texture cues from shallow features or corrupted images.In the structure encoder, the input includes mask images and broken grayscale sketch images, resulting in feature maps at different scales through the structure encoder.The texture feature mapping is represented as E t .
Subsequently, the final level of feature output from the texture encoder is fed into the structure decoder for deconvolution.At each deconvolution level, the feature maps of the corresponding resolution from the structure encoder are skip connection to it.Similarly, the output of the last level of the structure encoder is input into the texture decoder, with the feature maps of the corresponding resolution from the texture information encoder skip connection to it.Through these steps, in the process of reconstructing the texture and structure information, each encoder leverages the high-dimensional information from the other as a complement.This process can be formulated as: E c is then input to the feature fusion module, which shown in Fig. 5e.The quality and detail accuracy of the final image generation and reconstruction are improved by completely exploring and combining multiple levels of information to improve the connection between local features of the image.G denotes generator and D denotes discriminator.I g denotes undamaged image.S g denotes the grayscale sketch image of the complete original image.
I i is the damaged image after Mask processing.S i is the damaged grayscale sketch image after Mask processing.M i denotes the initial binary mask.It can be formulated as: I o denotes the output image of the generator network.S o denotes the output grayscale sketch image of the generator network.It can be formulated as:

BiSCCFormer block
In this paper, we propose a new BiSCCFormer block based on a Bi-level routing attention mechanism [33], as depicted in Fig. 5c.The attention mechanism is introduced to facilitate a deeper and more comprehensive understanding of the internal structure and semantic information of the image.Additionally, it plays a crucial role in preserving the profound cultural connotations and historical styles embedded in ancient artistic works.
When globally modeling picture information, the BiS-CCFormer block can effectively decrease redundancy in the computing process of the self-attention mechanism.In the initial stage, we added the Spatial and Channel Reconstruction Convolution (SCConv) [34] module.SCConv employs a novel perspective to examine feature extraction in traditional CNN, reducing redundant features through simultaneous reconstruction of both the spatial and channel dimensions of the feature map.This approach not only decreases the model parameters but also enhances the efficacy of feature representation. (1) Specifically, the Bi-level Routing Attention module is an innovative dynamic sparse attention mechanism, depicted in Fig. 5c.The core idea is to filter out the least relevant key-value pairs at the coarse-grained region level, and subsequently compute token-to-token attention within the remaining regions.It is specifically crafted to dynamically learn and capture long-range dependencies between different image regions, aiming to achieve a more profound and comprehensive understanding of the intrinsic structure and semantic meanings within the image.
I g denotes the original undamaged image.I s denotes the result of the processing performed by these modules.This process can be formulated as: where σ denotes the Bi-level Attention module.S denotes SSConv module.ζ denotes the BiSCCFormer module.E o denotes the output feature matrix of the BiSCCFormer module.

Focal block
Ancient paintings exhibit a higher degree of abstraction in color collocation, composition layout, and elements.Traditional algorithms may struggle with such abstract expression, making it challenging to achieve accurate restoration of details and the overall artistic conception.Therefore, we propose a novel feature focusing module, termed Focal block, as depicted in Fig. 5d, which effectively combines fine-grained local features of color collocation and element abstraction with coarse-grained global features of composition layout.
Drawing inspiration from ConvNeXt [35], we incorporate Dilated Convolution into the Focal block.Additionally, we employ two skip connection structures to expand the receptive field of the feature map, accentuate the learning of important regions, and diminish the impact of background or non-key regions.This approach enhances the efficiency of attention allocation in the model, particularly when handling landscape paintings with intricate composition and layout.Furthermore, it enables the simultaneous fusion of fine-grained local features and coarse-grained global features.
This process can be formulated as: where I denotes the module input.⊕ indicates the opera- tion of the feature fusion.δ represents the depthwise separable convolution.E w indicates the output of the depthwise separable convolution.E denotes the dilated (4) convolution.E J represents the output feature of the dilated convolution.ϕ stands for the 1 × 1 convolution.E m stands for the output feature of the Focal block.

MHSA module
The Multi-Head Self-Attention (MHSA) module is a multi head self attention mechanism module from Transformer [36].Our network places the MHSA module at the end of the generator.The network employs a Bi-level routing attention mechanism in the shallow stage and a global multi-head self-attention mechanism in the high stage.This design enables the model to fully utilize the locality and sparsity of the attention matrix at the shallow level when processing low-level information and to simulate long-distance dependencies fully when handling high-level information, thereby improving the model's reconstruction ability.

Discriminator details
In this paper, we introduce a two-stream discriminator to differentiate between the real image and the restored image by estimating the feature statistics of texture and structure.

Texture discriminator
The output restoration image is simultaneously fed into the texture discriminator along with the input real image.
The texture discriminator comprises three convolutional layers with a kernel size of 4 and a stride of 2, along with two convolutional layers in the tail with a kernel size of 4 and a stride of 1.We employ the Sigmoid nonlinear activation function in the last layer and utilize the Leaky ReLU activation function with a slope of 0.2 in the remaining layers.

Structure discriminator
The output restoration map, sent to the sketch extraction network, extracts the grayscale sketch of the restoration result and calculates the loss function based on the input grayscale sketch.
The convolutional layer kernel size in the structure discriminator is 1.Subsequently, the output features of the two branches are concatenated along the channel dimension to compute the adversarial loss.
Therefore, the structural branching not only assesses the authenticity of the generated structure but also ensures its alignment with the real image.Additionally, a spectral normalization layer is introduced to effectively address the training instability problem in generative adversarial networks.

Sketch extraction
We adopt a traditional approach to sketch extraction.Consequently, we utilize the sketch extraction method within the structure encoder to extract the grayscale sketch corresponding to the restored image, a crucial step for authenticating the structure.Initially, the image is converted to grayscale and inverted to black and white.Subsequently, a blur filter is applied to the inverted image to create a smooth, detailed-blurred version.Next, all pixel coordinates of the original grayscale image are iterated to obtain the grayscale value a of the current coordinate and the pixel value b at the corresponding position in the blurred and inverted image.New blended pixel values are calculated and then converted to the original grayscale image.These blended pixel values are written back to their corresponding positions in the original grayscale image.Finally, the resultant grayscale sketch image is obtained.

Loss function
The network is trained with joint loss, which is Reconstruction loss, Perception loss, Style loss, Intermediate loss, and Adversarial loss.

Reconstruction loss
We take the l 1 distance between I o and I g as the recon- struction loss.It can be formulated as: where L r denotes the Reconstruction Loss.

Perceptual loss
Since reconstruction loss makes it difficult to capture the high-level semantics of an image, we introduce the perceptual loss L p to evaluate the global structure of the image, and measure the 1 distance from I o to I g in the feature space defined by the VGG-16 network [37] pretrained on ImageNet [38].This process can be formulated as: where i denotes the number of layers in the pooling layer, φ i (•) denotes the activation mapping of the pooling layer from VGG-16.

Style loss
We further introduce style loss to ensure style consistency.This process can be formulated as: where ψ i (•) = φ i (•) T φ i (•) , and ψ i (•) denotes the Gram matrix constructed from the activation map φ i (•).

Adversarial loss
Designed to guarantee both textural and structural consistency and visual realism in the reconstructed image, the adversarial loss can be formulated as: where L I represents the adversarial loss between the original image and the inpainted image.L S represents the adversarial loss between the grayscale sketch of the input image and the grayscale sketch of the inpainted image.

Intermediate loss
In order to enable the structure encoder and texture encoder to capture structure feature and texture feature, we introduced intermediate loss.F s denotes the structure feature map output by structure decoder.F t denotes the texture feature map output by texture decoder.It can be formulated as: where R s denotes projection functions of structure, which mapping F s to grayscale sketch image.R t denotes projection functions of texture, which mapping F t to restored landscape paintings image.

Joint loss
In summary, the joint loss is written as: where r , p , a , s , and i are the weight parameters, we set r =10, p =0.1, a =0.1, s =250 and i =1.

Supplementary dataset
Additionally, we performed a preliminary test on the public datasets Places2 and CelebA.
• CelebA [40].Released by the Chinese University of Hong Kong, the dataset comprises over 180,000 labeled face images.CelebA is widely recognized as one of the most commonly used datasets in image restoration research.

Implementation details Training setting
All image datasets and mask datasets are uniformly resized to a resolution of 256×256 pixels.The training process is conducted on an NVIDIA 4090 GPU with the batchsize of 12, utilizing the Adam optimizer for optimization.The initial training utilizes a learning rate of 2 × 10 −4 .Then, the fine-tuning process utilizes a learning rate of 5 × 10 −5 .

Training process
Before training the MaskCLP model, the first step is uniformly preprocessing all images in the MaskCLP dataset and resizing them to the standard resolution of 256×256 pixels.Once in the formal training stage, the model combines the landscape painting data from MaskCLP with masks to simulate the damaged images to be repaired.Similarly, it combines line draft data with masks to simulate the damaged line drafts to be repaired.
Throughout the training process, damaged images along with their masks are inputted into the texture encoder, while broken sketch images and their corresponding masks are inputted into the structure encoder.Subsequently, the entire generative network focuses on reconstructing these damaged images with high precision.The reconstructed images are then passed to the discriminator for evaluation.The discriminator consists of two branches: the texture branch, tasked with evaluating the similarity between the restored image and the original landscape paintings in MaskCLP, and the structure branch, which extracts and compares sketch information from the restored image with real sketch images in MaskCLP.
During the training process, the network is trained using a joint loss function until the model converges.Additionally, to assess the model's generalization ability beyond the MaskCLP dataset, we applied it to both the public restoration dataset and the Mural dataset during training.Although a subset of line art data was not explicitly provided in the public restoration dataset and the Mural dataset during training, we utilized Python library functions to automatically extract the necessary line art for training purposes.Regarding the Mask data, we consistently used the in-house produced Mask dataset that we provided.

Testing process
During testing, we designate the test set as 10% of the training set.Simultaneously, we partition the mask into three groups based on mask ratios: 0-15% , 15-30% , and 30-45% .Additionally, our testing method outputs images of size 256×512.Subsequently, we evaluate and calculate metrics for different mask ratios.

Inference process
During the inference process, the output is required to be exactly equal to the resolution of the input image, and the average test time is about 0.027 s per image.

Evaluation metrics
In the field of AI art, to quantitatively measure the quality of the migrated images generated by our work, we utilize three metrics: PSNR, SSIM [41], and LPIPS [42].These metrics are employed for comparison with existing methods.Crucially, a key question in evaluating the creativity of human artists is whether AI can transcend pure technical imitation and achieve artistic creation with independent aesthetic value.Thus, the evaluation indicators we establish not only evaluate the ability of the images generated by the model in terms of image quality, realism, diversity, etc. but also assess the artistic beauty and visual appeal of the model.This approach can be more innovative and Cultural Background.

Baselines
• SI [18]: This method constructs a local best approxi- mation using the radial basis function network algorithm and repairs the damaged image pixels through interpolation.
• EC [22]: This method proposes a two-stage model that separates the inpainting problem into structure prediction and image completion.
• RFR [9]: This method introduced a progressive resto- ration method that starts from simple to difficult.
• PD-GAN [23]: This method for image restoration built on a vanilla GAN, which generates images based on random noise.
• CTSDG [26]: This method proposed a new tex- ture and structure coupling restoration network, which divides the image restoration task into two complementary subtasks: texture and structure.
• MAT [29]: This method is the pioneering Trans- former-based model for large hole inpainting.
• MISF [10]: This method integrates traditional meth- ods with deep learning techniques in image restoration.
Table 1 presents the metrics results for each type of restoration method on the MaskCLP dataset.To better illustrate the restoration performance of the proposed methods, we use the suboptimal results of the evaluation metrics for comparison analysis.Our model achieves the optimal result of 30.59 in the PSNR metric, which measures image distortion.This is an improvement of 0.05 compared to the sub-optimal MAT, indicating that our model restores images with less distortion and higher quality.Regarding the SSIM metric, which evaluates structural similarity, our model performs equally well, with a 0.07 improvement compared to the sub-optimal MAT, resulting in an 8.13% advantage.This indicates that our model better maintains image structure and details.Additionally, our model still performs well in the LPIPS metrics, with a decrease of 6.25% to 0.105 compared to the sub-optimal MAT.The analysis of the metrics shows that our model achieves better restoration effects in terms of overall image structure, line smoothness, colour levels, and detailed textures compared to other models.

Visualization
Through qualitative comparison with existing state-ofthe-art (SOTA) methods, our approach yields satisfactory restoration results and demonstrates a comprehensive understanding of details in MaskCLP, a traditional Chinese landscape painting.The restored paintings exhibit a high level of restoration in terms of overall structure, smoothness of lines, color fidelity, and texture detail, effectively preserving and reproducing the artistic essence of the original artworks.Our method excels in addressing the limitations of existing state-of-the-art methods, effectively resolving issues such as edge blurring, artifact generation, and excessive smoothing.Additionally, it adeptly fills in missing areas while preserving the unique texture and style of the original painting.

Visual comparison of traditional method
The visualization results of simple interpolation methods [18] and SGRGAN restoration are depicted in Fig. 6.As illustrated in Fig. 6, the outcomes of repairing with the simple interpolation algorithm exhibit excessive smoothing while filling the missing regions.Additionally, the boundaries of the restored regions lack clarity, resulting in a significant loss of details, particularly evident in the high-frequency detail section, leading to severe blurring.

Visual comparison on mask ratio of 0-15%
The restoration results of two ancient Chinese landscape paintings on the missing area of 0-15% are depicted in Fig. 7.
Specifically, regarding the rock depicted in Fig. 7b, it is evident that RFR, MISF, and CTSDG all exhibit a loss of texture details and display artifacts of varying degrees.In contrast, our proposed method aligns more closely with the overall style of the painting in filling the texture of the missing area, while also preserving brushstrokes consistent with those of the original painting, resulting in a more nuanced performance.
Moreover, color serves as a crucial medium for expressing emotion, atmosphere, and aesthetic conception in artworks, enriching the works with vivid flavors and rich connotations.However, the restoration results of the green grass in Fig. 7a reveal that the EG restoration yields darker results, while the MAT restoration results appear yellowish, deviating significantly from the

Visual comparison on mask ratio of 15-30%
The restoration results of two ancient Chinese landscape paintings on the missing area of 15-30% are depicted in Fig. 8. Specifically, in Fig. 8a, b, the restoration results of EC, RFR, and CTSDG exhibit predominantly yellow hues and severe artifacts, resulting in a significant discrepancy between the restored outcomes and the original style of the landscape painting.Moreover, the texture of the restored portions appears unnatural in relation to the surrounding environment.In contrast, the results repaired by our method do not suffer from color mismatch issues with the overall painting.Although there are minor artifacts present, they are nearly imperceptible to the human eye.Overall, our method effectively restores the original unique aesthetic essence and delicate texture of Chinese landscape painting.

Visual comparison on mask ratio of 30-45%
The restoration results of two ancient Chinese landscape paintings on the missing areas of 30-45% are depicted in Fig. 9.
Specifically, when examining the river in the restored area of Fig. 9b, the RFR restoration results exhibit evident inappropriate artifacts and distorted lines, the PD-GAN restoration results display noticeable blurring and slight color distortion, and the MAT restoration results demonstrate improper texture filling and edge blurring.In

Visual comparison on mask ratio of 45%
The restoration results of an ancient Chinese landscape painting with a 45% missing area are shown in Fig. 10.Specifically, serious artifacts are evident in the mountain repaired by RFR in Fig. 10b, resulting in significant inconsistencies in the structure and layout logic of the entire painting, thus disrupting the original composition balance.MISF restoration may exhibit color incongruity and a degree of blurriness.CTSDG failed to restore the clear outline of the original mountain, resulting in distorted lines.In contrast, our restoration approach effectively preserves the form of various painting elements, and restores the mountain outline, tree structures, and stone veins of the original landscape painting, while maintaining stroke consistency with the original artwork.
Overall, based on the experimental results, the outcomes generated by our proposed algorithm exhibit greater semantic coherence, recover more detailed image structures, reconstruct higher-quality images, and yield superior subjective visual effects.SGRGAN effectively restores the structure, texture, color, saturation, and brightness information of the painting.This enhancement elevates the artistic expression of the image, imbuing it with richer emotions and artistic conception, and bringing the image closer to the artistic characteristics and stylistic characteristics of the painting.

Ablation study
This section presents an ablation comparison analysis of the modules of SGRGAN, the loss function, and the trained dataset to verify the performance of the proposed repair method and its effectiveness.

Comparison analysis
In this section, we conduct ablation experiments and comparative analyses concerning the sub-modules, loss functions, and training datasets of SGRGAN, respectively, with the aim of validating the performance and effectiveness of the proposed restoration method.

Comparison on module
To demonstrate the effectiveness of the proposed SGR-GAN framework, ablation experiments are conducted on sketch-guided, Focal block, and BiSCCFormer block, respectively.The proposed method undergoes quantitative evaluation using three primary metrics: LPIPS, PSNR, and SSIM.Irregular masking scores were    categorized into 0-15% , 15-30% , and 30-45% intervals, and the respective scores are computed.Table 2 presents the results obtained from the Mask-CLP dataset.In the case of a mask ratio ranging from 30-45% , SGRGAN, compared with SGRGAN w/o Focal, exhibits a 0.16% improvement in PSNR, which assess image quality; a 1.04% improvement in SSIM, evaluating the preservation of structural information in the image; and a reduction of 2.9% in LPIPS, measuring image similarity perception.Conversely, when compared to SGRGAN w/o BiSCCFormer, SGRGAN demonstrates a 1 % improvement in PSNR , a 1.36% improvement in SSIM, and a 7.2% reduction in LPIPS .The outcomes of the aforementioned ablation experiments comprehensively showcase the effectiveness of our proposed sketch-guided, Focal Block, and BiSSCFormer Block for enhanced image restoration.

Comparison on loss function
Additionally, to assess the effectiveness of the joint loss utilized, ablation experiments are conducted on SGR-GAN w/o L p , SGRGAN w/o L s , and SGRGAN w/o L i , respectively.The proposed method undergoes quantitative evaluation using three primary metrics: LPIPS, PSNR, and SSIM.Irregular masking scores were categorized into 0-15% , 15-30% , and 30-45% intervals, and the respective scores were computed.
Table 3 presents the results obtained from the Mask-CLP dataset.For GAN-based generative networks, adversarial loss and reconstruction loss are indispensable.Consequently, without considering adversarial loss and reconstruction loss, we conducted an ablation study on other loss functions.From the table, it can be observed that models trained with joint loss achieved the best restoration results when repairing landscape paintings with different breakage ratios.Particularly, when the breakage ratio is 30-45% , training SGRGAN w/o L p resulted in a decrease of 0.49% in PSNR, a decrease of 1 % in SSIM, and an increase of 5.19% in LPIPS.
Additionally, if SGRGAN w/o L s or SGRGAN w/o L i is trained, compared to training with joint loss, the model failed to adequately learn the structural information of the landscape painting, resulting in an insufficiently reasonable and clear structure in the restored landscape painting.Thus, it is demonstrated that using joint loss is effective for our proposed SGRGAN restoration method.

Comparison on dataset proportion
Simultaneously, to ascertain the influence of different style pictures on the model within the training set, we conduct an ablation study by injecting 5 % , 10% , and 15% of the mural data into the landscape painting training set respectively.Subsequently, we train and test the model to assess the impact of different style elements.The proposed method undergoes quantitative evaluation using three primary metrics: LPIPS, PSNR, and SSIM.Irregular masking scores were categorized into 0-15% , 15-30% , and 30-45% intervals, and the respective scores were computed.Table 4 presents the results obtained from the MaskCLP dataset.
As shown in Table 4, incorporating a small amount of mural data leads to minimal fluctuation in the index results.However, as the proportion of mural data increases, the index results exhibit slight changes.In comparison to scenarios without mixing mural data, when the mask ratio is 30-45% , the PSNR index decreases by 0.66% , the SSIM index decreases by 2.2% , and the LPIPS index decreases by 5.9%.

Visualization
Through qualitative comparisons involving various mask ratios, different methods of obtaining sketches, and the removal of different modules, we demonstrate that our method achieves satisfactory inpainting results.Additionally, we showcase a comprehensive understanding of the details of traditional Chinese landscape painting, particularly in the MaskCLP dataset.

Ablation analysis on mask ratio
To assess the impact of varying mask ratios on the method proposed in this paper, we conducted separate experiments to examine their effect on the experimental results.As depicted in Fig. 11, the mask ratios are 15% , 30% , and 45% from left to right.
Fig. 11 illustrates that the proposed method produces varying results under different mask ratios.At a 15% mask ratio, the inpainting result is nearly perfect with minimal difference from the original image.At a 30% mask ratio, the restoration is relatively ideal, and the texture and shape of the tree in the missing area can be roughly restored.For a mask ratio of 45% , the restored area is generally consistent with the overall painting's style and characteristics.However, the unique line texture of the rock and the vibrant form of the tree cannot be accurately restored.In general, our proposed method GT 15% Mask 30% Mask 45% Mask Fig. 11 Restoration results of SGRGAN with mask ratio of 15% , 30% and 45% on MaskCLP.As the simulated damaged area increases, the inpainting performance of SGRGAN gradually decreases, resulting in slight artifacts and distortions can adapt well to different mask proportions and produce satisfactory inpainting results.

Ablation analysis on sketch-guided
To assess the impact of sketch guidance on the method proposed in this paper, we conducted additional ablation experiments to evaluate sketch images extracted through various methods.The experimental findings are presented in Fig. 12. Specifically, the first row displays the grayscale sketch extracted from traditional Chinese painting, which incorporates grayscale information.The second row exhibits the edge map generated using the traditional Canny algorithm for edge detection.Fig. 12 illustrates that, compared with SGRGAN without sketch-guided, the introduction of sketch guidance enhances the preservation of structure and texture in the restored area, aligning it more closely with the surrounding context.In the absence of sketch guidance, restored areas often exhibit blurriness and inconsistency in structure and texture.Furthermore, employing grayscale sketch extracted from traditional Chinese paintings as structural priors results in improved texture effects in the restored rocks and trees, enabling the restoration of mountain contours and tree and rock veins while maintaining consistent brushstrokes with the original paintings.Conversely, utilizing the sketch generated by the Canny operator as structural guides leads to local structural discrepancies in the restored results, along with issues such as distorted lines.
The quantitative results presented in Table 2 indicate that the PSNR, SSIM, and LPIPS metrics exhibit improvements when employing sketch guidance compared to its absence.In summary, utilizing sketch images as structural a priori inputs effectively mitigates the issue of inconsistency between local and overall structural elements within the restoration area.This enhancement significantly contributes to the quality of the results, reflected in superior quantitative scores.

Ablation analysis on module
To evaluate the effectiveness of the proposed Focal block, BiSCCFormer block, and sketch-guided in our method, we conducted ablation experiments to evaluate the impact of removing each component individually.The reconstruction effects of our method on various elements of Chinese landscape paintings, we utilize masks to cover the mountain peaks and trees individually.Specifically, as depicted in Fig. 13, the absence of the BiSCCFormer block results in noticeable deformations and peculiar textures in the reconstructed peaks and trees, deviating from the consistent texture and structure of the surrounding area.Similarly, the absence of the Focal block leads to artifacts and distorted lines in the reconstructed peaks and trees.However, employing both the BiSC-CFormer block and Focal block simultaneously enables our method to reconstruct different elements of Chinese landscape paintings effectively.The quantitative results presented in Table 2 further corroborate the effectiveness of the BiSCCFormer block and Focal block.

Ablation analysis on loss function
To assess the influence of reconstruction loss, perceptual loss, style loss, and intermediate loss on our model's training effectiveness, we perform ablation experiments on the loss function.The experimental results are depicted in Fig. 14.
From the visualization results in Fig. 14, it is evident that compared with SGRGAN w/o L p , the restored image, as depicted in Fig. 14b, although more consistent with the surrounding environment at the pixel level, appears blurry and flat at the texture level, failing to reproduce the richness of detail and realistic texture expected in the original image.Similarly, compared with SGRGAN w/o L s , as shown in Fig. 14c, the restoration results exhibit a small number of distorted lines, leading to an inconsistent effect with the surrounding style.Moreover, compared with SGRGAN w/o L i , as depicted in Fig. 14d, there is a distortion of structure and texture, undermining the consistency of the structure and texture of the restored object.The visualization results highlight the necessary of the joint loss functions employed in this paper.

Ablation analysis on dataset proportion
To verify the impact of varying proportions of different style images in the training set on the model, we conducted an ablation study by mixing 5 % , 10% , and 15% of mural data into the landscape painting training set.The model was then trained and tested, with the experimental results depicted in Fig. 15.
The visualization results in Fig. 15 reveal that as the proportion of incorporated mural data increases, influenced by mural examples during training, the restoration results exhibit characteristics inconsistent with the traditional aesthetics of landscape painting.Following the mixing of 15% of the mural data, the repaired image displays visual effects that are not in harmony with the overall style and color palette of landscape painting.

Comparison on mural
Our framework is evaluated on four distinct types of Mural painting datasets, e.g.Tangka, Temple, Cave, and Burial to verify its generalizability to additional painting datasets.The visualization results are depicted in Fig. 16.
Our model demonstrates applicability to murals, exemplified by the intricate details of the clothing in Fig. 16b.Additionally, our method successfully restores the texture details of the murals, highlighting its ability to achieve outstanding visual effects.

Comparison on celebA and places2
To validate the restoration capability of our model on public datasets, we conduct experimental validation on CelebA and Places2 datasets, respectively.The experimental results are illustrated in Figs.17, 18.
Our model performs consistently well for both faces and modern landscape photographs.For instance, the details of the nose in Fig. 17a and the floor in Fig. 18b showcase the model's ability to recover texture details in both scenarios.This demonstrates that our approach yields excellent visualization results across publicly available datasets.

Conclusion and discussion
In this paper, we introduce a series of innovative methods and contributions aimed at addressing the complex and challenging task of restoring ancient Chinese landscape paintings.Initially, we establish a novel dataset specifically tailored for ancient landscape painting restoration, featuring specialized Mask annotations, termed MaskCLP.This dataset serves as a valuable resource for the research community, facilitating the advancement of deep-learning-based techniques for ancient painting restoration.We will release MaskCLP on https://github.com/Makbaka1/MaskCLP to encourage more researchers to explore this field and conduct experiments.
This study introduces sketch as innovative a priori information for multimodal structure.By effectively topic deserving in-depth investigation.Moreover, while the proposed scheme enhances restoration effects considerably, integrating human expert knowledge with AI algorithms for enhanced interaction is crucial for capturing the unique expression and personal style of the artist more accurately during the restoration process.Furthermore, future research avenues may explore cross-media fusion techniques, such as incorporating textual histories or concurrently processing multiple types of image data, to augment the model's holistic understanding and generation capabilities.
Overall, the work presented in this paper furnishes a potent tool for leveraging modern technological advancements in safeguarding and perpetuating ancient Chinese cultural heritage, while also opening up new avenues for future explorations and applications of deep learning in the realm of cultural and artistic heritage preservation and revitalization.

Limitation
This paper presents a novel restoration method named SGRGAN, which aims to restore ancient Chinese landscape paintings using sketch as guides.Experimental results demonstrate that, compared to other methods discussed in this paper, SGRGAN effectively considers the structural details and texture characteristics of landscape paintings during the restoration process, resulting in high-quality restoration of damaged paintings.However, the limitations of this study include the following aspects: Firstly, due to the reality, the number of surviving intact ancient Chinese landscape paintings is limited and unevenly distributed.The dataset we obtained from the Internet and different institutional platforms contains works from multiple historical periods, different painters, and a variety of artistic styles, but the amount of data broken down into subcategories of dynasties, individual painters, and specific styles is insufficient.Labeling the style, author, and dynastic information of each painting requires a professional operation, which is a huge amount of work and requires more time and expense.Therefore, we relax the scope of the restoration task in this paper by restoring landscape paintings based only on the overall screened and merged dataset for general landscape painting restoration tasks.
Secondly, the sketch image assistance landscape painting restoration strategy has yielded significant advancements.In comparison with other inpainting methods, this strategy not only achieves high-fidelity restoration but also renders restoration defects less perceptible to the human eye.However, relying solely on sketch images as guidance elements still presents some shortcomings.Therefore, in subsequent research, we will explore the integration of more specific historical characteristics of landscape painting, the unique style of the painter, detailed explanations of the painting's connotations, and other background information to enhance the restoration process.
We have already started the refinement of the dataset, and in the subsequent work, we will expand the refined data and classify the dataset more carefully according to the dynasty, the identity of the painter, and the painting style.In this way, we will advance the research on accurate restoration techniques for landscape paintings in different segments and explore how to effectively utilize multimodal techniques to combine the contextual information of landscape paintings for effective restoration.This series of efforts will help to deepen the protection and inheritance practices of traditional Chinese cultural heritage.participated in several national research projects.His research interests include neural language processing, multimodal data mining, and applications.Relevant results have been published in finding of ACL, Knowledge-Based System, ICANN, MIDL.
Xianlin Peng Xianlin Peng is currently an Associate Professor at the School of Art, Northwest University, Xi'an, China.His research interests include the digital transmission of traditional Chinese landscape paintings and the digital restoration of ancient murals, employing cutting-edge artificial intelligence technologies such as deep learning to provide intelligent solutions for the preservation, inheritance, and innovative development of ancient painting works.

Fig. 1
depicts three traditional Chinese paintings from various styles and historical periods: Yuan Dong from the Five Dynasties and Ten Kingdoms, Boju Zhao from the Southern Song Dynasty, and Hong Zhang from the Ming Dynasty.Throughout the history of Chinese painting, diverse painters have employed distinct techniques, with each artwork possessing unique artistic characteristics.Yuan Dong of the Five Dynasties and Ten Kingdoms Dynasty specialized in the 'Pi Ma Cun' and 'Dian Zi Cun' techniques.In the Southern Song Dynasty, Boju Zhao employed the meticulous 'Outline and Filling' technique to portray rocks and trees.Ming Dynasty painter Hong Zhang focused on realism, employing dynamic and variable ink and brush techniques characterized by 'Cun, Ca, Dian, Ran' means 'hooking, texturing, dotting, and dyeing.'

Fig. 1
Fig. 1 Examples of the of the painter on MaskCLP.From left to right, (a) is a painting of 'River Dike Evening View' by Yuan Dong from the Five Dynasties and Ten States period.(b) is a painting of 'Flying Immortals Painting' by Boju Zhao from the Southern Song Dynasty.and (c) is a painting of 'Painting Album of Landscape' by Hong Zhang from the Ming Dynasty

Fig. 2
Fig. 2 Examples of the painting on MaskCLP.From left to right, (a) is a painting of 'Scroll of Exquisite Colors Along Bamboo Path' by Xin Hong from the Qing Dynasty, (b) is a painting of 'Flying Immortals Painting' by Boju Zhao from the Southern Song Dynasty, (c) is a painting of 'Painting of Rain Amidst Brook and Mountain' by Zhen Wu from the Yuan Dynasty, (d) is a painting of 'Mountain Landscape Painting' by Qichang Dong from the Ming Dynasty, and (e) is a painting of 'Painting Album of Landscapes' by Daqian Zhang from the modern era

Fig. 3 Fig. 4
Fig. 3 Examples of real Mask on MaskCLP.(a-e) are damaged masks extracted from authentic damaged Chinese paintings.These authentic masks are used on ancient Chinese paintings to simulate the damaged areas of the Fig. 2 are displayed

Fig. 5
Fig. 5 The overall Structure of SGRGAN.(a) Generator Details: Achieving simultaneous processing and optimization of image structure information and texture features through a coupled encoder-decoder.(b) Discriminator Details: Utilizing a dual-stream discriminator to estimate texture and structure feature statistics to distinguish between restored and real images.(c) BiSCCFormer Block: A novel module based on a Bi-level routing attention mechanism, facilitating a deeper and more comprehensive understanding of the intrinsic structure and semantic information within images.(d) Focal Block: Utilizing two skip connection structures to expand the receptive field of feature maps, emphasizing feature learning in important areas while combining coarse-grained global data and fine-grained local features.(e) Feature Fusion Block: Deep fusion of extracted texture features and structural features

Fig. 6
Fig. 6 Restoration result of traditional method on MaskCLP.SI denotes the simple interpolation method.Both the simple interpolation method and the SGRGAN method are used to simulate landscape image restoration with a mask ratio of 15% .The figure illustrates the repair effects achieved by each respective method.(a) Input images, (b) Restored result of the Simple interpolation method, (c) Restored result of the SGRGAN, (d) Ground Truth

Fig. 7
Fig. 7 Restoration results with mask ratio of 0-15% on MaskCLP.Compared to other inpainting methods, SGRGAN restores the structure of trees and rivers more reasonably, enhances texture clarity, and produces restored results that closely resemble the original image.(a) and 9b) represent ancient Chinese landscape paintings images with mask ratio of 0-15%

Fig. 8
Fig. 8 Restoration results with mask ratio of 15-30% on MaskCLP.Compared with other inpainting methods, SGRGAN produces restored regions with fewer artifacts and results that are closer to the original image.(a) and (b) represent ancient Chinese landscape paintings with mask ratio of 15-30%

Fig. 9
Fig.9 Restoration results with mask ratio of 30-45% on MaskCLP.When the damaged area is large, the inpainting results of other methods often exhibit varying degrees of distortion and artifacts, whereas SGRGAN consistently generates high-quality inpainting results.(a) and (b) represent ancient Chinese landscape paintings with mask ratio of 30-45%

Fig. 10
Fig. 10 Restoration results with mask ratio of 45% on MaskCLP.When inpainting regions with complex textures, RFR, MISF, CTSDG, and other methods exhibit significant artifacts, whereas SGRGAN produces inpainting results free of noticeable artifacts.(a) and (b) represent ancient Chinese landscape paintings with mask ratio of 45% Based on the comprehensive experimental results, an imbalanced distribution of various sample data in the training set can lead to confusion conceptual space of the model during training.Consequently, the model struggles to learn the features of different data distributions throughout the training process, resulting in varying degrees of decline in experimental indices.

Fig. 12 Fig. 13 Fig. 14
Fig. 12 Restoration results of the ablation experiment on sketch-guided.(a) Input image, (b) Restored sketch: the grayscale sketch predicted by our method and the yellow region is the restored sketch contours, (c) SGRGAN w/o sketch-guided, (d) SGRGAN, (e) Ground Truth

Fig. 15 Fig. 16
Fig. 15 Restoration results of varying proportions mural images.Ablation evaluates the impact of incorporating different style pictures into the dataset on SGRGAN training for restoration

Fig. 17 Fig. 18
Fig. 17 Visual comparison on CelebA.The SGRGAN method further demonstrates its applicability to modern face damage image inpainting tasks, showcasing remarkable effectiveness in achieving favorable restoration outcomes.(a) shows the input image with 0-15% face damage, the restored result of the SGRGAN, and the original real image.(b) and (c) only input images have different damage degrees: (b) has 30-45% damage, while (c) has 15-30%

Table 1
Comparison results on MaskCLP ↑ Higher values are better, ↓ Lower values are better.*Optimal results are displayed in bold, while suboptimal results are underlined

Table 2
Ablation experiments on MaskCLP We compareSGRGANwith SGRGAN w/o sketch-guided, SGRGAN w/o Focal, and SGRGAN w/o BiSCCFormer.↑ Higher values are better, ↓ Lower values are better.*Optimal results are displayed in bold, while suboptimal results are underlined

Table 3
Ablation experiments on Loss FunctionWe conduct ablation studies for reconstruction loss, perceptual loss, style loss, and intermediate loss, respectively.↑ Higher values are better, ↓ Lower values are better.*Optimal results are displayed in bold, while suboptimal results are underlined

Table 4
Ablation experiments on different datasetWe conducted ablation studies on the proportion of training datasets.↑ Higher values are better, ↓ Lower values are better.*Optimal results are displayed in bold, while suboptimal results are underlined Jinye Peng received his MS degree in computer science from Northwest University, Xi'an, China, in 1996 and his PhD degree from Northwest Polytechnical University, Xi'an, China, in 2002.He joined Northwest Polytechnical University as Full Professor at 2006.His research interests include image retrieval, face recognition, and machine learning.Jianping Fan Jianping Fan is a professor of Northwest University of China.He received his MS degree in theory physics from Northwest University, Xi'an, China in 1994 and his PhD degree in optical storage and computer science from Shanghai Institute of Optics and Fine Mechanics, Chinese Academy of Sciences, Shanghai, China, in 1997.He was a Researcher at Fudan University, Shanghai, China, during 1997-1998.From 1998 to 1999, he was a Researcher with Japan Society of Promotion of Science (JSPS), Osaka University, Japan.From 1999 to 2001, he was a Postdoc Researcher in the Department of Computer Science, Purdue University, West Lafayette, IN.His research interests include image/video privacy protection, automatic image/ video understanding, and large-scale deep learning.