Incomplete handwritten Dongba character image recognition by multiscale feature restoration

Bi, Xiaojun; Luo, Yanlong

doi:10.1186/s40494-024-01329-8

Research
Open access
Published: 27 June 2024

Incomplete handwritten Dongba character image recognition by multiscale feature restoration

Xiaojun Bi^1,2 &
Yanlong Luo³

Heritage Science volume 12, Article number: 218 (2024) Cite this article

36 Accesses
Metrics details

Abstract

Incomplete handwritten Dongba character often appears in heritage documents and its recognition is significant for heritage and philology. However, all previous methods always suppose that a complete Dongba character is used as input, and thus fail to achieve satisfactory performance when applied to incomplete Dongba character recognition. In this paper, an end-to-end network (DB2RNet) is proposed for incomplete handwritten Dongba character image recognition by multiscale feature restoration. Specifically, we first develop datasets that contain different levels of incomplete Dongba characters. A restoration module is proposed to restore the input incomplete Dongba character, and then a recognition module is employed to recognize Dongba character. By introducing an inter-module residual connection between the restoration module and recognition module, the DB2RNet can strengthen feature information transmission and boost the recognition performance. In addition, novel multiscale feature blocks are introduced, which can provide more effective texture and contextual information transmission for Dongba character image restoration, and thus yielding better restoration effects and better recognition results. Extensive experiments are conducted on Dongba character, Chinese character and Oracle character datasets and validate the effectiveness, superiority and robustness of our methods. Experiments results demonstrate that our proposed DB2RNet achieves competitive Dongba character restoration and recognition performance and outperforms the current state-of-the-art methods.

Introduction

Dongba character is a kind of hieroglyphic character created by the ancestors of the Naxi national minority in China. It is recognized as the only hieroglyphic character still in use in the world today, and it is a cultural treasure of great academic value [1]. Dongba character has been listed as “Memory of the world” by the United Nations Educational Scientific and Cultural Organization [2]. However, due to the poor preservation conditions of the Dongba ancient heritage documents, many documents were damaged and lots of Dongba characters were broken. Figure 1 shows ten incomplete Dongba characters from an ancient heritage document. All previous Dongba character recognition methods always suppose that a complete Dongba character is used and incomplete Dongba character recognition is more difficult and no relevant research has been proposed yet. Therefore, in this paper, we will focus on studying incomplete Dongba character recognition by multiscale feature restoration.

Lots of complete handwritten Dongba character recognition (HDCR) algorithms have been introduced for enhancing the recognition accuracy. Traditional methods typically encompass three distinct stages: pretreatment, characteristic extraction and recognition. Binary processing, shape normalization and wavelet transform [3] are generally employed pretreatment methods [4]. A combination of topological features processing and projection method shows a good performance on HDCR [5]. The statistical features such as coarse grid [6] are more discriminable than structural features that mainly analyze parts of Dongba characters. The most usually adopted methods of classifiers include support vector machines (SVM) [7,8,9], random forests [10,11,12] and k-nearest neighbor [13, 14].

With the explosion of deep learning, a diversity of deep learning based methods have been designed to solve heritage recognition problems [15, 16] as well as HDCR problems [17, 18]. Multi-scale features fusion method of three separate branch networks has been proposed for HDCR [17]. Improved ResNet [19, 20] are proposed to handle HDCR problems. Luo et al. [21] demonstrate that features of the shallower layer include both more detailed and spatial information, and the information is very helpful for similar Dongba character recognition.

All the mentioned above studies concentrate on complete handwritten Dongba character recognition and complete HDCR have achieved much progress in recent years. However, these recent algorithms for Dongba character recognition are specialised for perfect/complete characters that are typically created by imitating characters in authoritative dictionaries [22, 23]. On the contrary, most real Dongba characters in ancient heritage documents usually exhibit incompletion. As far as we know, no recognition methods have been applied on incomplete handwritten Dongba character. It can be easily summarized from Fig. 1 that these incomplete characters lack crucial information. As a result, the existing excellent image recognition methods do not perform well. Therefore, it’s not hard for us to think about that it would be easier to recognize them if incomplete Dongba characters have been restored.

Although complete Dongba character recognition models based on neural networks have performed progress in the past years, no research paid attention to incomplete Dongba character recognition, let alone incomplete Dongba character restoration. Consequently, we propose to recognize handwritten Dongba character with incompletion through an attached restoration procedure by an end-to-end network. Nevertheless, because no public incomplete Dongba character image dataset is obtainable presently, incomplete handwritten Dongba character restoration and recognition seem much tricky. Therefore, this paper begins with constructing novel incomplete handwritten Dongba character image datasets via complete handwritten Dongba character dataset [21] and most widely used random mask dataset which contains different mask ratio images [24].

It is not hard to come up with the idea of first obtaining the completed Dongba character through a restoration model, and then recognizing Dongba character through a recognition model using the restored image. Although no incomplete Dongba character recognition and restoration methods have been proposed. Wan et al. [25], Liu et al. [26] and Chen et al. [27] have make progress in end-to-end network to solve sketch restoration-to-recognition problem and vehicle logo image restoration-to-recognition problem, which provide us with ideas to solve the incomplete Dongba character restoration-to-recognition problems.

SketchGAN [26] uses a cascade Encode-Decoder network to complete the input sketch in an iterative manner, and employ an auxiliary sketch recognition task to recognize the completed sketch. Joint-Caps-Net [27] proposes a joint framework that simultaneously performs both vehicle logo restoration and recognition within a shared deep neural network architecture. ADFRNet [25] restores the imperfect sketch by the proposed attention-based feedback restoration loop, and then sent to a classifier. Although [25, 26] and [27] have made progress to solve image restoration and recognition problem. Two significant issues that have great impacts on end-to-end image restoration-to-recognition are still ignored by them.

On the one hand, they only fed the restoration images into the recognition model, leading to inadequate transmission of feature information and poor performance of the recognition model. It has been widely demonstrated that residual connection can strengthen feature propagation [28,29,30]. To solve this problem, we propose a novel end-to-end incomplete Dongba character image recognition method by multiscale feature restoration (DB2RNet), which can boost the performance of the recognition model. As shown in Fig. 2, DB2RNet not only fed the restoration image into the recognition model but also with the initial incomplete Dongba character. The residual connection between the restoration module and recognition module (inter-module residual connection) can strengthen feature information transmission and boost the recognition performance.

On the other hand, incomplete handwritten Dongba characters are short of texture and contextual information, and are commonly known to be ambiguous to be restored. What’s more, it is demonstrated that the Dongba character have more detailed information [21]. Therefore, it is necessary to extract more detailed and global information to achieve a restoration of texture and contextual information. It is experimentally validated that the features of the deeper layer contain more global and semantic information, while the features of the shallower layer contain both more spatial and detailed information [21, 31]. Therefore, a novel multiscale feature block (MFB) is proposed in the restoration module, which can make use of the different size output feature maps of different encoders, and fuses these feature maps to get the Dongba character features which contain multiscale information. MFB provides rich fusion information for the following decoder layers, and can provide more effective texture and contextual information transmission for Dongba character image restoration, yielding better restoration effects and better recognition results.

To estimate the superiority of the proposed DB2RNet at recognizing and restoring incomplete Dongba characters, extensive experimental studies have been carried out. Ablation studies and comparisons with other state-of-the-art methods are carried out on the constructed datasets, and the sufficient qualitative and quantitative results demonstrate the superiority of the proposed method. In summary, several novel contributions are presented as follows:

1)
To the best of our knowledge, we are the first to solve the problem of incomplete Dongba character recognition by restoration.
2)
A novel end-to-end network architecture named DB2RNet is proposed, which handles both incomplete Dongba character image recognition and restoration. And the restoration task benefits the recognition task.
3)
Inter-module residual connection is proposed to boost recognition performance. And multiscale feature restoration module is proposed to extract more texture and contextual information for restoration.
4)
Experiments are conducted on four different ratios incomplete Dongba character datasets, CASIA-HWDB1.1 and OBC306 datasets. Experimental results demonstrate the superiority and robustness of our DB2RNet both in incomplete Dongba character recognition and restoration.

Methods

In this study, we mainly focus on two significant problems. On the one hand, for the reason that previous end-to-end image restoration-to-recognition methods only fed the restoration image into the recognition module, overlooking the importance of the initial incomplete image. And thus leads to poor performance of the recognition module. Therefore, we propose a inter-module residual connection method. By concatenating the initial incomplete Dongba character and its restoration version, better recognition results will be accomplished. On the other hand, incomplete handwritten Dongba characters are short of texture and contextual information, and are commonly known to be ambiguous to be restored. Therefore, a novel multiscale feature block is adopted in the restoration module, which can make use of the different size output feature maps of different encoders, and can fuse these feature maps to get the Dongba character features which contain multiscale information.

In general, an incomplete handwritten Dongba character image recognition by multiscale feature restoration method (DB2RNet) is proposed, which can not only solve the problems of the previous end-to-end image restoration-to-recognition methods, but also solve the short of texture and contextual information problem.

The architecture of DB2RNet

The proposed DB2RNet is mainly composed of two components, i.e., restoration module and recognition module, as shown in Fig. 2. The DB2RNet adopts an incomplete Dongba character as input, and restores a corresponding complete Dongba character and outputs its recognition result. To be specific, the restoration module is used to restore complete Dongba character based on the original incomplete input character. The recognition module is designed to recognize the original incomplete input character by concatenating the original input and the output of the restoration module. All detailed structures of the restoration module and recognition module will be given in the following sections.

Multiscale feature restoration module

The restoration module plays a significant role in our DB2RNet. Better restoration quality results in less difficulties in recognizing Dongba character. The architecture of the restoration module is shown in Fig. 3.

Due to the lack of texture and contextual information in incomplete handwritten Dongba characters, their restoration is often ambiguous. Consequently, extracting more detailed and global information is crucial for restoring both texture and contextual details. Experimental evidence suggests that deeper layer features encompass more global and semantic information, while shallower layer features contain both spatial and finer details [21, 31].

Therefore, a Multiscale Feature Block (MFB) (illustrated in Fig. 4) is introduced. This block leverages the varying sizes of output feature maps from different encoders and merges these three feature maps to extract Dongba character features that encompass multiscale information. By providing decoders with rich texture and contextual information, MFB enables the transmission of more effective restoration features, ultimately leading to improved restoration effects and recognition outcomes. The various output feature maps from the three encoders can be computed as follows:

$$\begin{aligned} \begin{aligned} f_1^{en}&= F_1^{en}(x) \\ f_2^{en}&= F_2^{en}(f_1^{en}) \\ f_3^{en}&= F_3^{en}(f_2^{en}) \end{aligned} \end{aligned}$$

(1)

where $\textit{x}$ is the input incomplete Dongba character and $f_i^{en}$ ($\textit{i} \in {\{1, 2, 3\}}$) refers to the output of $\textrm{Encoder}_\textit{i}$ ($\textit{i} \in {\{1, 2, 3\}}$), and $F_i^{en}(x)$ ($\textit{i} \in {\{1, 2, 3\}}$) is the function of $\textrm{Encoder}_\textit{i}$.

Specifically, the MFB proposed in this paper integrates features of varying sizes from three encoder layers to capture rich multiscale information for Dongba characters. Larger feature maps typically encode finer details such as corners and edges, while smaller feature maps capture more global semantic information like overall shape. Each $\textrm{MFB}_\textit{i}$ ($\textit{i} \in {\{1, 2, 3\}}$) combines the output features from these three encoders, resulting in multiscale fusion features that encompass both detailed and global information.

Due to the varying feature sizes outputted by the encoder layers, we have designed three distinct $\textrm{MFB}_\textit{i}$ ($\textit{i} \in {\{1, 2, 3\}}$) to ensure that the fused features align with the feature size of the corresponding decoder layer. Each $\textrm{MFB}_\textit{i}$ receives three differently-sized feature maps from the encoders as inputs and produces fused features. This process can be expressed as follows:

$$\begin{aligned} f_i^{mfb} = F_i^{MFB}(f_1^{en}, f_2^{en}, f_3^{en}) \end{aligned}$$

(2)

where $f_i^{mfb}$ is the output of $\textrm{MFB}_\textit{i}$ ($\textit{i} \in {\{3, 2, 1\}}$), and $f_1^{en}$, $f_2^{en}$, $f_3^{en}$ are computed by Eq. (1).

Here, we delve into the detailed calculation process of $\mathrm {MFB_3}$ as an illustrative example, as depicted in the left structure of Fig. 4. Initially, we adjust the sizes of all features $f_1^{en}$, $f_2^{en}$ and $f_3^{en}$ to match the target size of $f_3^{en}$. Specifically, $f_1^{en}$ undergoes two convolutional layers with a kernel size of 3 and a stride of 2, resulting in a size reduction to 1/4. We apply instance normalization and the ReLU activation function after the last convolutional layer. Similarly, $f_2^{en}$ passes through a single convolutional layer with a kernel size of 3 and a stride of 2, halving its size. On the other hand, $f_3^{en}$ remains unchanged after passing through a convolutional layer with a kernel size of 3 and a stride of 1. Next, we concatenate the transformed features. Finally, we obtain the output $f_3^{mfb}$ by applying a convolutional layer with a kernel size of 3 and a stride of 1 to the concatenated features. As with the previous layers, we employ instance normalization and the ReLU activation function. The calculation processes for $\mathrm {MFB_2}$ and $\mathrm {MFB_1}$ are analogous to that of $\mathrm {MFB_3}$ but differ in the specific operations used for feature size transformation. For instance, in some cases, we might employ a combination of a convolutional layer with a kernel size of 3 and a stride of 1 followed by an upsampling operation. By following this process, we ensure that the fused features not only align with the corresponding decoder layer’s feature size but also encapsulate multiscale information crucial for effective Dongba character restoration.

The multiscale feature block (MFB) proposed in this paper is designed to fuse the output features of the three encoders, enabling the extraction of both detailed and global information. This fusion process aims to restore texture and contextual information effectively, crucial for Dongba character image restoration and recognition. By leveraging multiscale features, the MFB ensures a more comprehensive understanding of the input image, thus providing more effective information transmission during the restoration process. This enhanced information flow is pivotal in facilitating accurate recognition of Dongba characters.

The detailed structure and hyperparameters of the proposed restoration module are shown in Table 1. We use Input s &c and Output s &c to represent the input and output size and channel of the feature map of current layer. Ref p is padding number before current Conv and Up s indicates up-sample. K, S, and P denote the kernel size, stride, and padding of operators, respectively. Norm indicates whether Instance normalization layer is used after each convolution layer. Activation indicates the nonlinear function after the layer. At the DeConv1 layer, we use Tanh activation to map each pix of the output feature map to [0, 1].

Table 1 Detailed structure and hyperparameters of the proposed restoration module

Full size table

Inter-module residual connection and recognition module

Although previous methods [25,26,27] have made progress to solve image restoration and recognition problem. They have overlooked the importance of the original incomplete image in the recognition module, and they only fed the restoration image into the recognition model, leading to inadequate transmission of feature information and poor performance of the recognition model. It has been widely demonstrated that residual connection can strengthen feature propagation [28,29,30]. Therefore, to solve this problem, we propose a novel inter-module residual connection, which can assist the recognition module in achieving better performance.

Specifically, the input incomplete Dongba character is firstly fed into the restoration module, and a restoration version of the input character is obtained, which can be computed as:

$$\begin{aligned} \begin{aligned} {R}_{res}&= F_{res}(x) \end{aligned} \end{aligned}$$

(3)

where $\textit{x}$ is the input incomplete Dongba character, and ${R}_{res}$ refers to the restored character from the restoration module $F_{res}(\cdot )$. Then, the original incomplete character $\textit{x}$ needs to be firstly concatenated with the restored character ${R}_{res}$ in the channel dimension, and then the concatenated feature map is fed to the recognition module, which can be computed as:

$$\begin{aligned} y = F_{rec}(\textrm{Concat}({x}, {{R}}_{{res}})) \end{aligned}$$

(4)

where Concat($\cdot$) stands for the operation of concatenating along the channel dimension, and $\textit{y}$ refers to the predicted label of the input incomplete Dongba character $\textit{x}$ from the recognition module $F_{rec}(\cdot )$.

Detailed structure and hyperparameters of the recognition module is shown in Table 2. The recognition module structure is inspired by the basic network of MAAN [21] and it should be noted that any recognition models can be employed in our DB2RNet. The recognition model with better performance will also perform better in our DB2RNet. It should also be noted that the In c = 2 in the Conv1 of Table 2 represents the concatenated feature map of the original incomplete character and the final output of the restoration module and this strategy is implemented by the inter-module residual connection.

Table 2 Detailed structure and hyperparameters of the the recognition module. Meanings of the abbreviations are the same as Table 1

Full size table

Loss functions

Loss functions are significant factor in training a end-to-end model which can simultaneously handle incomplete Dongba character image recognition and restoration tasks. We utilize the following loss functions for training the proposed end-to-end DB2RNet.

MAE loss: With the help of restoration module, we restore a complete version of the Dongba character $\textit{R}_{res}$. To enforce the restoration module generating the expected complete Dongba character, we utilize the Mean Absolute Error (MAE) of the difference between the ground truth character and the generated character as loss function, which can be calculate by:

$$\begin{aligned} {\mathcal {L}}_{MAE}=\Vert {R}_{res}-{R}\Vert _{1} \end{aligned}$$

(5)

where ${R}_{res}$ is the restored character from the restoration module. R represents the ground truth Dongba character.

SSIM loss: However, MAE loss is based on pixel-by-pixel comparisons, without considering human visual perception, let alone human aesthetics. Therefore, the quality of restoration character needs to be further improved. Therefore, we also adopt SSIM loss to ensure the realistic property, which can be calculate by:

$$\begin{aligned} {\mathcal {L}}_{SSIM} = 1 - SSIM({R}_{res},{R}) \end{aligned}$$

(6)

SSIM is an increasing function with a range of [0, 1], therefore, 1-SSIM can be used as a loss function.

Recognition loss: The input into the recognition module is restored Dongba character with its corresponding incomplete version. To be more specific, the restored and incomplete Dongba character are firstly concatenated in the channel dimension, and then the concatenated feature maps are fed to the recognition module. Just like the original incomplete Dongba character input plays a significant role in the restoration module, it also has substantial impact on the recognition module. The recognition loss is Softmax loss:

$$\begin{aligned} {\mathcal {L}}_{REC}= -\sum _{i=1}^{K} y_i \log (p_i) \end{aligned}$$

(7)

where K denotes total categories of dataset, $\textit{y}_i$ stands for the label, and $\textit{p}_i$ is the predicted probability.

Total loss: The above defined three loss functions play diverse duties for the whole DB2RNet. Specifically, the restoration loss and the adversarial loss aim to enhance the quality of the restored Dongba character, and the recognition loss guarantee learning significant features for recognition. We employ the three functions in a weighted fashion to acquire the final loss function, which can be computed as:

$$\begin{aligned} {\mathcal {L}}_{TOTAL}= \alpha {\mathcal {L}}_{MAE} + \beta {\mathcal {L}}_{SSIM} + \gamma {\mathcal {L}}_{REC} \end{aligned}$$

(8)

where $\alpha$, $\beta$, $\gamma$ are hyperparameters for weighting different loss functions. We empirically set $\alpha$ = 0.5, $\beta$ = 0.35, $\gamma$ = 0.15, which give good performance.

Datasets and evaluation metrics

Datasets: All the previous methods focus on complete/perfect handwritten Dongba character recognition and no Dongba character restoration methods have been proposed [5, 17, 18, 21]. However, lots of incomplete Dongba character exist in Dongba ancient heritage documents. Therefore, we construct incomplete Dongba character dataset to enable research on incomplete Dongba character recognition and restoration.

We construct incomplete Dongba character dataset based on the complete Dongba character database [21], which consists of 445,273 images with 1,404 classes. Dongba character instances are shown in the first row of Fig. 5. Mask images of different ratios are firstly proposed [24] and most image inpainting methods adopt these mask images. To make our method more credible and convincing, we use 4 categories of masks with different hole-to-image area ratios of the irregular mask [24]. Four different mask ratio images: (0.01, 0.1], (0.1, 0.2], (0.2, 0.3] and (0.3, 0.4] are shown in the second, fourth, sixth and eighth rows of Fig. 5, respectively. The third, fifth, seventh and ninth rows represent four different ratios of the incomplete Dongba characters. The incomplete Dongba character are generated by element-wise addition of the ground truth images and mask images. That is, the white area of the mask image obscures the corresponding ground truth image area and the black area of the mask image do not change the corresponding ground truth image area. It can be easily seen that the larger level incomplete characters loss more information, and they are more difficult to restore and recognize. For the reason that all Dongba characters in Dongba character database and all masks in mask database are different from each other, the generated incomplete Dongba characters must be different. Therefore, all the experiment results are convincing.

Evaluation metrics: We utilize recognition accuracy as the measurement to evaluate the recognition performance. What’s more, to evaluate the restoration performance, the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) index are employed to test each complete/perfect Dongba character with its corresponding incomplete version and restored version. If the PSNR, SSIM and accuracy of restored Dongba character are higher than that of the original incomplete Dongba character and the direct recognition accuracy, it means that the proposed DB2RNet both improves the image quality and also benefits recognition of the incomplete Dongba character.

On the one hand, PSNR is an error-sensitive image quality evaluation index, which reflects the pixel-level differences between the images before and after restoration, which can be computed as:

$$\begin{aligned} PSNR = 10 \cdot \text {lg} \left(\frac{MAX_I^2}{MSE}\right) \end{aligned}$$

(9)

where MSE stands for mean squared error, which represents the average squared difference between two images. $MAX_I$ indicates the maximum numerical value of the image colors.

SSIM, on the other hand, is used to measure the similarity in brightness, contrast, and structure between two images, which can be computed as:

$$\begin{aligned} SSIM = \frac{(2\mu _x \mu _y + c_1)(\sigma _{xy}+c_2)}{(\mu _x^2 + \mu _y^2 + c_1)(\sigma _x^2 + \sigma _y^2 + c_2)} \end{aligned}$$

(10)

where $\mu _x$ and $\mu _y$ represent the mean values of images x and y, respectively. $\sigma _x$ and $\sigma _y$ stand for the standard deviations of images x and y, respectively. $\sigma _{xy}$ is the covariance between images x and y. $c_1$ and $c_2$ are default constants used to avoid computational errors when the denominator in the formula approaches zero.

Experimental setup

The proposed DB2RNet model is carried out by employing the PyTorch framework. We set the batch size as 128, and set the total epoch as 40. All models are trained by the optimizer Adam with from scratch. The decay of momentum of gradient is set as 0.5 and 0.9. The initial learning rate is set as 0.001 in the first 20 epochs, then the learning rate of each epoch becomes 0.9 times than that of the previous epoch.

We divide the training set and the testing set in a ratio of eight to two. After we train the proposed DB2RNet on the training set, the recognition accuracy of 1404 classes of incomplete Dongba characters will be tested on the testing set, Meanwhile, the PSNR and SSIM are compared to evaluate the restoration module.

Results and discussion

Ablation studies

In this section, the ablation experiments are conducted on our four different levels incomplete character datasets to validate the effectiveness of all components of DB2RNet. The performance of four baseline networks are compared by removing one of the components.

(1) Direct recognition. This network is a simple recognition module without restoration module. By this mean, the imperfect Dongba character is directly fed into the recognition module for prediction. For the reason that direct recognition means that we only perform recognition task with our recognition module, and no restoration module is used. Therefore, no PSNR and SSIM values are shown in the corresponding cells. The empty cells represent that this network will not generate corresponding values.

(2) No inter-module residual connection. This network only feeds the restoration character of the restoration module into the recognition module. In this way, the original incomplete character is not concatenated with the restoration character. This network also means that the “Inter-module residual connection” in Fig. 2 is not used.

Table 3 The performance of our proposed topology with different baseline networks on the four different mask level testing datasets

Full size table

(3) No multiscale feature block. This network only feeds the decoder with the previous convolutional layer output. In this way, no multiscale features can be obtained, and the information transmission from the encoder to the decoder will be lost.

(4) The whole DB2RNet (ours method). This network is our proposed whole DB2RNet.

Results of ablation studies for DB2RNet and other baseline networks are given in Table 3. Accuracy, PSNR and SSIM are used as measurements. These results effectively illustrate the excellence performance of our proposed DB2RNet. It should be noted that for original incomplete character, we directly calculate the PSNR and SSIM values between the original incomplete characters and their corresponding complete characters, without feeding them into the model. Therefore, no accuracy metric appeared in the corresponding cells. From Table 3, several observations could be drawn:

(1)
We find that direct recognition gets a lower accuracy because of the low-quality and texture and contextual feature loss of the input incomplete Dongba character.
(2)
According to the results no inter-module residual connection, we can know that the network gets the lowest accuracy, even lower than direct recognition. It can be concluded that the concatenation of restored character and original character in the recognition module is significant for recognition. It demonstrates that the inter-module residual connection makes great sense in end-to-end restoration-to-recognition network.
(3)
For the network without multiscale feature block, the value of PSNR and SSIM get the lowest value. The value of PSNR even lower than original incomplete character in Level 1 and Level 2 datasets. It may be concluded that multiscale feature block in restoration module strategy has significant impact on the restored image quality.
(4)
Compare all mask level datasets, we find that our proposed DB2RNet can achieve the best values and all evaluation indicators are higher than without one of the strategies. It can be concluded that the DB2RNet can handle different mask level Dongba character datasets with good robustness.
(5)
Combine all of the above strategies, the DB2RNet achieves the highest recognition accuracy, restoration PSNR and SSIM, proving combination of above strategies obtain further improvements. The combination of all strategies demonstrates their complementary ability exceeds employing either alone. The results also prove the effectiveness of all strategies in the DB2RNet.

The results in Table 3 demonstrates that all strategies contribute to the final excellent performance of DB2RNet. For the qualitative visual analysis of the effectiveness of our proposed DB2RNet, we show the incomplete Dongba character restoration results in Fig. 6. In particular, multiscale feature block has much more effect on the restoration rather than the inter-module residual connection. Therefore, the restored Dongba character of no multiscale feature block in restoration has more broken lines and intermediate completed sketches. And the restored Dongba character of no inter-module residual connection is closely to the best DB2RNet.

Comparison with state-of-the-art methods

Recently, lots of state-of-the-art recognition neural networks have been designed. In order to illustrate the superiority of our method, a good deal of experiments are conducted to make comparison between these excellent recognition methods and our proposed DB2RNet. What’s more, we also compare our methods with different excellent end-to-end image restoration-to recognition models to validate the superiority of our proposed DB2RNet.

Comparison with recognition models

In order to reveal the superiority of the proposed DB2RNet in this paper, we conduct sufficient comparative experiments with state-of-the-art deep neural network methods in the field of image recognition, such as ResNet50 [28], MobileNetV1 [32], EfficientNet [33], RepVGG [34] and ResNeSt50 [35]. At the same time, we also compare our DB2RNet with MAAN [21], which is the state-of-the-art model of perfect/complete Dongba character recognition. In this section, Accuracy is used as the measurement. Sufficient experimental results are illustrated in Table 4. Direct recognition accuracy means that the model is directly used to recognize the incomplete Dongba character. Employed in DB2RNet means that our recognition module is replaced with the model and therefore the model is employed in DB2RNet. It should be noted that our recognition module employed in DB2RNet is DB2RNet itself.

Table 4 Evaluation results of excellent image recognition models and Dongba character recognition model on four different incomplete Dongba character testing sets.

Full size table

As is shown in Table 4, the DB2RNet (our recognition module employed in DB2RNet) proposed in our paper achieves competitive recognition accuracy, achieving a accuracy of 98.25% in Level 1 incomplete Dongba character. For example, compared with the best representative image recognition method ResNet50 [28] (direct recognition), DB2RNet obtains the improvement of the performance by 1.46% of accuracy. At the same time, our DB2RNet also exceeds the best perfect/complete Dongba character recognition model MAAN [21] (direct recognition) with a recognition rate of 0.9%. The above results fully verify the superiority of the propose DB2RNet. These experiments results also demonstrate that our end-to-end restoration-to-recognition has better recognition accuracy than previous image recognition methods.

As is illustrated in Table 4, the accuracy of all models decreases with the increasing irregular mask ratios of incomplete Dongba character. This is because these characters have different types of incompletion situations and larger irregular incompletion usually represents less information, which are difficult to be modeled. Nevertheless, our proposed DB2RNet remains the state-of-the-art recognition accuracy. What’s more, the gap between the recognition accuracy of our DB2RNet (recognition module employed in DB2RNet) and that of other models is also increasing. Take ResNeSt50 [35] (direct recognition) as an example, the recognition accuracy gap increases from 2.19% (Level 1 incomplete Dongba character dataset) to 7.28% (Level 4 incomplete Dongba character dataset). These results demonstrate that the DB2RNet has good robustness and applicability.

As can be seen from Table 4, the recognition module in our DB2RNet (direct recognition) get a lower recognition accuracy than MAAN [21] (direct recognition), a minimum number of 0.13% in Level 3 incomplete Dongba character dataset and a maximum number of 0.47% in Level 2 incomplete Dongba character dataset. Table 2 and Table 5 show detailed structure and hyperparameters of our recognition module and the baseline network of MAAN [21]. By comparing Tables 2 and 5, we can find that the number of channels of Conv5 in Table 5 is twice that of Conv5 in Table 2, which means that Table 5 will consume more memory. The recognition module of the DB2RNet saves memory compared to MAAN and only requires a minimal recognition accuracy cost. Therefore, we adopt our recognition module rather than MAAN. It should be noted that any recognition models can replace our recognition module and can be employed in our DB2RNet. If memory is not a concern, our recognition module can be replaced with any CNN network that offers better recognition accuracy. It can be seen from Table 4 that better direct recognition accuracy models lead to better recognition accuracy when it is employed in our DB2RNet. In addition, in order to intuitively analyze the recognition results on the testing set, we also make confusion matrices for our DB2RNet, MAAN, MobileNetV1 and RepVGG, as shown in Fig. 7. To better show the differences of confusion matrices, we randomly selected one hundred level 4 incomplete Dongba character classes. It is obvious that our DB2RNet have discriminative effects for incomplete Dongba character recognition.

Table 5 Detailed structure and hyperparameters of the baseline network of MAAN [21]

Full size table

All these results demonstrate our proposed DB2RNet not only can achieve the state-of-the-art recognition accuracy but also can handle different incomplete level Dongba character datasets with good robustness. All the experimental results also demonstrate that incomplete handwritten Dongba characters is a challenging task that previous image recognition methods cannot handle it well and our DB2RNet is an effective method.

Comparison with end-to-end restoration-to-recognition models

In order to certify the advantage of the proposed end-to-end restoration-to-recognition DB2RNet, we compare our DB2RNet with the state-of-the-art restoration-to-recognition network of vehicle logo image Joint-Caps-Net [27], sketch image ADFRNet [25] and SketchGAN [26] and ancient Chinese character [36]. We can fully and vigorously prove the advantages of the proposed DB2RNet. Table 6 shows the quantitative evaluation of our method.

Table 6 Evaluation results of our proposed DB2RNet with other state-of-the-art end-to-end restoration-to-recognition methods on four different ratios incomplete Dongba character testing sets

Full size table

It can be observed from Table 6 that the proposed DB2RNet outperforms any other state-of-the-art end-to-end image restoration-to-recognition models. Although the SketchGAN obtains better results than the other models, our proposed DB2RNet outperforms it by a accuracy of 1.38% (Level 1 incomplete Dongba character) to 2.76% (Level 4 incomplete Dongba character). It can also be observed that our proposed DB2RNet is superior to the other models in Dongba character restoration in terms of restoration performance (higher PSNR and higher SSIM). Meanwhile, we can also find that these three modules even get lower accuracy than our direct recognition module, the reason may be that our recognition module (inspired by the state-of-the-art perfect Dongba character recognition method MAAN) has better performance and may be that they have overlooked the significance of the inter-module residual connection and the original incomplete Dongba character input not being transmitted to the recognition module in there methods.

Furthermore, we also show the restored images of the models mentioned above in Fig. 8. Compared with the above four methods, our DB2RNet produces better results in terms of incomplete Dongba character restoration, which confirms the superiority of the proposed DB2RNet. In conclusion, by comparing with recently state-of-the-art end-to-end restoration-to-recognition models quantitatively and qualitatively, it can be concluded that our DB2RNet outperforms the current state-of-the-art end-to-end restoration-to-recognition models.

Performance on open sets

In order to further demonstrate the proposed DB2RNet generalization and robustness, we conduct extensive experiments on Chinese character dataset CASIA-HWDB1.1 [4] and Oracle character dataset OBC306 [37]. On one hand, we make comparisons with state-of-the-art image recognition models which used in section Comparison with recognition models to reveal the superiority of the proposed DB2RNet, and the experimental results are shown in Table 7. On the other hand, to certify the advantage of the proposed end-to-end restoration-to-recognition DB2RNet, we compare our DB2RNet with the state-of-the-art restoration-to-recognition networks which used in section Comparison with end-to-end restoration-to-recognition models and the experimental results are shown in Table 8.

Table 7 Evaluation results of excellent image recognition models on four different incomplete CASIA-HWDB1.1 [4] and OBC306 [37] open sets

Full size table

It can be seen from Table 7 that better direct recognition accuracy models lead to better recognition accuracy when it is employed in our DB2RNet, no matter this strategy is used in CASIA-HWDB1.1 or OBC306 open sets. Both the direct recognition accuracy of our recognition module and when it is employed in DB2RNet obtain competitive results. Although they get a lower recognition accuracy than MAAN, our recognition module saves memory compared to MAAN and only requires a minimal recognition accuracy cost as we concluded in section Comparison with recognition models.

Table 8 Evaluation results of our proposed DB2RNet with other state-of-the-art end-to-end restoration-to-recognition methods on four different ratios CASIA-HWDB1.1 [4] and OBC306 [37] open sets

Full size table

According to the results of Table 8, it may be concluded that the proposed DB2RNet has achieved higher accuracy, PSNR and SSIM values than other state-of-the-art end-to-end restoration-to-recognition methods. Therefore, these results can demonstrate that our proposed DB2RNet can outperform state-of-the-art methods effect on other open sets like CASIA-HWDB1.1 and OBC306. The experimental results of our DB2RNet are significantly higher than those of other state-of-the-art methods, and the gap between them is even wider than what is shown in Table 6.

Furthermore, we also show the restored Chinese character and Oracle character of the models in Fig. 9. It can be seen that our DB2RNet produces better results in terms of both incomplete Chinese character and Oracle character restoration, which confirms the superiority of the proposed DB2RNet. In conclusion, by comparing with recently these models quantitatively and qualitatively on open sets, the credibility, robustness and generality of our proposed DB2RNet are proved.

Conclusion

In this paper, a novel end-to-end network for incomplete handwritten Dongba character recognition by multiscale feature restoration (DB2RNet) is proposed, and the restored characters can be used to assist boosting the performance of Dongba character recognition. We first generate four incomplete Dongba character datasets with four different incompleteness levels. Then, the incomplete Dongba character is first restored by the proposed restoration module, and then sent to the recognition module. The incomplete Dongba character is concatenated in restoration and recognition module in a inter-module manner, which can assist improve the performance of the proposed DB2RNet. In addition, novel multiscale feature blocks are introduced, which can provide more effective texture and contextual information transmission for Dongba character image restoration, and thus yielding better restoration effects and better recognition results. Extensive experiments results confirm the effectiveness of the components of the proposed DB2RNet, and demonstrate that it achieves the state-of-the-art performance on incomplete Dongba character recognition task. By comparing with recently state-of-the-art models quantitatively and qualitatively on CASIA-HWDB1.1 and OBC306 open sets, the credibility, robustness and generality of our proposed DB2RNet are proved.

Availability of data and materials

The datasets generated and/or analysed during the current study can be obtained from: https://mzyy.muc.edu.cn. The code will be released in the website address very soon once our paper is published. The datasets and code used are also available from the corresponding author on reasonable request.

References

Wang J, Wang S, Chen S. Research on the digital protection strategies of intangible cultural heritage. Softw Guide. 2011;10(8):49–51.
Google Scholar
Wu G, Ding C, Xu X, Wang N. Intelligent recognition on Dongba manuscripts hieroglyphs. J Electron Meas Instrum. 2016;30(11):1774–9.
Google Scholar
Guo H, Jy Zhao, Xn Li. Preprocessing method for NaXi pictographs character recognition using wavelet transform. Int J Digit Content Technol Appl. 2010;4(3):117–31.
Google Scholar
Liu CL, Yin F, Wang DH, Wang QF. Online and offline handwritten Chinese character recognition: benchmarking on new databases. Pattern Recognit. 2013;46(1):155–62.
Article Google Scholar
Xu X, Jiang Z, Wu G, Wang H, Wang N. Identification method of Dongba pictograph based on topological characteristic and projection method. J Electron Meas Instrum. 2017;31(1):150–4.
Google Scholar
Da M, Zhao Jy, Suo G, Guo H. Online handwritten Naxi pictograph digits recognition system using coarse grid. In: International Workshop on Computer Science for Environmental Engineering and EcoInformatics. Springer; 2011. p. 390–396.
Tong S, Jianjun Z, Wensi L, Yunmu W, Yifei X, Zhijian Z, et al. Research on recognition of Dongba script by a combination of HOG feature extraction and support vector machine. J Nanjing Univ. 2020;56(6):870–6.
Google Scholar
Zhang J, Lai Z, Kong H, Shen L. Robust Twin Bounded Support Vector Classifier With Manifold Regularization. IEEE Trans Cybern. 2023;53(8):5135–50.
Article PubMed Google Scholar
Meng T, Huang R, Lu Y, Liu H, Ren J, Zhao G, et al. Highly sensitive terahertz non-destructive testing technology for stone relics deterioration prediction using SVM-based machine learning models. Herit Sci. 2021;9:1–9.
Article Google Scholar
El-Askary NS, Salem MAM, Roushdy MI. Features processing for Random Forest optimization in lung nodule localization. Expert Syst Appl. 2022;193: 116489.
Article Google Scholar
Sun Z, Wang G, Li P, Wang H, Zhang M, Liang X. An improved random forest based on the classification accuracy and correlation measurement of decision trees. Expert Syst Appl. 2024;237: 121549.
Article Google Scholar
Vincent SSM, Duraipandian N. Detection and prevention of sinkhole attacks in MANETS based routing protocol using hybrid AdaBoost-Random forest algorithm. Expert Syst Appl. 2024;249: 123765.
Article Google Scholar
Wang Y, Pang W, Jiao Z. An adaptive mutual K-nearest neighbors clustering algorithm based on maximizing mutual information. Pattern Recognit. 2023;137: 109273.
Article Google Scholar
Ali A, Hamraz M, Gul N, Khan DM, Aldahmani S, Khan Z. A k nearest neighbour ensemble via extended neighbourhood rule and feature subsets. Pattern Recognit. 2023;142: 109641.
Article Google Scholar
Huang Y, Chen D, Wang H, Wang L. Gender recognition of Guanyin in China based on VGGNet. Herit Sci. 2022;10(1):93.
Article Google Scholar
Jin X, Wang X, Xue C. Nondestructive characterization and artificial intelligence recognition of acoustic identifiers of ancient ceramics. Herit Sci. 2023;11(1):144.
Article Google Scholar
Luo H, Xu D, Yang B, Zhang H. Multi-scale Feature Fusion Based Dongba Character Recognition. In: 2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE). IEEE; 2020. p. 1571–1575.
Hua R, Xu X. Intelligent classification on images of Dongba ancient books. J Eng. 2019;2019(23):9039–42.
Google Scholar
Luo Y, Bi X, Wu L, Li X. Dongba pictographs recognition based on improved residual learning. CAAI Trans Intell Syst. 2022;17(1):79–87.
Google Scholar
Xie Y, Dong J. Research on Dongba hieroglyph recognition using ResNet network. Comput Era. 2021;343(1):6–10.
Google Scholar
Luo Y, Sun Y, Bi X. Multiple attentional aggregation network for handwritten Dongba character recognition. Expert Syst Appl. 2023;213: 118865.
Article Google Scholar
Li L. Naxi pictographs and transcription characters dictionary. 1st ed. Kunming: Yunnan Nationalities Publishing House; 2001.
Google Scholar
Fang G, He Z. Naxi pictograph character chart. 1st ed. Kunming: Yunnan People’s Publishing House; 1981.
Google Scholar
Liu G, Reda FA, Shih KJ, Wang TC, Tao A, Catanzaro B. Image inpainting for irregular holes using partial convolutions. In: Proceedings of the European conference on computer vision (ECCV); 2018. p. 85–100.
Wan J, Zhang K, Li H, Chan AB. Angular-driven feedback restoration networks for imperfect sketch recognition. IEEE Trans Image Process. 2021;30:5085–95.
Article PubMed Google Scholar
Liu F, Deng X, Lai YK, Liu YJ, Ma C, Wang H. Sketchgan: Joint sketch completion and recognition with generative adversarial network. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; 2019. p. 5830–5839.
Chen R, Mihaylova L, Zhu H, Bouaynaya NC. A deep learning framework for joint image restoration and recognition. Circuits Syst Signal Process. 2020;39:1561–80.
Article Google Scholar
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 770–778.
Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 4700–4708.
Huang G, Liu S, Van der Maaten L, Weinberger KQ. Condensenet: An efficient densenet using learned group convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2018. p. 2752–2761.
Lin TY, Dollár P, Girshick R, He K, Hariharan B, Belongie S. Feature pyramid networks for object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2017. p. 2117–2125.
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861. 2017; https://doi.org/10.48550/arXiv.1704.04861.
Tan M, Le Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In: International conference on machine learning. PMLR; 2019. p. 6105–6114.
Ding X, Zhang X, Ma N, Han J, Ding G, Sun J. Repvgg: Making vgg-style convnets great again. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2021. p. 13733–13742.
Zhang H, Wu C, Zhang Z, Zhu Y, Lin H, Zhang Z, et al. Resnest: Split-attention networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition; 2022. p. 2736–2746.
Xu Y, Zhang XY, Zhang Z, Liu CL. Large-scale continual learning for ancient Chinese character recognition. Pattern Recognit. 2024;150: 110283.
Article Google Scholar
Huang S, Wang H, Liu Y, Shi X, Jin L. OBC306: A large-scale oracle bone character recognition dataset. In: 2019 International Conference on Document Analysis and Recognition (ICDAR). IEEE; 2019. p. 681–688.

Download references

Acknowledgements

Not applicable.

Funding

This work was supported in part by the National Natural Science Foundation of China (Grant No. 62236011); and in part by the National Social Science Fund of China (Grant No. 20 &ZD279).

Author information

Authors and Affiliations

Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China, Beijing, 100081, China
Xiaojun Bi
School of Information Engineering, Minzu University of China, Beijing, 100081, China
Xiaojun Bi
College of Information and Communication Engineering, Harbin Engineering University, Harbin, 150001, China
Yanlong Luo

Authors

Xiaojun Bi
View author publications
You can also search for this author in PubMed Google Scholar
Yanlong Luo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

X.J. Bi developed the research idea. Y.L. Luo wrote the manuscript and conducted experiments. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Yanlong Luo.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Bi, X., Luo, Y. Incomplete handwritten Dongba character image recognition by multiscale feature restoration. Herit Sci 12, 218 (2024). https://doi.org/10.1186/s40494-024-01329-8

Download citation

Received: 02 April 2024
Accepted: 14 June 2024
Published: 27 June 2024
DOI: https://doi.org/10.1186/s40494-024-01329-8

Incomplete handwritten Dongba character image recognition by multiscale feature restoration

Abstract

Introduction

Methods

The architecture of DB2RNet

Multiscale feature restoration module

Inter-module residual connection and recognition module

Loss functions

Datasets and evaluation metrics

Experimental setup

Results and discussion

Ablation studies

Comparison with state-of-the-art methods

Comparison with recognition models

Comparison with end-to-end restoration-to-recognition models

Performance on open sets

Conclusion

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords