WuMKG: a Chinese painting and calligraphy multimodal knowledge graph

Chinese Painting and Calligraphy (ChP&C) holds significant cultural value, representing integral aspects of both Chinese culture and global art. A considerable amount of ChP&C works are dispersed worldwide. With the emergence of digital humanities, a vast collection of cultural artifacts data is now available online. However, the online databases of these artifacts remain decentralized and diverse, posing significant challenges to their effective organization and utilization. Addressing this, our paper focuses on the Wu Men School of Painting and proposes a framework for constructing a multimodal knowledge graph for the ChP&C domain. We construct the domain ontology by analyzing the ChP&C knowledge schema. Then, we acquire knowledge from diverse data sources, including textual and visual information. To enhance data collection around collecting historical context and subject matter, we propose methods for seal extraction and subject extraction specific to ChP&C. We validate the effectiveness of these methods on the constructed dataset. Finally, we construct the Wu Men Multimodal Knowledge Graph (WuMKG) and implement applications such as cross-modal retrieval, knowledge-based question-answering and visualization.


Introduction
Chinese Painting and Calligraphy (ChP&C) is pivotal in traditional Chinese culture, significantly contributing to world art history.Beyond its aesthetic appeal, ChP&C embodies cultural richness and historical narratives, reflecting the profound heritage of China.ChP&C has provided valuable sources for historians and archeologists to study the history and humanity of the corresponding eras [1].The Wu Men School of Painting, representative of the mid-Ming Dynasty in China, is one of the most extensive, accomplished, and influential painting schools in Chinese art history.
ChP&C collections are preserved in numerous museums around the world.In 2009, Chinese Calligraphy was officially recognized and included in the Representative Digitizing cultural heritage facilitates reliable preservation and storage of cultural content in digital repositories [5,6].The advancement of the knowledge graph provides innovative approaches for knowledge organization and intelligent applications in the domain of cultural heritage.A common method for constructing a knowledge graph involves linking relevant knowledge to specific topics.As Fig. 1 illustrates, this can be achieved by developing the graph around a specific topic and extending it to relative knowledge.Currently, there is no specific ontology tailored for ChP&C.Consequently, utilizing a knowledge graph for researching and preserving ChP&C remains a challenge.
The research value of ChP&C is manifested in its rich and diverse visual information.As shown in Fig. 2, it includes subjects, postscript, and seals.The subject represents the primary elements of the painting.The postscript, closely related to the content of the painting, typically includes critiques and narrative records.
Seals are applied by either the artist or collectors.These are primarily divided into three types: the treasure seal of appraisal, the creator's seal, and the collector's seal.They explicitly reveal the painting's provenance and collecting history [7].The primary challenge remains in accurately extracting and analyzing visual elements.Analyzing traditional Chinese paintings for automated detection and recognition is challenging due to their stylized and abstract approach, which lacks realistic details and emphasizes conveying the essence of subjects.Cultural interpretations of objects and symbols in these paintings require background knowledge, posing difficulties for algorithms and viewers unfamiliar with these contexts.Moreover, variability in conditions, preservation status, and medium further complicate target detection algorithms, which are impacted by factors such as fading colors, paper deterioration, and ink spread.Additionally, there is a scarcity of publicly available, high-quality datasets specifically tailored for traditional Chinese paintings.
To address these challenges, we present a framework for constructing a multi-modal knowledge graph in the cultural heritage domain.We construct the Wu Men Multimodal Knowledge Graph (WuMKG) and develop associated applications.We design the ChP&C ontology based on CIDOC CRM [8] and Shlnames [9], which expands classes, attributes, and vocabulary.We refine descriptions of persons and artworks using multi-source heterogeneous data.Based on diverse data sources, we design methods to extract textual and visual knowledge including persons, ChP&C artworks, seals, and subjects.Then, we integrate and create associations between images and text to construct a multi-modal knowledge graph, which consists of 104,374 entities and 418,450 triples related to the Wu Men School.To facilitate the public's understanding and appreciation of ChP&C, we have developed a web-based system based on the WuMKG.This system implements features including cross-modal retrieval, a question-answering system (Q &A system), and KG visualization.These capabilities greatly enhance the knowledge discovery services for ChP&C and further advance its protection, inheritance, development, and utilization.
Our contributions can be summarized as follows: • We propose a framework for constructing a multimodal knowledge graph in the ChP&C domain.We analyze the knowledge schema in this domain and design the ontology based on CRM and Shlnames.Moreover, we introduce a method for knowledge acquisition, which includes extracting information from textual and visual data of ChP&C.

• We design distinct methods for seal extraction and
Chinese painting subject detection.We have con-

Treasure seal of appraisal
Jia Jing Yu Lan Zhi Bao [ ]

Subject
Mountain Subject Fig.

Related work
In this section, we review related work on metadata and ontologies within the cultural heritage domain, investigate multimodal knowledge graphs, and explore their applications in this area.Additionally, we review image processing research related to ChP&C.

Cultural heritage metadata and ontology
In the realm of cultural heritage preservation, metadata and ontology play pivotal roles in facilitating the organization, retrieval, and understanding of historical artifacts and information.Metadata serves as the backbone for cataloging and describing cultural heritage items, capturing vital details such as provenance, condition, and historical significance.Ontology, on the other hand, provides a formal representation of concepts, entities, and relationships within a given domain, enabling a more structured and semantic understanding of cultural heritage data.The early developments in metadata standards for cultural heritage occurred in the United States and Europe.The Getty Vocabulary Program [10] has categories for describing artistic works and data value standards for art, architecture, and material culture information.In addition, there are other standards like the Dublin Core Metadata Element Set [11], which describes general resources on the web, VRA core [12] categories for visual materials, and Information Object Definitions for exchanging cultural heritage content [13].
The Categories for the Description of Works of Art (CDWA) provides a framework to ensure consistent and comprehensive descriptions, including titles, creators, dates, materials, and styles, as well as the historical and cultural context of the works.[14].Bobasheva et al. developed a pipeline using deep learning to analyze and correct metadata.Their method identifies contextual relationships between concepts in metadata by analyzing semantic models and concept store similarities [15].The International Image Interoperability Framework (IIIF) is an innovative metadata solution that offers a standardized method for describing and delivering images on the web, facilitating the development of digital cultural heritage [16].Xue et al. implemented the representation of the Chinese Seal Stamping Catalogs using IIIF and Serverless [17].
In the domain of cultural heritage ontology modeling, noteworthy contributions have been made by the International Committee for Documentation (CIDOC) CRM and the European Data Model (EDM) [8,18].The Getty Vocabulary Program has also been extended to accommodate Semantic Web representations.The Shanghai Library maintains the China Biographical Database Project (CBDB) ontology and also developed a person name standard ontology(Shlnames) to support the description of Chinese individual information [19].
Matusiak et al. presented a model of creating bilingual parallel records that combines translation with controlled vocabulary mapping, which addresses the linguistic needs of the target audience, connects the digitized objects to their respective cultures and contributes to richer descriptive records [20].
Ontologies go beyond metadata's capabilities in cultural heritage by structuring complex relationships and hierarchies, enabling advanced data integration and knowledge representation.This structured, semantic framework allows ontologies to model intricacies that metadata alone cannot, facilitating deeper and more systematic discovery and management of knowledge.
Although there are some ontology practices in the cultural heritage domain, research specifically related to ChP&C remains relatively limited, often concentrating on modeling the components of these art forms.Comprehensive modeling of ChP&C is not only crucial but also presents significant challenges.In our research, we thoroughly analyze the knowledge schema of ChP&C and design an ontology for it.Utilizing this ontology, we construct the WuMKG, which has demonstrated the effectiveness of the ontology.

Cultural heritage knowledge graph
Extensive research and practice have been conducted on constructing knowledge graphs in the cultural heritage domain.
The British Museum released open-linked data on collections of cultural relics based on CIDOC CRM and developed a semantic retrieval system [21].The Museum of the Netherlands constructed open-linked data based on EDM ontology.The Europeana project integrates data from more than 4000 museums, libraries and archives in Europe, and its knowledge graph contains 3.7 billion triples [22].There are also the Italian cultural heritage knowledge graph ArCo [23], the Finnish World War II knowledge graph WarSamp [24] and the Nuremberg urban cultural heritage knowledge graph [25], etc.
Chinese researchers have also conducted research and exploration in recent years.Based on Dong Qichang's collection of image data and related digital resources, the Shanghai Museum sorted out his cultural context, built related data based on the CIDOC CRM, and displayed Dong Qichang's creative history in a visual form [26]. Related research also includes the knowledge graph of Dunhuang murals and the knowledge graph of Fan Qin's life [27].
In terms of multi-modal knowledge graphs, application exploration has also been carried out in the field of cultural heritage.Thomas et al. [28] combined expert knowledge and a knowledge graph containing images to propose a method for retrieving and recommending European silk fabrics.Puren et al. [29] proposed a knowledge representation method for silk text images.Fan et al. [30] constructed a multi-modal knowledge graph of Chinese intangible cultural heritage and used visual features to select representative images for entities.
Diverging from prior works, we introduce a novel framework specifically tailored for the construction of a multimodal knowledge graph for ChP&C.We construct a domain ontology by analyzing the knowledge schema.Additionally, we collect heterogeneous data from museums worldwide, extracting relevant ChP&C knowledge.Notably, addressing the unique characteristics of ChP&C, we design methods for seal extraction and subject extraction to extract knowledge from images.

Image processing on ChP&C
Chinese paintings, known for their diverse styles, have been a subject of exploration for researchers in the realm of artificial intelligence (AI).This section delves into the existing literature and highlights noteworthy contributions in multidimensional feature extraction, classification, and sentiment recognition tasks related to Chinese paintings.
Liong et al. [31] constructed a comprehensive traditional Chinese painting dataset featuring six distinct classes.Leveraging deep learning methodologies, particularly convolutional neural networks (CNNs), they pursued classification objectives.Furthermore, the researchers employed a series of instance segmentation techniques to discern discriminant characteristics in Chinese paintings.Zhang et al. [32] focused on the development of a model capable of learning aesthetic features from both global attribute maps and localized patches.The derived features, coupled with manually crafted features rooted in the expertise of art professionals, culminated in creating a comprehensive computational model for the beauty evaluation of Chinese ink paintings.Recognizing the challenge of CNNs underperforming in classification tasks with limited sample sizes, Li et al. [33] introduced a novel solution.They proposed a multi-instance learning algorithm, ALSTM-MIL, which employs a Long and Short-Term Memory neural network with an attention mechanism.It transformed Chinese painting images into multi-instance bags.The application of ALSTM facilitated the extraction of encoded features, enabling semantic classification of Chinese paintings.
Research related to Chinese paintings primarily focuses on feature extraction and classification.However, the presence of multiple distinct objects within a painting indicates that it possesses diverse subjects.To leverage this distinctive feature, we employ an object detection approach for subject detection, which enables the nuanced content representation in Chinese paintings.Differing from previous studies, we incorporate image processing technology into our framework.This involves extracting seals and subject objects from images of ChP&C artworks to facilitate knowledge acquisition, thereby providing a novel perspective for constructing multimodal knowledge graphs.

Methodology for WuMKG construction
In this section, we will introduce the framework for constructing the WuMKG.As illustrated in Fig. 3, the framework consists of four parts: data sources, ontology design, knowledge acquisition and applications.
Data sources We collect data from various sources, including the websites of museums, the website of the National Cultural Heritage Administration, specialized books, encyclopedias, and seal databases, to gather comprehensive data.

Ontology design
We analyze the domain knowledge schema.We reuse the CIDOC CRM and Shlnames ontology, extending the ontology by adding new classes, relationships, and terms.
Knowledge acquisition Our aim is to extract structured data from various sources of textual and image data, transforming it into knowledge.Initially, we preprocess the collected data to facilitate the extraction task.We extract entities, relationships, attributes, and events from the text.Additionally, we extract seals and subjects from images to acquire historical and content-related data.Finally, we apply knowledge fusion and data mapping techniques to construct the WuMKG.
Applications We release the SPARQL endpoint and build a web-based software platform based on the WuMKG, enabling multimodal retrieval, a question-andanswer system, and visualization applications.

Ontology of WuMKG
In this section, we will present the design methodology employed for the WuMKG ontology.To create a well-organized and domain-specific ontology, we integrate guidance from art theory, knowledge schema design, ontology design, and expert review.Historical art insights and principles guide the data structuring, ensuring that the resulting ontology authentically reflects the distinctive attributes and deep cultural contexts of the art styles.We construct a knowledge schema after analyzing the ChP&C data from various resources.Afterward, we incorporate existing domain ontologies, such as CIDOC CRM and Shlnames, mapping objects, elements, and modifiers from the schema to the foundational classes, relationships, and properties to form the basic ontology for ChP&C.Subsequently, we expand the person ontology to encompass various character relationships.The final work is validated by experts.

Knowledge schema
We collect and analyze existing public data on the internet, as well as books and literature related to ChP&C, to determine the range and content of data for the ChP&C knowledge graph.After a thorough examination and consideration of current metadata sources related to cultural heritage, we construct a knowledge schema from the available data.We also seek insights from experts to ensure that the schema accurately reflects domainspecific knowledge and practices.As shown in Fig. 4, Artworks We summarize the knowledge of ChP&C into three dimensions: basics, content and activity.Taking the painting "The Sound of Pines on the Mountain Road" as an example, Table 1 shows its information.The basics of ChP&C, including the dynasty, materials, and dimensions, provide insights into their historical context and the nuances of their creation.These artworks primarily express the artist's ideals and vision through their visual impact.

Data sources
The postscript denotes the calligraphic elements found within the artworks of ChP&C, encompassing aspects such as fonts and content.
The subjects of ChP&C are windows into the artist's emotions, thoughts, and aesthetic aspirations, while the employed techniques reveal their unique creative abilities and stylistic choices.Finally, the seals and other mark information in the artwork are content applied by the creator or collector, representing the process of creation and collecting history.
Persons Artists' creative styles and techniques are often influenced by their interpersonal relationships, including those with family members, teachers, and others.Additionally, significant life events can also lead to the evolution of an artist's creative style.To deeply study the creative characteristics of the characters, in addition to the basic information, the relationships and events of the characters were recorded as a reference for subsequent research.See Appendix A for examples of persons.
Resource records A large amount of knowledge in painting and calligraphy relies on records from multiple sources.Therefore, it is necessary to record information such as knowledge sources and bibliographies while introducing domain knowledge as auxiliary data for subsequent knowledge integration and research verification.The knowledge of ChP&C includes multimedia such as image resources and explanations of painting and calligraphy works, which should also be recorded as electronic resources.External resources store link relationships with other knowledge bases to enhance the connectivity between knowledge bases.See Appendix A for examples of resource records.

Ontology reuse
Reusing existing ontologies is an important principle in building ontologies, which helps to improve interoperability and avoid ambiguities and conflicts in expressions.We reuse CIDOC CRM and Shlnames ontology.
We use the CIDOC CRM as the basis of the ontology and map the concepts in painting and calligraphy to classes in the model based on semantics.The E24_Physi-cal_Human-Made_Thing class in the model is the basic class for entities such as artworks, seals, and inscriptions, as shown in Fig. 5.
For resource records, classes such as E73_Infor-mation_Object and E31_Document are used to describe electronic resources, bibliographies, records, etc., and are related to entities such as works, persons, painting and calligraphy, and seals in the atlas.To describe person information, we reuse the Shlnames ontology.Shlnames is an ontology constructed by the Shanghai Library for describing basic information about Chinese personal names.For person entities, use crm: E21_Person class and shl: Person class for description.Use the shl:courtesyName, shl: pseudonym, shl:livedAt, and other attributes in the personal name specification library to describe the character's font size, place of residence, and other information.

Ontology extension
The current ontology struggles to effectively represent life events, social relationships, specific terms, and connections related to works.To address these issues, we have developed the person relationship ontology, which extends the classes, relationships, and attributes within the existing ontology.This expansion is accomplished through the utilization of the Ontology Web Language (OWL) and Simple Knowledge Organization System (SKOS) to precisely define terms [34].This provides a more comprehensive and accurate relational framework for studying and analyzing artists.
As shown in Fig. 6, the detailed depiction of character relationships, encompassing kinship, master-disciple, and cooperative connections, is facilitated through the utilization of Protégé to expand the person class E21_Person within the CRM model.This expansion introduces two subclasses and delineates 38 specific types of person relationships.Additionally, the establishment of two gender classes, EX_MALE and EX_FEMALE, allows for a more nuanced representation of specific character relationships, such as sisters, brothers, husbands, wives, and more.
To ensure semantic consistency in the data mapping process of painting and calligraphy knowledge, we employ OWL and SKOS to construct a vocabulary of specialized terms.This vocabulary primarily encompasses terms related to painting and calligraphy types, materials, techniques, mounting, fonts, etc.The terms were systematically defined and categorized through expert validation and analysis of their expressions in data sources.The organization of these terms is based on their definitions and relationships.The terminology serves as a crucial tool for data mapping and maintaining consistency in painting and calligraphy knowledge.

Knowledge acquisition
In this section, we introduce the process of transforming textual and image data from various sources into knowledge within the WuMKG.Following the construction framework, knowledge acquisition is divided into three main components: collection &preprogressing, knowledge extraction, and knowledge fusion.Specifically, knowledge extraction includes text extraction, seal extraction and subject extraction.

Data collection
Our data primarily comes from professional books, museum websites, official websites, linked open data, and Baidu Baike [19,35].
Data collection The principle of selecting data sources is based on the professionalism and reliability of the data.We collect open-access data from museum websites, National Cultural Heritage Administration websites, books and references, cultural relics dictionaries, domain vocabularies, and the China Biographical Database Project (CBDB) [19] released by research institutions.The data encompasses structured (LOD), semi-structured (museum websites, etc.), and unstructured (PDF) formats.See Appendix B for details.

Data preprocessing
We conduct data cleaning on semi-structured data, removing duplicates, and addressing issues such as erroneous characters.For PDF documents, we employed OCR for automated text recognition.Additionally, painting and calligraphy images underwent automatic processing to convert them into text.

Text extraction
The information about artists typically includes biographical overviews that provide unstructured summaries containing extensive details about their lives, works, and geographical locations.We extract relevant information from these entries such as the artists' names, important locations related to their lives, and significant artworks.These summaries are similar to general knowledge about individuals.To uncover this knowledge, we employ LeBERT [36] as our extraction model and devise an iterative annotation extraction strategy to assist in completing this task at a relatively low cost.
We used a pre-trained LeBERT model to extract entities from infoboxes and contents.Given a Chinese sentence with n characters s c = {c 1 , c 2 , ..., c n } , the process is expressed as follows: where e i represents the embedding for c i , and l i repre- sents the label for c i .
(1) {e 1 , e 2 , ...e n } =LeBERT(s c ) (2) {l 1 , l 2 , ...l n } =Softmax({e 1 , e 2 , ...e n }) Domain entity extraction requires professional annotation, which incurs significant costs.To minimize labeling costs, we adopted an iterative annotation approach.Firstly, we use the model pretrained on Weibo Dataset for rough data annotation, creating a coarse annotation.Subsequently, human reviewers filtered and corrected the coarse annotations to obtain a high-quality refined dataset.We then fine-tuned the model using the refined dataset and performed more information extraction.
We conducted experiments to validate the effectiveness of iterative annotation and examined the correlation between the quantity of finely annotated data and extraction performance.

Seal extraction
The seals on ChP&C serve as distinctive marks for both the original author and subsequent collectors.Extracting these seals enables the study of the origins and circulation history of ChP&C.Analyzing the provenance of ChP&C involves extracting and matching these seals.We propose a method for seal extraction which consists of two stages: candidate seal detection and seal matching, as shown in Fig. 7.

Candidate seal detection
Here, we detect and segment the potential positions of seals within paintings and calligraphy.The procedure involves super-resolution augment, color correction, image binarization, and morphological processing.These sequential steps collectively contribute to identifying candidate regions likely to contain seals.
Super-resolution augment Artworks can fade and flake due to aging, making details like seals difficult to identify.We adopt BSRGAN, a super-resolution technology [37], to enhance image details actively.Using It aims to magnify the details of the seal area, facilitating improved identification in subsequent steps.The result is shown in Fig. 8.
Color correction Color deviations may arise between the captured image and the actual scene due to various factors, including lighting conditions and preservation conditions.For augmented image V = (R, G, B) , we employ white balance techniques to mitigate color bias [38].The Gray World Algorithm serves as the primary method for implementing white balance.where R avg , G avg , and B avg represent the average values of the red, green, and blue channels, respectively.C c cor- responds to the component values of R, G, B channels in the image after white balance correction, and these components collectively form the corrected image denoted as V c .
Denosing We apply a Gaussian kernel to reduce interference and enhance seal shape [39].For image V c , it adjusts pixel values with Gaussian-distributed weight coefficients to obtain the denoised image V d .It is expressed as follows: (3) where P c (x, y) and P d (x, y) represent the pixel value at position (x, y) in the image V c and the denoised image V d , respectively.The weight coefficient w(i, j) at each position is defined as: where σ represents the standard deviation of the Gauss- ian kernel, and n is the radius of the Gaussian kernel.
Image binarization The process involves keeping only the potentially seal-bearing red regions.The results are illustrated in Fig. 9. Converting the image to the HSV color space, which aligns with human color perception, enhances the extraction of the red component for improved identification of potential seal areas.
where V b denotes the binarized image.
Morphological Processing The binarized image displays noise holes, impacting both the content within the seal and the smoothness of the seal's edges.This impedes the effectiveness of seal extraction.To address this issue, we utilize morphological processing,  specifically employing dilation followed by erosion to eliminate gaps and connect fragmented regions.The results are illustrated in the Fig. 10.
Finally, we perform segmentation and extraction on candidate regions V cand by determining the coordinates of the maximum rectangular area within connected components and isolating them from the original image.

Seal matching
At this stage, we validate candidate seal regions to confirm their authenticity.Subsequently, corresponding records are generated through feature matching.
Seal validation We finetune a validation model to classify candidates into actual seals or not.We adopt Effi-cientNet-B0 as validation model [40].EfficientNet-B0 architecture is shown in Table 2.
For each V cand , the model assigns a score that indicates the likelihood of the image being an actual seal.When the score exceeds a predefined threshold t, V cand is considered a genuine seal V real .It is expressed as:

Features extraction
We separately extract seal features using the VGG16 model pre-trained on ImageNet and the SIFT (Scale Invariant Feature Transform) method [41,42].These features will be utilized to construct and match the seal database.Certainly, given the seal image V, the corresponding VGG16 feature K v and SIFT feature K s are expressed as follows: (10) Seal DB For matching of candidate seals, we construct a seal database (Seal DB).We obtain VGG16 and SIFT features of pre-collected seal images.For each seal image V i and its corresponding description I i , we obtain VGG16 features K v,i and SIFT feature K s,i through feature extrac- tion.The seal collection D is defined as:

Similarity computation
To retrieve seals from the database that closely match the actual seal, we calculate similarity scores between the actual seal V real and each seal in the Seal DB.Here, V real serves as the query to Seal DB.The similarity score for VGG16 is computed as follows: where Q v denotes the feature of seal extracted using VGG16 and K v denotes the features of each seal in Seal DB.
The score for SIFT is computed as follows: where Q s denotes the feature of the actual seal V real extracted using SIFT and K s denotes the features of each seal in the library D .d(X, Y) represents the Euclidean dis- tance between the two features.The total similarity score, denoted as Score total , is calcu- lated as a weighted sum of the VGG16 and SIFT scores, with α serving as the weighting factor.It is expressed as follows: Upon successful matching, relationships are established between the input image and the matched seals.Collecting history records are generated based on the seals recorded in the calligraphy, seamlessly integrated into our knowledge graph.

Subject extraction
Chinese paintings commonly depict a variety of subjects, including people, landscapes, trees, flowers, animals, feathers, vessels, and buildings.Despite extensive documentation of these subjects in metadata, there can be significant variations in categorization accuracy and granularity.To address this, We propose a method that utilizes object detection based on EfficientDet to extract subject-related information in Chinese paintings and associate it with corresponding entity knowledge.This approach aims to provide a more detailed and systematic (14 classification of Chinese paintings.The process is illustrated in Fig. 11.
We use the super-resolution technology to enhance the image input V in , enrich the image details, and improve the accuracy of small target detection.
We employ EfficientDet as our subject detection model.It consists of four components: EfficientNet backbone, BiFPN layer, class prediction network, and box prediction network.For image V, the model outputs predicted results S. It is expressed as follows: where l i and s i denote the detected ith subject label and its corresponding confidence score, respectively.For label l i , When the score s i for an image V exceeds a predefined threshold, it indicates that the image contains the subject related to the label l .This implies that the artwork rep- resented by V features the subject l .The identified sub- ject from this detection is then converted into triples and subsequently integrated into the WuMKG.

Knowledge fusion and data mapping
In this section, we perform knowledge fusion, which includes entity alignment and conflict resolution to address duplication and conflicts in knowledge from different sources.By the established ontology, we convert the data into triples.
Entity alignment In light of the diverse representations of entities across various data sources, especially those representing individuals, calligraphy, and painting, alignment becomes essential to establish connections between entities referring to the same object [43].Aligning person, calligraphy, and painting entities is crucial.We conducted both internal alignment and alignment with external knowledge graphs.(18 Internal aligment The alignment process relies on key attributes such as name, author and dynasty.We serialize entities and their respective attributes into sentences and calculate their similarities.The most similar entity pair is checked and merged into a single entity.We compute the similarity of attributes.Measures derived from the analysis of painting and calligraphy images.Notably, the domain vocabulary encompasses various aliases for individuals (e.g.Tang Yin [唐寅], courtesy name Bohu [字 伯虎], and Liuru Jushi [号六如居士]).Therefore, aligning person entities primarily draws on the domain vocabulary, utilizing attributes like alias, dynasty, and residence for similarity calculations.
External aligment To achieve alignment with person entities in external graphs, we employ matching features such as person name, alias, dynasty, etc.The similarity with character entities in external knowledge graphs like DBpedia and CBDB is calculated, and the result with the highest similarity is used as the alignment.Subsequently, the owl: SameAs relationship is employed to link the aligned entitY pairs.
Conflict resolution Conflict resolution is used for merging and proofreading person attributes.Using the person details in History of Wu Men Painting as the references, the biography in CBDB is integrated, and the character data in the encyclopedia is used as a supplement.When inconsistent data is encountered, priority is given to information with higher frequency and reliability.
Data mapping Semi-structured data is converted into structured data automatically, and then data mapping is used to complete the structured data conversion into triples.As shown in Fig. 12, the software tool Karma is used to complete this work [44].First, to generate a mapping model, we establish the mapping model between structured data samples and ontology classes, relationships, and attributes.Then, the model is executed to obtain knowledge graph data in the form of triples.

Evaluation of knowledge extraction
In this section, we conduct extensive experiments to evaluate the performance of the proposed methods and report detailed experimental results.

Text extraction experiment
In this experiment, we assess the effectiveness of iterative labeling on the text extraction performance.
Dataset We conduct experiments on the Baidu Baike dataset to assess the efficacy of iterative annotation.The dataset consists of the infobox and content text of artists in Baidu Baike.Following the BIO annotation scheme, entities were annotated with PER (Person), GPE (Location), and TITLE (Artwork name) tags.B and I denote the beginning and inside parts of each entity.The meticulously annotated data was partitioned into training and testing sets at a ratio of 3:7.Additionally, we randomly selected 33%(56 samples), 66%, and 100% of samples from the training set for training, denoted as 1x, 2x and 3x, respectively.
Implementation Details Initially, we employed LeB-ERT pretraining on the Weibo NER dataset, establishing it as our baseline.Subsequently, the model is fine-tuned using 33%, 66%, and 100% of the training set separately for 10 epochs, with a batch size of 32 and a learning rate of 1 × 10 −4 .
Evaluation Metrics We use three evaluation metrics, precision (P), recall (R) and F1 value.The metrics are defined as follows:

Results
The entity extraction results are shown in Table 3.The model achieves excellent extraction performance on domain text even with small amounts of downstream task data for fine-tuning.Even for data (21)

Seal extraction experiment
We construct a seal dataset to evaluate the classification performance of our approach.Dataset The dataset, as detailed in Table 4, comprises 8363 seal images, which are divided into training, development, and test sets with a ratio of 7:1:2.The seal images, sourced from the Shanghai Library's open data platform and manual annotations of candidate seals, include various shapes as depicted in Fig. 13.To enhance the diversity of negative samples, non-seal images were obtained through manual annotations and random cropping from larger images.
Evaluation metrics We use the same evaluation metrics as those used in the text extraction experiment.

Results
The experimental results are shown in Table 5.It shows that the F1 score for ResNet-50 is 99.54%, while our method using EfficientNet-B0 achieves an F1 score of 99.28%.This demonstrates the effectiveness of our approach in performing seal validation.Furthermore, the total number of parameters in ResNet-50 is five times greater than that in EfficientNet-B0.Consequently, our method offers superior computational efficiency while still maintaining high recognition accuracy.Figure 14

Subject extraction experiment
We construct a subject detection dataset and conduct a series of experiments to determine which model best suits our subject extraction task.Dataset We developed a Chinese painting subject detection dataset using open and authorized digital resources from the museum, comprising 3261 annotated images.We annotated the images with 7 labels in COCO format, utilizing the Labelimg for annotation.The dataset's statistical details are presented in Table 6.Samples across all categories were randomly divided into training and test sets in an 8:2 ratio.
Evaluation metrics We use two average precision metrics: mean average precision (mAP@0.5)and mean average precision with IoU thresholds ranging from 0.5 to 0.95 (mAP@0.5:0.95).These metrics are widely used to assess the performance of models in object detection tasks.Specifically, mAP@0.5 measures the average precision at an IoU threshold of 0.5, while mAP@0.5:0.95measures the average precision across the IoU range from 0.5 to 0.95:

Results
The comparative experimental results for painting and calligraphy subject detection are presented in Table 7.Our method using EfficientDet-D1 shows superior performance.It achieves mAP@0.5 of 74.3%, and a mAP@0.5:0.95 of 50.7%.These results surpass those of EfficientDet-D0 by margins of 0.5% and 0.7%, respectively.Notably, our method outperforms the YOLOv5 variants, with a particularly significant improvement over the YOLOv5l model by 2.6% in mAP@0.5 and 2.1% in mAP@0.5:0.95.DETR and YOLOS, featuring a sequence-to-sequence architecture, demonstrate lower performance in this specific task.An example of subject detection is shown in Fig. 15.

Applications
Based on the WuMKG, a web platform is b dal retrieval, knowledge-based question and answer system (Q &A system), and visualization.Multimodal retrieval It allows retrieval by text and image.Image retrieval is based on VGGNet16 to extract image features for entity matching.When retrieving seals, paintings and calligraphy, we use Faiss for similarity search to achieve faster search speed.The visualization application is developed based on Echarts, as shown in Fig. 16.Entities and relationships are displayed in the form of a force-directed diagram.
Q &A system we develop a multimodal Q &A system based on the WuMKG framework.The system consists of three key modules: intent classifier, SPARQL parser, and answer generator.The intent classifier identifies the user's input question and calculates its similarity to intent templates to determine the corresponding intent type, as illustrated in Table 8.We construct 17 types of intents, each including multiple templates, and use a Bayesian classifier as the classifier.Subsequently, the intent is forwarded to the parser to generate the appropriate query.The generator module then employs this query to extract values from WuMKG, transforming them into coherent and informative answer statements.
The knowledge question and answer uses a template matching method based on Bayesian classification, which supports 17 types of questions and also supports imagebased question and answer, as shown in Fig.

Fig. 1
Fig. 1 Sound of pines on a mountain path[山路松声]: a Chinese painting entity in a multimodal knowledge graph.The dashed line denotes the relation extracted from the image

Fig. 7
Fig. 7 The process of seal extraction It determines the average color of the image to estimate the global color shift, pinpointing channel-specific deviations.It is used to calculate gain coefficients for each channel, aiding in the adjustment of white balance.For channel C, C avg denotes the average values.k and k c denote the identical grayscale value and gain coefficients of channel C. It is expressed as follows: , j) • P c (i, j)

Fig. 11
Fig. 11 The process of subject detection based on EfficientDet

Fig. 12
Fig. 12 Using Karma for data mapping is an example of seal detection.a) Rectangular b) Gourd-shapedc) Oval-shaped e) Heart--shaped d) Square

Fig. 13
Fig. 13 Examples of seals in various shapes

Fig. 15
Fig. 15 An example of subject detection

Table 8
Category of intents a According to the classification standards for cultural relics collections announced by the Ministry of Culture of China, cultural relics are divided into precious and general cultural relics, with precious cultural relics further classified into first, second, and third grades Intent num Intent name Examples Relationships Who is Tang Yin's teacher/father?

Retrieve& visualization Multimodal Q&A system Applications Text extraction Ontology design Knowledge fusion &data mapping Artists Artworks Resource records Knowledge schema CIDOC CRM Shlnames Ontology reuse Person Attributes Vocabulary Ontology extension Text Images Collecting history Content representation Visual extraction Seal Knowledge acquisition Entities Relations Attributes Events Subject Collection and preprogressing LOD endpoint WuMKG Fig. 3 The framework of WuMKG construction Artworks ...
the painting and calligraphy knowledge schema mainly includes painting and calligraphy artworks, persons and resource records.

Table 1
Examples of painting and calligraphy information

Table 3
Entity extraction result not previously labeled, such as TITLE, fine-tuning with a small amount of data still yields effective extraction results.The model's performance with 2x trainset nearly matches that achieved with 3x trainset.This indicates that our iterative annotation method can efficiently extract high-quality text at a significantly reduced cost.Additionally, it showcases the method's potential applicability in text extraction tasks across various domains.

Table 4
Statistics for seal dataset

Table 5
Seal detection experimental results Fig. 14 An example of seal extraction

Table 6
Statistics for subject detection dataset annotations