Construction and application of a knowledge graph-based question answering system for Nanjing Yunjin digital resources

Nanjing Yunjin, one of China's traditional silk weaving techniques, is renowned for its unique local characteristics and exquisite craftsmanship, and was included in the Representative List of the Intangible Cultural Heritage of Humanity by UNESCO in 2009. However, with rapid development in weaving technology, ever-changing market demands, and shifting public aesthetics, Nanjing Yunjin, as an intangible cultural heritage, faces the challenge of survival and inheritance. Addressing this issue requires efficient storage, management, and utilization of Yunjin knowledge to enhance public understanding and recognition of Yunjin culture. In this study, we have constructed an intelligent question-answering system for Nanjing Yunjin digital resources based on knowledge graph, utilizing the Neo4j graph database for efficient organization, storage, and protection of Nanjing Yunjin knowledge, thereby revealing its profound cultural connotations. Furthermore, we adopted deep learning algorithms for natural language parsing. Specifically, we adopted BERT-based intent recognition technology to categorize user queries by intent, and we employed the BERT + BiGRU + CRF model for entity recognition. By comparing with BERT + BILSTM + CRF, BERT + CRF and BILSTM + CRF models, our model demonstrated superior performance in terms of precision, recall, and F1 score, substantiating the superiority and effectiveness of this model. Finally, based on the parsed results of the question, we constructed knowledge graph query statements, executed by the Cypher language, and the processed query results were fed back to the users in natural language. Through system implementation and testing, multiple indices including system response time, stability, load condition, accuracy, and scalability were evaluated. The experimental results indicated that the Nanjing Yunjin intelligent question-answering system, built on the knowledge graph, is able to efficiently and accurately generate answers to user’s natural language queries, greatly facilitating the retrieval and utilization of Yunjin knowledge. This not only reinforces the transmission, promotion, and application of Yunjin culture but also provides a paradigm for constructing other intangible cultural heritage question-answering systems based on knowledge graphs. This has substantial theoretical and practical significance for deeply exploring and uncovering the knowledge structure of human intangible heritage, promoting cultural inheritance and protection.


Introduction
Nanjing Yunjin, a gem in the silk weaving craft, represents the highest level of Chinese Yunjin weaving.Officially listed in the Representative List of the Intangible Cultural Heritage (ICH) of Humanity by UNESCO in 2009, it is a precious historical and cultural heritage of the Chinese nation and the world.However, with economic transformation and social change, as well as the development of multimedia and network technology, ICH like Nanjing Yunjin is facing challenges of survival and inheritance [1].To effectively store, manage, and utilize Yunjin knowledge and enhance the public's understanding and recognition of Yunjin culture, it is necessary to explore new solutions.
In recent years, knowledge graph (KG), as an emerging form of digital resource knowledge organization, can provide semantic, visual, and intelligent displays, thereby achieving efficient knowledge storage and application [2].The development of KG can be traced back to semantic networks in the 1960s.Through the evolution of a series of concepts such as ontology, semantic web, linked data, and others, the concept of KG was formally proposed when Google launched its search engine service based on KG in 2012 [3].Subsequently, many large companies further developed KG, such as Facebook's social graph search, Bing's academic KG search, eBay's product KG search, and so on, making KG increasingly widely used in various fields.In the field of ICH, KG are mainly used for data storage.Fan, T and others utilized China's ICH as a case study and proposed an ICH KG framework [4].Further, employing the Inventory of China's National ICH as an example, and integrating text and image entities from multiple data sources, they constructed a large-scale, comprehensive multimodal KG, providing a practical construction framework [5].Dou and others used natural language processing(NLP) technology to extract domain knowledge from text data, thus constructing a Chinese ICH KG based on domain ontology and instances [6].These projects provide new avenues for the storage of ICH knowledge from the perspective of semantic links.
KG not only can provide a semantic, associative and visualized way to store knowledge, but also can be applied to tasks such as word separation, phrase understanding and text processing in Question-Answering System (Q&A system) to help machines better understand natural language, identify users' intentions and improve the efficiency of Q&A system [7].
Nevertheless, no application combining KG with Q&A system has yet been found in the field of ICH.As a highly specialised and vertically oriented specific field, Nanjing Yunjin has a high degree of knowledge verticality and specialisation, and as an ICH with a long history, it has a complicated production process, many varieties, a wide range of motifs, a rich pattern content, and far-reaching and auspicious symbols, and covers a wealth of artistic connotations and cultural connotations.Therefore, in order to facilitate the public's knowledge and understanding of Nanjing Yunjin, it is necessary to clarify the intricate relationship between things, find the hidden connections between characters, and deeply reveal Nanjing Yunjin and its profound cultural connotations.This study focuses on constructing a KG-based Q&A system, building a Yunjin KG to rediscover and mine the knowledge associations within Nanjing Yunjin digital resources, and to display the history of Nanjing Yunjin and its weaving process, the classification and naming method of categories, the structure of the pattern and the implied meaning in the form of intuition and visualisation.Moreover, the Q&A system based on the Yunjin graph can provide a window for users to retrieve and utilize natural language, eliminating issues such as low precision in traditional information retrieval, information redundancy, and low information relevance.It can also understand real user demands, greatly enriching the knowledge discovery service of Nanjing Yunjin, and further deepening the development, utilization, inheritance, and protection of Nanjing Yunjin digital resources.
The main contributions of this study are: (1) We have constructed a KG for the Nanjing Yunjin domain, including more than ten thousand entities and entity relationships.The KG is the foundation of the intelligent Q&A system.As no public KG in the Nanjing Yunjin domain is currently available, our study fills this gap.(2) We designed and implemented a complete KGbased intelligent Q&A system.answering capabilities, allowing the system to recognize and infer user needs and querying intentions based on the context of user inquiries.This level of understanding enhances the system's interactive efficiency and accuracy.
The remainder of this paper is structured as follows: The "Related work" section reviews the current research status of Q&A system.The "Methodology" section introduces the relevant algorithms of this study, the specific process of KG construction, and the design and implementation of intelligent question-answering algorithms.The "Results and discussion" section discusses the implementation part of the Q&A system, provides a detailed introduction and operation examples of each module of the Model View Controller (MVC) architecture.The "Conclusion" section summarizes the work of this study and analyzes the content and direction of future research.

Early prototypes and evolution of Q&A systems
The development of Q&A systems is inextricably linked with the advancements in Artificial Intelligence (AI) and NLP.The aim of these systems is to offer intelligent solutions to inquiries expressed in natural language, representing a significant progression in information retrieval technologies [8].
Early Q&A systems were primarily specialized, employing rule-based templates to process narrow and structured data to answer questions in specific domains.The groundbreaking Turing Test proposed by Alan Turing in 1950 is widely regarded as the earliest prototype of modern Q&A systems [9].Subsequently, the advent of Eliza marked an important milestone in the developmental trajectory of interactive Q&A systems-Eliza primarily analyzed thematic relations based on user input, identified keywords, and generated responses based on rules, simulating a Rogerian psychotherapist through user input responses [10].The LUNAR system saw further evolution, employing heuristic/semantic syntactic analysis methods to parse user natural language inputs, dedicated to answering specific domain query tasks related to lunar rock samples [11].With the evolution of information and internet technologies, Q&A systems extended into general domains.IBM Watson realized a qualitative leap in the Q&A domain by employing neural networks to achieve advanced natural language understanding and reasoning capabilities.It could efficiently excavate answers from vast information sources, thereby broadening its applicability [12].Recently, OpenAI's GPT-3 has emerged prominently in the Q&A systems domain.GPT-3 is designed as a large-scale language model capable of executing various tasks through text interactions without gradient updates or model fine-tuning [13].From early specialized Q&A systems to general Q&A systems, all were confined to performing Q&A in natural language form.In recent years, with the advancement of AI, intelligent conversational systems like Apple's Siri, Amazon's Alexa, Microsoft's Cortana, and Google Assistant, have gained prominence.They employ technologies like voice recognition, knowledge bases, and Q&A recommendations to provide accurate answers to user's questions, supporting retrieval in various forms such as text, images, and voice [14].OpenAI's latest release, GPT-4, a Transformer-based model, supports not only text inputs but also image-based Q&A, marking the entry of Q&A systems into the developmental stage of intelligent interactive questioning [15].

Application of KG in the field of ICH
Q&A systems possess extensive application prospects.Currently, intelligent voice assistants primarily rely on information retrieval technology to perform similarity matching of Q&A pairs on existing web information or community Q&A websites, overlooking the utilization of background knowledge to achieve a deep semantic understanding of natural language questions and the information itself.KG can offer substantial support for the organization, storage, and display of knowledge in ICH projects, further unveiling the semantic associations between pieces of information [16].For example, the Europeana project enables the retrieval and utilization of European cultural heritage resources by linking elements like themes, times, and institutions in digital heritage, such as music, books, and artworks [17].The Ichpedia project constructs an encyclopedia system for ICH based on web data, allowing users to search for ICH elements and associations through simple search, semantic search, and map search [18].The I-Treasures project utilizes digital technology to capture, analyze, and model ICH, providing users with an open and extensible retrieval platform [19].These projects facilitate better interpretation of the phenomena and essence in comprehensive ICH data by semantically linking ICH knowledge, enabling audiences to understand and acknowledge ICH more profoundly.
Implementation methods of Q&A system based on KG KG, by providing semantically enriched structured data representation, introduce a paradigm shift in the architecture of Q&A systems.Such features render KG invaluable in enhancing information retrieval, handling complex queries, and improving accuracy [20][21][22][23].There are mainly three types of implementation methods for Q&A systems based on KG.
Template Matching was an early method in Q&A systems based on KG.Despite its acclaim for accuracy, it often faced criticism for its rigidity and the demands of manual maintenance.Existing works, such as those by Bast et al., have delved into optimizing this method to better handle queries [24].
Semantic Parsing Based Methods primarily involve translating user's natural language questions into a semantic form understandable by computers, displaying stronger adaptability and scalability.Yih proposed a novel semantic parsing framework for Q&A that utilizes knowledgebases [25].Song presented a method for understanding the semantics of questions in Chinese Q&A systems through semantic element analysis and combination [26].
DL-Based KG Embedding Q&A systems use DL models to perform entity and intent recognition on user's natural language questions and return answers.This method, albeit expensive in training, offers economically efficient rule definitions and high automation.Earlier, LSTM was introduced for entity recognition, and Conditional Random Fields (CRF) combined with LSTM became a typical DL model for NER [27].To solve the issue of the same embedding for a word in different semantic contexts, some scholars have combined BiLSTM with CRF to acquire bidirectional semantic information [28].For instance, Liu et al. constructed a KG and Q&A system in the field of Liao Dynasty history and culture based on the BiLSTM-CRF model [29].Many researchers improved this model.Qiu et al. proposed an ATT-BiLSTM + CRF model, using global information learned from the attention mechanism to enforce consistency among the same labels in multiple instances within documents [30].Chen et al. introduced the Lexical Feature based BiLSTM-CRF (LF-BiLSTM + CRF) model to further enhance the reliability of predicting labels [31].Zhao et al. improved the internal structure of LSTM and proposed Lattice-LSTM, enhancing the model's stability [32].
However, the aforementioned models share a common issue of lower recognition accuracy for polysemous words.Subsequently, Google's team integrated the BERT model into the BiLSTM + CRF model, where the bidirectional encoder based on the Transformer neural network eliminates word ambiguity by referencing contextual semantics.Models based on BERT were then widely applied.For example, Liu built a DL model based on BERT to recognize the intent and entities/attributes of input questions, querying in the constructed mineral KG, and returning answers [33].Aurpa TT applied a deep neural network model based on Transformer to accurately and swiftly obtain answers for reading comprehension in the Bengali language [34].Zhou, FG et al. introduced the Albert-BiLSTM-MHA-CRF model for extracting and constructing KG related to ancient poetry entities, exploring the connections between ancient poems to inherit Chinese traditional culture [35].
To implement these theoretical methods in practice, we have custom-designed a Q&A system for Nanjing Yunjin, a specific domain within ICH.This system leverages a DL model based on BERT, integrating BiGRU and CRF for entity recognition, exhibiting exemplary performance metrics in precision, recall, and F1 score.Given that the GRU employs fewer gates than LSTM and does not require the maintenance of additional state vectors, it has a lower computational intensity and faster training speed.The Q&A system adopts an intent recognition model based on BERT, capable of identifying and inferring user needs and questioning intentions according to the context of user queries.This understanding enhances the system's interactive efficiency and accuracy.Experimental results indicate that this model exhibits high accuracy and fewer omitted features in classification tasks.
The first step involves data collection, preprocessing, and knowledge extraction to derive the Nanjing Yunjin graph.The Nanjing Yunjin DKG is stored using the Neo4j graph database, completing the construction of the knowledge base.Subsequently, the PaddlePaddle DL framework is utilized to parse natural language.The BERT + BiGRU + CRF model is used to recognize entities within the questions, and intent recognition based on BERT is applied for categorizing the intentions of user queries.Finally, the recognized intent types and entity data are input into predefined query templates.After being transformed into matching Cypher expressions, Cypher language executes queries within the constructed KG database, and the processed query results are returned to the users in natural language.

Models and algorithms used Bidirectional encoder representations from transformers
BERT is a language model that pretrains deep bidirectional representations by jointly adjusting the contexts from all layers in all directions [36].The pretraining process of BERT includes two tasks: Masked Language Model (MLM) and Next Sentence Prediction (NSP).The MLM task trains the model to predict masked words by randomly masking some words in the input text, while the NSP task trains the model to determine if two sentences are consecutive by randomly selecting two sentences.The BERT model first embeds the input text, then deeply pretrains it, achieving effective feature extraction.The input information representation scheme of BERT is designed by first constructing the NSP task and then implementing the MLM task based on it.The network structure diagram of BERT model is shown in Fig. 1.

Bidirectional gated recurrent unit
GRU [37] and LSTM [38] are both enhanced models of Recurrent Neural Networks (RNN) [39], with the former being a simplified version of the latter.While LSTM contains three gating units: the input gate, the output gate, and the forget gate, GRU only consists of a reset gate and an update gate.The reset gate controls the degree to which previous information is forgotten, and the update gate dictates how much past information gets updated.The information discarded by the reset gate z t and the information updated by the update gate z t are represented in Eqs. ( 1) and ( 2), respectively.
In these equations, σ denotes the sigmoid activation function which serves as a gating signal by confining the value within the [0, 1] range.The terms w r and u r refer to the input weight matrix and the recurrent weight matrix of the reset gate, respectively.x t signifies the input infor- mation of the current node, whereas w z and u z denote the input weight matrix and the recurrent weight matrix of the update gate, respectively.Lastly, h t−1 represents the hidden layer state of the previous moment. (1) The structure of the GRU is illustrated in Fig. 2. Within the hyperbolic tangent (tanh) function, a new candidate hidden state h t is established.The Hadamard product is represented by ⊙ .By multiplying corre- sponding elements of the reset gate activation matrix with its weight matrix, the candidate hidden state h t is computed, as demonstrated in Eq. ( 3).In this equation, (2)   w and u denote the input weight matrix and the recurrent weight matrix of the unit state, respectively.Let h t be the current hidden layer state.When updating information, h t is as shown in Eq. ( 4).
The BIGRU model uses the word vectors extracted from the BERT layer, inputting them into the forward GRU and backward GRU for bi-directional feature extraction.In this way, the model can make full use of the contextual information of the feature vectors, and ensure that the extracted features can achieve the maximum effect at different positions in the sentence.This method has good modeling and processing capabilities and has a wide range of application prospects in the field of NLP.

Conditional random field
The BERT model has addressed the issue of correlations between inputs and outputs, but the dependency problem between tags remains unresolved.For example, according to the BIOES annotation system, in a correct sequence, B is always before E, and E will not appear between B and I.Both RNN and LSTM can only try to avoid the appearance of sequences that do not comply with the annotation system but cannot fundamentally avoid this problem.The CRF that will be mentioned below solves this problem well [40].A classic CRF is shown in Fig. 3.
The CRF is essentially an undirected graph, where blue dots represent inputs and yellow dots represent outputs.Edges between points can be divided into two categories: lines between X and Y indicating their correlation, and dependencies between neighboring tags Y.The CRF model maintains a probability transition matrix during decoding, judging the label corresponding to the current token according to this matrix during decoding, thereby avoiding the generation of entity segments that do not comply with the sequence ordering requirements.

Construction of the Nanjing Yunjin domain knowledge graph
A KG is a large knowledge network that saves structured data in the form of nodes and edges.It has the advantages of being intuitive, efficient, and visualizable.It can graphically display the relationships between entities, and can be quickly retrieved.In this study, Neo4j was selected as the storage and visualization carrier for the Nanjing Yunjin KG.

Data collection
The data related to the works used in this study mainly originate from official internal materials provided by the Nanjing Yunjin Museum and Nanjing Yunjin Research Institute.The relevant data about the inheritors of ICH mostly come from official data on the Chinese ICH website.As for the related data concerning weaving materials and machinery, it is primarily obtained through field investigations and collections.This information is parsed and scraped using the Houyi Collector and is saved as TXT text files post-collection.

Data preprocessing
In the performance evaluation of the Q&A system, the accuracy of the raw data is one of the key factors, so raw data from different sources and different structures need to be preprocessed.This includes removing useless symbols, deleting data without text content, and removing duplicates and advertisements.Useless symbols, including links on web pages and irrelevant characters, are removed using regular expressions.Data without text content, some of which are purely images or with very little text, are deleted.Duplicates and irrelevant advertisements embedded in web pages that are produced by forwarding or quoting, are also removed.

Knowledge extraction
Knowledge extraction is an information processing technique for extracting key information from data with different sources and structures.It mainly includes entity recognition, attribute extraction, and relationship extraction [41].The KG in this study is constructed based on the ontology framework described in previously published works [42].This framework chose CIDOC CRM as the main ontology for construction and, based on the knowledge characteristics and intrinsic traits of Yunjin ontology, has reused core concepts from other ontology models like Time and Ma-ontology.This ontology framework specifically defines seven core classes: E12 Production, FOAF:Agent, Time:Temporal Entity, E44Place Appellation, E5 Event, E70 Thing, and MA:MediaResource.Through Domain and Range, constraints are applied to the object properties of core classes in the defined ontology model, developing knowledge network topology around the concept of "productive protection", comprising 33 sets of object property relations such as Has value, Apply to, and Participate in.Based on this, the construction of the KG was completed under the guidance of an expert team and passed quality assessment tests in accuracy, consistency, completeness, and timeliness.To meet the specific needs of the Nanjing Yunjin Q&A system, we have expanded this ontology framework, adding some specific properties and relationships.Specifically, we have conducted in-depth descriptions and analyses of specific instances such as inheritors and works.
Nanjing Yunjin belongs to the E1 CRM Entity type.Due to the long history of Nanjing Yunjin as an ICH project, many inheritor families have been engaged in this field for generations.They are influenced by their families from an early age, inheriting this traditional craft, specifically defining the object property of 'Father of ' .
Based on the data provided by experts from the Nanjing Yunjin Museum, works of Nanjing Yunjin are also categorized differently, adding E55 Type to describe the categories of the works and introducing P2 has type to describe the relationships between E1 CRM Entity and E55 Type.The high recognition and value in the market are the driving forces for the sustainable inheritance of Nanjing Yunjin craftsmanship.According to the intrinsic characteristics of the ICH project and craftsmanship, two subclasses, YJWK:market intelligence, and YJWK:product, have been customized.The details of these main relationships and properties are outlined in Table 1.
To implement this framework, we employed the ontology modeling tool Protege [43] to establish the hierarchy of categories step by step.In the ontology relationship graph, solid lines are used to represent the relationships between subclasses and instances, while dashed lines represent attribute relationships.This relationship graph provides readers with a clear view, presenting the overall framework of the constructed ontology, as illustrated in Fig. 4.

Graph storage
For the extracted entity-attribute-attribute value and entity-relationship-entity triplets, it is necessary to store the KG.Firstly, the database is connected using py2neo, then the triplets in the excel are read row by row using xlrd2, and then the triplets are stored in the database and imported into the Neo4j graph database.
After importing the data, entities and their relationships, as well as entities and their attribute values can be queried and displayed in the Neo4j graph database.
The Neo4j graph database contains both node and relation elements.Nodes represent entities in a triad and relations represent connections between entities.The KG constructed in this study was designed with eight different types of nodes, which are E44 Place Appellation, E5 Event, E70 Thing, FOAF: Agent, TIME:Temporal Entity, E12 Production, E1 CRM Entity, and E55 Type.The main relationships between the node are Reflect to, Mentor of, curated, carries, etc. KG of "Nanjing Yunjin" as a center nodeas, partial shown in Fig. 5. Different colors of nodes in the graph represent different entity types, green represents 'E1 CRM Entity' ,blue represents 'E70 Thing' , Orange represents 'FOAF:person' , cyan represents 'FOAF:Organization, light gray represents 'E55 Type' , etc.The directed arrows between the nodes represent the relationships between entities.For example, the node NanJing Yunjin is green, which means it belongs to the E1 CRM Entity, and the node "Zhu Feng" is orange, which means it belongs to the  'FOAF:person' , and the arrows between these two nodes represent their relationship, that is Nanjing Yunjin is carried by Zhu Feng.

Design and implementation of intelligent question-answering module
The intelligent question-answering module serves as the core module of this system, which encompasses NER and intent recognition.NER is a task in information extraction that involves identifying specific types of information elements [44].Intent recognition, on the other hand, involves identifying potential intentions within a user's discourse, a critical part of a Q&A system [45].
In this study, we employ DL algorithms [46] to process the natural language text input by the users.This allows us to identify the query entities and intent categories accurately, thereby matching the Nanjing Yunjin KG to fulfill the users' detailed query requirements.The system, therefore, provides an interactive and efficient interface for users to access the rich resources within the Nanjing Yunjin KG.

Named entity recognition experiment
(1) Introduction to the Dataset: In the NER experiment for the Nanjing Yunjin Q&A system, the data related to the original corpus mainly come from official internal materials provided by the Nanjing For example, in the sentence "Zhang Fuyong, who was born into a family known for cross-stitch," the entity type of "Zhang Fuyong" is "Person," belonging to the FOAF; Agent type.The specific data are shown in Table 1.training phase, the loss rate is set to 0.1, learning rate to 5e-5, and the number of training rounds is 12. (4) Experimental Results: To assess the performance of the entity recognition model in this experiment, the study uses a confusion matrix to calculate evaluation parameters, including precision (P), recall (R), and F1 score (F1).Precision, also known as positive predictive value, is used to calculate the proportion of correctly predicted samples among those predicted to be positive, reflecting the accuracy of the experimental results.The calculation formula is as follows: TP refers to the number of correctly predicted positives, FP to the number of incorrectly predicted negatives, and FN to the number of incorrectly predicted positives.
Recall, also known as complete rate, is used to calculate the proportion of correctly predicted samples among actual positive samples, reflecting the coverage of the experimental results.The calculation formula is as follows: In general, P and R are mutually influencing and mutually constraining.Comparing only precision and recall could lead to a one-sided assessment of experimental results.Therefore, the F1 score is required.The F1 score takes into account both precision and recall, balances their advantages, and can comprehensively evaluate the experimental results, rendering it more convincing.Its calculation method is as follows: During the model training process, evaluation parameters are recorded upon the completion of each round of training.
In order to verify the effect of entity recognition of BERT + BIGRU + CRF model, control experiments of BERT + BILSTM + CRF, BERT + CRF,BILSTM + CRF were set up, and the experimental results of each model are shown in Table 2.
Experiments show that the F1 value is only 0.906 when using BILSTM + CRF for entity recognition, which indicates that the introduction of Bert pre-training model can improve the model accuracy in the entity recognition (5) task.When using only Bert as a feature extractor and combining it with CRF for entity recognition, the performance of the model is slightly lower than the model using RNN (GRU or LSTM).This is due to the fact that RNN help in capturing sequential and contextual information, which improves the accuracy of entity recognition.The model combining BiGRU and CRF performs best in terms of precision, recall and F1 score in case of using Bert as feature extractor.It has a higher F1 score of 0.9552 compared to other combinations.This indicates that BiGRU provides better performance when dealing with entity recognition tasks.

Intent recognition experiment
Intent recognition refers to identifying and understanding the type of intent expressed by users based on their input of natural language text.This study identify user intents based on BERT.
(1) Construction of Query Set: The term "questionanswer pairs" refers to the matching relationship between the questions posed by users and the answers provided by the Q&A system.For this experiment, Nanjing Yunjin question data was extracted from the amassed large corpus in the field of Nanjing Yunjin.First, we conducted strict manual analysis and annotation on the original corpus that had been reviewed by experts.Each entry was marked and annotated in detail, following Gruber's five criteria, namely, clarity, coherence, extendability, minimal ontological commitment, and minimal encoding bias [47], to ensure the granularity and accuracy of the data.The analysis process involves multiple aspects in the field of ICH, including but not limited to related data on works, inheritors of ICH, and related data on weaving materials and machinery.All data entries underwent rigorous review by academic experts in the field of ICH.These experts, with their extensive experience and professional knowledge, can ensure the accuracy and reliability of the data.With the support and review of domain experts, considering the characteristics of Nanjing Yunjin and user needs, we conducted effective screening and sequencing of  After the model training is completed, the trained model is loaded to predict the validation dataset.Based on all prediction results, a confusion matrix is constructed, and the evaluation parameters of the eight types of user intents are shown in Table 4.
The experimental results show that the average accuracy of the intention recognition model constructed based on BERT reaches 95.33%, the recall rate reaches 95.28%, and the F1 value reaches 0.953, which reflects the good classification effect.In order to verify the experimental effect, the intent recognition reference experiment based on ELECTRA is carried out in the same environment, using the same dataset and uniformly setting hyperparameters, ELECTRA is called "Efficiently Learning an Encoder that Classifies Token Replacements Accurately".ELECTRA is a new type of pre-training model based on generative model.The control results for intent recognition are shown Table 5: From the data, BERT and ElECTRA are very close in terms of precision, recall and F1 score.They both show high accuracy and less missed features in the classification task.

Results and discussion
Based on the characteristics of Nanjing Yunjin's digital resources and the methods of KG construction, this study establishes a Q&A system for Nanjing Yunjin's digital resources.

System development environment
The Q&A system constructed by this project is platformindependent and can operate on common systems such as Windows, Linux, and Mac.The system employs the Django web development framework in Python.Python is an easy-to-learn programming language with high code readability.It has the advantages of being simple, easy to use, and rapid in development, supports multiple programming paradigms, and excels in big data processing, AI, and Web development, among other fields.Django is a comprehensive, large-scale open-source web design framework that is commonly used for application frameworks.The Neo4j graph database supports most mainstream browsers without the need to install any plugins or software.

Overall system framework
The Nanjing Yunjin DKG Q&A system adopts the MVC architecture, which is divided into the presentation layer, logic layer, and data layer [48].The overall system architecture is shown in Fig. 8. (

System implementation and display
The interface of the Nanjing Yunjin Q&A system based on KG accessed through a browser; it includes a search box, send button, and answer display box.
The system supports the types of questions described in the previous section, for instance, the first question input "What is Nanjing Yunjin?",The system recognizes that Nanjing Yunjin belongs to the defined E1 CRM Entity type through NER, and then determines that it belongs to "Ask for definition" through intent recognition, and finds the corresponding attributes and relationships of the entity in the ternary group.For example, the "Has Type" relationship between E1 CRM and E55 TYPE entities, the system then returns the natural language answer, that is, Nanjing Yunjin is incrided on the Representative List of the ICH of Humanity.
The Second question input "I want to know the definition of Nanjing Yunjin, can you check it for me?",By asking the definition of Nanjing Yunjin in a different way, the system still retrieves the answer.Due to the intent recognition dataset creation process, the KG-based questioning expansion is used for the same question, so that each type of questioning intends at least 20 different ways of asking questions, which ensures strong robustness of the trained model and improves the scalability of the Q&A system.
The third question input "Do you know who is the successor of it?",Since this system has a multi-round question and answer function, it is able to record the context, and when no entity is recognized, it will default to the previous subject, thus recognizing and inferring the user's needs and question intent.For example, if the subject is omitted in the question, the system recognizes and deduces that the question entity is Nanjing Yunjin and gives an accurate answer based on the context of the user's previous question.This understanding improves the interaction efficiency and accuracy of the system.The question and answer example is shown in Fig. 9.
The tests of several types of natural language questions mentioned above demonstrate the feasibility of the system process design and algorithm operation.Through various browsers, it provides Nanjing Yunjin knowledge services to users, achieving the expected function of the Q&A system.

System testing
Stress testing is a testing methodology used to evaluate the performance and reliability of a system, network, or application under real-world load conditions.This test will simulate a large number of concurrent user requests or high load situations to test the system's response time, throughput, resource utilization, and other indicators under high load.The stress test tool used in this experiment is Apache JMeter.this stress test simulates 100 users initiating requests to the interface at the same time, the number of cycles is 20 cumulative 2000 requests sent to the interface, the average response time was 22.51 s, the maximum response time was 33.017 s, the minimum response time was 0.614 s, and the error rate was 0. the results of the stress test are shown in Table 6.
The results of the pressure test show that the Nanjing Yunjin Q&A system based on KG can still work normally with a concurrency of 100, and can resist a certain amount of concurrency, providing users with a fast and stable Q&A platform.
On the other hand, regarding the accuracy of the system in answering questions, this study automatically generates 300 questions related to Nanjing Yunjin through the code, and conducts the accuracy test in the Q&A system, after several rounds of testing, the system is running well, of which 285 questions are answered by the system more objectively and accurately, and the rest of 15 questions are not answered accurately enough, and the system answer accuracy is up to 95%, which indicates that there is still room for progress in this model.
In this research, a domain-specific Q&A system was constructed to cater to Nanjing Yunjin's digital resources by leveraging KG technology.A comprehensive evaluation of the system was carried out, focusing on response time, stability, load capacity, accuracy, and scalability.The empirical assessments suggest that the system is well-equipped to accommodate moderate concurrency and traffic, thus meeting the specific demands within the realm of Nanjing Yunjin.
The Q&A system stands as a pivotal interface for the utilization, preservation, and propagation of Yunjin culture.It offers a nuanced approach to query resolution by tapping into a KG, which enhances its answering capabilities and semantic understanding of the queries.
This work marks a significant step in the development of intelligent systems in the domain of ICH.While the current implementation exhibits promising performance metrics, future research avenues include continual improvements in system architecture, answer formulation techniques, and adaptation to evolving digital resources.This would augment the system's role as a robust platform for engaging with and preserving the intricate cultural narratives embedded in Yunjin.

Conclusion
The construction of a Q&A system based on KG in this study represents an innovative exploration for the intelligent service of Nanjing Yunjin digital resources.The system integrates a large amount of Nanjing Yunjin related data and stores it in a visually graphed form, effectively addressing the problem of Nanjing Yunjin's digital resources being relatively isolated and scattered, which is beneficial for the organization, management, and protection of Yunjin knowledge.Moreover, the graph-based Q&A system can swiftly respond to natural language questions and efficiently generate accurate answers, greatly facilitating user retrieval and utilization of Yunjin knowledge.This promotes the inheritance, promotion, and application of Yunjin culture, enhancing the expressiveness, communicative power, and influence of ICH.The principal research tasks are as follows: ( Lastly, a Q&A system is built using the Django web development framework.Through implementation and testing of system response time, stability, load conditions, etc., results indicate that on the basis of normal operation, the system can highly recognize user query intents and accurately respond to user needs.(5) This research features multi-turn Q&A functionality, capable of recognizing and inferring user needs and question intents based on the context of user inquiries, thereby enhancing the interactive efficiency and accuracy of the system.
Although this study has realized the Nanjing Yunjin DKG Q&A system, several areas still require exploration and improvement: (1) Concerning the knowledge extraction of Nanjing Yunjin, solely relying on machines remains impracticable; semi-supervised involvement of domain experts is necessary.Future research should delve deeper into knowledge extraction in specialized domains.
(2) The knowledge sources of the KG are relatively singular, mainly involving the processing of text data.In the future, information could be extracted from multimodal data such as images and videos to enrich the diversity and comprehensiveness of the graph.
(3) The system primarily showcases the KG Q&A system via web pages.In the future, the application channels for Nanjing Yunjin knowledge services could further expand to other application terminals, such as mini-programs and apps.(4) The migration of Nanjing Yunjin DKG and the question-answering module to other ICH can provide critical technical support for the construction of intelligent Q&A systems for the KGs of other ICH.(5) Testing at the content level of the system is not thorough enough, lacking actual user research to assess the system's effectiveness and reliability.Future endeavors should constantly update and expand the Nanjing Yunjin KG and further invite domain experts and users to conduct detailed tests on the reliability of the system content, to reflect more comprehensively and accurately the diversity and richness of culture.

Fig. 1
Fig. 1 Network structure diagram of BERT model

Fig. 4
Fig. 4 Ontology model of semantic organization for Nanjing Yunjin

Fig. 5
Fig. 5 Localized map of the knowledge graph

3Fig. 7
Fig. 7 Model architecture diagram model.The intent types and entity recognition data are input into predefined query templates and converted into Cypher expressions for querying in the Neo4j graph database after matching.Finally, the query results are transformed into natural language answers.(3)Presentation Layer: It primarily enables interaction with users and page display.The presentation layer is the front-end page, mainly based on the Django framework to build a Web-based Nanjing Yunjin intelligent Q&A system.Users can ask questions through this layer.The presentation layer submits user data to the logic layer for processing, uses Python to connect to and query the Neo4j graph database, and ultimately responds to user questions.

Fig. 9
Fig. 9 Question and answer example

Table 1
Main relationships and attributes of ontology for Nanjing Yunjin

Table 2
Experimental resultsDuring the text input phase, the maximum sentence truncation length is set to 256, and the sentence quantity within each training batch is 16.At the word vector representation stage, the pre-trained model BERT is used, with a vector dimension of the default 1024 dimensions in Bert-large-cased.In the semantic encoding phase, the default 24-layer Transformer encoder of Bert-large-cased is utilized.During the model training phase, the loss rate is set to 0.1, the learning rate to 5e-5, and the training epoch to 5.

Table 3
Question intent classification

Table 5
Results of controlled experiments

Table 6
Stress test results 1)This study has constructed a Nanjing Yunjin DKG.Yunjin's source data were collected through official channels, and preprocessing was applied to multisource data to complete knowledge extraction, forming structured triplets.These triplets are then stored in the Neo4j graph database, realizing the construction of the Nanjing Yunjin DKG.(2)This study employs the BERT + BIGRU + CRF model for NER.Compared to LSTM, BIGRU's (GRU) employs fewer gates, has a smaller computational load, and faster training speed.The model is compared with BERT + BILSTM + CRF, BERT + CRF and BILSTM + CRF models, and it outperforms in precision, recall, and F1 score, demonstrating its superiority and effectiveness as validated by experimental results.(3) This study identifies question intents based on BERT.A large amount of corpus is accumulated through official channels and, after being reviewed by Nanjing Yunjin experts and referenced to the five criteria proposed by Gruber, effective sorting and filtering are performed on the Nanjing Yunjin question set, eventually constructing eight classes of question intents.Probabilities of each intent category to which user inquiries belong are calculated through the DL model, thus understanding the actual needs of users.Comparative experiments with the ELECTRA model prove the accuracy and effectiveness of this model as they both exhibit high accuracy and fewer missed features in classification tasks.(4) This study realizes the KG-based Q&A system for Nanjing Yunjin's digital resources.Firstly, a Nanjing Yunjin Q&A corpus is constructed to train the DL model.Then, the BERT + BIGRU + CRF model is used for sentence entity recognition.After acquiring entity information, question intent categories are identified based on BERT.The parsed results are then translated into Cypher language, queried in the Neo4j graph database, and the results are returned.