An ontological data model for points of interest (POI) in a cultural heritage site

Ranjgar, Babak; Sadeghi-Niaraki, Abolghasem; Shakeri, Maryam; Choi, Soo-Mi

doi:10.1186/s40494-021-00635-9

Research article
Open access
Published: 04 March 2022

An ontological data model for points of interest (POI) in a cultural heritage site

Babak Ranjgar¹,
Abolghasem Sadeghi-Niaraki ORCID: orcid.org/0000-0002-0048-8216²,
Maryam Shakeri¹ &
…
Soo-Mi Choi²

Heritage Science volume 10, Article number: 13 (2022) Cite this article

3111 Accesses
14 Citations
Metrics details

Abstract

Cultural heritage (CH) reflects on the history of a society and its traditions and it is treated as the nation’s memory and identity. Digitizing and web, beside its benefits, brought some challenges in disseminating and retrieving CH information, which has heterogeneous content varying widely in type and properties yet encompassing rich semantic links. Semantic web technologies, especially ontologies, provide a common understanding inside a domain that helps sharing knowledge and interoperability. They can be very helpful in data modeling for a better information retrieval compared to relational databases as they take into account the semantics of information, guarantee reusability, and make information machine-readable that can offer more flexibility to intelligent services and applications. CH community is one of the first domains to make use semantic web technologies to deal with this issue. CIDOC CRM is the most used and famous ontology in CH domain, which is an ISO standard since 2006. Heritage sites are composed of many points of interest that attract visitors to find out about them. However, information about a particular POI is complex and interconnected with other people, events, and objects. In this paper, we aim to develop a POI-based data model for heritage sites in Iran using concepts from CIDOC CRM integrated with GeoSPARQL, the standard ontology in geospatial field, to incorporate spatial semantics with heritage information. This way the user can freely explore their preferred information about the places they desire. This can make it possible to use the data model for location-based services and applications in heritage sites.

Introduction

After World War II following the destruction of the valuable cultural heritage, there was a substantial need to protect and preserve these monuments. To this end, the UNESCO world heritage convention (WHC) was created in 1972 to identify, register and protect both natural and cultural heritage.^{Footnote 1} Iran is also one of the countries with a large number of entries in the UNESCO world heritage list (ranked 11th on WHL with more than 20 sites registered and a number of others on the waiting list) [1]. This reflects Iran’s rich culture and the need for more attention and action. Cultural heritage has a high socio-economic potential, including tourism as most prominent example [2, 3]. The problem of cultural resource management cannot be addressed by one organization because of the complexity of the tasks and the vastness of information. Therefore, numerous organizations take control of different parts of the work, but because of differences in provision, updating, and maintenance of data, and various data standards, as well as different data policies, information on cultural heritage are distributed in different organizations, and there are heterogeneity problems among these trustees managing cultural heritage [4,5,6]. Information about a particular heritage site is also complex and interconnected with other people, events, that is to say, cultural and historical heritage are intertwined, and this requires the creation of a structure where all information about a place or (point of interest) POI is provided in an integrated and interconnected manner.

In spite of the massive information available on cultural heritage, the multitude of trustee organizations, and the discrepancies in their tasks and policies, as well as differences in documentation, data collection and storage, and the distribution of the data, resulted in low interoperability thus making it difficult to manage cultural resources, share the knowledge among community and provide services required by various users. Semantic web (SW) technologies and particularly ontology are capable of establishing a common understanding inside a domain [7]. Ontologies formally represent a thorough understanding of a domain by connecting its entities and pieces of information with respect to functions and content of the domain, therefore eliminates informal, partial, and personal terms and viewpoints [8]. This brings a shared understanding of the domain, which makes it easier to share knowledge and increases interoperability among the community [9]. Here we present a list of benefits of using ontologies for knowledge management rather than other traditional methods:

Although relational databases (RDB) are capable to deal with large amounts of data, they are not designed to preserve the semantics of the very data. This makes it difficult to exchange the information and integrate it with others, therefore results in low interoperability [10, 11].
As discussed, ontologies provide a shared vision in a particular domain of interest, which is the key element for sharing knowledge [12].
An important advantage promised by ontologies is their reusability [13]. It is shown through case studies that once an ontology is developed for a domain it can be used many times since it captures the mechanism and content of that domain and its problems can be solved through continuous evaluations [14, 15]. Possible shortcomings can be solved through extending the domain ontology by further specifications. This reduces the costs of reimplementation of a knowledge management system from scratch [16].
When a domain knowledge is represented formally by means of a common and shared language, they become understandable not only for humans but also for automated computer systems and web agents [17]. As a result, web services or search engines can improve their performances in terms of fast and accurate information retrieval alongside providing smart, context-aware applications, exploiting the semantically enriched information.

Generally, LBSs are designed for special uses and therefore, the data and the service are tightly coupled through predefined and non-extendable schemas [18]. Another problem is the concept of places within these services. Location is often defined manually beforehand by only its geographic coordinates [19]. However, a place is much further than a set of latitude and longitude. The topology of a place and its semantics should be considered in LBSs to not limit such application to predefined POIs and give it the capability to harvest and integrate various information [20, 21]. In this study, we propose a data model that integrates both CH information and location semantics to support an intelligent location-based user guide for tourists in a heritage site. This enables users to explore places they are interested in based on the spatial semantics. For example, a user entering a complex of heritage sites and museums might ask, “Where is the nearest place that has oil paintings from 1600?” The anticipated system that utilizes the proposed model can provide answer to such semantic questions.

Using semantic web technologies and the knowledge management systems developed, large national and cross-country initiatives are established to collect the distributed heritage data and preserve and present the history in a broader sense and as a whole. This matter is also serious in our country, Iran, which has many tangible and intangible cultural resources at international and national level. This study attempts to design a spatial POI-based data model for heritage sites reusing two standard ontologies, CIDOC CRM and GeoSPARQL. We will use concepts from CIDOC CRM to model the cultural heritage information related to a POI and connect it to GeoSPARQL through a mediation to incorporate spatial semantics. Therefore, the goal of this study is to create a knowledge base for heritage sites, which enables the ontological data model to be used in semantic location-based services (SLBS) applications, such as user-guides and recommender systems and makes information ready to be used in liked data platform.

Related work

In recent decades, there has been a worldwide effort in this area of collecting and harmonizing and integrating cultural heritage information through SW technologies and especially ontologies and it is still an ongoing active field. In fact, cultural heritage is one of the first domains to adopt SW tools and recommendations and coming along its evolvement [22,23,24]. It started from simple knowledge organization systems^{Footnote 2} (SKOS), like vocabularies and thesauri, for example, the Getty vocabularies^{Footnote 3} (AAT, TGN, ULAN, and CONA) that contain structured terminology for art, architecture, decorative arts, cultural and archival materials, visual proxies, the names of geographic places, the names of artists, and bibliographies. Then, there are metadata schemas, such as VRA Core and CDWA. Visual Resource Association (VRA) Core Categories,^{Footnote 4} developed based on Dublin Core (DC^{Footnote 5}), to describe the visual cultural material as well as the pictorial surrogates that represent and document them. The Categories for the Description of Works of Art^{Footnote 6} (CDWA) is a set of procedures also a metadata schema for the description and classification works of art, architecture, groups and collections of works, and related images.

Perhaps the CIDOC Conceptual Reference Model^{Footnote 7} is the most widely acknowledged ontology in the CH domain, which provides descriptions and a formal structure for defining the implicit and explicit concepts and relationships used in CH documentation. The CIDOC CRM [25] is a top-level ontology intended to promote a shared understanding of CH information by providing a common and extensible semantic framework that facilitates the integration, mediation, and exchange of heterogeneous cultural heritage information. It can provide the “semantic glue” necessary to mediate between different sources of CH information, such as items published by galleries, libraries, archives and museums (also called GLAMs). Currently, CIDOC CRM is the only data model that is an ISO standard (ISO 21127:2006) in the CH area.

There are data models that were developed based on CRM in some countries. CRM-EH (English Heritage) was developed by the English Heritage. It was designed with the intention to capture the detailed excavation/analysis procedures [26]. In Korea, Korean Cultural Heritage Data Model (KCHDM) was developed mainly based on CIDOC CRM. It is an ontological model for integrating heterogeneous heritage data from different institutions in Korea and serve as a mediating means for collecting and connecting various database systems [27]. For the CultureSampo (Finnish culture on the semantic web) project, Hyvönen et al. developed national ontology based on the thesauri of their own country in the FinnONTO [28] project. They employed content independent recommendations of W3C, such as RDF, SKOS, and OWL, but they converted their national ISO abiding thesauri into lightweight ontologies and created the national KOKO ontology infrastructure, which consists of one high level and mediating ontology called YSO and 14 other field specific ontologies [29]. In the Europeana^{Footnote 8} project which aimed to collect, enrich, and provide access to cultural heritage information of institutes all over the Europe, a data model was developed, called European Data Model (EDM). This top-level ontological model was created to replace the older flat Europeana Semantic Elements (ESE) metadata due to general shortcomings of metadata schemas. The model reuses constructs from other standards, such as DC and FOAF,^{Footnote 9} to which institutions can map their data [30]. MONument Damage Information System (MONDIS) is an ontological framework developed to capture and reason over the built heritage documentation of damages, interventions, changes, and natural disaster occurrences, for diagnosing current condition of the buildings that can be helpful for their conservation [31]. Recently, HEritage Resilience Against CLimate Events on Site (HERACLES) ontology is being developed in the course of a project with the same name. It aims for better management and monitoring of built heritage health by modeling climate change effects and different types of damage it can cause for various type of materials through specific mechanisms. It is still in the early stages, going through tests and awaiting acceptance of experts and stakeholders [32].

Nowadays, with the advent of smartphones consisting of various sensors (e.g. GPS, accelerometer, gyroscope, compass, and light sensor) context-aware services have attracted a great deal of interest, especially location-based services (LBS) [33]. LBSs offer customized information based on location of the users, giving it an added value [18]. They have also been of interest in CH field for providing recommender systems and user guides. For example, the SMARTMUSEUM project developed a mobile recommender system for users interested in cultural heritage in three outdoor, indoor, and web-based scenarios [34]. The system is built on top of the Finnish KOKO ontology described above, which resulted in better user experience through accurate recommendation. In [35], a mobile augmented reality (AR) application is presented based on linked open data (LOD) data published within the project that captures CH information for nearby POIs of the user. Kim et al. developed an AR mobile application based on the previously described KCHDM ontology they proposed [36]. They provided multimedia information for three POIs inside a palace using visual location detection method.

Methodology

After discussing the needs and aims of this study and introducing the fundamentals of the data modelling and the trend in CH domain, this section is dedicated to present the methodology and first design steps of this research towards reaching its goal. First, the chosen study area and selected POIs will be discussed, then the methodology used and related issues of developing the spatially-enabled POI-based ontological data model with required classes and properties will be presented. The section ends with discussing the extraction and applying of the cultural heritage data into the developed data model.

The study area and POIs

The Sa’dabad complex is a historical and cultural complex of palaces built by Qajar dynasty at the beginning of the nineteenth century. After Qajar, kings of Pahlavi dynasty resided in the place and added more palaces to it. The complex covers an area of 110 ha with 180 ha of natural forests, gardens, springs, and rivers in north of Tehran, Iran. It contains 18 palaces, which belonged to royal families. After the 1979 revolution, the place was turned into a set of museums and galleries for public exhibition and is run under the responsibility of Cultural Heritage Organization of Iran. The complex with its many museums and galleries has a vast amount of cultural heritage objects and various information associated with different events occurred during the monarchy period makes this place an appropriate case for this research. Amongst the many palaces in the complex, the Mellat museum and museum of fine arts are chosen as the POIs for this study, which are located at the south side of the complex area shown in Fig. 1.

Mellat palace, also called White palace for its color, is the largest building in the complex with 54 units. It used to be the summer residence of the Mohammad Reza shah, the second king of Pahlavi dynasty. It was also used for official affairs and meetings. Museum of fine arts is another one of the magnificent buildings in the complex. It was used as a royal court from 1968 to 1979, but after the revolution it was called the museum of fine arts because of the great collection of painting from Safavid, Afshar, Zand, and Qajar periods collected by the Mohammad Reza’s last wife, Farah.

Ontological data model development approach

There are numerous approaches for ontology design and development. In this study, we used the steps outlined in [37] for creating our data model. In Fig. 2, an overview of steps are shown, which will be discussed in following subparts.

Step one: domain and scope

The goal of this study is to create a POI-based data model for heritage sites. It is clear that the biggest part of the domain is CH, however it is not only that. As stated before, we want to give it a capability of spatial reasoning so that it could be used in LBS system, therefore we have to extend the domain and give it a Geospatial scope too. As a result, this data model should involve a mediation between the two domains.

Step two: reusing existing ontologies

As discussed in previous part, scope of our data model involves two domains. Therefore, we reuse an ontology from each domain. We selected CIDOC CRM from CH domain and GeoSPARQL from geospatial domain since they are both ISO standard, verified, validated and used many times in other projects and have got mature over time. The CRMgeo extension has also combined the two GeoSPARQL and CIDOC CRM ontologies [38]. However, there are differences between this extension and the data model intended in this study. The CIDOC CRM is an event-centric ontology and the events take place in a specific place and a specific time, thus a specific spacetime volume [39]. The CRMgeo attempts to use the geospatial standard GeoSPARQL to define the spacetime necessary for historical events. On the other hand, the intention of this study is to develop a data model for historical POIs. Therefore, it combines both object-centric and event-centric modeling visions with the POI in the center. The CIDOC CRM is used to model historical data related to the POIs and GeoSPARQL to add spatial semantics necessary for LBSs.

There are a number of ontologies in the CH domain, but probably the most trusted one is the CIDOC CRM [25]. This data model has been through a long and intensive development process since its first version. By adapting itself to the various needs and functions of the CH community, CIDOC CRM has been used in different projects from large-scale ones to local and small-scale ones. It has become an ISO standard and currently it is the only ISO data model in CH and archaeology fields, therefore it has gained an appropriate level of credibility. In this research, we are going to use concepts from this ontology for developing our data model. At the time of writing, CIDOC CRM is in version 6.2.3^{Footnote 10} and has 99 classes and 188 properties, which is quite large for this study as it cover all aspects of archeology and cultural heritage. Since we want to develop an data model for presenting multimedia information related to the events, people, and objects of a heritage POI to its visitors, we have to use a light ontology with classes and properties needed based on the available data. The CIDOC CRM ontology has several major concepts shown in Fig. 3. This figure is a qualitative schema of the overall model. CIDOC CRM is based on an event-centric information modeling, which means other classes like persons, concepts, and places are connected to each other via events [25, 40,41,42]. Events are temporal entities that connect other major concepts together as can be seen in the Fig. 3.

Temporal entity is a top class and it contains the concepts like event and activity and it is at the center of this data model being a glue that holds everything attached together. Event is a general concept referring to historical happenings, but activity is a subclass of event and it refers to actions that have been done by human action such as construction, creation, and production. Actors are people, historic and influential figures, or groups that participated in the events. The physical and conceptual objects have specific locations and they witnessed or were influenced by temporal entities. There is a type concept that can be applied to all classes in order to refine the kind of subject in that class. Also, there is another similar concept, appellation, which is all sorts of names that are or were given to a particular entity for referring or identifying it. This is the overall schema of the CIDOC CRM and the other subclasses further elaborate on concepts and add more specific details. The data model that we use in this study, is developed based on the classes and properties of this ontology according to the information gathered.

GeoSPARQL, on the other hand, is an Open Geospatial Consortium (OGC)^{Footnote 11} standard for modelling, representing, querying and accessing spatial data on the SW [43, 44]. This ontology has three main classes shown in Fig. 4.

Spatial object is the top concept and feature and geometry are its subclasses. Any entity in the world that has a spatial location, such as schools, parks, police stations, museums, etc. can be an instance of the feature. The geometric characteristics of these features are stored in geometry, which can be further defined using simple feature vocabularies, such as point, lineString, polygone, surface, etc. Spatial entities are related to each other somehow, for example, overlap, within, cover, etc. and these relations are called topological relations in spatial science. In GeoSPARQL, the eight 2D topological relations, also called RCC8 [45] or Egenhofer relations [46], are incorporated for spatial objects, which are essential for spatial semantics.

Step three: enumerate important terms

In this step we have to point out the outstanding terms that are used in information related to POIs in order to decide which classes are needed from ontologies, especially CIDOC CRM. CH data in Iran has several problems. First of all, there has not been any effort for making data machine-readable or linking the data or providing any SPARQL endpoints or raw data dumps. In addition, there is no special portal providing a large amount of data about heritage sites in Iran. The heritage information is mostly at preliminary stages and mostly kept privately by the heritage organization. Fortunately, there is one online database for the Sa’dabad complex^{Footnote 12} that has a list of objects at each museum with images and textual description for them (Fig. 5). However, there is no uniform and well-structured metadata for the object descriptions and the information is not rich enough. Therefore, we used the manual corpus extraction method to extract information and for identifying the entities in the textual descriptions so that we could select the classes and properties needed for modeling this data from the CIDOC CRM ontology. In addition, with the keywords identified, we searched the web for other related information to aggregate and integrate them together in our data model.

Obviously, the data finding, extraction, and gathering was done manually as there was no standard database or structured data in any form. The manual data discovery has certain limitations. It takes a lot of time and effort to extract the information needed. Nevertheless, it was the only way to collect information, as there is a lack of structured databases and portals or endpoints to ingest from.

Step four: define classes and the class hierarchy

After generating the terms in information, we have to define and select appropriate classes for them. In Fig. 6, the selected classes from CIDOC CRM and GeoSPARQL, the mediating class, and the class hierarchy is shown.

As it can be seen, E1 CRM Entity is the superclass of all the classes from CIDOC CRM. Its existence is necessary in case a property can be attached to all other classes and it can be used as the domain of that property. E5 Event holds the historical events as its individuals. Then there is the E7 Activity, the subclass of E5 Event. The distinction between these two classes is that events bring instantaneous changes of the state, whereas activities are human actions therefore caused by instances of the class E39 Actor and they bring changes to an object. In the CIDOC CRM, E5 Event has three subclasses, while here we didn’t use E64 End of Existence based on the available information. The other subclass used here is E63 Beginning of Existence, whose instances are the events that bring anything into existence. It has its own subclasses like birth but are not used here. Amongst the subclasses of E7 Activity, E10 Transfer of Custody and E12 Production are used. Then there is E24 Physical Man-Made Thing that includes all persistent physical things that are created by human for a purpose. Here we use this class for the buildings of the POIs. Its subclass, E22 Man-Made Object, is used for physical and conceptual objects that are holed in the heritage places. E36 Visual Item comprises all visual things that are recognizable intellectual or conceptual signs, marks, and images. E38 Image is the subclass of the E36 Visual Item that is for the visual objects with form, tone, and color on the surface of photos, paintings, prints, sculptures, or even directly on electronic media. E39 Actor is the classes that holds human beings, individually or in-group, who have intentionally have performed actions for which they can be held responsible. Its subclass, E21 Person, is for the individual real persons who lived or at least are assumed to. E52 Time-span is used to define temporal extent of instances of E5 Event and any of its subclasses that are valid for a certain time. E53 Place comprises extents in space, on the surface of the earth. It has not to be exact coordinates and it is more of a general aspect of place that can be the position of any physical reference. E55 Type is used to categorize and classify instances of all CRM classes. Terms from other thesauri and controlled vocabularies can be used in this sense, which makes this class as an interface that connects CIDOC CRM to other knowledge organization systems. E57 Material is a specialization of the class E55 Type and comprises the concepts of materials. On the other hand, Spatial Object is the superclass of all the classes from GeoSPARQL. Feature and Geometry are the two main classes of GeoSPARQL. Feature is for any entity that has some spatial location, which is the POI in our case. These features have geometric characteristics and they are stored through the class Geometry. Geometry has sixteen subclasses, two of which, point and polygon, are used here. POIs can be represented both by a single point and with a polygon for finer spatial reference. Place feature subclass of feature, is the mediating class that we developed to create a link between the two ontologies, which will be explained later in detail.

Step five: define the properties (slots) of classes and step six: define the facets of the slots

Since defining properties involves define their facets (domain and range), we will discuss both steps in one part. However, before we define properties, there is a need to decide how we are going to design our data model. One of the features of CIDOC CRM ontology is that it has an inverse property for each property. It means that for each property there is another one with the opposite domain and range. This gives us flexibility in designing our data model.

There are two main data modeling approaches in the CH domain, event-centric and object-centric data modeling [47]. When all the information are attached to the object for its description, the modeling method is based on object-centricity. While in event-centric data modeling the information is connected together through events allowing all the events and activities that an object was involved in, to be modeled in a machine-readable way rather than just a textual description of the object and other entities such as actors, time periods, locations and other details related could be linked to the object via events. Also, the chain of activities and changes of the object could be modeled and thus reasonable for machines which is not possible in object-centricity. Therefore, the event-centric modeling is more expressive than the object-centric approach [47]. Since we are using concepts from the CIDOC CRM ontology, our data model would have the event-centric characteristic, however, as the objective of this study is to integrate the related information to a POI as a whole to form a knowledge graph of the place, object-centricity seems to be necessary in our data model. This gives the data model an object-centric feature too; as a result, the data model in this study has both the modeling approaches in this sense (Fig. 6).

As it can be seen, the POI is in the center of the model and other concepts are connected to it, which shows the object-centricity of the model. In addition, the POI is linked to GeoSPARQL concepts via Place feature. The Place concept in CIDOC CRM refers to “immobile” objects such as cities, rivers, buildings, ships, etc. In addition, places can be related to spatial features, which has geometry therefore; we defined the property isFeature to connect E53 Place to the class Place feature. Geometry can be defined in two ways in GeoSPARQL, Well-Known Text (WKT) or Geography Markup Language (GML). As it is obvious, we have used WKT for storing geometry of POIs. Moreover, the POIs are encoded both in point and polygon features. Although LBS applications mostly use simple point geometry, places are polygons in real world. Furthermore, there are some complexes, such as our study area (Sa’adabad), that include many POIs and this can be useful for such cases.

In Table 1, all the properties used in the model are shown with their domain, range, and description. The classes of CRM are denoted by “E” and its properties are denoted by “P”. The inverse properties of CIDOC CRM have an “i” as an indicator. The classes and properties of GeoSPARQL are denoted by “ogc”.

Table 1 The properties of the data model

An ontological data model for points of interest (POI) in a cultural heritage site

Abstract

Introduction

Related work

Methodology

The study area and POIs

Ontological data model development approach

Step one: domain and scope

Step two: reusing existing ontologies

Step three: enumerate important terms

Step four: define classes and the class hierarchy

Step five: define the properties (slots) of classes and step six: define the facets of the slots

Step seven: create instances

Implementation

Ontology development

Content of the data model

Triple store and SPARQL endpoint server

Validating results using semantic queries

Event

Object

Actor

Spatial semantics

Conclusion

Availability of data and materials

Notes

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords