A study of spatial cognition in the rural heritage based on VR 3D eye-tracking experiments

Traditional Chinese villages act as the essential carriers of China’s rural heritage. Such villages are considered to retain living practices of intangible heritage through know-how, artistic and folk customs, as well as a well-preserved architectural appearance. However, their conservation is at stake due to degradation, abandonment, and urbanization. Spatial cognition is one way of bringing people to the matter of what is important to conserve (attributes) and why they are important (values). This is done by analysing how people perceive and value rural heritage. Traditional methods of gathering spatial cognition data, such as laborious fieldwork or subjective qualitative analyses, often fall short of providing a holistic representation of real-world experiences. This article presents an innovative method for analysing cognitive features in traditional village spaces using an immersive virtual reality equipped with an eye-tracking system. We selected Cheng and Jitou, two traditional Chinese villages, as our case studies. In two virtual reality environments, we captured accurate 3D cognitive data, including participants’ locomotion, gaze point, and sightline. The results indicate that the accessibility of village roads affects the distribution of people’s locomotion, which in turn affects the distribution of people’s areas of interest; the changes in the rhythm of the sightline scale reflect the rhythm of spatial changes in the villages. By broadening the understanding of spatial cognition in traditional Chinese villages related to people’s values, this paper sheds light on an alternative approach to assess the cultural significance of such villages for higher conservation and sustainability.


Introduction
In 2017, ICOMOS proposed that all rural areas can be read as heritage, both outstanding and ordinary [1].In order to better protect China's rural heritage, the Ministry of Housing and Urban-Rural Development, the Ministry of Culture and the Ministry of Finance jointly launched a survey and listing process for traditional Chinese villages, TCV, in April 2012 [2].The Ministry of Housing and Urban-Rural Development organised experts in the fields of architecture, folklore and economics to recommend a total of six times of traditional village lists from 2012 to 2023 based on the 'Traditional Village Evaluation and Recognition Index System (Trial)' , which combines qualitative and quantitative approaches in three aspects: traditional architecture, location and layout, and intangible cultural heritage [3].
TCV are existing settlements in China that boast a long history and unique regional features.Examples include the ancient villages of Xidi and Hongcun in southern Anhui, as well as the Fujian Tulou, these villages were recognised as UNESCO World Heritage Sites [4].These sites serve as tangible carriers of both natural and cultural heritage [5].While the value of these villages might not be as immediately striking or centralised as individual heritage sites in cities, TCV embody the historical and cultural significance of rural heritage [6].In the preamble of the Faro Convention, it was mentioned that people and human values are to be placed at the centre of an enlarged and cross-disciplinary concept of cultural heritage [7].Building on this perspective, the attributes and values of cultural heritage manifest through people's cognition.Thus, it's imperative to delve deeply into research on not just the tangible aspects of rural heritage but also the cognition of it.
Chinese scholars have extensively researched traditional dwellings, villages, and cultural landscapes, emphasising aspects such as architectural space, village layout, clusters, and conservation.This body of work has yielded significant findings that enrich our understanding of TCV studies.For instance, they have analysed the spatial distribution characteristics and summarised the influencing factors, including natural environmental factors, socio-economic factors, historical factors and cultural factors [8,9].They have also presented comprehensive conservation strategies, drawing comparisons and summarising concepts and policies from European architectural heritage conservation [10].Additionally, these scholars laid a theoretical foundation for tourism by analysing tourists' behavioural intentions and decision-making processes [11].The value of TCV, in terms of historical significance, social relevance, age and aesthetics, is manifested in their architectural clusters and the harmonious and unified living environment created by the integration of mountains, rivers, trees, and buildings that coexist in harmony between humans and nature [12,13].
In the evolution of urban and village spaces, the primary element impacting people's lives and behaviour is the alteration in spatial configuration.This change is illuminated through spatial cognition [14].Bill Hillier pointed out that the concept of space shall not be regarded merely as a space-creating entity.Indeed, space has the subjectivity and relevance of materials and is an essential part of human activities instead of the background [15].The Space Syntax theory emphasises that space has a significant impact on people, not just through the enclosures created by buildings.
Instead, space serves as a container for movement and acts as a central driver of such movement [16].The spatial configuration can shape social interactions and human behavior.The arrangement of spaces and their accessibility can determine where people gather, interact, and engage with others.The study of spatial cognition is dedicated to making spatial configuration meaningful for people, thus providing better theoretical support for urban design, village heritage renewal and the transformation of historic districts [17].Kim proposed a method framework that integrates three spatial information sources: morphology, cognition, and behavior.The reliability of this method was validated in the suburban areas of northern London, United Kingdom, demonstrating the existence of correlations among these three spatial information types [18].Lin et al. employed spatial syntax analysis to investigate the characteristics of spatial structure and spatial cognition in traditional villages [19].Nan et al. combined semi-structured interviews, spatial syntax, and Wi-Fi positioning tracking data to analyse the locations within villages that people perceived, remembered, and imagined [20].Li et al. utilized Schema theory to construct cognitive maps for residents and tourists, analysed the primary landmarks in traditional village spaces as remembered by individuals [21].
The earliest studies of spatial cognition can be traced back to the five elements in cognitive maps in the urban context proposed by Kevin Lynch, i.e., paths, edges, districts, nodes, landmarks [22].However research related to spatial cognition has been dominated by qualitative analysis due to the limitations of technical conditions and technological tools.In recent years, the development of computer technology and the consequent increase in the experiential and practical nature of Virtual Reality (VR) have made VR experiments an important tool for spatial cognition research.Compared with traditional research methods, for example, researchers have utilized methods such as cognitive mapping and surveys to examine how individuals engage with rural heritage spaces [23,24], VR can greatly reduce the cost of experiments, improve their efficiency, and make it easier to obtain relevant data on human cognition, thus providing quantitative data support for advancing spatial cognition research [25].For instance, traditional field investigations necessitate researchers to physically visit the research site for data collection, incurring substantial expenses such as transportation costs, accommodation fees, and data collection expenditures.These costs far exceed those associated with acquiring a suitable set of VR equipment, which can be reused for multiple experiments.
The reliability of using VR for cognitive and behavioural studies of the built and urban environment was first demonstrated by several experiments conducted in the 1990s, followed by a series of studies that focused on human cognition in space [26,27].For example, Ruddle used VR experiments to demonstrate that body movement and rotation are as important as vision in acquiring spatial cognitive information, in a way that the participant's actual body movements were more accurate in choosing paths during the virtual space experience than using a joystick to manipulate movement [28].
Spatial information is acquired to a large extent through visual cognition.People perceive space through their eyes, reason about the task, and if they think they need more information, they perform a visual search [29].Eye tracking allows researchers to measure a people's visual attention, resulting in a rich source of information about when, where, how often and in what order people are viewing certain information in or about space [29].The current mainstream eye tracking devices all utilize Pupil Center Corneal Reflection (PCCR) technology.The basic principle of this technology can be summarized as follows: (1) Infrared light is emitted onto the eyes.(2) A camera captures the infrared light reflected from the cornea and retina.(3) Due to the physiological structure and physical properties of the eyeball, the reflection spot formed by the corneal reflection remains stationary when the position of the light source and the head are fixed.(4) The direction of the light reflected from the retina indicates the orientation of the pupil (the light enters the pupil from the light source and exits the pupil through retinal reflection).Finally, (5) the direction of eye movement can be calculated based on the angle between the corneal and pupil reflection rays [30].The core principle of PCCR eye tracking technology can be summarized as follows: The direction of gaze is determined by the position of the pupil center relative to the corneal reflection [30].
In experimental research in the psychology of vision and perception, Wiener et al. investigated the influence of visual perception on motion during spatial navigation.They found that participants tend to select longer linear paths during motion, and the direction of gaze also aligns with the final behavioral choice [31].Emo and Beatrix, through studying eye movement characteristics, discovered that individuals focus on spatial structural elements in architectural environments, and sky area, ground area, and the longest line of sight are three important factors in spatial cognition experiments [32].Furthermore, findings from cognitive neuroscience experiments have provided a deeper understanding of the perceptual mechanisms underlying spatial exploration.Some experiments have demonstrated that during motion, hippocampal neurons control the alignment of head direction with visual gaze direction, both of which tend to align with the direction of walking [33,34].
With the advancement of spatial cognition research in built environments, eye-tracking techniques have been introduced.For example, Hollander used eye-tracking to measure human responses to varying urban scenes [35], Sun Cheng conducted an eye-tracking experiment in a shopping mall using a wearable eye tracker.The objective was to analyse the interrelationship between the visual saliency of complex architectural interior landmarks and users' wayfinding behaviours in real-world scenarios [36].Spatial cognition experiments that rely heavily on eye-tracking often encounter problems such as a strong dependence on field environments, limited control over experimental conditions, and high costs associated with task failures [37].In response, researchers are turning to a combination of VR and eye-tracking for studying spatial cognitive behaviours.For example, Yan Feng introduced WayR, integrating VR and eye-tracking to study pedestrian way-finding behaviour [38].Similarly, Helmut evaluated a guidance system for a sizable public transport infrastructure using a VR setting combined with a mobile eye-tracking [39].However, analysing eyetracking data from 3D dynamic stimuli presents its own set of challenges.Improvements regarding gaze data processing and visualization for dynamic stimuli are still in the early stages.Most techniques are based on frameby-frame analyses of video data.This process can be time consuming and computationally intensive [40].This is particularly true since most standard software packages from vendors are not equipped for automated analysis of visual attention on moving Areas of Interest (AOI) making it more complicated than data analysis from static stimuli [29,41].
To tackle the aforementioned issue of analyzing eyetracking data for 3D dynamic stimuli, various approaches have been attempted by scholars.Duchowski suggested binocular scanning paths in 3D VR, where 2D gaze positions and depths (gaze depth paths) are described [42].Ramloll was proposed a solution that is a fixation net for dynamic 3D non-stereoscopic content, for which gaze positions are mapped on to flattened objects [40].Helmut developed a program based on his immersive VR environment called DAVE, which enables the capture of 3D gaze vectors and intersection data with 3D models.Subsequently, the intersection data was subjected to density analysis using Cloud Compare [39].Campanaro utilized Vive Pro Eye to collect participants' visual experience data in the Pompeian house and conducted 3D visualization analysis in GIS.This workflow introduces a more dynamic approach to overcome the significant limitations posed by most traditional GIS-based studies.However, the article did not provide a detailed description of the experimental procedure for eye-tracking and the methods used to process eyetracking data [43].
This research uses immersive VR combined with eyetracking to focus on human visual cognitive processes in the space of TCV.In the virtual environment (VE), for each generated gaze point by the participant, a realtime output of a 3D gaze point marker is generated in the background, along with the corresponding positional marker of the participant.Consequently, every gaze point marker and its corresponding positional marker of the participant are recorded one by one.Subsequently, Rhino, in conjunction with Grasshopper, is employed to visualize and analyse the 3D experimental data within the 3D model.Therefore, this method not only overcomes the challenge of handling eye-tracking data involving dynamic stimuli but also allows for the analysis of sightline scale by calculating the distance between the participant's positional markers and gaze point markers.The study aims to provide an appropriate answer to the main question: How do the spatial attributes of the TCV influence spatial cognition?
To thoroughly address the primary question above, it can be further divided into the following sub-questions: 1. What attributes within the village space most often capture attention, and why?
2. What is the range of visual cognition scales in TCV?By delving deeply into the physical form and cognitive characteristics of TCV, we can gain a better understanding of the spatial attributes of TCV.Such as analyzing the locomotion distribution reveals the trajectories and chosen stopping points within village spaces, examining gaze point distribution allows for the analysis of AOI among the people, indicate visual interest preferences within these spaces, assessing the sightline provides insights into the visual scale range and its variations within the village context.By grasping people's visual cognitions and behavioural spatial patterns, architects and planners can be better equipped with insights into both the potentials and challenges in heritage conservation and development [44,45].

Immersive VR 3D eye-tracking experiment
This research presented an innovative approach that combines a VR headset with eye-tracking and a motion platform for 3D gaze points and sightlines analysis and visualisation.The platform mainly consists of two parts: the software and hardware systems.The software system consists of Unity3D, Steam VR, ArcGIS, and Rhino Grasshopper.Unity3D is used to construct the VE and write the experiment scripts; Steam VR is used to build the VR environment and establish the interaction between the participant and the VE.ArcGIS and Rhino Grasshopper visualise and analyse the cognitive data in a 3D model.The hardware system consists of a graphics workstation, the HTC Vive Pro Eye immersive VR headset, and the Omni VR motion platform, which is used to run the VR scenarios and to acquire and export the participant's behavioural data.

The virtual environment
The workflow for creating the virtual experimental environment is as Fig. 2 shows.The construction of the VE for the experiment consisted of two main parts: constructing a model of the two villages and the VE.Firstly, a drone (Aircraft: DJI Mavic Pro, Camera: 1/2.3"(CMOS), Effective pixels:12.35M (Total pixels:12.71M),Image size: 4000*3000, ISO: Automatic, Photo: JPEG) was used to conduct a low-altitude scan during the field research, and handheld camera equipment was used to photograph the architectural details.The aerial photographs taken by the drone were used to generate a 3D point cloud model in Context Capture software.(Fig. 3), and then Sketch-Up was used to build the base 3D model of the village.The architectural details of the street-level interface were refined using a combination of detailed photographs taken with handheld camera equipment.The building surface mapping was captured on-site and later seamlessly mapped to produce the corresponding building material mapping.The completed 3D model was imported into Unity3D for adjustment and refinement, adjusting the texture properties of the model material map to make the material as close to the natural environment as possible.At the same time, natural environments such as sunlight, sky, rivers, and plants were added to the village's VE to enrich the spatial experience.Fig. 4 shows the VR environments.

Physical environment
The VE was constructed based on two TCV, Cheng and Jitou, both located in the northern part of Fujian Province, China.The north part of Fujian is mountainous, located in southeastern China.It was an essential gateway for communication between the rest of Fujian and the Central Plains, with The Tea Road being the most famous.After the Qin and Han Dynasties (21 B.C. -220 A.D.), human activities such as population migration, cultural exchange, and transportation formed rural settlements with regional characteristics in northern Fujian.These villages represent one type of TCV.
In this study, two villages, Cheng and Jitou, were selected as case studies for analysis.Both villages, emblematic of northern Fujian, retain well-preserved historical village patterns.They were designated on the first list of TCV in 2012 [46].Despite both being situated in flat areas within mountainous regions, their spatial characteristics differ owing to their unique development histories.Cheng (Fig. 5), was established during the Sui and Tang dynasties (581 A.D. -907 A.D. ).It experienced significant growth during the Ming and Qing dynasties, largely due to tea transportation.TLocated 35 kms south of Wuyishan City in Fujian Province, Cheng spans 30 hectares and is home to approximately 600 residents.The village is predominantly inhabited by individuals with the surnames Lin, Li, and Zhao; all esteemed families originally from the Central Plains.Ancestral halls dedicated to the Zhao, Lin, and Li families are scattered throughout the village, and a plethora of monuments and edifices further enrich its historical ambience.Notably, crossing pavilions are strategically placed at vital street intersections, serving as communal spaces for villagers to interact.Conversely, Jitou (Fig. 6), dates back to the Tang Dynasty (618 A.D. -907 A.D.).Situated 4.5km east of Pingnan County in Ningde City, Fujian Province, the village covers 23 hectares and houses roughly 500 residents.Its geography is notable, nestled amidst hills with the Bamboo Creek meandering through.This creek, coupled with the encompassing hills, has directed its spontaneous architectural development.The majority of the village's edifices boast unadorned yellow rammed earth walls, topped with black tiles.Jitou's primary thoroughfare maintains a strong continuity and an expansive scale, predominantly lined with shops.Intermittent corridors punctuate this street, facilitating movement for its inhabitants.

Data acquisition Locomotion
In conventional VR experiences, the movement of the virtual character is usually controlled using a joystick, which does not achieve the sensation of real body movement; secondly, during the course of a VR experience, motion sickness often occurs.There have been many studies on motion sickness in VR environments [47][48][49].Sickness occurs when the signals from the vestibular and visual systems are not coordinated during the perception of self-motion [50].The most common symptoms of VR disorders are dizziness, nausea, sweating, and orientation disorders [51].The chances and circumstances of motion sickness vary due to the individual circumstances of VR users.When motion sickness occur, they can cause a large margin of error in experimental results.One potential solution for addressing motion sickness is the utilization of an omnidirectional treadmill [50].Therefore, the Omni VR motion platform was used to enable autonomous roaming in a VE in a manner that approximates movement in a real environment (Fig. 7).
The Virtuix Omni uses a platform to simulate walking motion, utilises matching shoes or shoe covers to reduce surface friction, and is used in conjunction with the HTC Vive Pro Eye for a complete VR experience.The Virtuix Omni platform has a bowl-shaped surface and uses inertial sensors to track a person's position, stride length, and how fast they are moving.It returns the data to a workstation, translating it into locomotion movements in a virtual scene.Using the Omni VR motion platform, how a person walks in the real world can be recreated to the greatest extent possible, increasing the feeling of real movement while providing relief to the participant from motion sickness during the experimental project [51].Although the Omni VR motion platform has provided some relief from motion sickness, some participants still suffer from motion sickness due to different physical conditions.Therefore, we flagged participants with motion sickness and excluded them from the data to ensure the accuracy of the experiment.In this study, if participants experience any discomfort during the experimental process, they are free to stop the experiment at any time, and their personal wishes will be fully respected.

3D gaze point
Eye-tracking in a real environment, or desktop eyetracking, cannot accurately capture the 3D gaze point of the participant in the experimental setting [29,40,41].Due to the unified 3D world coordinate system in the immersive VE, any movement behaviour of the participant in the VE can be accurately recorded.Combining an immersive VR headset and the eyetracking can satisfy the conditions for obtaining 3D gaze point data from a technical level [50].In this study, based on the Unity 3D script development, the eye-tracking module equipped with HTC Vive Pro Eye was used to capture the vector data of the participant's gaze direction (Fig. 1), calculate the 3D coordinates of the intersection of its gaze direction vector and the object in the VE, output the coordinate (x 1 , y 1 , z 1 ) as the participant's 3D gaze point, and output the spatial position of the participant's head (x 2 , y 2 , z 2 ) when outputting the 3D gaze point.The coordinates (x 2 , y 2 , z 2 ) were used as the head position of the participant.This headset has a display resolution of 2448 × 2448 pixels per eye, 90 Hz refresh rate and a FOV of 120 • as stated by the manufacturer [52].The HMD is equipped with an eye tracking system running at 120 Hz sampling rate and reported to achieve a spatial accuracy of 0.5 • -1.1 • [53].Schuetz conducted experiments and concluded that the HTC Vive Pro Eye eye-tracker meets the accuracy requirements for eye-tracking experiments in VEs [53].

Sightline
The line of sight is an important indicator for analyzing visual distance.Currently, research on visual distance primarily focuses on the field of transportation.For instance, studies have been conducted using laser radar data for 3D virtual intersection visibility analysis [54], as well as in GIS platforms and numerical computing environments to estimate the available visual distance for typical urban roads [55].In the field of built environments, scholars also show significant interest in the topics of spatial scale and visual distance.Notably, Kevin Lynch and Fredderik Gibberd have pointed out that a scale of approximately 30 ms is considered the most comfortable and appropriate in urban spaces [56,57].Furthermore, Jan Gehl and Allan B. Jacobs have discussed the relationship between spatial scale and the scale of sightlines in their respective works [58][59][60].
However, the analysis of visual distance is currently limited due to the constraints of the eye-tracking technology.However, the combination of immersive VR with eye-tracking technology has the potential to achieve this objective.Based on the 3D gaze point coordinates and 3D head coordinates of the participant in the VE, it is possible to redraw the participant's sightline (Fig. 8).Then, we can use the distance formula between the two points to calculate the distance between the head position coordinates and the gaze point coordinates, i.e., the length of the sightline:

Participants and experimental procedures Participants
A total of 60 participants, consisting of 36 males and 24 females, took part in this experiment.The age range of the participants was between 18 and 50 years.All participants had normal vision or corrected vision.No Fig. 1 The hardware system Fig. 2 The workflow of creating the VE prior information regarding the village samples used in the experiment was provided to the participants.

Experimental procedures
Prior to the start of the experiment, participants were given instructions on how to use the Omni VR motion platform and were allowed to practice freely for 5-10 min.They were then equipped with HTC Vive Pro Eye headsets.Due to the common issue with eye-tracking headsets in VR, eye-tracking calibration is necessary before using the headset [50].Each participant should ensure that the headset is comfortably and securely worn on their head before the experiment begins.The interocular distance should be adjusted to accommodate different participants and ensure accurate eye-tracking measurements.
The initial position of the participants upon entering the VE was set at the main entrance of the village, which also serves as the primary entrance for non-local residents in the real world.Subsequently, participants were given the freedom to explore the village space, with the expectation that they would explore the village as much as possible, but without the requirement to visit the entire village.If participants felt that they had completed the experience within the first 10 min, they were allowed to stop the exploration at any time.Alternatively, if the duration reached 15 min, participants were informed that the experiment had concluded.The duration of each participant's exploration varied slightly but was controlled within the range of 10-15 min.At the end of the experiment, the data file corresponding to As the experimental study is the village space, collision detection was set in Unity3D for regions around the village streets, building boundaries, and the river to prevent participants from falling or going through walls due to incorrect operations in the VE.We extracted only the gaze point data that was focused on solid objects during the experiments, excluding data from participants looking towards the sky or the natural environment outside the village.

Results
A total of 51 valid data sets were collected in the experiment, while the remaining 9 individuals did not complete the experiment due to early termination.The data collected to reflect the participants' cognitive processes were divided into two categories: data on the participants' motion behaviour and visual behaviour while exploring the village space.

Locomotion behavior perception
We integrated the data of all participants' locomotion tracking points during their experience in the two villages.In ArcGIS, we analysed point density to visually analyse participants' locomotion distribution density in the two villages.The warm colours represent a more concentrated locomotion distribution of participants in the region, while the cool colours represent a sparse locomotion distribution.The hotspot map (Fig. 9a) showed that the distribution of people in Cheng was mainly concentrated in the main streets such as Zhu Street, Xia Street, and Heng Street.In the Jitou (Fig. 9b), the high-density locomotion distribution regions were mainly concentrated along the roads on both Bamboo Creek sides.The two streets adjacent to it were the next most crowded; the surrounding fringe regions were less crowded.Although the participants were not asked to complete any specific tasks during the experiment and only visited the village space based on their visual cognition, the distribution of spatial locomotions of the participants in the two virtual villages still showed a structural pattern that matched the street structure and public space of the real world village.For example, in Cheng, the ancestral shrines of the three clans were located on Heng Street, Zhu Street, and Xia Street, the village's main streets.In Jitou, the commercial shops used to be located on both sides of Bamboo Creek Street; later, due to the expansion of the village, the new multi-story buildings were located on the street north of the village entrance square, and the ground floor was mostly used for shops of a catering nature, which is the main commercial activity part at present.The spatial structure of the village topology (Fig. 10a, b), as revealed by integrating the 800 m walking scale line segments by the spatial syntax method, makes it easier to understand the mechanisms underlying the distribution of participant locomotions in the experiment.The red streets with a higher degree of integration represent regional connectivity in the village network.The clustering of participants' locomotions in the two villages corresponds strongly to the spatial structure of the highly integrated streets in the villages.Participants are guided by the spatial topology in spatial exploration and eventually gather in parts of high accessibility.In real villages, this congregation of locomotion subsequently catalyses the emergence of public activities and attracts the establishment of public functions.The experimental results were also locally different from the spatial syntax measurements.For example, the local point-like highlighting in the distribution of movement trajectories and the extension of crowd trajectories in the main street.By looking back at the participants' experimental video recordings, it can be found that elements such as streams and landmarks in the village can be guiding and attractive to the movement of the crowd.

Gaze points distribution analysis
The processing of 3D experimental data presents challenges and complexities, and the standard software packages provided by existing supplier fail to achieve our desired level of visual analysis.Therefore, we developed a customized program based on the   ), it's evident that in both villages, most AOI coincide with areas of clustered locomotion.This suggests that areas with a higher concentration of people also offer more opportunities for visual observation and perception.
In Cheng, the concentration of gaze points was predominantly observed at street exits, alleyways, intersections, ancestral halls, and architectural details such as doors and windows.This distribution can be attributed to the more linear road configuration in Cheng compared to Jitou.Notably, street pavilions constructed at key intersections within the village emerged as primary regions where participants demonstrated AOI during  locomotion.The attention these pavilions garnered was markedly greater than that for other public edifices, even those with more intricate architectural features and details, such as the ancestral halls and temples in the village.In real-world contexts, these pavilions serve as pivotal landscape elements and crucial hubs for communal activities.Locals frequently congregate at these pavilions for various activities, including casual conversations, vegetable preparation, cooking, and communal dining, thereby offering a vibrant tableau of social interaction.Ancestral halls are the highestranking buildings in the village, and they are the places where the villagers' clan beliefs are represented, so they take on the role of some of the "landmarks" in the village.Secondly, the facade of ancestral halls and some of the better-decorated houses have some exquisite decorative patterns and motifs, which the human eye starts to catch when one approaches these buildings and takes a closer look at them.
In Jitou, a predominant AOI was observed at the entrance square and along the streets flanking Bamboo Creek.The entrance square serves as the starting point for the VR experiment, functioning as the primary entrance to the village from the county road.Additionally, it serves as a central location for various recreational activities of the local residents, including traditional holiday gatherings, daily exercises, and markets.Consequently, the square boasts a spacious area, featuring a large public pavilion and a water pool, providing resting and viewing spots for both locals and tourists.As a result, participants of the VR experiment spent a considerable amount of time either stationary or moving within the plaza during their spatial exploration.Concurrently, the main street also attracted a substantial number of gaze points, likely due to its role as a convergence point for participants.Analysis of gaze points in both villages indicates that areas of heightened AOI predominantly encompass street entrances, significant landmarks, street corners, areas directly opposite street exits, intricate architectural features such as windows and doors, and locations at ground level (Figs.13,14).

The sightlines analysis
We obtained the participants' sightline data by connecting the coordinates of the participants' gaze points and head positions.In this study, we first counted all the participants' sightline length data during the VR experiments in the two villages and the AOI's sightline length data to analyse human visual behaviour's scale range and characteristics during spatial locomotion.
We have counted data on the length of the sightlines for the villages of Cheng and Jitou separately and divided them into different length ranges.Figs. 15 and 16 show the scale range of the sightlines.The vertical coordinate of Figs. 15 and 16 is the quantity of the sightlines, and the horizontal coordinate is the length of sightlines (m).As shown in Figs. 15 and 16 and Table 1, the average length of the sightline in Cheng is around 25 m, with the Fig. 15 Frequency of sightlines at different length ranges in VR experiments of Cheng longest sightline not exceeding 400 m, and the number of sightlines in the 0-50 m range reaches nearly 87.3%.Only 0.3% of the sightlines exceed 200 m.In the Jitou, the average length of the sightline is around 20 m, and all the sightlines are within 200 m, with 92% of the sightlines within 50 m and 0.6% between 150-200 m.The standard deviation of sightline length was 16 in Cheng and 10 in Ji Tou Village.On the one hand, the above data reveal that different environmental scales influence sightlines.On the other hand, the physiological characteristics of the human being himself, i.e. the human visual observation distance, is mainly within the 50 m scale.
We then extracted sightline length data for the AOI in both villages (Figs.17,18).The sightline data were then arranged in chronological order, with the sequence in the horizontal coordinate being the chronological order and the vertical coordinate being the length of the sightline (m).The AOI was within 140 m, both in Cheng and Jitou.The horizontal Sequence of sightline data in chronological order presents the change in the distance of the sightline during the participant's locomotion in the VE.According to the moving average curve in the graph, it can be seen that the overall pattern of sightline length fluctuates over time, with a sudden increase, then a gradual decrease, and then a sudden increase again.The data for both villages show similar fluctuating characteristics when a person perceives an AOI, they usually start at a distance, and as they gradually get closer to the AOI, the number of viewings starts to increase.Therefore, the sightline gradually becomes shorter.Observing the next AOI then begins, and so on, and creates the fluctuating pattern shown in the diagrams.

Discussion
The main goal of this study was to analyse how spatial attributes in TCV influence spatial cognition.We propose a method for analysing the spatial cognitive characteristics of rural heritage, taking two TCV as cases, by capturing and analysing visual and motor behaviour.This method integrates an immersive VR environment with an eye-tracking.Leveraging the technical features of immersive VR headsets, we could captures the coordinates of a participant's 3D gaze point and obtains 3D sightline data.A significant advantage of this method is its ability to more precisely locate the AOI within the 3D environment, allowing for the sightline to be redrawn and subjected to statistical analysis.
For the first sub-question, we found that the street structure of the village influences visual behaviour by affecting people's locomotion.Through the observation and analysis of the experimental data, participants tend to gather in street spaces with high accessibility  while exploring the space, which is in high agreement with the results of the accessibility measurement of the spatial syntax.The AOI also tend to be distributed in street spaces with high accessibility.This can be explained by the fact that during purposeless spatial exploration, the spatial structure influences the distribution of locomotion, which affects the distribution of gaze points.These metrics are usually relevant to the study of spatial comprehension.Still, previous research methods through cognitive sketching and 2D eye-tracking cannot quantify and precisely locate cognitive attributes in 3D space.The combined use of VR and eye-tracking allows for experimentation in 3D environments, thus adding distance data to the 2D plane.Sightline is the third dimension of the 3D heat map and provides further insight into the visual coverage of the 3D environment [17,50].Therefore, this experimental method can provide a more convenient way to quantify spatial cognition research.Furthermore, areas that attract relatively higher attention in the TCV context include street interfaces or spatial elements located in the direction of movement, locations with spatial turns and changes, areas with prominent features and architectural details, and road surfaces.This is because people are constantly looking for places to occupy their field of vision as they walk, and these places of change and highly recognisable objects are more likely to be noticed.On the one hand, these results are also supported by the conclusions of some previous cognitive neuroscience experiments.They have demonstrated that hippocampal nerve cells control the direction of the head during locomotion to align with the direction of visual gaze and that both tend to remain in the direction in which the person is walking [33,34].On the other hand, these also confirm Kevin Lynch's five elements of the city [22], which are the main elements to focus on in urban design, village regeneration and other efforts.This study can also provide theoretical and quantitative support for relevant design.
For the second sub-question, within a range of 50 ms, particularly at a scale of 20-30 ms, the sight statistics of the two villages show a similarly high frequency of occurrence.Regarding the scale of visual cognition, the two villages have different street scales, resulting in different maximum sightline lengths for participants in the experiment.It can be argued that specific environmental features trigger the visual scale in the residential environment and are also related to human vision's physiological characteristics.Periodic fluctuations of the sightline in the AOI reveal features of exploratory behaviour.This result also validates the 3D aspect regarding people's visual exploratory behaviour when viewing a 2D plane.First, the eye makes a series of large viewing movements, and then the viewing activities are reduced and become slower in a particular way [51].In 3D space, a large sweep at a distance is performed first.The number, as well as the frequency of gaze, increases gradually as one gets closer to the AOI, and then one begins to look at the next place similarly.
In line with the research findings, this study also conducted verification of the correlations among the three spatial information sources ( morphology, cognition, and behavior ).These findings are consistent with the discoveries made by Kim [18], Lin [19], Nan et al. [20], further confirming the significance of village spatial structure.In comparison to previous studies, this research, as indicated by the aforementioned results, has further identified more specific and in-depth visual AOIs and visual scales in TCV.In terms of experimental design, this study is similar to the research conducted by Campanaro [43] and Helmut [39] as both utilized VR to collect eye-tracking data and employed software for 3D visualization analysis.However, a key difference lies in the utilization of Rhino & Grasshopper in this study.In addition to visual and motion behavior data analysis, this study enabled the visualization of gaze direction and the statistical analysis of sightline length, which could not be achieved in the experiments based on GIS statistical analysis and four-sided immersive stereoscopic projection environments.
In the formation of most TCV, a self-organised, nonplanned development model has been predominantly adhered to, distinguishing it from urban development paradigms.In the real world, commercial functions and major public activity spaces are often located on main streets, leading to the formation of social interactions in these areas.However, in VE, participants were not influenced by the context of the real world, yet their locomotion and AOI still tend to concentrated on main streets.This phenomenon can be explained by the self-organizing pattern of bottom-up formation in TCV, which serves as the backdrop for the formation of social functional spaces in villages.For instance, the integration of spatial structures influences the accessibility of different areas, making highly accessible regions important for commercial and social activities.In contrast, areas with lower accessibility gradually decline or relocate due to decreasing usage frequency.Within this developmental trajectory, TCV manifest attributes that are the culmination of longstanding interactions between humans and nature [6].It is precisely these attributes that elucidate the value of rural heritage [13].Analysing how people (re)cognise these tangible attributes can enhance our understanding of how they (re)cognise the values of heritage.For instance, the scale of sightlines can reveal the spatial morphology of villages, AOI indicate which architectural elements are more readily observed by people, and patterns of locomotion reflect the village's spatial structure [61].Collectively, these attributes underscore the distinctive value of rural heritage inherent to TCV.
Motion sickness has been a issue in VR, which can affect the accuracy and credibility of experimental data [50].Realistic walking patterns are another major factor in improving the accuracy of VR experimental data [62].In this study, a VR motion platform was used during the experiments to provide a VR environment with locomotion patterns closer to those found in natural environments and to reduce motion sickness to a certain extent.This study provides a preliminary attempt to use this VR motion platform after previous research on VR experiments has been constrained by technical limitations.
Moreover, it is important to mention the limitations of this study.One limitation pertains to the group of participants.Given the intricate nature of the experimental setup, the time commitment for each participant increased considerably.This made achieving a balanced gender and age representation challenging, with the study predominantly focusing on university students and faculty.For future endeavours, the experimental process and performing comparative analyses should be more diverse.Moreover, the villages of Cheng and Jitou, while illustrative, are but a subset of the myriad TCV across China.The capacity to generalise the findings to TCV in other provinces necessitates further exploration.Subsequent research should aspire to bring together methodologies such as semi-structured interviews and questionnaires conducted on-site to amass subjective spatial cognition insights.Marrying these subjective inputs with objective experimental findings could offer a promising avenue for a broader understanding of human cognition related to the cultural significance of TCV.

Conclusions
This study employed a spatial cognition experiment in rural heritage, integrating VR with eye-tracking methodologies, to analyse data on locomotion distribution, gaze point distribution, and sightline scale.The results indicate that the spatial structure of a village influences human locomotion patterns, subsequently affecting the distribution of points.Locations in the village with marked spatial shifts, objects with pronounced features or intricate architectural details, and road surfaces are more likely to be observed by people.Additionally, the scale of streets impacts the sightline scale.The aforementioned outcomes highlight attributes that make village heritage more perceptible objectively.To a certain extent, these attributes represent the spatial cognitive characteristics of TCV.Similarly, consistent with the Faro Convention, these attributes also constitute the local cultural heritage, collectively forming the foundation for memory, understanding, identity, cohesion, and creativity.
The combined VR and eye-tracking experimental approach, along with the results derived, offers valuable insights into the conservation of rural heritage and village spatial planning.In the conservation and renovation of rural heritage, efforts should be made to preserve these attributes.Besides, in crafting new rural plans, these attributes can be extracted and integrated into rural designs, recreating spaces with traditional village characteristics.Especially in the context of China's current rural revitalisation efforts, applying the exemplary features of traditional village spaces to enhance the quality of rural spaces becomes particularly crucial.

Fig. 9
Fig. 9 Density map of the distribution of movement tracks

Fig. 10 Fig. 11
Fig.10 The spatial structure of the villages presented by the analysis of the integration of Space Syntax at the 800 m scale

Fig. 16
Fig. 16 Frequency of sightlines at different length ranges in VR experiments of Jitou

Fig. 17 Fig. 18
Fig. 17 Length statistics of sightlines in AOI in VR experiments of Cheng

Table 1
The mean and standard deviation of the length of the sightline