A valid and reliable explanatory model of learning processes in heritage education

Background The main challenge in heritage education is to identify the verbs—and their hierarchical relations—that explain heritage learning as based on empirical evidence. The Heritage Learning Sequence (HLS) selects seven verbs (Knowing-Understanding-Respecting-Valuing-Caring-Enjoying-Transmitting) on the basis of (a) theoretical studies, (b) analyses of international standards, and (c) evaluation of heritage education programs. The study has the following objectives: (a) to clarify the heritage learning process; (b) to test a theoretical model that groups the verbs that make up the Heritage Learning Sequence (HLS), as well as the relationships between them; (c) to identify possible sub-models that explain the different heritage learning itineraries. Methods The Q-Herilearn


Research problem and review of the relevant literature
Cultural heritage is a collective legacy that is passed from generation to generation [1] and its very existence depends on this intergenerational continuum [2,3].This transmission process derives, in turn, from a set of learnings (verbs) that succeed one another following a structural logic and, therefore, exhibiting some kind of order [4].The main current research problem for all heritage sciences is to identify the verbs that explain heritage learning and the possible underlying structural logics (models).Solving this problem is important because all heritage sciences are directly or indirectly affected by the need to transmit heritage; in particular, this is critical for educational sciences, since they are concerned with planning and measuring such learnings [5].
The verbs -or main actions that sustain heritage learning-have usually been classified into distinct categories within the spectrum that encompasses the unique process of heritage learning: i.e., conceptual, procedural and attitudinal learning [6][7][8]; in most designs and implementations (at least until the 2010 s), the first two have been predominant, while, more recently, the attitudinal dimension has been prioritized [9,10].To these three categories we can add a fourth one, which has become known as "experiential learning" [11][12][13][14][15]; strictly speaking, this is a cross-sectional category that includes and integrates some of the above learnings and moreover incorporates the experience and presence of people in heritage spaces and settings [5,16].It is therefore not a subtype of learning, but a sequence of learnings conditioned by their in situ acquisition and the consequent prevalence of participatory and experiential processes.
Other studies have focused on identifying some of the content of these learnings (what we learn), i.e., the direct complements of these verbs [17][18][19][20].For the most part, they have come to focus on the teaching-learning processes (how we learn), that is, on the instructional strategies and implementation processes that accomplish -through mediation, communication, or educationthe acquisition of these learnings [21,22].Special attention has been paid to digital environments as mediators [23] or educational technology as a new priority [24][25][26].For this reason, a significant proportion of studies have focused on learning evaluation [27][28][29][30].Others, fewer in number, have explored possible relationships-interactions between some learnings and others by means of sequential approaches [31,32].
The hypotheses are based on the theoretical model proposed by Fontal (2003), which is the result of research focused on the observation of behaviour, and which made it possible to identify these preferential relationships.Subsequently, this theoretical model has been confirmed in different empirical studies [33].
These hierarchical and structural relationships between the verbs that articulate heritage learning lead to the construction of theoretical models [34,35].However, until the publication of the Q-Herilearn scale [36], no measurement models had been generated to identify the main dimensions of heritage learning.
Ordering -prioritizing, hierarchizing-these learnings has been a part of a number of theoretical research studies framed within the postmodern understanding of heritage learning [37], the 'ethical' dimension [38,39], the incidental or natural dimension of these learnings [40] or, following a similar approach, intuitive learning [41].As a result, instruments to measure heritage learning are very recent [42,43] and focus on specific dimensions such as motivation [44], sensory-motor learning [45], enjoyment [46] or attitudes [47].Therefore, they are partial in terms of the learnings they measure, and, for the most part, they are designed ad hoc to evaluate an implemented educational design.In this sense, they cannot be used to measure other designs or display a lack of rigor in the calibration phase and cannot be placed under a theoretical model that would justify their own structure.
Therefore, moving from theoretical models (processes, sequences, clusters) to structural models and measurement models involves a gap in heritage education research which we intend to address in this article.The so-called Heritage Learning Sequence (HLS) manages to identify up to seven verbs that have been selected from previous studies targeted at international normative analysis [48,49], at the evaluation of heritage education programs [42,43,[50][51][52][53] and at the construction of theoretical models [33,36,54]: (1) Knowing: To acquire conceptual, procedural, attitudinal, and experiential knowledge related to heritage.
(2) Understanding: To understand the contextual keys to heritage that explain its origin, meaning, and evolution.Consequently, a sequential and hierarchical order for these seven verbs will be proposed.This sequence of verbs operates independently of formal, non-formal, and informal educational contexts, as it encompasses interrelated cognitive processes involved in transitioning from heritage knowledge to heritageization.In short, it represents a sequential logic based on the observation of human behavior that identifies basic cognitive operations.Thus, overcoming the shortcomings of previous studies -and in parallel to the very process of calibration of the scale [36] from which this study derives-, we will be able to (a) understand the heritage learning process and identify all the verbs involved therein, as well as the influence relations that take place among them; (b) demonstrate that a theoretical model that has been accepted by the scientific and educational community (HLS) is, in addition, validated through a structural model and several measurement models; and (c) identify possible submodels or subtypes within this organizational structure that explain the different heritage learning itineraries tailored to different groups, geopolitical contexts, or educational environments.

Research model and hypotheses
The exploratory study starts with the Heritage Learning Sequence (HLS) which identifies the seven main verbs in heritage learning.These verbs constitute the seven dimensions of the Heritage Process Model (HPM).Each of the latent variables is evaluated using seven indicators.These measurement models underpin the Q-Herilearn scale (see Fig. 1).
Consequently, we propose the following hypotheses (see Fig. 2): • H1: Knowing has a positive influence on Understanding.
In order to understand the heritage, it is necessary to know it beforehand.Knowledge of the heritage promotes its understanding.• H2: Knowing has a positive influence on Caring.
We will only care for such heritage as is known; the unknown cannot be cared for.• H3: Knowing has a positive influence on Enjoying.
A positive experience of knowledge can directly lead to enjoyment.• H4: Knowing has a positive influence on Valuing.
We cannot value what we do not know or know insufficiently or inadequately.• H5: Understanding has a positive influence on Respecting.
Accessing the meaning and significance of heritage assets provides arguments for respecting them.• H6: Understanding has a positive influence on Valuing.
The keys to heritage significance provide arguments or criteria to identify why it is valuable.• H7: Respecting has a positive influence on Valuing.
In order to value a heritage asset, we must first respect it.We cannot value those cultural assets that do not deserve our respect.• H8: Valuing has a positive influence on Caring.
We tend to care only for the assets that we consider valuable.If we perceive that a cultural property lacks value, we will not tend to protect or preserve it; we will ignore it, or, at worst, we will destroy it.• H9: Valuing has a positive influence on Enjoying.
We tend to enjoy those assets that we consider valuable and possess positive qualities.• H10: Caring has a positive influence on Enjoying.In order to enjoy a cultural asset, it must have been previously cared for, either by former generations or by the people who bequeath it to us.• H11: Caring has a positive influence on Transmitting.
In order to be able to transmit heritage, it must have been cared for.We will not pass on heritage that does not deserve to be cared for.• H12: Enjoying has a positive influence on Transmitting.
We tend to pass on such heritage as we have enjoyed, with the intention of making others enjoy it too.• H13: None of the sociodemographic variables considered (age, gender, country, number of countries visited, area of residence, mother tongue, level of education) has a significant influence on the relationships between the latent variables in the model.

Methods
In this study we have used PLS-SEM (Partial Least Squares Structural Equation Modeling) as the general analysis strategy, due to the fact that the approach is preferably exploratory.Accuracy and statistical power have been guaranteed by taking into account the sample size.Data were collected using the Q-Herilearn scale, whose scores have demonstrated sufficient validity and reliability [36].Both reliability (internal consistency) and validity (convergent, discriminant) of the scores in the sample used have been analysed.

Sampling procedures
A convenience sample of N = 1454 people was used.All participants completed an online survey after providing the corresponding informed consent (https:// oepe.es/ escala-heril earn/) during the months May 2022 to April 2023.A total of 1454 responses were obtained, of which 1403 contained complete sociodemographic information.The participants were predominantly under 30 years of age (85.6%), female (69.8%) and resided in Spain (90.1%), more particularly in urban areas (79.4%), had Spanish as their mother tongue (80.3%) and had higher education qualifications (85.6%).

Sample size, power, and precision
In establishing the minimum sample size, consideration has been given to (a) the statistical power (at least 80%); (b) the effect size ( f 2 ≥ 0.35 ) and (c) the signifi- cance level ( α = 0.05 ).Despite the widespread practice of applying the 10-times rule to determine the minimum sample size in analyses using PLS-SEM [55, pp. 24-35], [56], various studies have shown that this strategy leads to a substantial underestimation of the minimum sample size.
Thus, and to assess the accuracy and statistical power obtained from the analysis given the sample size used, we performed a Monte Carlo analysis (10,000 replicates) using as population parameters the results referred to in Fig. 3, according to the recommendations of Muthén & Muthén (2002) [57].
The analysis in Mplus, v. 8.10 [58], converged successfully in 100% of replications.Population and estimated model parameter means were highly similar, indicating negligible estimation bias.Standard error estimation also showed minimal bias across parameters.Mean squared error (MSE) values were consistently near zero, confirming unbiased parameter estimation.Between 94% and 96% of replicates included the population value within a 95% confidence interval.Power was maximized (1.000) for population parameters above zero.For parameters at zero, the significance rate (0.05) was consistently maintained.In summary, Monte Carlo analysis suggests precise parameter estimates with high power and low type I error probability using this sample size.

Quality of measures and psychometric properties
Q-Herilearn [36] is a probabilistic summated rating scale designed to measure seven dimensions related to the learning process in heritage education.It is structured into seven factors: Knowing, Understanding, Respecting, Valuing, Caring, Enjoying, and Transmitting.Each dimension is measured by means of seven indicators scored on a 4-point frequency response scale (1 = Never or almost never; 2 = Sometimes; 3 = Quite often; 4 = Always or almost always).The wording of the items can be found in Tables 1-7 (Supplementary Materials, hereinafter referred to as SM).Sufficient evidence of content validity has been obtained through a concordance analysis -which employed multi-facet logistic models (Many Facet Rasch Model MFRM)-of the scores of 40 judges, who estimated the relevance, adequacy, and clarity of each item.The metric properties of the scores were determined using ESEM -Exploratory Structural Equation Modeling [59]-, EGA Exploratory Graph Analysis [60] and Network Analysis [61].The scale was calibrated using Item Response Theory models: the Nominal Response Model [62] and the Graded Response Model [63].
The results summarized in the previous paragraphs provide sufficient evidence of the reliability and validity of the scores obtained with the Q-Herilearn scale.

Procedure
Data were extracted from the LimeSurvey platform, transferred to R, and cleansed using three strategies: abnormal response filtering, multivariate outlier detection, and missing data processing.Abnormal response patterns, even in small proportions, can significantly distort data analysis results [64,65].To address this, we excluded cases with identical responses to all 49 items (Straight Lining) and calculated the polytomous mode of the l p z standardized likelihood ratio for each response vector [66,67].We then eliminated observations outside the 97.5th percentile of the chi-square distribution, identifying them as multivariate outliers [36].

Data analysis Analysis procedures and criteria
Data were analyzed according to the criteria specified in Table 1.The analysis procedures indicated in the "Procedure" column (global model fitting, measurement model evaluation, structural model evaluation, alternative model comparison and robustness checks) were carried out, and the criteria indicated were applied to ensure the adequacy of the results obtained.

Instead of alternative strategies such as Covariance-Based Structural Equation Modeling (CB-SEM), Partial
Least Squares Structural Equation Modeling (PLS-SEM) was used as a general analytical strategy.The reasons for using PLS-SEM [73][74][75][76], were as follows: (a) the analyses were aimed at testing a theoretical framework from an exploratory and predictive perspective; in this area of study, well-defined and tested models do not yet exist.Thus, a new emphasis has been laid on understanding the relationships between the latent variables, rather than on obtaining precise estimates of the parameters; (b) the model to be tested is complex, since it consists of a total of 49 indicators and 7 latent variables; (c) given the absence of previous research on this topic, the goal of our research is mainly focused on the construction of theories through a preferably exploratory approach; (d) it may happen that some of the constructs used are formative; (e) the PLS-SEM provides the possibility of obtaining factor scores, which could serve as a basis for further analysis in this field of research; and (f ) our data did not meet the requirements of the CB-SEM methodology, especially as regards multivariate normality: multivariate normality tests obtained values of T β (d) = 2.030 , p = 0.000 (Henze-Zirkler); H = 417.475, p = 0.000 (Royston), y skew = 690.219, p = 0.000 (Mardia Skew- ness), kurt = 7.871 , p = 0.000 (Mardia Kurtosis).

Reflective measurement models
In the first place, the evaluation of the seven reflective measurement models was carried out.The adequacy of these models was determined by following the usual empirical rules in the PLS-SEM literature [55,77,78].To ensure the stability of the results obtained, several robustness checks were performed to enrich the evaluation of the measurement models.In summary, the following analyses were completed: (1) Indicator reliability (2) Internal consistency reliability (3) Convergent validity evidences (4) Discriminant validity evidences (5) Loadings and cross-loadings

Reliability of the indicators
The first step in the evaluation of the reflective measurement model is to examine to what extent the variance of each indicator is explained by its construct, which is indicative of the reliability of the indicator itself.As can be seen in Table 2, most of the loadings were greater than 0.7071, indicating that the construct explains more than half of the variance of the indicator.This translates into acceptable reliability of the item.
Indicator res36 ("I respect all heritage assets even if their origin and meaning do not agree with my ideology", = 0.648 ) has the lowest explained variance, as expressed by the indicator with a value of 0.420 (= 0.648 2 ) , while indicator tra86 ("I share my interest in the heritage sites of my city / town in SN", = 0.830 ) has the highest explained variance, with a value of 0.689 (= 0.830 2 ).

Internal consistency reliability
Internal consistency reliability is the degree to which indicators measuring the same construct are associated with each other.([80]).As seen in Table 3 and Figure 2 (SM), these are mean values between Cronbach's α and Com- posite Reliability (ρ C ).
We used bootstrap confidence intervals -BCa method: Bias-corrected and accelerated bootstrap [55]-to test whether the construct reliability (CR) values were significantly higher than the recommended minimum threshold.As can be seen in Table 3, all upper CI values are greater than 0.7071 in α as well as in CR(ρ A ) and CR(ρ C ) .It can also be observed that the upper limit of the CI is in all cases lower than 0.95, which supports the stability of the results.

Evidence for convergent validity
Convergent validity (i.e. the degree to which each of the seven constructs converges to explain the variance of its component items) was assessed using the average variance extracted (AVE).The minimum acceptable AVE is 0.50 -an AVE of 0.50 or higher indicates that the construct explains 50% or more of the variance of its component items-.Table 3 shows that all factors have an AVE greater than 0.50, with the only exception of Respecting ( AVE = 0.491 ).However, the CI includes the value of 0.50 ( 95%CI = 0.466; 0.516 ).Therefore, we can state that we have sufficient evidence of convergent validity.

Evidence for discriminant validity
Discriminant validity refers to the degree to which a construct or latent variable is distinct from other constructs in the structural model, which means that it measures a single, separate concept [73].The evidence of discriminant validity has been evaluated using the following two procedures: the Forner-Larcker Criterion [81] and the Heterotrait-Monotrait Ratio HTMT [82].
The Forner-Larcker Criterion is a measure of discriminant validity that compares the square root of the Average Variance Extracted (AVE) of each construct with the intercorrelations of all other constructs in the model [73].We expect the square root of the AVE of each factor to be greater than all the correlations between factors (that is, values in the same row and column).
In Table 4 we see that the Forner-Larcker criterion is indeed met, since the values on the diagonal are greater than the values in the same row and column.However, some research [83] shows that the Fornell-Larcker criterion does not work well, especially when the loadings of the indicators in a construct differ only slightly (in our study, all the indicator loadings ranged between 0.648 and 0.830).
To solve this problem, Henseler et al. (2015) [83] proposed the Heterotrait-Monotrait correlation ratio (HTMT).HTMT is estimated as the ratio between the mean covariance across different constructs (heterotrait covariance) and the mean variance within each construct (monotrait covariance).It is used to evaluate the magnitude of the covariance between constructs in relation to the variability within each construct.
The HTMT relationship is mathematically defined as HMT = √ HAC ÷ √ AMC , where Average Heterotrait Covariance AHC is the average covariance between pairs of different constructs, and is calculated by taking the average of the covariance estimates between different constructs in the PLS path model; and Average Monotrait Covariance AMC is the average covariance within each construct, and is calculated by taking the average of the variance estimates for each construct in the PLS path model.
As can be seen in Table 4, the HTMT values span a range from 0.302 ( 95%CI = 0.264; 0.246 ) between Respecting and Caring to 0.793 ( 95%CI = 0.762; 0.825 ) between Knowing and Understanding.These values do not reach the 0.85 threshold (values greater than 0.85 indicate lack of evidence of discriminant validity)   [83], given that the upper limits of the 95% confidence interval of the HTMT are in all cases lower than 0.90 or 0.85.Following these results, we have sufficient evidence of discriminant validity in all seven measurement models.

Loadings and crossloadings
Finally, the differences between loadings and crossloadings were analyzed, leading to the following conclusions: (a) all loadings are significantly higher than crossloadings, which is proof of discriminant validity; (b) the magnitude of most loadings exceeded the value of 0.707 ( M = 0.748; SD = 0.047 ), while cross load- ings were much lower ( M = 0.410; SD = 0.110 ); (c) all loadings were statistically significant ( p ≤ 0.01 ); and (d) all loadings measuring each factor were high, with the exception of four items corresponding to the "Respecting" factor, which is evidence of convergent validity.In summary, the magnitude, significance, discriminant validity, and convergent validity of the loadings and crossloadings support the quality of the measurement models.

Structural model
Once we have ensured that the measurement of the seven constructs presents sufficient evidence of reliability and validity, we moved on to a systematic evaluation of the structural model.We used seven procedures: (1) Assessment of collinearity (VIF).
(2) Assessment of relevance ( β ) and statistical signifi- cance (t) of the structural model relationships.

Assessment of collinearity
First, we examined the possible presence of collinearity between the predictor constructs to determine whether its existence could bias the results.The evaluation of the significance and relevance of the paths in the structural model only makes sense if the absence of collinearity is guaranteed.
The VIF was calculated for both indicators (outer model) and paths (inner model).As can be seen in Table 5, all values were lower than 5, which implies the absence of collinearity [55].

Significance and relevance of the structural model relations
Statistical significance is based on standard bootstrapping errors as a basis for calculating the t-values of the path coefficients with their corresponding confidence intervals.A path coefficient is significant at the 5% level if the zero value does not fall within the 95% confidence interval.
As for the largest magnitudes, they confirm the order proposed for the first part of the HLS: Knowing-Understanding-Respecting, which indicates that respect for heritage necessarily depends on previous processes of understanding and knowledge.The binomial Knowing-Understanding is particularly solid, thus confirming that these are the most relevant links in the sequence (highest values) and that their position at the beginning of that sequence is relevant; it is not by chance that these verbs are the most frequently used in heritage teaching-learning processes.As for the weakest impact relations, although they confirm a direct match between the sequence's binomials (e.g., Knowing-Valuing; Knowing-Caring), they suggest that, between both learnings, there may be others that function as mediators (in this case, at least, Understanding-Respecting; and Understanding-Respecting-Valuing-Enjoying) and feature prominently in the sequence.In sum, none of the values contradict the order proposed in the HLS, but rather confirm it and, in addition, suggest other direct sequential orders between pairs of the HLS verbs that, without contradicting the initial order proposed in the theoretical models, suggest secondary learning itineraries that should be explored (see Fig. 3).
The influence across most of the dimensions of the sequence is moderate ( β between 0.3 and 0.7) except in terms of the differences detected between Knowing ⇒ Valuing ( β = 0.187) and Caring ⇒ Enjoying ( β = 0.210).In the case of the dimension termed Knowing, we found a larger influence on the verbs occupying the first part of the HLS sequence: Knowing-Understanding-Respecting (that is, knowing ⇒ understanding with β = 0.698; Understanding ⇒ Respecting with β = 0.591 and respect- ing ⇒ value with β = 0.370).This confirms the sequential order articulated in the theoretical model [33], although it introduces direct relations with verbs in the second part of the sequence that had not been anticipated (e.g., Knowing ⇒ Caring with β = 0.327; Knowing ⇒ Enjoy- ing with β = 0.370; and Understanding ⇒ Valuing with β = 0.331).This clearly indicates that knowledge can lead to valuing, caring for, and enjoying heritage in a straightforward way (without the mediation of understanding and respect).Such a direct influence can be conditioned in turn by the nature of that knowledge (i.e., conceptual, procedural, attitudinal, experiential) and by the impact it produces on the learning subjects, which will be greater the more experiential knowledge is and lesser the more conceptual it becomes; we could regard this subsequence or learning itinerary (Knowing ⇒ Enjoying), as experi- ential or non-comprehensive knowledge.Regarding the second part of the HLS (Valuing-Caring-Enjoying-Transmitting), we find that the influence between Valuing ⇒ Enjoying ( β = 0.367) is slightly higher than that detected between Valuing ⇒ Caring ( β = 0.321), suggesting that we tend to enjoy and care for in a very similar way what was previously valued, either by others or by ourselves.If we look at the mutual relations between these two dimensions (Caring ⇒ Enjoying), it appears that it is quite low ( β = 0.210), which suggests that either it is a very close-to-home dimension (we only take care of the heritage we enjoy, or we enjoy the heritage that has been taken care of ) or that in the theoretical model of the second part of the sequence it could be altered so as to become Valuing-Enjoying-Caring-Transmitting.
In terms of statistical significance, all paths have been found to be statistically relevant with α ≤ 0.05 , as the estimated values of t from the bootstrapping should exceed the value of 1.96.None of the confidence intervals include the value 0.

Explanatory power assessment
Next, the R 2 (in-sample explanatory power) values of the endogenous constructs were examined.R 2 measures the variance that is explained in each of the endogenous constructs and, therefore, is a measure of the explanatory power of the model itself [84].R 2 ranges from 0 to 1, with higher values indicating greater explanatory power.As a rule of thumb, R 2 values of 0.75, 0.50, and 0.25 can be considered, respectively, substantial, moderate, and weak [16,85].As can be seen in Table 5, three of the values (corresponding to Enjoying, Transmitting, Valuing) greatly exceed the value of 0.5, while the other three (Understanding, Caring, Respecting) obtain lower magnitudes.In general, it can be stated that the explanatory power of the model is satisfactory.
The R 2 analysis was complemented with the calculation of the effect size f 2 of the predictor constructs to check how removal of a predictor construct affects the R 2 val- ues.This metric allows us to analyze the relevance of the constructs in the explanation of endogenous factors.In other words, the point is to determine the extent to which a predictor contributes to the R 2 value of any endogenous factor within the structural model.According to the literature [86], values of 0.02 are interpreted as small, values around 0.15 as medium, and values of 0.35 or higher as large.As can be seen in Table 5, of the 12 paths analyzed, five are small, four are medium-sized, and three are large, particularly Understanding ⇒ Respecting (f 2 = 0.558) and Knowing ⇒ Understanding (f 2 = 0.917).

Predictive power assessment
The predictive power of the model was evaluated using the predict_pls function of the SEMinR package.
Predictions were generated with k = 10 folds and 10 rounds.The technique used was predict_DA (Direct Antecedents).First, the prediction error distributions were evaluated to determine the most appropriate metric to assess the predictive power.As an example, the results for the endogenous variable Transmitting show that the distribution in all cases is symmetric (skewness ranges between 0.26 and 0.77 and kurtosis between − 0.03 and 0.89).
Secondly, we applied the Q 2 blindfolding algorithm [87,88] to assess the predictive relevance of the model.Q 2 values of must be greater than zero to indicate the Q 2 values range from −∞ to 1, with values closer to 1 indicating better predictive relevance.As a general rule, values of Q 2 greater than 0.00, 0.25 and 0.50 represent small, medium, and substantial predictive relevance of the PLS-path model [77,89,90].
As can be seen in Table 7, the Q 2 values of the six endogenous constructs are considerably higher than zero.More specifically, Enjoying has the highest Q 2 value (0.393), followed by Transmitting (0.329), Valuing (0.314), Understanding (0.255), and finally Caring (0.189) and Respecting (0.162).These results support the predictive relevance of the six endogenous latent variables used in the model.
The effect size values q 2 were calculated manually using the formula ) .For each endogenous variable, two estimations of the model were made: the first one without the antecedent latent variable ( Q 2 excl ) and the second one with the antecedent latent variable ( Q 2 incl ).The results can be seen in Table 7.The effect sizes of q 2 were high in Knowing ⇒ Understanding ( q 2 = 0.342 ); of medium size in Know- ing ⇒ Enjoying ( q 2 = 0.295 ), Understanding ⇒ Respect- ing ( q 2 = 0.1936 ), Caring ⇒ Enjoying ( q 2 = 0.157 ), and Caring ⇒ Transmitting ( q 2 = 0.125 ).The rest of the q 2 values have been small [86].

Overall model fit assessment
Regarding the fit indices, the following values were obtained: SRMR = 0.067 ; d_ULS = 5.491 ; d_G = 0.792 y NFI = 0.828 .These values can be described as mod- est.However, it should be noted that, unlike covariancebased SEM, PLS-SEM does not rely on model fit to assess model quality.Instead, it uses a combination of predictive performance measures.
The reason why fit indices are not essential in PLS-SEM is that the technique is designed primarily for predictive rather than confirmatory purposes.The PLS-SEM is used to identify the most important relationships between component variables and to estimate the strength and direction of these relationships, as we have shown in previous paragraphs.
In summary, following the results in collinearity, relevance, and significance of the path coefficients β , explanatory power R 2 , predictive power coefficient Q 2 , effect sizes f 2 and q 2 , and overall fit, the proposed struc- tural model has proven itself capable of providing sufficient guarantees to explain the relations across the

Comparison of models
The original model was compared with a more simplified model (see Fig. 4), which reproduces the relations between the latent variables specified in Fontal (2003) [33].
Each of the two models was estimated separately.The results for the BIC values (Model 1: BIC = −1197.197; Model 2: BIC = −831.844) suggest that Model 1 is supe- rior to Model 2 in terms of fit.This conclusion is corroborated by the relative probabilities of the models.The BIC-based Akaike weights resulted in 0.993 for Model 1 and 0.001 for Model 2. Model 1 has a very strong weighting, so we can conclude that it is superior to Model 2.

Robustness checks: IPMA
Finally, we supplemented the analyses described in previous paragraphs with the IPMA (Importance-Performance Map Analysis), following the recommendations of Hair et al. (2018) [91].
We first checked whether the IPMA requirements were met.Its items have a 4-point Likert format, and all have positive valence (i.e., higher score means higher trait level).Additionally, the polychoric correlations between the items were positive in all cases, ranging from 0.308 to 0.793.The signs of the external weights were all positive, ranging between 0.143 (car58) and 0.250 (res32).Therefore, we retained all indicators for the analysis.
As can be seen in Figure 1 (SM), Caring ( TE = 0.547 ) is especially important for the prediction of Transmitting, as are, to a lesser extent, Knowing ( TE = 0.484 ) and Enjoying ( TE = 0.389 ).Valuing ( TE = 0.314 ), Under- standing ( TE = 0.176 ) and Respecting ( TE = 0.121 ) have a much lower relative importance.However, it should be noted that Caring is the latent variable with the lowest level (the performance score for this latent variable is 23.909).
Extension of IPMA to the indicator level We extended the IPMA analysis to the indicator level in order to identify specific areas relevant for improvement.
In this way, the relative importance of the indicators in a specific measurement model can be traced.This importance equals the total effects of the indicators on the target construct.The indicators that obtained higher importance scores were car59 -"I share news on the web about heritage conservation that can help others to learn about ways of caring" ( TE = 0.104)-; car64 -"I feel the need to protect the heritage of my environment through digital environments" ( TE = 0.124)-; and car63 -"I use digital environments so that heritage in my environment is not lost or forgotten" ( TE = 0.125)-.

Table 7
Construct crossvalidated redundancy and communality, and q 2 effect size

Fig. 4 Model comparison
The indicators with the highest performance scores were res33 -"People can express different opinions about heritage in digital environments" ( P = 66.009)-, res36 -"I respect all heritage assets even if their origin and meaning do not match my ideology" ( P = 76.166)- and res30 -"I have a respectful attitude towards the diversity of personal heritages" ( P = 76.846)-(see Fig- ure 1 SM).

Invariance assessment and multigroup analysis
A final analysis was performed to test whether the effects of the proposed model differed significantly as a function of the gender of the participants (Women, N = 976 ; Men, N = 410 ).The analysis was carried out using the MICOM composite model measurement invariance procedure: [92], which is based on the permutation method [93].This method randomly exchanges observations between groups and re-estimates the model after each permutation.Calculating the differences between the eigenpath coefficients of each group in each permutation ( N = 1, 000 ) allows us to check whether such differences also occur in the population.The analysis was based on a multigroup analysis.As shown in the results (Tables 8  and 9 SM), configurational invariance has been established, since the same indicators have been used for each measurement model, and the data treatment, algorithm settings, and optimization criteria have been identical.
The results of compositional invariance (that is, the hypothesis that the composite scores are identical in both groups) indicated that there were no significant differences between the groups.Therefore, the results supported the partial measurement invariance.
Once partial measurement invariance has been established and, in addition, the composite scores have equal mean values and variances in all groups, we can claim that full measurement invariance has been confirmed.
These results have been corroborated by Consistent Bootstrap multigroup analysis.As shown in Table 8 (SM), the apparent differences in the paths between men and women were not statistically significant according to the results of the Permutation Test [93], the PLS-MGA [85], the Parametric Test [94] and the Welch-Satterhwaite t-test [91].
The same analysis was carried out for the rest of the sociodemographic variables (country, number of countries visited, area of residence, mother tongue and educational level).In all cases, the results were similar to those found in the first analysis.

Support of the original hypotheses
The 13 hypotheses of the study, which deploy the Heritage Learning Sequence (HLS) and describe the interactions that make up the Heritage Process Model (HPM), are confirmed insofar as the values obtained in the different reflective measurement models (Reliability of the indicators, Reliability -internal consistency-, Evidence of convergent validity, Evidence of discriminant validity, and Loadings and cross loadings) corroborate the positive influence of the verbs that make up the sequence itself and confirm that the measurement of the seven constructs presents sufficient evidence for reliability and validity.
On the other hand, systematic evaluation of the structural model ratifies that the 12 influence relations described between the verbs (dimensions) have adequate values for collinearity (VIF), relevance ( β ) and statistical significance (t) within the structural model HPM.Furthermore, the HPM model exhibits adequate explanatory power ( R 2 ), predictive power ( Q 2 ), and overall fit (SRMR, d_ULS , d_G , χ 2 , NFI).These analyses provide sufficient guarantees to validate the influence across the main verbs (latent variables) defining heritage learning, as well as its sequenced order (i.e., HLS) -a set of relations together with the hierarchy that underpins them-.Our findings have enabled us to fulfill the first of the goals of the present study: (a) to understand the heritage learning process and to identify all the verbs involved, as well as the influence relations that take place between them.
Nevertheless, the statistical significance values, particularly those obtained using the Bootstrap method, support the existence of other direct relationships among the verbs in the sequence that match specific theoretical approaches (e.g., Knowing-Enjoying evidences the presence and relevance of the experiential learning of heritage with regard to conceptual learning).This finding points to the importance of identifying effective heritage learning subsequences that can be measured in studies specifically focusing on each of them and/or the comparison between several of these subsequences.This outcome of our study addresses the third research goal: (c) to identify possible submodels or subtypes within this organizational structure that explain the various heritage learning itineraries, adjusted to groups, geopolitical contexts, or educational environments.
Regarding the proposed structural model (HPM), it makes it possible to go beyond the basic theoretical framework that had been advocated by the standard theoretical studies since 2003 following the comparison with alternative, more complex models.as well as the implementation of subsequent robustness checks.The results show that the basic theoretical model (Model 1, see Fig. 4) is sufficiently reliable insofar as its values support the predictive relevance of its six endogenous latent variables.This model should be completed on the basis of results obtained from the proposed structural model (HPM) because (a) it shows a better fit and (b) it is more comprehensive, since it has the capacity to respond to a greater variability of heritage teaching-learning situations -an aspect that should be the subject of a specific theoretical reflection and empirical analysis in further exploratory studies-.This part of our research has made it possible for us to meet the second objective of the study: (b) to demonstrate that a theoretical model that has been accepted by the scientific and educational community (HLS) is also validated through a structural model and by using different measurement models.
In short, the results presented here seem to be fundamental in determining the basic framework for the design of tools aimed at evaluating heritage learning outcomes.The combination of the structural model and the different measurement models produces an Overall Model (i.e., the Heritage Process Model-HPM) that provides (a) the basic structure for heritage learning (i.e., the Heritage Learning Sequence-HLS), (b) a sequential order endorsed by empirical data for the verbs making up that sequence -and their influence relations-that measure heritage learning, and (c) distinct measurement models for every dimension of these learnings.

Implications
The results of this study, in addition to the measurement of learning outcomes, outline an avenue of knowledge transfer and therefore have a direct application in the design of comprehensive, effective and efficient heritage learning assessment tools.This study will lead to the generation of a model (i.e., the Heritage Process Model-HPM) that can be operationalized in all heritage teaching-learning processes, regardless of the context or the people to whom the heritage education programs are addressed, since it is based on a basic framework for heritage learning (i.e., the Heritage Learning Sequence-HLS).Undoubtedly, having a theoretical model that is adjusted to a structural model (HPM) which, in turn, has been tested on several measurement models (Q-Herilearn) will make it possible to design and implement heritage education proposals around a regular structure (HPM) and provide measurement instruments that are accurate and whose output data are valid and reliable.

Limitations
This study has some limitations.First, an incidental, or non-probabilistic, sample was used.It should be emphasized that the non-probabilistic nature of the sample may impact the external validity of the results, despite Monte Carlo analysis demonstrating that the chosen sample size (N) ensures suitable statistical power.In this regard, attention is drawn to the primary limitations of the study, which include: (a) a restricted capacity for generalizability due to the potential lack of representation of population traits, diversity, or demographics; (b) selection bias: utilizing an online survey as the data collection tool may lead to overrepresentation of specific population segments within the sample; (c) limited variability: The observed lack of diversity within the sample may constrain response ranges and compromise the generalizability of the findings.Given these constraints, it is advisable for future research to adopt probability sampling techniques, incorporating random selection processes to ensure the acquisition of a representative sample.
Secondly, the use of PLS-SEM instead of CB-SEM could imply some kind of constraint (e.g., less robust methods to evaluate model fit, nonparametric approximation for parameter estimation, a possible lower efficiency of parameter estimates, or less rigorous statistical inferences when estimating model parameters and performing significance tests).We have tried to respond to these limitations in the section dedicated to the justification of our analytical strategy (see Sect. 2.4.2).In any case, it would be interesting to compare the results of the PLS-SEM analysis with those obtained by means of the CB-SEM analysis.
Thirdly, we did not assess whether participants had prior training in heritage education.Since this variable could influence the results, it would be advisable to consider it in future research on this subject.

Relevance of the results and future research avenues
It has been possible to generate an explanatory model of learning processes in heritage education that presents three differential features with regard to other studies: (1) It stems from a consolidated theoretical model.
(2) It is comprehensive/absolute in that it does not focus on partial dimensions but encompasses the entire cycle of heritage learning.(3) It possesses structuring power, allowing for broad applicability and tailored adaptation to diverse educational frameworks.
In addition, unlike other instruments currently available, it exhibits sufficiently robust guarantees of rigor in the calibration phase, which provide sufficient metric evidence of validity and reliability.All of this, in turn, provides a structure capable of ordering the practical sequences in heritage education programs and of generating specific instruments to evaluate the learning outcomes of heritage-related teaching in each of these programs.
We would like to highlight several key points concerning the relevance of the results obtained: (a) The development of an exploratory model represent a significant advancement in heritage education research by providing a robust foundation for understanding learning processes within this domain.(b) The model is comprehensive in nature, covering the entire heritage learning cycle rather than focusing on isolated dimensions.This holistic perspective enables a more nuanced understanding of how heritage education unfolds in different contexts.(c) The emphasis on structural power facilitates the generalizability and adaptability of the model to different educational contexts.This aspect underscores the potential for wider application to heritage education programs and initiatives.(d) The study addresses a gap in the field, as no measurement instruments with sufficient metric guarantees have been developed to date.(e) The findings of this research have practical applications.The results can serve as a guide for the design and implementation of heritage education programs.
Future lines of research derived from the results of the present study are geared towards: (2) The design and evaluation of programs based on a hierarchical order -and, therefore, on prioritization of the main verbs featuring in the design (what to teach-learn in the field of heritage and in which order to teach-learn)-.(3) The construction of essential indicators focused on heritage learnings (how to teach-learn heritage contents following a structuring order) that are supported by the measurement models analyzed in this study.(4) The exploration of different submodels of heritage learning by following specific assessment instruments derived from the proposed structural model.(5) The theoretical exploration of relationships between concepts that shape the idea of heritage following the structural models and the measurement models generated in this study.

( 3 )
Respecting: The result of the behavior or set of positive behaviors towards the physical and conceptual integrity of heritage.(4) Valuing: To identify and project positive qualities on cultural heritage.(5) Caring: To conserve and preserve both physically and conceptually cultural heritage.(6) Enjoying: To appreciate the values of cultural heritage as part of the individuals' experience.(7) Transmitting: To bequeath to other persons and/or generations our cultural heritage together with its values.

Table 6
Path coefficients and Total effects in the structural model O = Original sample; M = Sample mean; SD = Standard deviation; t = t statistics; ( | O ÷ SD | ); CI = Confidence Interval predictive precision of the structural model for a specific endogenous construct.

( 1 )
The measurement of the model's specific capability for adequacy at different scales of measurement focused on (a) the several dimensions of the model on an individual level (i.e., Knowing, Understanding, Respecting...) and (b) the adaptation of the base structure to specific educational designs and its potential to generate new items.
Table 3 contains the ordinal Cronbach values α , the Composite Reliability and the Average Variance extracted.Composite Reliability ( ρ C ) ranged from 0.871 (Respecting) to 0.917 (Enjoying).Therefore, no problematic values were obtained, which would indicate item redundancy that would negatively affect the evidence of construct validity [79].Cronbach's ordinal α values ranged from 0.828 to 0.894.Column (ρ A ) represents the approximately accurate measure of Composite Reliability proposed by Dijkstra & Henseler (2015)

Table 2
Indicator loadings

Table 5
Collinearity analysis, Effect Size f 2 and R 2 values VIF = Variance Inflation Factor; End.Var.= Endogenous Variable