Prediction model of the hardness of waterlogged archaeological wood based on NIR spectroscopy

The significance of waterlogged archaeological wood (WAW) lies in its profound informational value, encompassing historical, cultural, artistic, and scientific aspects of human civilization, and therefore need to be properly studied and preserved. In this study, the utilization of near-infrared (NIR) spectroscopy is employed as a predictive tool for assessing the hardness value of WAW. Given the submerged burial conditions, waterlogged wooden heritage frequently undergo substantial degradation in their physical and mechanical properties. The mechanical properties of waterlogged wooden heritage are essential for evaluating their state of preservation and devising appropriate conservation and restoration strategies. However, conventional methods for testing mechanical properties are limited by factors such as the availability of adequate sample size and quantity, adherence to the “principle of minimum intervention,” and cost considerations. NIR spectroscopy is a non-destructive, rapid, sensitive, and low-cost analytical technique with great potential for application in this area. In this study, two large and significant ancient Chinese shipwrecks were investigated. One hundred ninety-seven samples were collected and analyzed using NIR spectroscopy and a portable C-type shore hardness testing method. A partial least squares (PLS) regression model was developed to predict the hardness of the WAW. The model was optimized and validated using different preprocessing methods and spectral ranges. The results indicate that the best models were obtained with first derivatives + multiple scattering corrections (MSC) and first derivatives + standard normal variate (SNV) preprocessing in the 1000–2100 nm spectral range, both with an R2c of 0.97, a root mean squared error of correction (RMSEC) of 2.39 and 2.40, and a standard error of correction (SEC) of 2.40 and 2.41. Furthermore, they exhibited an R2v of 0.89 and 0.87, a root mean squared error of cross-validation (RMSECV) of 4.43 and 4.67, a standard error of cross-validation (SECV) of 4.45 and 4.68, and RPD values of 3.02 and 2.88, respectively. A coefficient of determination of the established prediction model (R2p) of 0.89 with a relative standard deviation for prediction (RSD) of 6.9% < 10% was obtained using a sample from the prediction set to predict the established model inversely. These results demonstrate that NIR spectroscopy could enable a rapid, non-destructive, and accurate estimation of the hardness of WAW. Moreover, by carefully choosing appropriate preprocessing techniques and spectral ranges, the predictive capabilities and accuracy of the model can be further enhanced. This research also contributes to the development of a theoretical framework and a methodological approach for future studies in this field. Furthermore, the data obtained from this study are crucial for determining effective preservation strategies for waterlogged archaeological wood.


Introduction
Wood, a natural polymeric organic material, plays a significant role in human activities and the development of civilization because it is environmentally friendly, resource-rich, and has biocompatible properties [1,2].Furthermore, wood was used as the main constructionmaterial in ancient ships, which are of great conservation and research importance, as they are an important part of the historical and cultural heritage of mankind.However, the cell morphology and chemical structure of wood can be affected by biological hazards and natural degradation during long-term water burial [3], leading to the decomposition of the main components of wood cells such as hemicelluloses, cellulose, and lignin [4,5].Degradation increases the porosity and hygroscopicity of the wood tissue, and decreases its density, and ultimately causes a significant reduction in its mechanical properties [6], and finally causes a significant reduction in its mechanical properties [7].Because the mechanical properties directly determine the service life of wood, they are an important index and essential data for evaluating the state of preservation of wooden heritage and formulating protection and restoration schemes [8].However, due to WAW's scarcity, destructive research should be avoided when evaluating its mechanical properties [9].
Conventional mechanical testing of wood is typically performed using a universal mechanical testing machines; however, large sample sizes and destructive tests often consume large amounts of archaeological wood to obtain mechanical data for aforementionend conventional testing methods according to national standerd [10].To reduce the consumption of relics during the evaluation of mechanical properties, researchers have experimented with various microdamage methods for mechanical testing, such as nanoindentation (NI) [1] and static thermomechanical analysis (TMA) techniques [11].Although these methods reduce the number of cultural relic samples consumed during mechanical property evaluation, some waste is still involved in the testing and processing stages.Additionally, samples prepared according to specific test specifications are often difficult to reuse, and preventing damage to the integrity of wooden heritage proves challenging.To maximize the value of these valuable mechanical property data, researchers have attempted to combine mechanical property results with nondestructive assessment methods to construct mechanical property prediction models.This approach can enhance the speed and efficiency of mechanical property predictions in future studies while minimizing the number of wood relic materials consumed.Near-infrared spectroscopy (NIR) is a rapid and nondestructive technique for evaluating the properties of organic materials.This is reflected in the characteristic absorption of hydrogen-containing groups such as CH, OH, and NH in the near-infrared region [12,13].Since it was first proposed in the 1960s [14], NIR technology has been widely applied in various fields, including food, medicine, tobacco, petrochemicals, and agriculture [15,16].At present, this technology is widely used in the performance evaluation of sound wood [17,18].It can quickly and accurately predict the chemical composition [19,20], wood species [21], density [22], moisture content [23][24][25], microfibril angle [26], and mechanical properties [27][28][29][30] of wood.However, using different wood species and analytical methods may lead to different prediction results [31].In addition, the overlap of several spectral ranges can make it difficult to identify the differences [32], and the roughness of the sample surface and moisture can affect the reflection and absorption of the NIR spectra [33].Researchers have used multiplication scatter correlation (MSC) and standard normal variate (SNV) to reduce particle size, surface scattering, and optical path variations in NIR spectra to address these issues [34,35].Furthermore, the effects of baseline drift or smoothing of background interference are eliminated by preprocessing methods, such as first and second derivatives, to improve the resolution of the raw spectra [36].Although the noise and interference of the spectrum can be effectively reduced using preprocessing methods to improve the accuracy of the prediction model [37,38], only a few studies on the NIR prediction model for WAW exist, mainly focusing on wood aging [39][40][41][42], fungal degradation [43][44][45] and heat treatment [46,47].Using NIR combined with chemometrics, Chen et al. successfully developed an orthogonal partial least squares discriminant analysis (OPLS-DA) model for archaeological hardwoods and softwoods and a predictive model for their degrees of degradation [32].Yonenobu et al. found that a 1300-year-old archaeological wood contained less hemicellulose and cellulose and more lignin than sound wood [48].According to Pecoraro et al., NIR spectroscopy was used to analyze decayed wood that was stored under waterlogged conditions for long periods of time [49].The above studies confirmed NIR spectroscopy is reliable for detecting the relative chemical composition of decayed wood.However, due to the difficulties of obtaining archaeological wood and performing data analysis, this technique has yet to be widely applied to predict mechanical properties for wooden heritage [50].
The present study employed NIR spectroscopy and chemometric methods to develop a hardness prediction model for WAW, using one hundred ninety-seven samples from two significant ancient Chinese shipwrecks.Meanwhile, the effects of preprocessing methods and band selection on the NIR prediction model were examined.

Wood samples
In this study, 30 wooden shell platings of the Chinese cruiser Chih Yuen shipwreck from the Qing Dynasty (1885-1894CE) [51] and Nanhai No.1 shipwreck from the Southern Song Dynasty (1127-1279CE) [52,53] were selected as research subjects.All samples were identified as Pinus spp according to anatomical microscopic features [32,52].The shell platings were retrieved from the "Nanhai No. 1 Shipwreck Site"(located near Zhanjiang City, Guangdong Province, China) and the "Chih Yuen Shipwreck Site"(located near Dandong City, Liaoning Province, China) (Table 1), and were irregular blocks with relatively flat surfaces [53].The samples in a waterlogged state were encapsulated in plastic bags and vacuumed.Finally, it was placed in a refrigerator freezer at 10 ℃.

Maximum water content and basic density
Maximum water content (MWC) and basic density (BD) were selected as two representative physical properties of the degree of degradation of waterlogged archaeological wood (WAW).There are five classes of the state of preservation (The class in the following text is equivalent to state of preservation): "less than 135% for the class 0-MWC; 135-225% for the class 1-MWC; 225-350% for the class 2-MWC; 350-500% for the class 3-MWC; more than 500% for the class 4-MWC" [54].Shell platings Nos. 10 to 30 were tested for MWC and BD as follows: Four samples were taken from each shell plating for the MWC and the BD analysis, and their average value was used to characterise the degradation degree of each shell plating.They were tested in the same way as shell platings Nos. 1 to 9. The calculation methods are based on Eqs.(1) [55,56] and (2) [57]: (1) where m 0 is the constant weight of the sample at 102 ± 3℃ (g), V max is the volume of the waterlogged sample measured by the water displacement method (cm 3 ), and m max represents the mass of the waterlogged sample (g).

Hardness test
The Shore hardness tester is a commonly used tool for determining the hardness of rubber and plastic materials.In this study, the hardness of WAW is relatively lower than that of sound wood, with a surface that feels similar to that of sponge or rubber.Therefore, the C-type Shore hardness tester (Mitutoyo, Japan) was used to determine the surface hardness of the samples, based on the scope of application of this type of instrument.This type of Shore hardness tester consists of a hemispherical indenter with a diameter of 5 mm, and works by measuring the depth of the indenter pressed into the WAW, which is converted into a certain unit of hardness (HC).Note that the force applied should be just enough to bring the anvil and sample into complete contact and that the reading must be taken within 1 s after the anvil and sample have been fully pressed together.In order to pursue "the principle of minimal intervention" in heritage conservation, this approach was adopted without cutting the samples so that the samples could also be used for other studies.
The tests were performed by fixing the waterlogged shell platings sample on the experimental bench and placing a flexible cotton grid on its surface (each grid cell was a 2 × 2 cm square), keeping the testing area defined.Five to eight hardness measurements were taken within each grid cell and averaged, resulting in a total of 197 grid cells on radial sections of 30 shell platings.Care was taken to avoid areas with obvious signs of decay, such as discoloration and the presence sulfur iron compounds.The samples remained waterlogged during the test.

NIR spectra acquisition
NIR spectral data were collected on the same grid cell surface area of the hardness test using a portable NIR spectrometer (TerraSpec4 Hi-Res, Malvern PANalytical, UK) with a spectral range of 350-2500 nm and a scan rate of 100 ms.The device has a resolution of 3 nm@700 and 6 nm@1400/2100 nm.During spectral acquisition, the ambient temperature was maintained at (25 ± 2 °C) and the average relative humidity was around 20%.During the test, a wet towel was used to wipe away any running water from the sample surface, and then the NIR spectra were taken immediately after the hardness test.
The NIR spectra acquisition time within each grid cell was limited to 2 min to ensure that the samples remained waterlogged throughout the NIR spectra acquisition process.Ten NIR spectra were averaged as one after collection at different locations within the same grid cell.

Data analysis
Principal component analysis (PCA) and orthogonal partial least squares discriminant analysis (OPLS-DA) were employed for qualitative analysis, while PLS was used for quantitative analysis.The software Origin2021 (Origin Lab, USA) was utilized for principal component analysis, spectrum drawing, and significant difference analysis.The discriminant models were established using SIMCA-14.1 software (Umetrics, Umeå, Sweden).The collected near-infrared spectral data were preprocessed and quantitative models were established using the chemometrics software The Unscrambler X 10.4 (CAMO, Norway).

PCA and OPLS-DA
The purpose of performing PCA on spectral data is to visualize samples with varying degrees of degradation in multiple dimensions, and to better understand them through an OPLS-DA discriminant model.This allows for exploration of the correlation between the hardness and the degree of degradation, providing a basis for establishing a quantitative prediction model.The quality of the PCA model is evaluated using R 2 X and Q 2 .R 2 X reflects the total variability of x and the degree of optimization of the model.Q 2 represents the cumulative contribution rate, describing the cumulative predictive ability of the model and the accuracy of the predictions.The closer R 2 X and Q 2 are to 1, the better the model.The OPLS-DA model's goodness-of-fit and reliability were evaluated using R 2 X cum and R 2 Y cum , while the predictive ability was evaluated using Q 2 cum .To check for potential overfitting, a permutation test was performed 200 times.The outcomes are represented by the R 2 and Q 2 interception values on the Y-axis.For a valid model, the interception of R 2 should be below 0.30, and for Q 2 , it should be below 0.05.

PLS regression analysis
In this study, the model is optimised by using a selection of preprocessing methods such as first derivative, second derivative, multivariate scattering correction and standard normal variate, combined with the different spectral ranges.The 197 WAW NIR spectroscopy data and hardness data were randomly divided into a calibration set and a prediction set in a ratio of 4:1, and the calibration set data were used to build a NIR prediction model for the hardness of the WAW, and the calibration model was validated using full cross validation.Finally, the model is externally tested using the prediction set data.The data were imported into Unscrambler X 10.4 (CAMO, Norway) for preprocessing and modeling.Based on the results of the established calibration set, validation set and prediction set models, the effects of different preprocessing methods and spectral ranges on the model quality were analyzed.The partial least squares (PLS) was used to model, as it can effectively solve the problem of the large amount of information in the NIR spectroscopy, eliminate the influence of external noise to a certain extent, improve data accuracy, and correlate the independent and dependent variable matrices to obtain the best model [56].Whether the predictive performance and accuracy of the NIR prediction model established by the partial least squares method reach the standard of practical application requires a certain index to evaluate the model.
In this paper, coefficient of determination of calibration (R 2 c), coefficient of determination of cross-validation (R 2 v), coefficient of determination of prediction (R 2 p), root mean square error of calibration (RMSEC), root mean square error of cross validation (RMSECV), root mean square error of prediction (RMSEP), standard error of calibration (SEC), standard error of cross validation ( SECV), standard error of prediction (SEP), ratio performance of deviation for cross validation (RPD) and relative standard of deviation for prediction (RSD) are used as the evaluation indexes of the model prediction effect.When the RPD value is greater than 2.5, the predictive performance of the model is satisfactory; if the RPD value can reach about 1.5, NIR can be used as a preliminary assessment tool [58,59].When the RSD value is less than 10%, the established model can be used for actual estimation [60].

Hardness of waterlogged archaeological wood (WAW) and its variation in different degradation classes
As shown in Table 2, the MWC values of samples Nos. 10 to 30 in this study ranged from 218 to 631%, and the BD values ranged from 0.15 to 0.36 g/cm 3 .Then combining the data from Chen et al. study (samples Nos. 1 to 9) in Table 1 and classifying the state of preservation of samples Nos. 1 to 30 according to the method described in Sect."Maximum water content and basic density", it can be seen that the state of preservation of all the samples ranges between 0 and 4. Furthermore, an analysis of the variability between the groups was performed on the hardness data for the different state of preservation.Figure 1 shows the graded box line-normal distribution plots and ANOVA results for the degrees of degradation of the WAW samples, with mean hardness values of 85.4 HC, 80.6 HC, 75.4 HC, 61.8 HC, and 53.1 HC respectively for classes 0, 1, 2, 3 and 4. Of the five classes of samples, there were significant differences in the hardness data for all grades except between classes 0 and 1, where there was no significant difference (their letters representing variability are all A, p < 0.05).The distribution of the hardness data for each class indicates that, except the Class 4, in which data was more concentrated, the remaining classes presented data points that were relatively scattered, suggesting a nonuniform degradation of the samples from Class 0 to Class 3. Overall, there was a correlation between the degree of degradation of the samples and their hardness; the hardness decreased as the degree of degradation increased.On the other hand, the sample data for each grade span a wide range, i.e., there are high standard deviations (STDV), which can be attributed to several factors related to the natural variability of the wood (earlywood/latewood, sapwood/heartwood, etc.).Even with the influence of factors unrelated to the degree of degradation, the results of the analysis of the hardness data proved their reliability in assessing degradation of WAW.
Figure 2 presents the raw spectra obtained by averaging the spectra for each class, visually illustrating the differences between the five classes of samples in different spectral ranges.The results are similar to those of the variance analysis for hardness.The NIR spectra of classes 0 and 1, and 3 have similar peak shapes, characteristic peak positions, and peak heights.Although the results of the hardness analyses showed four categories (0 + 1, 2, 3 and 4) and the raw spectra reflected only three categories (0 + 1, 2 + 3 and 4), these results still indicate that the NIR spectra of WAW with the same mechanical properties are similar [61].On the other hand, NIR spectra are rich in information, but also complex, which necessitates further analyses by chemometrics.
Table 3 shows the results of the hardness data, with a range of 44.1-92.6HC for the calibration set and 45.7-91.4HC for the prediction set.The standard deviations for the calibration and prediction sets were 13.5 HC and 13.3 HC, respectively.The data in both sets were relatively evenly distributed, and a wide range of hardness values was observed.This is beneficial for modeling and establishing a NIR prediction model that can accurately predict the hardness of a wide range of WAW samples.

Table 2 Results of MWC, BD and state of preservation of samples
The MWC and BD data for these samples with asterisks (*) are cited from Chen et al. study, which was tested in the same way as in this study

Preliminary classification of degree of degradation of waterlogged archaeological wood (PCA and OPLS-DA)
To investigate the effectiveness of NIR spectroscopy in predicting the hardness of WAW, raw spectral data from samples with different degrees of degradation were analyzed using an unsupervised PCA model.The results showed model fit parameters of R 2 X = 0.996 and Q 2 = 0.995, with PC1, PC2, and PC3 explaining 89.3%, 6.2%, and 3.3% of the variance, respectively.The cumulative contribution reached 98.8%, covering essentially all the information in the samples.The PCA score plot (Fig. 3a) shows that the samples with different degrees of degradation were grouped into five clusters.However, there were overlaps between the clusters.Most Class 3 and 4 samples were located in different spaces of the PCA, and although there were some overlaps, they could still be distinguished from the other class samples.Furthermore, a supervised OPLS-DA discriminant model was used to analyze the spectral data.The model with R 2 X cum of 0.59 and R 2 Y cum of 0.57 and Q 2 of 0.52, indicating the predictability of OPLS-DA is acceptable.The results showed that (Fig. 3b) Class 0, Class 1, and Class 2 samples still significantly overlapped.However, Class 3 and Class 4 samples could be distinguished from other grade samples, corresponding to the results of the hardness variability analysis.In addition, the applicability of the OPLS-DA model was tested using 200 random permutations (Fig. 3c).The intercepts of R 2 and Q 2 were 0.0098 and -0.118, less than 0.30 and 0.05, indicating that the model was not over-fitted and reliable for classification discrimination [62].These results confirm that NIR spectroscopy can effectively perform a preliminary discriminant analysis of WAW with different degrees of degradation.Furthermore, this analysis serves as a foundation for quantitatively assessing the mechanical properties of waterlogged archaeological wood using NIR spectroscopy.

Quantification
Among the 158 calibration set samples used to establish prediction model of WAW hardness, the maximum hardness is 92.6 HC, the minimum is 44.1 HC, the average is 73.2 HC, and the standard deviation is 13.5 (Table 3).The effects of different preprocessing methods for spectra on the quality of the models are discussed for the raw spectral range (350-2500 nm).Data processed with MSC, SNV, first derivative, second derivative, and a combination of the two preprocessing methods were used to build predictive models.A control group of predictive models was built from the raw spectra (Table 4).Table 3 shows the model correlation coefficients established by the raw NIR spectra reached 0.74 and 0.70 for R 2 c and R 2 v, 6.83 and 7.39 for RMSEC and RMSECV, and 6.85 and 7.41 for SEC and SECV.Meanwhile, the RPD value of 1.82, which is larger than compareison value 1.5 in previous publications [58,59], indicating that the model developed by raw NIR spectroscopy can preliminarily assess the hardness of WAW.
Only one preprocessing method was used, but applying either MSC or SNV did not lead to a significant improvement in the model's quality.In contrast, the performance and accuracy of the model improved significantly after derivative processing.The R 2 c and R 2 v after the first and second derivatives reached more than 0.90 and 0.80.The RPD values greater than 2.5, indicating that the derivative-processed prediction model has satisfactory predictive performance [63].The model achieved the highest quality when the first derivative was combined with both MSC and SNV as the preprocessing methods.The R 2 c and R 2 v were as high as 0.95 and 0.88, and the RMSEC and SEC were reduced by approximately 50% compared to the raw spectra.The RMSECV and SECV were reduced by approximately 30% compared to the raw spectra, while the RPD values reached 2.87 and 2.78.Hence, the prediction accuracy of the WAW hardness model based on the first derivative and MSC and SNV combined with the preprocessed NIR spectra was significantly improved.
Modeling using full-band spectral data results in a large amount of information computation, overlapping areas of information, and noise, ultimately affecting the predictive performance and accuracy of the model.In this study, the spectra were divided into three ranges (350-1000 nm, 1000-1800 nm, and 1000-2100 nm) according to the distribution of the spectral information of the wood characteristic compounds, and the effects of inaccessible preprocessing methods on the models generated based on the three spectral ranges were analyzed (Tables 5-7).
As shown in Table 5, modeling in the 350-1000 nm spectral range using the first derivative combined with MSC and SNV was optimal.The R 2 c and R 2 v of the model based on the first derivative + MSC preprocessing were 0.93 and 0.84, the RMSEC and RMSECV were 3.55 and 5.40, the SEC and SECV were 3.56 and 5.42, and the RPD value was 2.48.The R 2 c and R 2 v of the model based on the first derivative + SNV preprocessing were 0.94 and 0.85, the RMSEC and RMSECV were 3.33 and 5.30, the SEC and SECV were 3.34 and 5.32, and the RPD value was 2.53.Table 6 displays the outcomes of the optimal model established for the 1000-1800 nm spectral range, which employed the first derivative in combination with MSC and SNV as the preprocessing methods.The R 2 c and R 2 v of the model based on the first derivative + MSC preprocessing were 0.94 and 0.87, the RMSEC and RMSECV were 3.42 and 4.83, the SEC and SECV were 3.42 and 4.84, and the RPD value was 2.78.The R 2 c and R 2 v of the model based on the first derivative + SNV preprocessing were 0.96 and 0.88, the RMSEC and RMSECV    In summary, the optimal models developed for the four spectral ranges (350-2500 nm, 350-1000 nm, 1000-1800 nm, and 1000-2100 nm) were based on first derivative + MSC and first derivative + SNV preprocessing.For the WAW, the best quality model was developed for the 1000-2100 nm spectral range (Fig. 4a, b).Similarly, prior research on the physical and mechanical properties of sound wood has reported that models developed using the 1000-2100 nm spectral range demonstrate improved predictive performance [64].This may be because this range contains most of the characteristic information of WAW, and the spectral profiles in this range are clearer and less noisy (Fig. 5a, b).
In this study, a model for predicting the hardness of WAW based on NIR spectroscopy was developed and demonstrated higher prediction accuracy and stability than other similar studies.For example, Raul et al. employed NIR spectroscopy to forecast the hardness of charcoal post heat treatment at varying temperatures; the model R 2 c at different temperatures was only 0.507, and R 2 v was only 0.427 [65].Tetsuya et al. developed a NIR spectroscopy prediction model simulating the compression modulus of archaeological wood with a coefficient of determination of 0.90 [47].However, when the technique was applied to predict the bending modulus of archaeological wood, the model coefficient of determination was reduced to 0.82 [51].Wang et al. integrated two preprocessing techniques to establish a NIR spectroscopy predictive model for the bending strength of Catalpa bungei wood, achieving optimal coefficient of determination of 0.843 and 0.846 for the bending strength and modulus of elasticity [66].The lower coefficient of determination may be related to the larger coefficients of variation in compression modulus, flexural modulus, and flexural strength in the above referring studies.However, in this study, the large gradient in degradation and the uniform distribution of the hardness dataset between light and heavy degradation facilitated the mathematical modeling of the NIR spectroscopy prediction model.In addition, this study optimized a NIR hardness prediction model for WAW based on different preprocessing methods and spectral ranges, resulting in a model with high predictive performance and accuracy.However, an external validation of the model is required to test its accuracy and the actual prediction results.

External validation
The model obtained for the first derivatives + MSC preprocessing in the 1000-2100 nm spectral range was selected for external testing and applied to 39 prediction set samples (39 samples were randomly selected from the total sample).The predictions were compared with the measured results (Fig. 6). Figure 6 shows that the model has a prediction set coefficient of determination R 2 p of 0.89, an RMSEP of 5.04, and an RSD value of 6.9% < 10%, indicating that the model has high predictive performance and accuracy for initial application in practice [67].However, the results of grading the degree of degradation based on the maximum water content do not correspond to the results predicted by hardness, which may be related to the non-uniform degradation of the samples.In contrast, the hardness data corresponds to the NIR spectra (Fig. 6).For example, even though some samples belong to the same classification based on the maximum water content method and the error between the model-derived hardness predictions and the measured hardness values is small, the hardness results are distributed in different intervals.Hence, relying solely on maximum water content or basic density for grading the degradation of WAW is unreliable.The inclusion of mechanical properties like hardness becomes necessary.

Conclusions
This study developed a hardness prediction model for archaeological waterlogged wood (WAW) by combining NIR spectroscopy with chemometrics.It also examined the impact of preprocessing methods and spectral ranges.The optimal prediction models were established when the preprocessing method was either first-derivative + MSC or first-derivative + SNV for a spectral range of 1000-2100 nm.Both models achieved R 2 c of 0.97, R 2 v of 0.89 and 0.87, and RPD values of 3.02 and 2.88.The prediction model established through external validation achieved a correlation coefficient (R 2 p) of 0.89, an RMSEP of 5.04, and an RSD value of 6.9% < 10%, indicating high predictive performance and accuracy.Furthermore, hardeness data and results from the inverse prediction model indicated that the degradation of WAW was non-uniform.The classification of degradation based on maximum water content did not precisely align with the hardness prediction results.Additionally, the hardness data exhibited a stronger correlation with NIR spectra compared to maximum water content.Therefore, it is important to include hardness as an indicator for assessing the state of preservation of WAW.
The hardness and NIR spectroscopy data of WAW obtained under waterlogged conditions provide a theoretical foundation for the rapid, nondestructive, and accurate evaluation of the mechanical properties of wooden heritage.In this study, a preliminary near-infrared prediction model for the hardness of WAW has been established, which can provide a new means for rapid, nondestructive and accurate assessment of the mechanical properties of WAW.At the same time, this study should be further improved, such as how to overcome the high STDV of the hardness data due to the natural variability of the wood (earlywood/latewood, sapwood/ heartwood) and other factors, and how to improve the prediction accuracy of the model.In the future, this research will expand to include important mechanical properties like bending modulus, bending strength, and compressive strength.This will establish a theoretical basis for predictive models to nondestructively, quickly, and accurately assess the mechanical properties of WAW.Additionally, it may provide important data for assessing nondestructively the state of preservation of WAW and thus assist in the development of an appropriate conservation strategy.

Fig. 1 Fig. 2
Fig. 1 WAW hardness grading box line-normal distribution scatter diagram; the same letter in Fig. 1 means that the parameters are not significantly different at the 95% confidence level

Fig. 4 Fig. 5
Fig. 4 Predication model for the 1000-2100 nm spectral range.Note: Diagonal straight lines (Y=x) indicates the perfect correlation trend

Fig. 6
Fig. 6 Inverse prediction model.Note: Diagonal straight lines (Y=x) indicates the perfect correlation trend

Table 1
List of waterlogged archaeological wood samples [32] Yuen refers to the Chinese cruiser Chih Yuen shipwreck; Nanhai No.1 refers to the Nanhai No.1 shipwreck.*Thesesamples(samples Nos. 1 to 9) were also included in the Chen et al. study, and their MWC and BD have been tested[32].Samples Nos. 10 to 30 are new samples of unknown state of preservation and independent of the Chen et al. study, and their MWC and will be tested in this study MWC stands for maximum water content; BD represents basic density

Table 3
Statistics of the hardness of Calibration and Prediction

Table 4
Hardness model results of WAW based on raw NIR spectra Factors Opt.number of factors, MSC Multiple Scattering Correction, SNV Standard Normal Variate, 1st First Deviation, 2nd Second Derivative, R 2 c Coefficient of determination of calibration, RMSEC Mean Squared Error for Calibration, SEC Standard Error of Calibration, R 2 v Coefficient of determination of cross-validation, RMSECV Mean Squared Error for Cross-Validation, SECV Standard Error of Cross-Validation, RPD Ratio Performance of Deviation for Cross-Validation

Table 5
Hardness model results for WAW based on NIR spectra in the 350-1000 nm range Factors Opt.number of factors, MSC Multiple Scattering Correction.SNV Standard Normal Variate.1st First Deviation, 2nd Second Derivative, R 2 c Coefficient of determination of calibration, RMSEC Mean Squared Error for Calibration, SEC Standard Error of Calibration, R 2 v Coefficient of determination of cross-validation.RMSECV Mean Squared Error for Cross-Validation, SECV Standard Error of Cross-Validation, RPD Ratio Performance of Deviation for Cross-Validation

Table 6
Hardness model results for WAW based on NIR spectra in the 1000-1800 nm range Factors Opt.number of factors, MSC Multiple Scattering Correction.SNV: Standard Normal Variate, 1st First Deviation, 2nd Second Derivative, R 2 c Coefficient of determination of calibration, RMSEC Mean Squared Error for Calibration, SEC Standard Error of Calibration, R 2 v Coefficient of determination of cross-validation, RMSECV Mean Squared Error for Cross-Validation, SECV Standard Error of Cross-Validation, RPD Ratio Performance of Deviation for Cross-Validation

Table 7
Hardness model results for WAW based on NIR spectra in the 1000-2100 nm range Factors Opt.number of factors, MSC Multiple Scattering Correction, SNV Standard Normal Variate, 1st First Deviation, 2nd Second Derivative, R 2 c Coefficient of determination of calibration, RMSEC Mean Squared Error for Calibration, SEC Standard Error of Calibration, R 2 v Coefficient of determination of cross-validation, RMSECV Mean Squared Error for Cross-Validation, SECV Standard Error of Cross-Validation, RPD Ratio Performance of Deviation for Cross-Validation 2.77 and 4.62, the SEC and SECV were 2.78 and 4.63, and the RPD value was 2.91.Table7presents the outcomes of the optimal model established for the 1000-2100 nm spectral range, which utilized the first derivative in combination with MSC and SNV as the preprocessing methods.The R 2 c and R 2 v of the model based on the first derivative + MSC preprocessing were 0.97 and 0.89, the RMSEC and RMSECV were 2.39 and 4.43, the SEC and SECV were 2.40 and 4.45, and the RPD value was 3.02.The R 2 c and R 2 v of the model based on the first derivative + SNV preprocessing were 0.97 and 0.87, the RMSEC and RMSECV were 2.40 and 4.67, the SEC and SECV were 2.41 and 4.68, and the RPD value was 2.88. were