### Ultrasonic testing

According to the distribution characteristics of the weathering damage in the external walls of the Yungang Grottoes, most of the cross sections are rectangular and square. Thus, grid-based ultrasonic testing, which is commonly used in preserving cultural relics, was used to measure the weathering layers of the external walls of the Yungang Grottoes (Fig. 1c). The “fixed excitation probe and moving receive probe” and “synchronous movement of excitation and receive probes” methods were used separately. Each test area consisted of at least three survey lines, and every survey line had at least six measured points. The horizontal and vertical distances between adjacent measuring points were 5 cm. The measured results were recorded according to the grotto number of the test area. For example, the ultrasonic testing results for the No. 20 grotto were recorded using the code number 20. Similarly, the THz spectral measurements were recorded according to the grotto number of each sampling site.

A wide-band sonic detector was used for ultrasonic testing. The excitation transducer used a 50-kHz excitation probe. The emission voltage, transmission gain, number of sampling points, and sampling interval were set to 250 V, 36 dB, 2000, and 1 s, respectively. In addition, to reduce the air space between the emission probe (or the receiving probe) and the outer walls of the grottoes and to increase the detection sensitivity, normal glue was chosen as the coupling agent. The normal glue is boiled from wheat flour and water at a 1:6 ration under 353 K, which is availability, suitability for direct use, and quasi-flow structure. Normal glue does not penetrate easily into stones, it does not cause secondary pollution, and it is easy to clean. Considering the unevenness of the testing regions, uneven coating of the coupling agent and variation in the placement of transducers during field ultrasonic, there may be some errors in the test results. Outliers in the original ultrasonic wave time (t) data are eliminated before the data analysis using Grubbs’ test. By looking up Grubbs’ tables, we are able to obtain values of G (n_{0}) for use in excluding outliers [for which G is greater than G (n_{0})]. And G is defined as follows:

$${\text{G}} = \left| {\frac{{\overline{\text{t}} - {\text{t}}_{\text{n}} }}{\text{s}}} \right| = \left| {\frac{{\frac{1}{\text{n}}\mathop \sum \nolimits_{{{\text{i}} = 1}}^{\text{n}} {\text{t}}_{\text{i}} - {\text{t}}_{\text{n}} }}{{\sqrt {\frac{1}{{{\text{n}} - 1}}\mathop \sum \nolimits_{{{\text{i}} = 1}}^{\text{n}} \left( {{\text{t}}_{\text{i}} - \frac{1}{\text{n}}\mathop \sum \nolimits_{{{\text{i}} = 1}}^{\text{n}} {\text{t}}_{\text{i}} } \right)^{2} } }}} \right|$$

(1)

where \(\overline{\text{t}}\) is the average of all ultrasonic wave time (t) data, s is the standard deviation, and n_{0} is the significance level, which was taken to be 5%.

The longitudinal ultrasonic velocity (V) of each grid unit of the tested grotto walls are calculated using the t measured by the mobile ultrasonoscope and the mesh scale within the corresponding transmit–receive intervals.

### THz spectral measurement

#### Sample preparation

To minimize damage to the relics, three small weathering samples were collected from the ultrasonic testing regions (Fig. 1c) and then ground evenly with an agate mortar. Each powder sample was marked according to the grotto number. To reduce the scattering effect of the test sample, 0.1 g of a standard ground sample was mixed with 0.1 g of polyethylene powder (1:1) and then sieved by filtering with 200-eye sieves. Each sample was then compressed into a round tablet (diameter: 1.3 cm; thickness: approximately 1 mm) with smooth parallel surfaces used an infrared tablet under a pressure of 5 tons.

#### Experimental setup

The spectral data of the weathered samples were collected by transient THz-TDS [24]. A titanium sapphire femtosecond pulsed laser was used to generate laser light sources with a 800-nm central wavelength, 80-MHz repetition frequency, 100-fs pulse width, and 960-mW power output. A p-InAs crystal was used to excite the THz electromagnetic impulse, and a ZnTe crystal was used as the detection crystal. The samples were placed on the focal point between two parabolic mirrors. The setup was placed in a chamber filled with N_{2} to eliminate the influence of water vapor in the air. The relative humidity was less than 4%, and the temperature was 293 K.

The experimental data conformed to the physical model developed by Dorney and Duvillaret for extraction of the THz optical parameters of materials [25, 26]. THz-TDS can be used to measure the time-domain waveform of the reference and sample signals and then obtain the corresponding frequency-domain spectra using a fast Fourier transform. By comparing the sample and reference pulses, the complex transmissivity (T) can be defined as follows:

$$T\left( \omega \right) = \left| {\frac{{A_{{{\text{sample}}\left( \omega \right)}} }}{{A_{{{\text{reference}}\left( \omega \right)}} }}} \right|^{2} = \left| {t_{N} (\omega )} \right|^{2} \exp \left[ {\varphi_{N} \left( \omega \right)} \right]$$

(2)

where A_{reference} is the complex amplitude of the reference, A_{sample} is the complex amplitude of the sample, N is the complex refractive index of the sample, ω is the frequency of radiation, |t_{N}(ω)| is the transmission coefficient, and \(\varphi_{N} ( \omega )\) is the related phase difference between the reference and sample spectra.

#### LS-SVM

Due to its non-linear algorithm, the SVM is particularly adapted to separating samples with similar profiles. And the key to LS-SVM modelling is the selection of the kernel function and its parameters, which have a direct influence on the prediction accuracy. After contrasting kernel functions in the SVM, a radial basis function (RBF) was chosen to train the SVM in this study [27, 28]. The RBF is a nonlinear function that reduces the complexity of computation during training. Different parameters have to be optimized to increase the performance of the algorithm, like the three parameters are a penalty factor (c), an insensitivity loss parameter (ε), and an RBF coefficient (γ). The corresponding effective value ranges of the three parameters were 2^{−10}–2^{15}, 2^{−10}–2^{3}, and 2^{−10}–2^{10}, respectively. The double cross-validation (D-CV) method was used to search for the optimum c, ε, and γ values. The THz spectra of the grotto walls and the regression model for predicting the degree of weathering were obtained in this manner. The prediction accuracy of the model was evaluated in terms of the relative error (RE) of the predictions.