Comparison of three 3D scanning techniques for paintings, as applied to Vermeer’s ‘Girl with a Pearl Earring’

A seventeenth-century canvas painting is usually comprised of varnish and (translucent) paint layers on a substrate. A viewer’s perception of a work of art can be affected by changes in and damages to these layers. Crack formation in the multi-layered stratigraphy of the painting is visible in the surface topology. Furthermore, the impact of mechanical abrasion, (photo)chemical processes and treatments can affect the topography of the surface and thereby its appearance. New technological advancements in non-invasive imaging allow for the documentation and visualisation of a painting’s 3D shape across larger segments or even the complete surface. In this manuscript we compare three 3D scanning techniques, which have been used to capture the surface topology of Girl with a Pearl Earring by Johannes Vermeer (c. 1665): a painting in the collection of the Mauritshuis, the Hague. These three techniques are: multi-scale optical coherence tomography, 3D scanning based on fringe-encoded stereo imaging (at two resolutions), and 3D digital microscopy. Additionally, scans were made of a reference target and compared to 3D data obtained with white-light confocal profilometry. The 3D data sets were aligned using a scale-invariant template matching algorithm, and compared on their ability to visualise topographical details of interest. Also the merits and limitations for the individual imaging techniques are discussed in-depth. We find that the 3D digital microscopy and the multi-scale optical coherence tomography offer the highest measurement accuracy and precision. However, the small field-of-view of these techniques, makes them relatively slow and thereby less viable solutions for capturing larger (areas of) paintings. For Girl with a Pearl Earring we find that the 3D data provides an unparalleled insight into the surface features of this painting, specifically related to ‘moating’ around impasto, the effects of paint consolidation in earlier restoration campaigns and aging, through visualisation of the crack pattern. Furthermore, the data sets provide a starting point for future documentation and monitoring of the surface topology changes over time. These scans were carried out as part of the research project ‘The Girl in the Spotlight’.


Introduction
The three-dimensional landscape of paintings Paintings are generally considered in terms of their (2D) depiction, but the physical artwork also has a third dimension. The substrate is rarely completely flat, and subsequent paint and varnish layers also influence the surface topography. This effect can be intentional-using the paint to create a 3D effect-or the consequence of drying, hardening, or degradation. Artists, including Vermeer, deliberately created 3D textural effects on the surface. For instance, they used impasto to create additional reflections for highlights, or used 3D effects to emphasise the textural appearance of the material they were depicting. Alternatively, three-dimensional brushstrokes can be the consequence of a fast-paced, expressive style.
The topography of a painting will change under the influence of internal and external factors. Natural aging and (photo)chemical changes (e.g. the formation of metal soaps [1]) that occur within the different layers can result in cracking, protrusions and/or changes in gloss. The layers respond to environmental influences: for example, an increase or decrease in temperature or relative humidity can cause the support to expand or contract, resulting in cracking or deformations. Conservation treatments can also cause changes in topography. Linings, especially those that employ heat and pressure, can flatten the paint. Efforts to locally soften and flatten raised cracks using heat and/or pressure can also cause irreversible changes to the 3D surface structure. Mechanical damages during handling, transport or by accident can result in cracked, tenting, or flaking paint. A paintings conservator is compelled to document and address these issues, but until recently, possibilities to record these changes, also over the long term, have been limited.

Painting documentation
During technical examination(s), it has become general practice to use a wide range of imaging techniques to visualise and document the condition and chemical composition of a painting. Imaging techniques like infrared imaging, visible light photography, ultraviolet (UV) imaging, X-radiography, macro X-ray fluorescence imaging, multispectral and hyperspectral imaging, all provide information about different modalities of a painting [2,3]. Photographs in raking light conditions-commonly made during technical examination(s)-reveal undulations in the surface and visualise features like cracks (see for example Fig. 1a). Although this technique emphasises the effect of surface topography on its appearance, it is not an exact measurement of the topography and are often not (well) documented.
Moreover, the effect of craquelure on a painting's appearance has been studied by Bucklow [5] describing and classifying the types of cracks commonly found on paintings, as well as exploring the pictorial effect of crack patterns [6]. Automated crack detection in digital images has been employed to classify paintings into geographical regions of origin [7] and for digital crack removal [8].
A limitation of these studies is that they only consider craquelure as a two-dimensional pictorial feature, and not its three-dimensional shape.
Three 3D scanning methods have been demonstrated for capturing (and reproducing) the fine topographical details of painted surfaces, namely 3D laser triangulation, structured light 3D scanning, and focus variation microscopy. Blais et al. [9,10] captured the topography The crack pattern of the painting was documented by manually tracing the cracks onto a transparent polyester film, placed on the painting (Courtesy of Jørgen Wadum [4]). The red inserts show an enlargement of the area around the pearl earring of paintings using a 'white' laser spot (composed of a red, green and blue laser source). This scanner was later commercialised and currently used to create full-colour 3D printed reproductions [11] using an additional (2D) reference photograph for faithful colour rendition [12]. Factum Arte [13,14] and the Van Gogh Museum [15] both (likely) use a combination of laser-based scanning and digital photography to create facsimiles of paintings. However, extensive literature study did not reveal scientific publications on it.
Structured light 3D scanning, using a projector combined with either one or two cameras has also been proposed to measure the topography of painted surfaces (e.g. [16][17][18][19][20][21]). The latter approaches [19][20][21] use a de-focused projection pattern, whereby the sampling resolution only depends on the camera resolution, rather than the (lower) projector resolution.
Focus variation microscopy has been demonstrated, for instance to analyse punchmarks on medieval panel paintings [22], and to evaluate its usefulness in supporting painting cleaning [23]. Both studies have only used this technique to capture small areas, rather than 3D imaging larger regions or complete paintings.
Besides capturing micro-scale features, studies have been conducted to measure and monitor global shape variations of paintings, for example to monitor the effect of tensioning tests conducted on a canvas painting and gilt leather artefact [24], or monitor deformation of panel paintings over time [25]. Although these techniques are used in relation to painting conservation treatments, their resolution does not allow for capturing fine surface details like craquelure.
An overview of the above-mentioned scanners and their specifications can be found in Table 1.
Although conventionally optical coherence tomography (OCT) is used to image the stratigraphy (layering) of semi-transparent layers [27,28], the uppermost boundary also represents the topography of the surface. Its application for imaging the stratigraphy of cultural heritage artifacts was demonstrated in various studies (e.g. [29][30][31][32][33]). To reach sufficient lateral and axial resolution for these applications, the OCT field-of-view is limited to a scanning area of approximately 15 mm × 15 mm By combining a high resolution spectral domain OCT (SD-OCT) setup-a specific sub-type of OCT using a broadband light source (see [33])-with automated scanning stages, it is possible to scan much larger areas (as demonstrated in this paper, reaching a single automated scan area of 0.04 m 2 ).

Case study: Girl with a Pearl Earring by Johannes Vermeer
From archival documentation and research it is known that Johannes Vermeer's Girl with a Pearl Earring (c. 1665), from the collection of the Mauritshuis 1 , has undergone various conservation and restoration treatments in its lifetime. Some treatments, along with the degradation effects that can be expected of a seventeenth-century painting, have affected its surface topography [34]. It is at least certain that the painting had larger height variations in its painted surface when it left Johannes Vermeer's studio. He applied small details, like the highlight on the Girl's earring and dots on her clothing, with more impasto than the surrounding paint. In 1882, the Antwerp restorer Van der Haeghen lined the canvas support with a starch-based paste. As part of a 1915 or 1922 treatment, restorer De Wild 'regenerated' the varnish by subjecting it to alcohol vapours and copaiba balsam. He presumably used the so-called 'Pettenkofer' process, to help soften flaking, brittle paint [35]. These treatments, and/or a consolidation treatment with an aqueous adhesive (date unknown), may have caused the starch-based lining to shrink. Although the paint already had some adhesion issues prior to the 1882 lining, the shrinkage may have caused more flaking and cupping of the paint. In 1960, another restorer Traas relined the painting using a wax-resin adhesive, applying heat and pressure. This wax-resin lining flattened some impasto details that Vermeer had applied more thickly than the surrounding paint, but ensured that the painting has remained structurally stable.
The conservation history was documented as part of the 1994 restoration campaign, conducted as part of the project Vermeer Illuminated [4]. During this restoration, the crack pattern was traced onto a transparent polyester film that was carefully placed on the surface of the painting (see Fig. 1b). This was a way of documenting the surface condition, in addition to raking light photographs. The 2018 technical examination as part of the project The Girl in the Spotlight provided a new opportunity to examine Vermeer's Girl with a Pearl Earring using traditional examination methods as well as state-of-the-art scientific techniques. High-resolution digital photographs were made of the painting in different lighting conditions, including raking light (see Fig. 1a). A new and innovative part of this examination was the 3D documentation of the painting's surface using the following means: multiscale optical coherence tomography (MS-OCT), 3D scanning based on fringe-encoded stereo imaging (at two resolutions), and 3D digital microscopy.
In this paper we compare these three imaging technologies. We find that the 3D digital microscopy and the MS-OCT offer the highest measurement accuracy and precision. However, the small field-of-view of these techniques, makes them relatively slow and thereby less viable solutions for capturing larger (areas of ) paintings. For Girl with a Pearl Earring we find that the 3D data provide an unparalleled insight into the surface features of this painting. This specifically relates to 'moating' around impasto, the effects of paint consolidation in earlier restoration campaigns and aging, and visualisation of the crack pattern. Furthermore, the data sets provide a starting point for future documentation and monitoring of the surface topology changes over time.

Multi-scale optical coherence tomography (MS-OCT)
The spectral-domain OCT system has a source spectrum centered at 900 nm, spanning a bandwidth of 195 nm. The specified axial bandwidth-limited resolution (Z) is 3.0 µm (in air). The OCT scan probe is mounted on a 3-axis stage consisting of two 200 mm (X and Y-axis) scan range stages and a 100 mm (Z-axis) scan range stage. The OCT probe head is mounted to the setup with the aid of a customised mount, which has 2 degrees of freedom for tilt around the X axis and Y axis, enabling imaging of a surface under a small angle. This is necessary to reduce the amount of over-saturation on the detector, caused by surface specular reflections reflecting back into the lens, which are then for the most part avoided. The scans were carried out overnight, with constant temperature and relative humidity, to minimise the influence of environmental conditions on the scan results. Figure 2a shows the schematic depiction of the spectral-domain OCT setup, 2b the probe head and scan system, and 2c the instrument set up in front of the painting. More information about the system specifications and raw OCT data processing can be found in Table 2 and [33].

Calibration and data processing
For the MS-OCT setup the desired scanned area was correctly aligned, by means of a built-in video camera, The focal plane of the OCT sample arm optics was set to 0.4 mm below the zero delay position; this to reduce auto-correlation artefacts in the measured spectra. Then, sample regions in the scanning area were captured, to ensure that there was no height variation larger than 1.89 mm (depth-of-field) within the dimensions of a single tile. Next, a self-developed program in LabVIEW [36] was activated, which fully automatically scans the artwork and keeps it in focus. MS-OCT tile stitching was performed off-line, based on segmented surface data in volume scans. This image stitching algorithm is not dependent on tile overlap (due to high precision translation stages), but image segmentation artefacts can in rare cases result in stitching errors.

3D scanning based on fringe-encoded stereo imaging
Two 3D scanning systems, both based on fringe-encoded stereo vision, were used to scan the complete painting. The imaging systems both consisted of an XY-frame able to move the imaging module in a parallel plane to the surface of a painting, in a horizontal and vertical motion. Figure 3a depicts the schematic layout for both imaging modules. The imaging modules consisted of two cameras positioned on either side of a projector, all fitted with polarisation filters. This cross-polarised setup removes unwanted reflections like highlights in the images. The cameras were positioned at an angle relative to the  Table 2 Specifications of 3D scanning systems used in this case study, including: system design, scan modalities, system components, computational hardware used, resolution, scan area, settings used, and the configuration of the setup The latter specifies the orientation of the painting during scanning, the imaging distance l, which is the closest point of the system to the painting's surface, and the imaging angle β , describing the angle between the painting's surface normal and the optical axis of the imaging components a Stated depth accuracy by manufacturer  Fig. 3a). Camera lenses utilising the Scheimpflug principle (also known as tilt-shift lenses), were used to align the focal plane to the painting surface, rotated by angle α (see Fig. 3a).
The Std-Res 3D Scan (described in [20], based on the original design by [19]), sampled at a lateral resolution (XY) of 25 µm , and is shown in Fig. 3b, c. The scan was carried out overnight, and under constant temperature and RH conditions, in complete darkness to minimise the influence of environmental conditions on the scan results. Based on the geometric layout of the systemimaging angle β and lateral imaging resolution-the theoretical lower boundary of the axial resolution (Z) is 27.5 µm.
The second system, hereafter denoted as High-Res 3D Scan (described in [21]) sampled at a lateral resolution of 7 µm , and is shown in Fig. 3d, e. Two 25 mm lens extenders were used in this setup to extend the focal length (of the commercially available tilt-sift lenses) from 90 to 105 mm , allowing focusing at closer range, and thereby higher resolution imaging. The scan was carried out in a glass enclosure, and with constant temperature and relative humidity, to minimise the influence of environmental conditions on the scan results. Based on the geometric layout of the system-imaging angle β and lateral imaging resolution-the theoretical lower boundary of the axial resolution (Z) is 17.5 µm . An overview of the specifications of both scanners can be found in Table 2.

Calibration and data processing
For both 3D scanning systems, the lens distortion of cameras was calibrated following a multi-view geometry calibration procedure [37], using a checkerboard pattern. The white balance and illumination non-uniformity were calibrated using a 300 mm × 300 mm Spectralon panel.
The colour and 3D topography of the surface were captured using a hybrid solution of fringe projection and stereo imaging. To measure the topography, a 6-phase shifting sinusoidal grey-scale pattern (fringe) was projected horizontally and vertically on the painting's surface, capturing in total 24 images for a single scan area. One additional image was captured with uniform illumination from the projector, which is used as the colour image. This imaging module was then moved to the next  (4) the horizontal stage, β the imaging angle, α the tilt angle of the lenses and l the working distance to the painting position and the procedure was repeated. Offline, fringe unwrapping is applied and a sparse stereo matching was carried out to match the fringes of both camera images. A look-up table was generated for both cameras, encoding both images. Taking into account the camera calibration, a dense stereo matching was performed, using the principle of ray-tracing. The RGB values of the uniformly illuminated image were then mapped onto the XYZ datapoints. Next, a plane was fitted through the datapoints and the data was sampled in a regular XY-grid at respectively 25 µm or 7 µm intervals, which approximates the reconstructed resolution. If a single pixel contained more than one value (due to our non-parallel configuration this can occur), the average of the values was taken. Missing values were interpolated based on surrounding pixels. A detailed description of the image processing can be found in [19,20]. The output of these systems is an aligned colour and height map.

3D digital microscopy
The painting was examined using a 3D digital microscope (Hirox RH-2000), using the principle of focus stacking for 3D imaging, at a magnification of 140× . A custom bridge stand was built to accommodate the size of the painting, with a 500 mm × 500 mm automatic motorised XY-stage, combined with the existing automated Z-axis of the microscope. In this configuration the painting was placed horizontally on a vibration damping table and the microscope moved on a motorised stand above it (see Fig. 4). The scans were carried out in a glass enclosure, with constant temperature and relative humidity, to minimise the influence of environmental conditions on the scan results. A combination of (built-in, LED) directional and ring lighting was used to, respectively, penetrate the transparent varnish layer, and create the necessary contrast in the shadowed regions, needed for the 3D imaging. The Z-axis focus-stack range (bottom-to-top) was set to 450 µm , making sure that the height variation of the complete scan area was captured within this range. An overview of the specifications can found in Table 2.

Calibration and data processing
Prior to starting the scanning, the alignment of the axis and the planarity of movement relative to the platform (and painting) was checked using Hirox-supplied reference targets. A Hirox calibration grey card was used to set the white-balance of the microscope. XY-scale was calibrated using the Hirox-supplied, certified glass scale.
The Hirox software was used for programming the automatic acquisition of the individual tiles. The microscope created a series of images in the Z-direction capturing each focus layer individually, between the predefined top and bottom set point. These are then combining into one single all-in-focus image (multi-focus or extended depth-of-field). The microscope then moves to the next position and the multi-focus capturing procedure is automatically started again. The result for each tile is a TDR file (Hirox 3D file format) which includes a all-in-focus colour image (JPG) as well as altitude in the Z-axis for each pixel. The raw data-the individual focus images-was not saved due to size limitations. Hirox e-Tiling software [38] was used to stitch offline the individual 3D tiles (providing identical functionality to the online stitching of the microscope), resulting in large 3D-stitched files.

Pre-processing and alignment of data sets
The height data was measured by the researchers and stored in an array that is sampled in µm (rather than pixels). This enables interchangeable dataset comparison. Annotated picture of the 3D digital microscope setup in lab conditions. c 3D digital microscope during imaging Girl with a Pearl Earring. In all images (1) denotes the Hirox microscope lens MXB-5000REZ, mounted on the motorised Z-axis focus block, (2) the LED lighting provided through optical fiber(s), (3) the motorised X-axis which moves the microscope, (4) the motorised Y-axis used to move the painting platform, and l the working distance to the painting Consequently the data was processed by the alignment algorithm, making use of the (open-source) implementations of SciPy [39] (scientific computational libraries) and OpenCV [40] for Python. The data comparison flowchart is shown in Fig. 5. First, missing data is replaced by a linear interpolation result of the respective height map (Fig. 5a). Secondly, the data is de-trended in order to minimise mismatching due to angled scanning, orientation of the painting during measurements and bending of the painting (which is a non-rigid object). After these initial preparation steps, a region of interest (ROI) is selected in which every imaging method has data and contains structures which show clear topology variations. Consequently, the height arrays are aligned with a scale invariant template matching algorithm. The sizes of the individual height arrays do not have to match with respect to the X/Y pixel dimensions (scale invariance), but the aspect ratio (AR) of the surface map must geometrically be correct. This scale invariance is a requirement, since a given area will be sampled with different resolutions by every method. We find the resampled matched template (Fig. 5d by computing the cross-correlation coefficient and locate its maximum value, so that the overlapping areas are positioned on top of each other. The AR with the highest cross-correlation coefficient is deemed the best match by the system and will become the overlapping area that is visualised. Finally, we apply again a detrending step (Fig. 5e), in order to remove some local height variations in the datasets. For visualisation purposes we align the mean height value of the datasets.

Reference target measurement comparison
In order to compare each imaging technique's ability to faithfully reproduce the height profiles of the objects, we compared the measurements results of a reference target (Rubert&Co, sample no. 513E [41]). The reference target was a electro-formed nickel specimen, with a hard protective top layer of nickel-boron. This was one of the few measurement samples that could be found with features in the mm range, rather than the much more common µm range, commonly used for a variety of (contact) roughness measurements. The sample has four milled grooves, with expected (i.e. not certified, and found to be only a rough estimates of the feature sizes) milled depths of 1000, 500, 200 and 30 µm , and widths of 3, 2, 2 and 0.5 mm (see Fig. 6a).
As the target was not certified, we did not compare our measurements to the provided dimensions, but rather compare them to measurements made using white light confocal profilometry (WLCP), using the Hirox Nano Point Scanner (NPS). The JYFEL NP3 measurement unit was mounted on and connected to the Hirox RH-2000 system, in a similar fashion as the 3D digital microscopy setup. As the height accuracy is declared at 150 nm, it suitable as a gold standard for the other measurements, which are in the micron range. An overview of the specifications can found in Table 2. The output of this system is multiple profile measurements across the grooves, resulting in a 3D surface.
The sample was scanned completely or in parts by the different measurement instruments (depending on the tile size), and local height variations were compared for the four grooves. The rectangles in Fig. 6b show the sampling locations of the comparison areas. We compare 2D areas for every technique. For the NPS scanner there are only 10 adjacent line-scans present, but this is offset by the high lateral sampling and low standard deviations found for these measurements. The data of the reference target was pre-processed and aligned in a similar fashion as the painting data.

Reference target scan results
The reference target (depicted in Fig. 6), was scanned by all four imaging systems. Generally, we found that all techniques are capable of measuring the grooves in the sample, albeit the 3D scanning systems needs an noncross-polarised setting to capture the projected fringes on this metallic artefact. For the four comparison regions, the standard deviation was calculated, and the mean height differences between the respective region pairs. We found that the top surface of the reference target was not completely flat (also in the NPS measurement). Therefore, we fitted a curve through the NPS measurements, and corrected all the other measurements for this curvature (see Fig. 6b). Table 4 provides an overview of the (corrected) measurements results for every scanning system.
The (corrected) mean height differences for every groove, relative to the NPS measurements, are plotted in Fig. 7. In this figure the dotted lines denotes the theoretical Z-resolution of the scanning systems. Note that there is no theoretical Z-resolution available for the 3D digital microscopy. We found the smallest relative errors (absolute) for the 3D digital microscopy (between 0 and 3 µm ) and MS-OCT (between 0 and 6 µm ), as compared to the High-Res 3D scan (between 0 and 11 µm ) and Std-Res 3D scan (between 8 and 24 µm ). All errors except one fall within the resolution boundaries. A mean absolute error of 6 µm was found for the MS-OCT, at the 500 µm groove, which lies outside the resolution boundaries of the MS-OCT.
As expected, we found that the NPS measurements have the smallest standard deviation ( σ ) (between 0.05 and 0.42 µm ), which were in all cases (close to or larger than) a magnitude scale different to the other measurements. For the 3D digital microscopy the σ lies between 3.1 and 6.3 µm , for the MS-OCT between 1.4 and 7.4 µm , for the High-Res 3D scan between 7.2 and 12.9 µm , and the Std-Res 3D scan between 9.2 and 15.0 µm.

Painting scan results
Using the MS-OCT, the painting was scanned in four sessions overnight, each time scanning an area of 200 mm × 200 mm . The total scanned area, consisting of 4 sets of 41 × 41 tiles, resulting in a total scanned area of 350 mm × 400 mm (excluding overlaps).   Fig. 8b and c b For the OCT scanning and 3D scanners, this denotes the number of tiles captured to scan the complete painting; for the 3D digital microscopy this denotes the number of tiles within the scan areas that were compared in the results section c Note that the tile sizes are very different between systems d The individual images that constitute a focus stack are not saved, due to storage limitations e This is the total data size per tile, including raw and processed image files (i.e. conversion of camera RAW format to TIFF) f This excludes any overlap between tiles g This includes the time needed for de-trending and stitching the tiles, which was integrated in the processing of the individual tiles  The Std-Res 3D scanner imaged the painting overnight in the completely dark environment, and was able to capture the complete painting in in 4 × 8 tiles, in a little over 2.5 h. The High-Res 3D scanner captured the complete painting in 8 × 15 tiles, with a total capturing time of approximately 4.5 h. This scan was carried out during open hours of the museum, with environmental lighting. Note that the scan time did not scale linearly with the resolution between these scanners. This is due to limitations in data transfer and various differences between the two systems, including extent of scan speed optimisation and capturing of gloss as additional modality (for the Std-Red 3D scan).

Scanned areas
Due to time restrictions the 3D digital microscope scanned only ten regions of interest (ROI) on the painting at a high enough magnification to be relevant for 3D data comparison As an indication, the Left Eye region was scanned in 16 × 23 tiles, and took roughly 0.5 h to scan. Details on the scan time and data size for every imaging system be found in Table 3.
We compared the results of the four imaging systems by studying three ROIs scanned by the 3D digital microscope. The location of the comparison areas are shown in Fig. 8. The results per scanner of these three regions are depicted in Fig. 9, where the top row shows the stitched colour image obtained by the 3D digital microscope, and the following rows plots of the 3D data, of the respective imaging devices. Figure 9a shows a small detail, the Pearl Highlight, measuring 7.0 mm × 6.4 mm , 9b a part of the yellow/green Jacket measuring 22.0 mm × 23.2 mm , and 9c the Left Eye measuring 21.6 mm × 21.0 mm . All height maps are plotted on a scale between 0 and 200 µm . All height maps depicted in Fig. 9 show relevant topographical details like (flattened) impasto although this effect is more difficult to distinguish in 9b, and 9b, c also clearly show the craquelure pattern of their particular regions; these are larger areas than 9a. Figure 9b also shows an indentation, multiple of which can be found across the surface of the painting.

Scanning the metallic reference target
The measurements of the reference target show reasonable small error, as compared to the highly accurate NPS measurements, and acceptable standard deviations for the features we try to capture. The results are in line with the expected measurement uncertainty of the different measurement devices. We find that the 3D digital microscopy and MS-OCT have the highest accuracy and precision, compared to the 3D scanning systems based on fringe-encoded stereo imaging. We believe that the larger error of MS-OCT at the 500 µm groove, lying outside the measurement uncertainty boundary of this device, might be attributed to a data processing segmentation error [33].
If this segmentation fails (layers cannot be clearly distinguished), it will introduce height map artifacts. Typically, if this effect occurs it leads to local artefacts that can be classified as outlier data. However, if this occurs along the boundary of a tile, this could lead to a stitching artefact, that propagates over multiple tiles in the height map.
A limitation of using a metallic reference target is that some imaging artefacts are introduced in the data, that are rare or non-existent in the painting scan data. The reflections caused by the metal surface frequently oversaturate the MS-OCT spectrometer and this results in measurement errors. In case of scanning a (larger) metal object (automatically), the oversaturation effect might also hamper the auto-focus algorithm. Another consequence of its metallic surface is, that the target does not reflect light diffusely. For this reason, both 3D scanning systems were used in a non-cross-polarised setup, which led to extra noise in the height data (stereo mis-matching features, due to specular reflections). Furthermore, with the High-Res 3D scan we experienced what looked like thin-film interference patterns, leading to further stereo  Fig. 9 mis-matching. This mis-match can be explained by the (thin) protective coating on the reference target. Furthermore, the measured results are only indicative of the depth determination within a single tile of the measurement technique (specifically for the 3D digital microscopy and MS-OCT, as they cannot capture within one tile).

Challenges with (in-situ) painting scanning
When we compare the scan results of the case study painting (see Fig. 9), we find that all three scanning techniques are capable of measuring its topography, capturing details at the level of individual cracks. The data sets of the three imaging techniques show broad similarities in spatial layout (XY) and height values (Z). However, the Std-Res 3D scan, misses the fine craqualure details, for instance seen on the Pearl Highlight (see Fig. 9a). Another difference between the painting scan results is the subtle variations in the global shape for the small ROI plotted in Fig. 9 (e.g. relatively lighter or darker features between the maps), despite the de-trending. (Potential) reasons for these and other differences, and the subsequent difficulties with comparing the height data are discussed on the following sub-sections.

Scanning non-rigid paintings
One difficulty with scanning and comparing data, specifically for canvas paintings, is the fact that they have a flexible substrate. Actions like unframing, handling, and clamping a canvas painting onto an easel-all typical steps for a technical examination campaign-influence the overall shape of a painting's canvas. In this case study, due to limitations imposed by the project, the painting was also moved between the different scans, inevitably leading to variations in global canvas shape. Additionally, the painting's orientation differed between the scans, which most certainly will also have had an effect on its shape. This means that there were differences in how gravitational forces and forces of the (easel) fixture, influenced the global shape of the painting. For example, it is very likely that the central part of the painting will sag more in the horizontal orientation (of the 3D digital microscopy setup) compared to the vertical orientations (of the other three scans). These differences might be avoided in a more controlled experiment. However, a similar situation will arise when two data sets are compared over time (i.e. for monitoring purposes). For these reasons, we will have to deal with these variations to be able to make a meaningful comparison between the data sets. In both situations-between these different scanning techniques and over time scan comparisons-care should be taken with accounting for the global shape. If this global shape removal is done too rigorously, valuable information might be lost about the painting's shape (comparison or changes). On the other hand, if the removal is not rigorous enough, comparison might not lead to any valuable insights.

Alignment and stitching (of very large datasets)
The alignment between data sets was performed using only the 3D data, aligned at the pixel level, applying only rigid transformations. A consequence of such an alignment approach is that, if a data has lateral distortions (e.g. caused by calibration errors or imaging system imperfections), the features of the height maps will not align perfectly. Whether such systematic errors occur remains to be investigated. It then also remains to be investigated if a sub-pixel, non-rigid alignment can be achieved, to improve comparison results (e.g. following the 2D alignment approach applied to other types of multi-modal painting scan data in the Bosch Research and Conservation Project [42]). Additionally, as the painting was measured at different resolutions-and therefore needs re-sampling-sampling errors can also influence the comparison between height data.
Furthermore, as all systems offer more than one imaging modality-such as colour information-these might also be included in the alignment algorithm, potentially leading to more robust alignment results. However, care should be taken with this, as it will almost certainly lead to a different result, which then leads to the issue of determining which result is better: all are merely based on optimisation. For instance, in the case of using colour data, robustness of the algorithm to illumination differences-leading to differences in shadowing and shading-needs to be assured.
On the scale of the ROI, selected from our case study painting, only MS-OCT and 3D digital microscopy are made up of merged tiles. For the stitched data sets, which was either based on XY-axis displacement (MS-OCT, no blending), or done by the Hirox e-tiling software (3D digital microscopy, method unknown) no obvious tiling artefacts were observed. It remains to be determined, which strategy is the most suitable for tile stitching and/ or blending, especially in the case when data in the overlapping regions are not in agreement with each other.
With the stitching of (very) large datasets, we also run into the issue of error propagation, in the lateral (XY) as well as axial (Z) direction. If tiles a stitched in a sequential manner, small errors in misalignment (XYZ) can lead to much larger errors across the complete surface. Currently no topographical scans were made at the scale of the complete painting, or intermediate resolutions, which might be used for a global-to-local optimisation step in alignment, (potentially) minimising the influence of local errors and the potential for error propagation. We envision that an approach like the one used in the Bosch Research and Conservation Project [42], might be extended for topographical data, also dealing with the height (Z) data.
Furthermore, we encountered issues with stitching the High-Res 3D data set, as our current approach to stitching (as is used in for instance [20]), is limited by its memory requirements for such large data sets. Also the Hirox e-tiling software did not allow stitching the larger ROI at full resolution.
Given the currently unsolved challenges with the nonrigid nature of canvas paintings (also encountered in the case study data sets), and the various challenges related to alignment and stitching (i.e. sub-pixel, non-rigid alignment, (multi) modal alignment considerations, dealing with stitching and blending of overlapping regions, error propagation, memory issues), we deem large-scale painting data height comparison to be beyond the scope of this paper.

Measurement occlusions and specular reflections
All techniques have limitations in the terms of measurement occlusions. Firstly, none of the methods are (currently) able to capture 'overhangs' or undercuts on a surface, meaning they cannot imaging underneath flaking or cupping paint, or capture the underside of overhanging impasto (not typically present in Golden Age paintings). Although MS-OCT would be able to detect these layer transitions, currently the segmentation for this is not incorporated in the data processing. It could however be made visible in a virtual crosssection using this data. Also the current method of data representation-as a 2D image-does not allow for representing these types of 3D structures. Furthermore, there are some limitations imposed by the imaging and illumination angles. Both the MS-OCT and the 3D scan systems image the surface under an angle (indicated by imaging angle β in Figs. 2a and 3a, whereby deep cracks or the sides of highly sloped areas (i.e. of an impasto) might be occluded from view, and therefore not measured. In the case of the 3D scan systems this might be alleviated by adding additional cameras, providing additional imaging angles. In the 3D digital microscopy occlusions can occur in the areas where the shading or shadows are too dark (i.e. under-exposure), created by the raking light illumination. This can be (partially) alleviated by additional low intensity ring lighting. However, both the raking light illumination and the ring lighting can cause light to reflect directly into the lens, leading to over-exposure. For these pixels the 3D digital microscope will not be able to reconstruct the 3D topography, leading to missing data. Careful positioning of the raking light and a low-level ring lighting minimise this effect.

Measuring dark regions
As all imaging techniques rely on optics, they all have difficulty scanning areas where (almost) all light is absorbed, and therefore the spectral reflectance is low (i.e. dark/ black areas). In scanning very dark areas, the 3D scan systems have scanning artifacts, showing more sporadic and noisy data. Although not specifically the case in this painting, another limitation of the 3D scanning algorithm is that it relies on (some) salient features to reconstruct the topography. If scanning tiles completely lack any features (completely uniform in colour), the algorithm can fail to reconstruct the 3D shape.

Vibrations and illumination conditions during scanning
Other in-situ conditions in which Girl with a Pearl Earring was scanned also potentially influence the scanning results. For instance, vibrations were not measured during the scans and only to a limited extent controlled for. The High-Res 3D scanner and 3D digital microscope were used during opening hours with museum visitors present in the same room as the technical examination, offering multiple potential sources of vibrations affecting the scanner and/or painting. For the case of the 3D digital microscope system we expect that this will have had a limited influence, as the system (including painting) was placed on a low-vibration table and features active vibration compensation for the focus stacking. The MS-OCT and Std-Res 3D scans were both captured during the night, limiting the vibrations induced by the environment.
The fact that visitors were able to view the examination also introduced another potential source of measurement error, namely external illumination. Specifically for the High-Res 3D scan the external lighting (reflecting off the surface) potentially influenced the height measurement, and might explain some measurement artifacts (i.e. measurement artefact of a repeating fringe pattern showing up in some tiles).

Differences between 3D scanning techniques Lateral and axial resolution differences
Each of the techniques and their implementation have quite different lateral (XY) and axial (Z) resolutions, where the relationship between these resolutions and level of flexibility in choosing a resolution are also different. For the MS-OCT system the anisotropic lateral resolution is 8.53 µm × 67.56 µm/pixel and the axial sampling resolution is 3.0 µm . These resolutions are limited by system optics for the lateral resolution and by the broadband source for the axial resolution and are consequently decoupled. The technique therefore does offer the capability to measure at different lateral resolutions.
For the 3D scanning the lateral and spatial resolution of the system is more flexible (in this case the High-Res 3D scan has a lateral resolution of 7 µm and an axial resolution of 17.5 µm , and the Std-Res 3D scan respectively 23.1 µm and 27.5 µm ). The axial resolution (Z) is determined by the imaging angle ( β in Fig. 3) and the lateral resolution (XY) (determined by the imaging distance and camera/lens combination). The axial resolution can be increased by either imaging at a higher magnification, or by increasing the imaging angle. However, with increasing magnification, the total depth-of-field will become smaller. This should, however, remain large enough to capture the full height variation of the surface within a single tile. The imaging angle is limited by the maximum tilt angle of the lenses. In short, the 3D scanning technique offers quite some flexibility in term of resolution, but is limited by the minimal needed depth-of-field, camera and lens parameters.
For the 3D digital microscopy the ability to calculate the topography is dependent on a very small depth-offield of all images in the focal stack. An axial resolution suitable of capturing details of a painting like Girl with a Pearl Earring can therefore only be created at relatively high magnifications. This means that reaching a reasonable axial resolution also results in a small scanning tiles ( 2.1 mm × 1.31 mm here).

Scalability
The scalability of the different techniques is in part governed by the limitations in resolutions and the (related) tile size (for relative scale between the imaging systems see Fig. 10). Given that the microscope tiles are only 2.1 mm × 1.31 mm it becomes very time consuming and data intensive to scan complete paintings. Even for a relatively small painting like Girl with a Pearl Earring, a scan at a lateral resolution of 1.1 µm would lead to an image of roughly 355 × 10 3 by 404 × 10 3 pixels (a 144 gigapixel image). Based on the settings as used in the case study this would take roughly 200 h to scan the complete painting. Dealing with the sheer size of data at such resolutions (also in terms of alignment and stitching) also remains challenging to this day. Both the MS-OCT and 3D scan systems can acquire the data at a faster rate (mainly related to the relative tile size), where the MS-OCT system took four nights and the High-Res 3D scan system only several hours to scan the complete surface. Here we should note that these are not commercial systems, and optimisation might still lead to even faster acquisition.

Automated scanning range
Although all systems are portable (allowing in-situ scanning), their automated scanning range differs substantially. These differences are not directly technically limited, as all imaging techniques might be fitted on larger movement axes. Of all of the techniques, the 3D digital microscope would be most demanding in terms of step size accuracy (sub-millimeter accuracy in XY), whereas for the MS-OCT and 3D scan systems a lateral stepping in the millimeter range would suffice (if relying on other tile alignment strategies in the case of MS-OCT).
An additional limitation of the current 3D digital microscope configuration is that the size of the painting to be scanned is limited by the dimensions of the surrounding frame. This could be alleviated by mounting the microscope on a vertical frame. This would also limit the amount of dust particles and fibres that settle on the painting during scanning, which are currently visible in the 3D digital microscopy data.

Aligning and focusing scanning systems
One of the difficulties with scanning paintings is the lack of a-priori knowledge of the surface shape. This is relevant on the scale of the complete painting (i.e. warping) and at a smaller scale (i.e. impasto). For the artifact to be scanned accurately, it has to be captured within in the focused range (i.e. large enough depth-of-field). The MS-OCT system currently features an auto-focusing functionality with a range of 100 mm , assuring alignment for every tile; however, due to the imaging angle, the total height difference within a single tile cannot exceed 1.87 mm . With the High-Res 3D scan the depth-of-field is approximately 1.0 mm . A careful alignment is therefore needed to assure the complete painting is in focus during scanning. Automated axial alignment might be needed to overcome larger height variations in paintings with more extensive warping. For the 3D digital microscopy the focal range need to be set for the complete scanning area, assuring every point lies within this focal range. This is also important for the total scanning time. Setting the boundaries too wide leads to an unnecessary increase of the scanning time, while setting the boundaries too narrowly runs the risk of not capturing certain areas as they remain out of focus. Also here automated axial alignment could help to efficiently scan paintings with extensive warping.

Scanning additional modalities
All three techniques offer the possibility to provide other modalities in addition to scanning the topography. The MS-OCT-as is actually its main application-is capable of mapping the stratigraphy of the (semi-)transparent paint layers, providing information about varnish-and glaze-layer thickness as well as the 3D topographies of these layers. The 3D scanners and 3D digital microscopy both offer colour information in addition to topography, which for both techniques are one-to-one intrinsically aligned data sets. MS-OCT, however, does not provide such colour information. The lack of this information can be overcome by mapping the topographical (and stratigraphy) data to a colour image, allowing enhanced data interpretation and ease of localisation on the artwork. Furthermore, the Std-Res 3D scan system is capable of measuring additional modalities of a painting's appearance, namely measuring the spatially-varying gloss of the surface [20]. This has the potential to offer an even richer documentation of the appearance and a painting's state.

Interpretation of Girl with a Pearl Earring data
The scanning techniques described in this case study have the potential to help a conservator evaluate the topography, condition, authenticity and conservation history of an object. They provide a documentation of the topography of Girl with a Pearl Earring at one specific moment in time. This data already provides information about Vermeer's technique, and the condition of the painting.
For example, the three ROI compared as part of this case study-the Pearl Highlight, Jacket and Left Eye (see Fig. 9)-contain small details that Vermeer painted more thickly than the surrounding paint (i.e. impasto). Visualising their height and topography in 3D reveals how much paint Vermeer loaded onto his brush, and the rheological properties of the materials he used, and subsequent effects of past treatments and aging. Figure 9a, c shows that the top of the white Pearl Highlight but also in the white highlight in the Left Eye-which presumably would have been rounded or slightly pointed when Vermeer painted it-is now planar and almost level with the surrounding paint. Directly around the impasto is a 'moat' , for the Pearl Highlight approximately 200 µm deep (as measured using all three techniques), which formed when the paint surrounding the highlight was displaced as the impasto was flattened. This suggests that Vermeer's original impasto would have matched or exceeded the current depth of the moat to be able to make such an impression. It appears that lining had a less effect on the impasto with lower topography-for example, in the Jacket-as there is no perceptible moating (see Fig. 9b). This data is useful for understanding Vermeer's painting technique, but also to assess the consequences of waxresin lining.
The effects of using the so-called 'Pettenkofer' process-to flatten flaking paint-can also be identified for instance in Fig. 11e, using a colour map for visual enhancement. The affected area of the forehead is deeper than the surrounding paint. We can also see in these visualisations that the cracks in the forehead appear to have soft, rounded edges, and the cracks appear wider than the sharp-edged cracks throughout most of the painting. These scanning techniques could therefore potentially be used to discover other regions that might have been treated using the Pettenkofer method. It should be mentioned that determining the exact shape (roundness/sharpness of cracks) is limited by the scanning resolution and measurement occlusions imposed by the scanning systems.
Visualisations that can be created using 3D data are an improvement on the ways that conservators have traditionally been able to document and view craquelure. Figure 11b show the cracks that were visible with the naked eye in 1994, which the conservator traced onto a transparent polyester film. In the dark background, the dark cracks were more difficult to see, which gives the impression that the cracking might be less pronounced or severe in the background than in the face. In comparison, Fig. 11d, e shows that the relative amount of cracking is much more similar when comparing the face to the background. Although raking light photographs (segments shown in Figs. 11a and 12a) are still useful for visualising the topography, it can be difficult to interpret because the colour of the paint can affect the visibility of the craquelure. With 3D scanning, the topography can be rendered as an image without the colour information, only showing the height variations (see for comparison Fig. 11d, e and detailed  Fig. 1a), and b crack tracing image (see Fig. 1b). Bottom row: rendering of 3D data from Std-Res 3D scan (c) using colour and topography data, (d) rendered as a matte, white surface, and (e) rendered using a colour map, to enhance the height variations. Note that for the renderings the height variations are exaggerated relative to the lateral scale, to increase their visibility views in 12c-f ). Virtual relighting of the topographyrendering it as a matte, white surface-allows a conservator to see the topography more clearly than on a colour image. It could be used to recognise vulnerable areas-e.g. where the paint is lifting, flaking or tenting-and these areas could be earmarked to be checked regularly.

Conclusions
Based on the case study presented in this paper-3D scanning Girl with a Pearl Earring by Johannes Vermeer-we conclude that all techniques are capable of capturing the spatially-varying topographical features of such a painting at a relevant scale (i.e. capable of visualizing individual cracks). However, detailed investigation of the scan results show that resolution of the Std-Res 3D scan is not sufficient to capture the finer cracks, which are of interest (see for instance Fig. 9a).
Measurements of a reference target, also show that these techniques have sufficient accuracy and precision to measure these fine details, although possibly not enough for capturing the finest details with the Std-Res 3D scan. From these measurements we can also conclude, as expected, that multi-scale optical coherence tomography (MS-OCT) and 3D digital microscopy offer the highest accuracy and precision, as compared to the 3D scanning systems (based on fringe-encoded stereo imaging). We find that the standard deviations  Fig. 1a), and b crack tracing image (see Fig. 1b). Middle row: rendering of 3D data from 3D digital microscope scan (c) using colour and topography data, and (d) rendered as a matte, white surface. Bottom row: rendering of 3D data from High-Res 3D scan (e) using colour and topography data, and (f) rendered as a matte, white surface. Note that for the renderings the height variations are exaggerated relative to the lateral scale, to increase their visibility on the measurements of the reference target all lie within the range that would be expected based on their respective measurement uncertainty. Although the MS-OCT and 3D digital microscopy offer higher measurement accuracy and precision (in the height measurement), the single measurement areas (tiles) of these systems are very small. This affects the scanning speed and thereby limits their suitability to scan complete paintings, even at the size of Girl with a Pearl Earring ( 39 cm × 44.5 cm ). Also the stitching of these tiles to create larger topographical maps potentially increases the measurement error.
Moreover, the results show the capabilities of all four systems to combine high-resolution capture of the surface topography, with a large planer scale, surpassing existing painting scanner capabilities, either on resolution or scale.
Given the level of detail captured (for instance shown in Fig. 9), we argue that for the description of topography, the 3D data provides much richer and complete information than conventional documentation techniques, and which could not be reached with existing 3D scanning systems. This moves the analysis of the topography from a subjective and-in the case of a tracing image-binary visualisation of the craquelure, to an objective measurement of these fine details. We believe this to be true for all 3D scanning techniques shown in this paper. We show how such (objective) 3D data of a painting can be interpreted and used (e.g. for conservation purposes) based on case study data, exemplified by three ROI and rendered visualisations of a larger region and detail of the painting's surface, showing topographical features like cracks, moating and effects of paint consolidation.

Future work
The development of approaches to deal with high-resolution 3D scanning data of non-rigid artefacts, like paintings, is considered a crucial next step to enable (over time) data comparison of paintings. Further investigation into alignment and stitching strategies capable of largescale, non-rigid (sub-pixel) alignment in XY and Z are also considered a pre-condition for meaningful comparison of painting data sets, such as collected in this case study. Also the use of multi-modal scan data should be given further consideration, to potentially improve data alignment.
The applicability of 3D scanning systems for scanning complete paintings, such as discussed in this paper, might be further improved by increasing the scanning speed and automated scanning range. Furthermore, the implementation of an auto-focus or auto-alignment functionality of the focal plane for the 3D digital microscopy and 3D scanning systems will be needed to ensure accurate measurement of larger (and potentially more warped) paintings. As 3D scanning-based on fringe-encoded stereo imaging-appears to be the most promising in terms of scan speed to capture complete paintings, we suggest further investigation into increasing the scanning accuracy, precision and robustness, by for instance improving the calibration and imaging strategy in combination with further improving the data processing algorithms.
It is envisioned that 3D scans made at periodic intervals have the potential to monitor long-term changes that occur in an artwork: for instance, changes that might occur if there is a rapid change in temperature or relative humidity, or as a result of frequent handling or transport. Depending on the frequency of monitoring, this could provide information about the speed and conditions in which these changes can occur. A 3D scan of an artwork created before and after a conservation treatment (or of a reconstruction made using historically appropriate materials) can also be used to reveal the effect of such treatments. These could involve methods that are still practiced (like locally flattening lifting paint with heat), and those (like wax-resin lining) that are seldom used by conservators nowadays [43]. As the used 3D imaging systems were built with standard equipment (lamps, cameras, projectors, microscope) that are well documented by the supplier and software based on standardised image processing and calibration procedures (using either Matlab, LabView of Hirox supplied software) the scanners can be rebuilt in the future with the same quality. However, currently no (dedicated) software exists for data comparison (over time), which can adequately deal with the global shape variations nor the data size. It also remains to be investigated if the accuracy of these systems is sufficient for monitoring paintings over time, as currently no data exists on the magnitudes of topographical changes occurring in paintings, including those kept under museum conditions. Topographical data of paintings might also be used for (more accurate) crack detection (e.g. extending studies like [7]), a painter's brushstroke analysis, or (automatic) identification of treatment effects/degradation issues (e.g. recognising effects of the 'Pettenkofer' process).
Documenting the painting's appearance and condition is important, as the artwork itself will inevitably degrade further in the centuries to come, unfortunately probably even up to the point of total disintegration, no matter the conservation efforts. 3D data might be used to create 3D (printed) reproductions, and could also serve as a starting point for (digital) reconstructions, showing past (and future) states of an artwork (i.e. removing or extrapolating craquelure effects).