The problem of image inpainting can be described as the task of filling in damaged (or occluded) areas in an image f defined on a rectangular domain \(\Omega\) by transferring the information available in the intact areas of the image to the damaged areas in the image. Over the last 30 years a large variety of mathematical models solving the image inpainting problem have been proposed, see, e.g., [28, 34] for a review. In some of them, image information is transferred into the damaged areas (the so-called inpainting domain, denoted by D in the following) by using local information only, i.e. by means of suitable diffusion and transport processes which interpolate image structures in the immediate vicinity of the boundary of D in the occluded region. Such techniques have been shown to be effective for the transfer of geometric image structures, even in the presence of large damaged areas [28]. However, because of their local nature, such methods do not make use of the entire information contained in the intact image regions. In particular, such methods do not take into account non-local image information in terms of patterns and textures nor image contents located far away of D. For this reason, non-local mathematical models exploiting self-similarities in the whole image have been proposed [29, 30, 35, 36]. Such models operate on image patches rather than single pixels. Small patches inside D are iteratively reconstructed by comparison with patches outside D in a suitable distance. Missing patches are then reconstructed by copy and paste of a closest patch (or its centre pixel) from the intact part of the image. These models have been proven to be impressively effective in a very large variety of applications and rendered computationally feasible in recent years with the well-known PatchMatch algorithm [37].
The first step of any inpainting algorithm is the decomposition of the image domain in damaged and undamaged areas. This is an image segmentation problem, decomposing a given image into its constituting regions, cf. for instance [34]. Its solution may be rendered very hard in the presence of fuzzy and irregular region boundaries and small scale objects.
In the following we describe an algorithm which detects damaged areas in images with possibly large and non-homogeneous missing regions using few examples provided by the user. This is then used as a necessary initial step for the subsequent application of a two-stages inpainting procedure based on total variation inpainting [38] and exemplar-based image inpainting proposed in [36] for the reconstruction of image contents in the images of the illuminated manuscripts in Fig. 1. Our proposed segmentation is semi-supervised since user input is required for training, while the inpainting procedure is fully automated.
Description of the dataset
Our dataset is composed of two manuscripts made by William de Brailes in 1230-1250 and now part of the collection of the Fitzwilliam Museum in Cambridge (UK), see Fig. 1: Last Judgement in Fig. 1a and Christ in Majesty with King David playing the harp in Fig. 1b, of dimension 196x123mm and 213x135mm, respectively. The images are acquired with a Leaf Valeo 22 back utilising a Mamyia RB67 body and the resulting RAW files are processed using Leafs own proprietary software, where distortions and aberrations are corrected. Also, the colour accuracy is provided by using a customized Kodak colour separation guide with grey-scale (Q13 equivalent) and exported in Adobe 98 colour space. The final output results in very large .tif images (about \(4008\times 5344\) pixels and 47 MB each).
A semi-supervised algorithm for the detection of the damaged areas
For identifying the damaged areas in the image (mainly missing gold leaves) we propose in the following a two-step semi-supervised algorithm. Here, a classical binary segmentation model is used first for the extraction of a small training region as described in “Chan-Vese segmentation” section which subsequently serves as an input for a labelling algorithm which segments the whole inpainting domain based on appropriate intensity-based image features in “Image descriptors: feature extraction” and “A clustering algorithm with training” sections.
Chan-Vese segmentation
In binary image segmentation one seeks to partition an image in two disjoint regions, each characterised by distinctive features. Typically, RGB intensity values are used to describe image contents and mathematical image segmentation methods often compute the required segmented image as the minimiser of an appropriate functional.
Let f be the given image. We seek a binary image u so that
$$\begin{aligned} u(x) = {\left\{ \begin{array}{ll} c_1, & {}\quad {\text {if}} \; x \text{ is } \text{ inside } C, \\ c_2, & {} \quad {\text {if}} \; x \text{ is } \text{ outside } C, \end{array}\right. } \end{aligned}$$
(1)
where C is a closed curve. In this work, we consider the Chan-Vese segmentation functional for binary image segmentation [39], that is
$$\begin{aligned} \mathcal {F}(c_1,c_2,C):= \, & {} \mu ~ \text {Length}(C) + \nu ~\text {Area}\left( int(C)\right) \\&+ \nonumber \lambda _1\sum _{x\in int(C)} | f(x)-c_1 |^2 + \lambda _2\sum _{x\in ext(C)} |f(x)-c_2|^2. \end{aligned}$$
(2)
The functional \(\mathcal F\) is minimised for constants \(c_1\) and \(c_2\) and the contour C, i.e. the optimal u of the form (1). Here, \(\mu ,\,\nu ,\,\lambda _1,\,\lambda _2>0\) are positive parameters and \(int(C),\, ext(C)\) denote the inner and the outer part of C, respectively. In (2) the first and second term penalise the length of C and the area of the region inside C, respectively, giving control on the smoothness of C and the size of the regions. The two other terms penalise the discrepancy between the fitting of the piecewise constant u in (1) and the given image f in the interior and exterior of C, respectively. By computing a minimum of (2) one retrieves a binary approximation u of f.
Despite being very popular and widely used in applications, the Chan-Vese model and its extensions present intrinsic limitations. Firstly, the segmentation result is strongly dependent on the initialisation: in order to get a good result, the initial condition needs to be chosen within (or sufficiently close to) the domain one aims to segment. Secondly, due to the modelling assumption (1), the Chan-Vese model works well for images whose intensity is locally homogeneous. If this is not the case, the contour curve C may evolve along image information different from the one we want to detect. Images with significant presence of texture, for instance, can exhibit such problems. Furthermore, the model is very sensitive to the length and area parameters \(\mu\) and \(\nu\), which may make the segmentation of very small objects in the image difficult.
For our application, we make use of the Chan-Vese modelFootnote 1 to segment a sub-region \(D_1\) of D that will serve as a training set for the classification described in the following two subsections. To do that, we ask the user (typically, an expert in the field) simply to click on a few pixels inside the inpainting domain D to identity a candidate initial condition for the segmentation model (1), which is then run to segment the subregion \(D_1\). In Fig. 2 we show the results of this approach with a superimposed mask of the computed region \(D_1\) for some details cropped from the original images.
Because of the intrinsic limitations of the Chan-Vese approach, we observe that the segmentation result is not satisfactory (see, for instance, the example in the first row of Fig. 2) since it generally detects with high precision only the largest uniform region around the user selection. To detect the whole inpainting domain D in this manner, the user should in principle give many initialisation points, which may be very demanding in the presence of several disconnected and possibly tiny inpainting regions.
For this reason, we proceed differently and make use of a feature-based approach to use the area \(D_1\) as a training region for a clustering algorithm running over the whole set of image pixels. This procedure is described in the next two sections.
Image descriptors: feature extraction
In order to describe the different regions in the image in a distinctive way, we consider intensity-type features. Namely, for every pixel x in the image we apply non-linear colour transformations to compute the HSV (Hue, Saturation, Value), the geometric mean chromaticity GMCR [40], the CIELAB and the CMYK (Cyan, Magenta, Yellow, Key) values (see [41] for more details). Once this is done, we append all these values and store them in a feature vector \(\varvec{\psi }\) of the form
$$\begin{aligned} \varvec{\psi }(x)= [ \text {HSV}(x), \text {GMCR}, \text {CIELAB}(x), \text {CMYK}(x)]. \end{aligned}$$
(3)
For our purpose the feature vector (3), essentially based on RGB intensities, rendered precise segmentations. For more general segmentation purposes, one could add texture-based features and, if available, multi-spectral measurements such as infrared IR or ultraviolet UV images.
A clustering algorithm with training
Once the feature vectors are built for every pixel in the image, we use the training region \(D_1\) detected as described in “Chan-Vese segmentation” section as a dictionary to drive the segmentation procedure extended to the whole image domain. We proceed as follows. First, we run a clustering algorithm over the whole image domain comparing the features defined in (3) in order to partition the image in a fixed number of K clusters. To do that, we use the well-known k-means algorithm.Footnote 2 After this preliminary step, we check which cluster has been assigned to the training region \(D_1\) and simply identify in the clustered image which pixels lie in the same cluster. By construction, this corresponds to finding the regions in the image ‘best-fitting’ the training region in terms of the features defined in “Image descriptors: feature extraction” section, which is our objective. After a refinement step based on erosion/dilation of extracted regions, so as to remove or fill-in possibly misclassified pixels, we can finally extract the whole area to inpaint D. We report the results corresponding to Fig. 2 in Fig 3a, b.
Inpainting models
Once an accurate segmentation of the damaged areas is provided, the task becomes the actual restoration of the image contents in D by means of the available information in the region \(\Omega \setminus D\). A standard mathematical approach solving such an inpainting problem consists in minimising an appropriate function \(\mathcal {E}\) defined over the image domain \(\Omega\), i.e. in
$$\begin{aligned} \text { finding }\qquad u\qquad \text {s.t.}\qquad u\in \text {argmin}_v~ \mathcal {E}(v). \end{aligned}$$
(4)
A standard choice for \(\mathcal {E}\) in the case of local inpainting models is the functional
$$\begin{aligned} \mathcal {E}(v) = R(v) + \lambda \Vert \upchi _{\Omega \setminus D}(f- v)\Vert ^2_2, \end{aligned}$$
(5)
where f denotes the given image to restore, \(\Vert \cdot \Vert _2\) is the Euclidean norm, \(\lambda\) and appropriately chosen positive parameter and \(\upchi _{\Omega \setminus D}\) denotes the characteristic function of the non-occluded image areas, so that for every pixel \(x\in \Omega\):
$$\begin{aligned} \upchi _{\Omega \setminus D}(x) = {\left\{ \begin{array}{ll} 1\quad &{}\text {if }\; x\in \Omega \setminus D\\ 0 \quad &{} \text {if }\; x\in D. \end{array}\right. } \end{aligned}$$
The second term in (5) is as a distance function between the given image f and the sought after restored image u in the intact part of the image. The multiplication of \(f-u\) by the characteristic function \(\upchi\) implies that this term is simply zero for the points in D, since there is no information available, while \(f-u\) for all the points in \(\Omega \setminus D\) has to be as small as possible. The term R typically encodes local information (such as gradient magnitude) which is the responsible of the transfer of information inside D by means of possibly non-linear models [28, 34]. The transfer process is balanced with the trust in the data by the positive parameter \(\lambda\). A classical choice of a gradient-based inpainting model consists in choosing
$$\begin{aligned} R(v) = \Vert \nabla v \Vert _1 = \sum _{x\in \Omega } | \nabla v(x) | \end{aligned}$$
(6)
i.e. the Total Variation of v [38]. As mentioned above such an image inpainting technique is not designed to transfer texture information. Furthermore, it fails in the inpainting of large missing areas. For our purposes we use (6) as an initial ‘good’ guess with which we initialise a different approach based on a non-local inpainting procedure as described in the following section.
Exemplar-based inpainting
We describe here the non-local patch-based inpainting procedure studied in [30, 36] and carefully described in [42] from an implementation point of view.Footnote 3 In the following, we define for any point \(x\in \Omega\) the patch neighbourhood \(\mathcal {N}_x\) as the set of points in \(\Omega\) in a neighbourhood of x. Assuming that the patch neighbourhood has cardinality n, by patch around x we denote the 3n-dimensional vector \(P_{x} = (u(x_1), u(x_2),\ldots ,u(x_n) )\) where the points \(x_i, i=1,\ldots n\) belong to patch neighbourhood \(\mathcal {N}_x\). In order to measure ‘distance’ between patches, a suitable patch measure d can be defined, so that \(d(P_{x},P_{y})\) stands for the patch measure between the patches around the two points x and y. We define then the Nearest Neighbour (NN) of \(P_{x}\) as the patch \(P_y\) around some point y minimising d.
For an inpainting application the task consists then in finding for each point x in the inpainting domain D the best-matching patch \(P_y\) outside D. Assuming that each NN patch can be characterised in terms of a shift vector \(\phi\) defined for every point in \(\Omega\) (i.e. assuming there exists a rigid transformation \(\phi\) which shifts any patch to its NN), the problem can be formulated as the minimisation problem
$$\begin{aligned} \min ~ \mathcal {E}(u,\phi ) = \sum _{x\in D}~ d^2\left( P_{x},P_{x+\phi (x)}\right) . \end{aligned}$$
(7)
Heuristically, every patch in the solution of the problem above is constructed in such a way that in the damaged region D the patch has a correspondence (in the sense of the measure d) with its NN patch in the intact region \(\Omega \setminus D\). Following [42], we use the following distance:
$$\begin{aligned} d^2\left( P_{x},P_{x+\phi (x)}\right) = \sum _{y\in \mathcal {N}_{x}} \left( u(y)- u(y+\phi (x))\right) ^2. \end{aligned}$$
(8)
From an algorithmic point of view, solving the model involves two steps: the first consists in computing (approximately) the NN patch for each point in D, so as to provide a complete representation of the shift map \(\phi\). This can be computationally expensive for large images. In order to solve this efficiently, a PatchMatch [37] strategy can be applied. Afterwards a proper image reconstruction step is performed, where for every point in D the actual corresponding patch is computed. We refer the reader to [42] for full algorithmic details.
A crucial ingredient for a good performance of the exemplar-based inpainting algorithm [30, 36] is its initialisation. In particular, once the inpainting domain is known, a pre-processing step where a local inpainting model, such as the TV inpainting model (5) with (6), can be run to provide a rough, but reliable initialisation of the algorithm.Footnote 4
We report the results of the combined procedure in Fig. 4 and the overall work-flow of the algorithm in the diagram in Fig. 5.
Model parameters
For the segmentation of the training region \(D_1\) within the inpainting domain D we use the activecontour MATLAB function by which the Chan-Vese algorithm can be called. For this we fixed the maximum number of iterations to maxiter\(=1000\) and use the default value as a tolerance on the relative error between iterates as a stopping criterion. We use the default values for the parameters \(\mu\) and \(\nu\) in (2). The subsequent clustering phase was performed by means of the standard MATLAB kmeans function after specifying a total of \(K=35\) labels to assign. The use of such a large value for K turned out to be crucial for an accurate discrimination. The automatic choice of the value of K for this type of applications is a matter of future research. The clustering was iteratively repeated 5 times to improve accuracy. Once the detection of the inpainting domain is completed, in order to provide a good initialisation to the exemplar-based model we use the TV inpainting model (4) with (6) with the value \(\lambda =1000\) and a maximum number of iterations equal to maxiter2\(=1000\) with a stopping criterion on the relative error between iterates depending on a default tolerance. Finally, we followed [42] for the implementation of the exemplar-based inpainting model: for this we specified 12 propagation of iterations and tested different sizes for the patches. In order to avoid memory shortage, we restricted ourselves to patches of size \(5\times 5\), \(7\times 7\) and \(9\times 9\).
The numerical tests were performed on a standard MacBook Pro (Retina, 13-inch, Early 2015), 2.9 GHz Intel Core i5, 8 GB 1867 MHz DDR3 using MATLAB 2016b.
Discussion and outlook
We proposed in this section a combined algorithm to retrieve image contents from two images of illuminated manuscripts shown in Fig. 1 where very large regions have been damaged. At first, our algorithm computes an accurate segmentation of the inpainting domain which is performed by means of a semi-supervised method exploiting distinctive features in the image. Then, taking the segmentation result as an input, the procedure is followed by an exemplar-based inpainting strategy (upon suitable initialisation) by which the damaged regions are filled.
The results reported in Figs. 4 and 6 confirm the effectiveness of the combined method proposed. In particular, when looking at the difference between standard local (TV) image inpainting methods and the exemplar-based one we immediately appreciate the higher reconstruction quality in the damaged regions, especially in terms of texture information. The method has been validated on several image details extracted from the entire images, and has been shown effective also for very large image portions with highly damaged regions.
In term of computational times, the segmentations in Fig. 3 are obtained in approximatively 15 min. The inpainting results in Fig. 4 are obtained in about 3 min for patches of size \(5 \times 5\) and about 7 min for patches of size \(7 \times 7\). Overall the whole task of segmenting and inpainting the occluded regions takes approximatively 20 min per image of size \(690 \times 690\). However, these results highly depend on the size of the image, the size of the inpainting domain and the size of the patches chosen.
Future work could address the use of different features for the segmentation of the inpainting domain with similar methodologies, such as for instance texture features [43]. Furthermore, at an inpainting level, we observe that the reconstruction of fine details in very large damaged regions (such as the strings of the harp in Fig. 6) is very challenging due to the lack of correspondence with similar training patches in the undamaged region. For solving this problem a combination of exemplar-based and local structure-preserving inpainting models could be used.