The geometry of colors in van Gogh’s Sunflowers

“Paintings fade like flowers”: van Gogh’s prediction on the impact of age on paintings came true for most of his paintings. We have studied the consequences of this aging on the Sunflowers in a vase with a yellow background series, namely its original, F454, currently in London, and two replicates, F457, in Tokyo, and F458, in Amsterdam, which van Gogh painted using the original as a model. The background and flower renditions in those paintings have faded and turned brown, making them less vibrant that van Gogh had most likely intended. We have attempted to restore van Gogh’s intent using a computational approach based on data science. After identifications of regions of interest (ROI) within the three paintings F454, F457, and F458 that capture the flowers, stems of the flowers, and background, respectively, we studied the geometry of the color space (in RGB representation) occupied by those ROIs. By comparing those color spaces with those occupied by similar ROIs in photographs of real sunflowers, we identified shifts in all three color coordinates, R, G, and B, with the positive shift in the blue coordinate being the more salient. We have proposed two algorithms, PCR-1 and PCR-2, for correcting that shift in blue and generate representations of the paintings that aim to restore their original conditions. The reduction of the blue component in the yellow hues has lead to more vibrant and less brownish digital rendition of the three Sunflowers in a vase with a yellow background.


Introduction
Vincent van Gogh's series of paintings of Sunflowers are among his most famous creations. There are two such series, the first one executed in Paris in 1887 that depicts sunflowers lying on the ground, while the second series was made later in Arles and represents sunflowers in a vase. Van Gogh made four versions of the Arles series in 1888, with different flower arrangements and different backgrounds: turquoise for the first version, F453, royal-blue for the second version, F459, blue-green for the third version, F456, and yellow for the fourth version, F454. Note that we use here the recognized classification of van Gogh's paintings, where F stands for the De La Faille catalogue of these paintings. A year later van Gogh made one repeat of the third version (F455), and two repeats of the 4th version (F458 and F457). The authenticity of the second repetition of the 4th version, F457, has been questioned, although experts now believe it is authentic [1]. These three repetitions and their two originals are usually referred to as the Sunflowers in a vase, where the number of sunflowers vary even in van Gogh's own recollection as described in his correspondence [2]. They are all currently housed in different museums: the third version, F456, is in the Neue Pinakothek museum in Munich, Germany, and its repetition, F455, is in the Philadelphia Museum of Art, United States; the fourth version, F454, is at the National Gallery in London, England, while its repetitions, F457 is at the Seiji Togo memorial Sompo Japan Museum of Art, Tokyo, Japan, and F458 is at the van Gogh Museum Amsterdam. Those paintings are rarely reunited, with possibly the latest occurrence being an exhibit at the National Gallery in London in 2014 were the original and one repetition of the 4th version were shown side to side. Interestingly, however, it is possible to see them virtually in the same room in a virtual 360 • exhibition commented by Vincent van Gogh's grandson (https:// www. faceb ook. com/ VanGo ghMus eum/ videos/ 10159 18733 40105 97/). In this paper we focus on the fourth version, F454, and its two repetitions, F457 and F458, in which the vase with sunflowers is portrayed against a yellow background. This version is considered as a study of variations on the theme of yellow, with the aim of achieving a light-on-light effect [1,3]. Pictures of those three paintings are shown in Fig 1. Van Gogh's Sunflowers appeal to many. The fascination exerted by van Gogh's stay in the south of France in 1888-1889, his friendship and fallout with Paul Gauguin, his mental health, and the seven sunflowers canvases he painted during that time remain a tremendous source of inspiration in many forms of popular entertainment as well as for artists, museum curators, and scientists in general. Biologists have studied how bees interact with those paintings [4], the genetic fabric of the sunflowers [5], doctors have attempted to associate his medical conditions and related treatments to his perception of colors [6][7][8]. Of direct relevance to the paintings themselves, scientists and museum curators have used a broad array of traditional and state-of-the-art techniques to look at and below the paint surface itself [9][10][11][12][13][14][15][16][17]. Of special interest, we advise the reader to consider the recent book, Van Gogh's Sunflowers Illuminated published by the Amsterdam van Gogh museum that summarizes the results of the research undertaken by an international team of scientists, curators, and art historians aimed at comparing the F454 Sunflower from the London National gallery with one of its repetitions, F458 from the van Gogh museum [18]. In this paper, we approach the same problem of comparing F454 with its repetitions F457 and F458 from a very different perspective. Instead of analyzing the canvases and the paint surfaces directly, we study high resolution images of the paintings using a data science approach. The availability of large collection of digital images of paintings has opened the door to the use of state-of-the-art supervised machine learning techniques in art, for understanding the artist's style [19], for classifying paintings [20], to detect forgeries [21], and possibly for even creating art [22]. Our approach to analyzing one of the Sunflowers painting and two of its repeats differs as it is unsupervised. We consider a painting based on a high-resolution image of it. This image is a collection of pixels, with each pixel characterized by its location and color. The color is quantified based on the RGB color model. This is an additive color model in which Red, Green, and Blue lights are added together to reproduce any colors. In a digital image, the amount of each R, G, and B color is discrete, usually an integer in the range [0, 255]. As such, a pixel in an image belongs to a discrete color space of size [0, 255] 3 . Our analysis of F454, F457, and F458 amounts to comparing their representations in this discrete color space. We are particularly interested in the sunflowers themselves, with their yellow crowns and stems as they are at the heart of the three paintings.
We acknowledge that our analyses may be somewhat subjective. We only use indirect representations of the three paintings by van Gogh, namely digital images of Fig. 1 Pictures of the three paintings of Sunflowers in a vase with a yellow background from Vincent van Gogh: A the fourth version, F454, painted in 1888 in Arles and currently at the National Gallery in London, and its two repetitions, also painted in Arles in 1889, F457, currently in the Seiji Tojo memorial Sampo Museum in Tokyo B, and F458, at the van Gogh museum in Amsterdam C those paintings. While those images are high resolution, it is not impossible that there are some color distortions, although those are expected to be small with modern digital cameras (this will be addressed briefly in the following). More importantly, the paintings themselves are more than 130 years old and the modern images we have of those paintings certainly differ from their original renderings. It is interesting that van Gogh himself was well aware that paintings age, as he wrote to his brother on April 30th 1889 that "paintings fade like flowers" [2]. Many studies have focused on the chemistry of aging of van Gogh's paintings [9][10][11][12][13][14][15][16], as well as attempts to predict the evolution of the appearance of those paintings [23][24][25]. With this as a background, there are two main directions that we have followed in our analyses of the digital images of F454, F457, and F458: i) Identify markers of aging from the color distributions in the digital images. We were able to identify shifts in the red, blue, and green components of the images that are most likely a result of aging. The most significant shift is observed for the blue component of yellow hues, leading to a browning and fading of those colors. Using this shift, we propose pseudo-color reconstruction schemes that enable us to generate model images of the original paintings. We note that the same procedure can be used to extrapolate to models of the paintings in the future. ii) Compare and contrast the use of colors in the original version, F454, and the two repetitions, F457 and F458, of the Sunflowers in a vase with a yellow background.
In particular, we identify differences between the repetitions that most likely reflect different intents by van Gogh.
We organize the paper as follows. Major findings are shown in the "Results and discussion" section. In particular, we provide an in depth analysis of the RGB color spaces of the digital images corresponding to the three paintings, F454, F457, and F458. For clarity reason, we will refer to F454 as being the "London version", F457 as the "Tokyo version", and F458 as the "Amsterdam version". By comparison with recent images of actual sunflowers, we identify significant shifts in the blue components, B, of the images of those paintings and use those shifts to derive pseudo-color reconstruction (PCR) schemes to attempt to restore the original images. We also propose a color transfer scheme that enables us to compare the three paintings. In the "Conclusion" section we briefly discuss the benefits of our approach as well as future directions of research. Finally, in the "Materials and methods" section we provide technical details on the images we have used, on the statistical methods that we implemented, as well as on the algorithms we have implemented.

Results and discussion
Basic statistics on the color content of the three digital images of the Sunflowers paintings A digital image is composed of basic image elements, i.e. pixels. Each pixel is characterized by its position in the image and its color. In the RGB model, a color is described by 3 components, also called coordinates, representing the amounts r, g, and b, of red, green, and blue that need to be combined to generate that specific color. Each coordinate is an integer value taken in the range [0, 255]. There are therefore 256 3 (approx. 17 millions) possible colors. We have used high resolution images for all three Sunflower paintings, namely the London version, the Tokyo version, and the Amsterdam version. We are aware that there might be some color distortion in those images, associated with the fact that it is likely that those images were taken with different cameras. In addition, we cannot exclude the effect of the resolution. While it is difficult to assess the importance of the cameras, we can at least assess the importance of the resolution. We generated the distributions of the values for the red, green, and blue coordinates over the three high resolution digital images of the three paintings, as well as for corresponding low resolution images (see "Materials and methods" section for details on how to generate those distributions). Results are shown in Fig. 2. The low resolution images were taken from the museum websites directly. For the London version, the high resolution image has 3349 × 4226 pixels, while its corresponding low resolution image has 629 × 800 pixels. Similarly, the high resolution image of the Tokyo version has 3626 × 4829 pixels, while its corresponding low resolution image has 450 × 602 pixels, and the high resolution image of the Amsterdam version has 3324 × 4226 pixels, while its corresponding low resolution image has 609 × 800 pixels. It is interesting that despite those significant differences in resolution, the distributions of red, green, and blue are very similar (although admittedly not identical) for the high of low resolution images, for all three paintings, while different when comparing the paintings themselves. This gives us some confidence that we can perform such comparisons. In the remainder of the paper, we will use only the high resolution images.
As mentioned above, the distributions of red, green, and blue differ between the three paintings (see Fig. 2). While the distributions of red and green values are relatively similar over the three images, with values that concentrate in the range [100, 255], the distributions of values of the blue coordinate differ significantly. In the Tokyo version there are basically no pixels whose blue intensity is below 60 or otherwise stated nearly all its pixels have some blue component. In contrast, in the Amsterdam version there are more than 5 % of the pixels that have a blue coordinate close to zero, namely no blue component. Clearly, something is different with the contributions of blue in the paintings. This will be discussed in detail below.
Interestingly, in the high resolution digital images of the London, Tokyo, and Amsterdam paintings, we observe only 2 %, 5 %, and 3 % of those 256 3 colors, respectively, indicative of relatively low diversities of colors in the three paintings. While this may not be surprising as all three paintings are predominantly yellow (see Fig. 1), it is noteworthy that the color diversity differ significantly between the painting. Indeed, the Tokyo version and the Amsterdam version are defined as repetitions of the London version and therefore we could expect more similarities.
We looked also at the association between pairs of fundamental colors in all three paintings, by computing their relative conditional entropy HR (see "Materials and methods" section for details). The conditional entropy HR measures the amount of information shared between two colors. Our definition of HR places it in the interval [0, 1]; it is equal to zero if the two colors are fully determined by each other, while conversely it is equal to 1 if the two colors are completely independent. In Fig. 3 we display the values for the nine pairs of colors in the form of tables for the three images corresponding to the London, Tokyo, and Amsterdam versions. As expected, a color compared to itself leads to a conditional entropy of zero. In contrast, the HR values for pairs of different colors reveal very low dependence between them. There are, however, noticeable differences between the three images.

Color distributions in the background, the sunflower crowns, and the sunflower stems of the Sunflowers in a vase with a yellow background
Visual inspections show differences between the London version and the Tokyo and Amsterdam versions, respectively, especially when it comes to their colors. We see that the green color of the stems is similar to the expected green for stems of real sunflowers. On the other hand, the yellow flowers are darker than the expected vibrant yellow of the petals of a real sunflower, especially in the London version (see Fig. 1). There are many possible reasons for this darker color of the flowers. First, it could have been Van Gogh's intent. It is known that Van Gogh painted his series of paintings on sunflowers using real sunflowers are models. He painted the London version in late August 1888, when sunflowers usually start to fade [26]. Second, van Gogh used the yellow chrome pigment to color the sunflowers and it is known that this pigment darkened as it ages [9,10,27]. Finally, we analyze the colors of the paintings using digital images of those paintings. The images are influenced by the lighting that was used at the time that they were taken, which may influence our perception of their colors (see discussion above on the importance of resolution, as well as reference [25]). It is therefore difficult to isolate a specific origin, especially as those mentioned above are ultimately non-exclusive. We propose a data-driven approach based on the analysis of the color space of the three paintings to provide some quantitative elements that describe the differences between the three paintings, and their differences with real-life sunflowers.
The Sunflowers in a vase with a yellow background paintings include 15 sunflowers with their stem in a vase standing on a table, with a yellowish background. For each painting, we identify three regions of interest, or ROI. A ROI is defined according to its position in the image as well as from its color consistency. The first two ROIs relate to the flower crowns of the sunflowers (mainly yellow, yf-ROI) and the stems of the sunflowers (green, g-ROI), while the third ROI refers to the background (bg-ROI), which includes the table, vase, and the region behind the sunflowers. In parallel, we analyzed two recents photographs of sunflowers (see Fig. 12), from which we extracted the corresponding flower ROIs and stem ROIs. Extractions of the ROIs were performed as described in the Material and Methods section. Each ROI is defined as a set of pixels, characterized by their locations and colors, with the latter given in the RGB space with three discrete coordinates with values in the range [0, 255].

The yellow flowers of the Sunflowers in a vase with a yellow background
We compared first the ROIs associated with the yellow parts of the sunflowers of the three paintings. Fig. 4 shows the distribution of colors of the pixels associated with those ROIs in the red-green, red-blue, and greenblue planes. In the red-green plane, most pixels are found to follow the first diagonal, consistent with the general yellow color of those ROIs. Interestingly, the original, namely the London version, and the Amsterdam repetition are found to be very similar. Both differ from the Tokyo repetition. The differences with the Tokyo version are striking in the red-blue and blue-green planes, as blue appears with a wide range of values in the London and Amsterdam versions, from 0 to 120, while it is always present in the Tokyo version, appearing in a much smaller range centered around 60.
While there are differences between the three paintings in the diversity of color of their yellow sunflowers, there are even bigger differences when we compare those sunflowers with real sunflowers, as observed by comparing i) The yellow of real sunflowers contain a minimal amount of blue, as illustrated in the blue-red and blue-green planes for the two photographs SF1 and SF2. In contrast, as indicated above, the yellow observed in the sunflowers of the paintings contains a significant amount of blue. ii) The yellow of the flowers of real sunflowers contains a wide range of green covering nearly the whole spectrum of possible values from 0 to 250. In contrast, the green in the yellow of the flowers in the paintings is limited to the range [50, 250] in all three paintings. iii) Similarly, the yellow of the flowers of real sunflowers contains a wide range of red with values between 20 and 255. In contrast, the red in the yellow of the flowers in the paintings is limited to the range [100,250] in all three paintings.

The stems of the Sunflowers in a vase with a yellow background
We repeated the analysis described above on the ROIs corresponding to the green stems of the sunflowers, both for the 3 paintings of sunflowers and for the two photographs of real sunflowers. Results are shown in Figs. 6 and 7, respectively. Comparisons of those two figures lead to the following observations: i) Among the two copies, namely the Tokyo version and the Amsterdam version, the latter is the closest to the original (the London version). This result is the same as what was observed for the yellow parts of the petals and crowns of the sunflowers. Differences associated with the Tokyo version again come from a smaller range of blue coordinates for the pixels representing the stems. . Each dot represents one pixel and its color (grayscale) suggests the intensity of its red or green or blue coordinate, ranging from low (white) to high (black). In each panel, from left to right, we display red vs green (blue in grayscale), red vs blue (green in grayscale), and green vs blue (red in grayscale) ii) On average, the pixels in the green ROI of the Tokyo version contain more blue than the corresponding pixels in the London version and the Amsterdam version. This is visible in the paintings themselves, as the stems in the Tokyo version appear with darker green (a consequence of the addition of blue) than those of the London and Amsterdam versions. iii) There is a stronger linear relationship between red and green coordinates for the stems of the real sunflowers, compared to the stems in the three paintings. iv) Besides the stronger relationship indicated above, the colors of the stems of the painted sunflowers qualitatively resemble the colors of the stems of real sunflowers, as captured by photographs.

The backgrounds of the Sunflowers in a vase with a yellow background
Finally, we compared the backgrounds of the three paintings in Fig. 8. The background ROIs are expected to be the most diverse as they include multiple parts of the paintings. In addition, visual inspections of those paintings clearly indicate significant differences (see Fig. 1): the background of the London version is globally pale yellow, while those of the Tokyo and Amsterdam versions are more green-yellow and solid yellow, respectively. Those differences are reflected in the differences between the RGB space occupied by the pixels of the background, as illustrated in Fig. 8. The background ROI within the London version shows a large number of pixels with large R, G, and B coordinates; those pixels will show as close to white. In contrast, a significant number of pixels in the background of the Tokyo version have large G coordinates, consistent with a green coloration. Finally, the background of the Amsterdam version includes many pixels with high R and G coordinates, and low B coordinate, consistent with a solid yellow.

Why those differences between the paintings, and between the painted Sunflowers and real sunflowers?
As mentioned in the introduction of this section, there are possibly three main reasons for differences between the Sunflowers paintings themselves, and between the paintings and real sunflowers. Those reasons are associated to the painter's intent, to the aging of the paintings, and to artifacts associated with the digital images we consider. We exclude the latter as we believe that such artifacts are minor compared to elements of the two other reasons [25]. The analyses we have presented above provide elements that highlight the importance of aging. The three Sunflower paintings currently in London, Tokyo, and Amsterdam are more than 130 year old and as such they have been aging. They have been subject to color degradation and deterioration as the color pigments are constantly exposed to light, humidity, pollution, and microbial contamination [13,28]. It is known that Van Gogh used commercial oil paints for his paintings. In a letter to Arnold Koning, dated January 1889 and believed to refer to the London version of the Sunflowers , van Gogh described them as being "painted with the three chrome yellows, yellow ochre and Veronese green and nothing else" (letter 740 [2]). Recent X-ray fluorescence spectrometry analyses of the London version [29] and of the Tokyo version [14] revealed that indeed van Gogh was using chrome yellows to render yellow in those paintings; it is very likely that he was using the same pigments for the Amsterdam version. The aging of chrome yellow pigments used in paintings has been studied in details [9-11, 13, 27], including studies on van Gogh's Sunflowers [15]. Those studies have highlighted that the lightest hues in the chrome yellow family contain sulfate groups, which reduce the pigments' stability under light: bright yellow on canvases then turns to brownish green. Our comparisons of the yellow observed for real sunflowers and the yellow visible in the sunflowers in the three paintings are consistent with those observations. In particular, we observe an increase in the amount of blue for the yellow pixels associated with the flowers in those paintings, leading to a less vibrant yellow that may even look brown. In addition, we observe a shift to increased amount of green for those pixels (as none of them have green coordinates below 50), while the distribution of green in the yellow of real sunflowers covers the whole spectrum. This increase in blue and green is consistent with the yellow colors appearing more brownish green. While the differences in the yellow hues of the painted Sunflowers compared to the yellow hues in real sunflowers are explicit for all three paintings, highlighting aging of those paintings, we have also observed significant differences between the Tokyo version of the repetition and the two other paintings, the London version (the original), and the Amsterdam version (the second repetition). This is most likely the intent of van Gogh, as proposed . Each dot represents one pixel and its color (grayscale) suggests the intensity of its red or green or blue coordinate, ranging from low (white) to high (black). In each panel, from left to right, we display red vs green (blue in grayscale), red vs blue (green in grayscale), and green vs blue (red in grayscale) by Bakker and Ripoelle [26]. Indeed, van Gogh painted the Tokyo version not based on real sunflowers but based on another work of art, the London version (van Gogh painted the Tokyo version in January when there were no sunflowers available). In a letter to his brother Theo (letter 736, [2]), van Gogh mentioned that this repetition was meant to be "equivalent and identical", although it was clear that this referred to the subject (the vase and sunflowers), and not to details and colors. Van Gogh pushed chromatic intensity even further, with the aim of achieving a radical light-on-light effect, such that the green stalks of the flowers contrast even more strikingly with the various yellows than in the original, the London version. Those differences remain despite the aging of the paintings. In contrast, our analyses show that the Amsterdam version appears closer to the original, the London version, than the other repetition, the Tokyo version.
Can we remediate aging of colors in paintings digitally?
Paintings age and consequently do not look today as they were originally designed by the artists. This is not evident at the level of the colors that undergo fading and sometimes changes in hue, such as yellow turning into brown. While this aging is inevitable, there is great interest among curators to recreate the artists' original colors to enhance the experience of museum visitors. Paintings, however, cannot be physically restored to their original colors and only reconstructions offer the possibility of recreating their appearance as intended by the artist.
Over the past few years, digital reconstructions of several paintings by van Gogh and other post-impressionist artists have been published, including investigation of van Gogh's series of The Bedroom [30], The Starry Night [31], Undergrowth with Two Figures [32], Irises [33], Roses [33], and Fields with Irises near Arles [25]. Some of those reconstructions rely on identifications of regions within the paintings that contain pigments that may have aged, using X-ray fluorescence, followed by digital reconstruction using software that can manipulate images (see for example the digital reconstruction of van Gogh's "Undergrowth with two figures" [32], or applications of software for optimizing the display of images on mobile devices [34]. The analyses of the RGB space occupied by the pixels of the paintings Sunflowers in a vase with a yellow background provided above suggest two other methods for digital reconstruction, namely color transfer and color correction, which we describe below.

Restoring colors in van Gogh's Sunflowers using color transfer
One approach to correcting aging in a painting is to transfer color from a recent representation of an object onto the region representing that object in the painting. For example, we can transfer the yellow color from live sunflowers observed in photographs to the regions representing sunflowers in one of the paintings. We illustrate this process in Fig. 9 for the London version, namely the original Sunflowers in a vase with a yellow background. The RGB space occupied by the pixels of the green stem ROIs of the photographs SF1 (A) and SF2 (B) of real sunflowers. Each dot represents one pixel and its color (grayscale) suggests the intensity of its red or green or blue coordinate, ranging from low (white) to high (black). In each panel, from left to right, we display red vs green (blue in grayscale), red vs blue (green in grayscale), and green vs blue (red in grayscale) We start with the ROI corresponding to the yellow flowers of the London version as well as the corresponding ROI of the yellow flowers in the photograph SF1. We apply hierarchical clustering on the pixels of each of those ROIs, using the Euclidean distance between their RGB coordinates as a metric, and complete linkage ("Materials and methods" section ). The hierarchical clustering generates a tree; we represent the leaves of that tree with vertical bars whose colors are the colors of the corresponding pixels. Fig. 9A and B illustrate the corresponding complete color bars for the yellow flower ROI of the London version, and for the yellow flower ROI of SF1, respectively. In the color bar associated to the London version, we identify the cluster with the darkest hues of yellow and label the corresponding pixels as F454_A.
The positions of those pixels within the London version are highlighted in red in Fig. 9C. Similarly, we isolate the region with the brightest hues of yellow in SF1 and label corresponding pixels as SF1_A. Finally we transfer the color associated with SF1_A onto F454_A using the transfer algorithm described in the Method section. Result of the transfer is shown in Fig. 9E, to be compared with the original, the London version, shown in Fig. 9D. As expected, the sunflowers in the modified London version are much brighter. The color transfer strategy presented above can be expanded to play with color contrasts within the painting. We illustrate this concept by modifying the background of the flowers in the London version, as illustrated in Fig. 10. van Gogh painted many variations of the sunflowers, with background varying from pale to deep blue and yellow as he explored chromatic effects in the juxtaposition of those backgrounds with the yellow flowers of the sunflowers [26]. We decided to modify at least  includes the table, the vase, and the region behind the flowers. Each dot represents one pixel and its color (grayscale) suggests the intensity of its red or green or blue coordinate, ranging from low (white) to high (black). In each panel, from left to right, we display red vs green (blue in grayscale), red vs blue (green in grayscale), and green vs blue (red in grayscale) part of this background to see how it visually impacts our perception of the paintings, using the London version as a support. We first identified pixels in the background of the London version that are located mostly behind the sunflowers and have light yellow hues. Those pixels are referred to as F454_B (see Fig. 10C). We then selected pixels in the real sunflowers depicted in the photograph SF1, which correspond to the dark yellow part of the flower crown. Those pixels are referred to as SF1_B. Finally we transfer the color associated with SF1_B onto F454_B using the transfer algorithm described in the Method section. Result of the transfer is shown in Fig. 10E, to be compared with the original, namely the London version shown in Fig. 10D. As expected, the sunflowers in the modified London version appear darker than in the original, as they are now put in a context of a darker background.

Restoring colors in van Gogh's Sunflowers using color correction
Possibly the most striking difference we observed when comparing the hues of yellow in the flower depiction of the three versions of Sunflowers in a vase with a yellow background with the hues of yellow in real sunflowers observed in modern photographs is an increase in the amount of blue coordinates in those yellow hues in the paintings, leading to a less vibrant and brownish yellow, in agreement with chemical analyses of the aging of chrome yellow [9-11, 13, 27]. This observation hints at an opportunity to restore algorithmically the original colors of the Sunflowers: reducing the amount of blue for all pixels in the images of the paintings. We implemented SF1_yE SF1_yL Fig. 9 Color transfer from a real sunflower to the sunflowers in the London version A The color bar representing all hues of yellow in the ROI corresponding to the flowers in the London version. This color corresponds to the leaves of the HC tree computed from the colors of all pixels in this ROI. We select the region (cluster) 454_yE in this color bar which contains some of the darkest yellow pixels. The pixels associated to this region are referred to as F454_A. B The color bar representing all hues of yellow in the flower ROI of SF1. We identify the cluster SF1_yG in this color bar that contains some of the brightest yellow. The pixels associated to this region are referred to as SF1_A C Pixels in the London version that belong to F454_A are highlighted in red D London version: before color transfer E London version: after color transfer two versions of such an algorithm as follows. We first note that the images of the paintings have been divided into three ROIs, namely the flowers, yf-ROI, the stems of the flowers, g-ROI, and the background, bg-ROI. The first two ROIs are relatively homogeneous in color, yellow and green, respectively, while the latter includes a more diverse spectrum of colors, as it includes the table, vase, and flower background. Each ROI is then processed independently. All pixels within a ROI are clustered first, using the differences in their color as a metric, and hierarchical clustering ("Materials and methods" section ). The leaves of the corresponding tree are represented as vertical color lines, where the color of the line is the color of the leaf. Those color lines are organized as a color bar, which is then segmented into clusters. Figs. 9A and 10A provide illustrations of such a color bar with its clusters for the yf-ROI and bg-ROI of the London version, respectively. Each cluster regroups pixels with similar color within an ROI. Those clusters are then processed separately. For a given cluster k within an ROI, we compute first the minimal blue coordinate, m k , over all pixels in the cluster. Let i be one such pixel, and let b i be its blue coordinate. We correct this blue coordinate using one of the two following pseudo color restoration (PCR) schemes: This color corresponds to the leaves of the HC tree computed from the colors of all pixels in this ROI. We select the region (cluster) 454_bG in this color bar which contains some of the pixels that are directly located behind the sunflowers in the vase. The pixels associated to this region are referred to as F454_B. B The color bar representing all hues of yellow in the flower ROI of SF1. We identify the cluster SF1_yK in this color bar that contains some of the darkestt yellow. The pixels associated to this region are referred to as SF1_B C Pixels in the London version that belong to F454_B are highlighted in red D London version: before color transfer E London version: after color transfer that the yellow colors of real sunflowers contain nearly no blue, as seen in Fig. 5. It is expected to work well for all yellow hues that have been tarnished. PCR-2 is a more gentle correction as it limits the amount of blue that can be deducted. The upper limit of 60 for this deduction comes from the observation that the blue shift detected in the flower regions of the three paintings is close to this value (this is especially clear for the Tokyo version, see Fig. 4B).
In Fig. 11 we show the results of applying the two strategies PCR-1 and PCR-2 described above. There are a few observations associated with those reconstructions. visually the differences between PCR-1 and PCR-2 are small. This is expected as those two schemes are expected to act similarly on yellow hues, and yellow dominates in the three paintings. Second, there is a clear difference between the restored images and the original images as the colors appear much brighter in the reconstructed images, especially the yellow in the background (see for example the effects on the London version). While we need to be cautious as to ascertaining that the reconstructions represent the paintings as originally intended by van Gogh, we can safely say that they are most likely closer to his intent, as van Gogh was playing with the contrasts between the sunflowers and the background (as he was experimenting with different colors and intensities for those backgrounds) and that darker colors reduce this contrast (see Fig.10 D, E). Finally, the reconstructed models of the paintings highlight differences in the contrast between flowers and background between the original, namely the London version, and the two repetitions, the Tokyo and Amsterdam versions, with the original based on a brighter background. This observation reemphasize the importance given by van Gogh to the capture of light, colors, and contrast for the Sunflowers in a vase.

Conclusion
Vincent van Gogh was by no means a chemist by his own admission (letter 889 to his brother Theo [2]). However, he was well aware that colors in paintings evolve due to changes in the chemical nature of their pigments: "... paintings fade like flowers" (letter 765 to Theo, [2]). In the case of the Sunflowers in a vase with a yellow background, namely an original, the London version, and two replicates, the Tokyo and Amsterdam versions that van Gogh painted using the original as a model, this sentiment came true, as the background and flower rendition in those paintings have faded and turned brown, making them less vibrant than van Gogh had most likely aimed at. His intents with those paintings were to "push chromatic intensity ... with the aim of achieving a radical light-on-light effect" [26]. We have attempted to restore van Gogh's intent using a computational approach based on data science. After identifications of regions of interest (ROI) within the three paintings that capture the flowers, stems of the flowers, and background, respectively, we studied the geometry of the color space (in RGB representation) occupied by those ROIs. By comparing those color spaces with those occupied by similar ROIs in photographs of real sunflowers, we identified shifts in all three color coordinates, R, G, and B, with the shift in the blue coordinate being the more salient. This shift in blue that leads to hues of yellow that are faded and even brownish are consistent with the fading of the chrome yellow, the pigments used by van Gogh for representing yellow in Sunflowers, observed by chemical spectroscopic methods [9-11, 13, 27]. We have proposed two algorithms, PCR-1 and PCR-2, for correcting that shift in blue and generate representations of the paintings that aim to restore their original conditions (see Fig. 11). While we acknowledge that these are models, the reduction of the blue component in the yellow hues has lead to more vibrant and less brownish digital rendition of the three Sunflowers in a vase with a yellow background. While we believe that the models we have generated for the Sunflowers take the viewers closer to the paintings created by van Gogh, we acknowledge that the current state-of-the-art techniques for digital reconstructions, including our own, are still limited in number and far from actually restoring the original version of the painter. Progress will ultimately come from combinations of techniques. We did not have access to spectroscopic fluorescence data on the Sunflowers; we believe that such data would have helped us delineate the regions of interest in the paintings, in particular those regions where chrome yellow dominates as those regions are more susceptible to fading due to aging of the chrome yellow pigments. In return, the computational methods proposed here for analyzing the RGB space occupied by those regions should help identify the impact in color space of aging, as well as methods for reversing those effects. In addition, those methods enable manipulations of the images of the paintings of interest and therefore the assessment of hypotheses on how the painter was experimenting with color juxtapositions within a painting. Based on our preliminary studies of the Sunflowers, it is our intent to develop a general computational framework for digital restoration and manipulation of paintings and make this framework available as a tool for enhancing painting viewing experience.

Material: the digital images used in this study
F454 is the fourth version of the Sunflowers painted by van Gogh while he was living in Arles, with the specificity of having a yellow background in contrast with the turquoise (F453), royal-blue (F459), and blue-green (F456) for the other versions. It is currently owned by the National Gallery in London, UK, and stored under the inventory number NG3863. The National Gallery only provides a low resolution image of this painting on its website; we found a high resolution version on Wikimedia commons, from the URL https:// upload. wikim edia. org/ wikip edia/ commo ns/4/ 46/ Vince nt_ Willem_ van_ Gogh_ 127. jpg. This high resolution image is provided in jpeg format, with a resolution of 3, 349 × 4, 226 pixels.
F458 is one of the two repetitions of F454 painted by van Gogh, owned by and displayed at the Van Gogh Museum, Amsterdam. The van Gogh museum only provides a low resolution image of this painting on its web site (https:// www. vango ghmus eum. nl/ en/ colle ction/ s0031 V1962). However, just like for the London version, we found a high resolution version on Wikimedia commons, from the URL https:// upload. wikim edia. org/ wikip edia/ commo ns/9/ 9d/ Vince nt_ van_ Gogh_-_ Sunfl owers_-_ VGM_ F458. jpg. This high resolution image is provided in jpeg format, with a resolution of 3, 224 × 4, 226 pixels.
The high resolution image of F457 was generously provided by the Seiji Togo memorial Sompo Japan Museum of Art, Tokyo, Japan. It was also provided in jpeg format, with a resolution of 3626 × 4829 pixels.
In addition to the three high resolution images of the three paintings F454, F457, and F458, dubbed the London, Tokyo, and Amsterdam versions, respectively, we used two recent digital images of sunflowers in a field, to capture actual colors of sunflowers. There two pictures, which we label as SF1 and SF2, were obtained from: SF1) A picture of sunflowers in Provence (Shuttersandsunflowers.com), downloaded with permission of the author. SF2) A picture of sunflowers in a field, by Leo Adamchuk, available within Adobe Stock and downloaded as part of their free trial.

Color statistics in digital images
A digital image is an image composed of picture elements, or pixels. Each pixel is characterized by spatial coordinates, (x, y), that define its positions along the x-axis and y-axis within the image, as well as by color components, (r, g, b), that define the amount of red (R), green (G), and blue (B) which, when combined, describe the color at the pixel. Note that we rely here on the additive RGB color model; this model is not universally accepted and other models such as CMYK (a substractive model) or Lab are possible. We have used RGB as it is the color model that was available with the images we recovered. In this RGB model, each color component r, g, or b is an integer in the range [0, 255]. Namely, each fundamental color R, G, or B is described by a discrete variable that can take 256 distinct values, while a composite color is described by its 3 coordinates along those fundamental colors, and therefore belongs to a discrete space (in this case a cube) with [0, 255] 3 possible values. Let us define a digital image I as the set S(I) of its pixels. Let N be the cardinality of that set, namely the total number of pixels in the image. Let C be one of the fundamental colors, namely C can be R, G, or B. As we have seen above, C can take 256 possible discrete values, each in the range [0, 255]. We can compute the distributions of these values over a given image as follows. Let S(c) be the set of pixels such that its color coordinate for the color C is c: The probability of observing color C with intensity c within the image is then given by: where |S(c)| stands for the number of elements of set S(c). The entropy of the color C within the image is then given by: The entropy measures the amount of "information" associated with the color C in the image. If the color C is always represented with the same value over each pixel, the entropy is zero, while if the possible values for the color are evenly distributed, the entropy is at its maximum with a value of log(256). We can also measure the association of fundamental colors within the image. Let C and D be two of the three fundamental colors. Let P(C = c, D = d) be the joint probability of those colors C and D taking the values c and d, respectively. If we define the set S(c, d) as then From this joint probability, we can compute the conditional probability that C = c , knowing that D = d , as

Hierarchical clustering analysis (HCA)
Clustering is the task of regrouping objects such that those that belong to the same group, called a cluster, are more similar to each other than to those in other groups. In our analyses, the objects are pixels within an image. A pixel P(i) is characterized by its color, c(i), given as a combination of its composents along the fundamental colors R, G, and B, i.e. c(i) = (r(i), g(i), b(i)) . As described above, c(i) belongs to the 3D discrete space [0, 255] 3 . The similarity between two pixels P(i) and P(j) is set to be the Euclidean distance between their colors: The clustering of the pixels is then performed using the agglomerative hierarchical clustering analysis, or HCA. The is a bottom-up approach in which each pixel starts in its own cluster, and pairs of clusters are merged iteratively until all pixels belong to the same cluster. The whole procedure defines a hierarchy of clusters, also referred as as a clustering tree. A key element to this procedure is to define the distance between two clusters, also referred to as the linkage criterion. When the two clusters contain a single element, this distance is simply the distance between those elements, as defined in Eq. 3. When the two clusters A and B are sets of elements, the distance is then defined as a function of the pairwise distances between those elements. Two common choices are the single linkage: and the complete linkage: We have used the complete linkage in all our analyses.

Identifying regions of interest in the paintings Sunflowers in a vase with a yellow background
A region of interest (ROI) is defined according to its position in the image as well as on its color consistency.
In the Sunflowers in a vase with a yellow background, we identify three such ROIs, namely the flower crowns of the sunflowers (yellow), yf-ROI, the stems of the sunflowers (green), g-ROI, and the background, bg-ROI. Automated segmentation is one approach for selecting the pixels that belong to each ROI. We have used instead a combination of manual and automated processing, using the algorithm described below. This algorithm is based on the fact that the yf-ROI and g-ROI are chosen based on color consistency. We first divide the whole discrete RGB space [0, 255] × [0, 255] × [0, 255] into a collection of 5 × 5 × 5 color cubes. All pixels of an image are then attached to those cubes based on their colors. Cubes that are occupied are then assigned a representative, whose color is identified as the center of the cube. All those representatives are then clustered using the HC technique described above, using the Euclidean distance between their colors as a similarity measure, and complete linkage. At the bottom of the HC tree, we draw a color bar with each leave of the tree represented with a vertical line whose color corresponds to the cube associated with that leave. The HC tree is then cut to form 70 clusters; we chose 70 somewhat arbitrarily, such that we would have enough clusters. Most of those clusters have consistent colors along the color bar described above. Two groups of clusters are selected visually, those that are predominantly yellow, with a yellow that resembles the yellow of the flower in the paintings, and those that are predominantly green. Each of this group is then processed separately.
Let G be the group of clusters of green colors identified above. Collect all the pixels in G, and repeat the procedure described above, but with the RGB space d (A, B) = min{d(a, b), a ∈ A, b ∈ B} d(A, B) = max{d(a, b), a ∈ A, b ∈ B} divided into a collection of 4 × 4 × 4 cubes. Again, identify all cubes that are not empty, cluster their representatives, and select only those clusters that contain pixels whose colors are consistently green. The remaining pixels are processed one last time, with the RGB space divided into a collection of 3 × 3 × 3 cubes. All the corresponding occupied cubes are ordered with respect to the number of pixels they contain. Cubes are retained by going down on the ordered list, starting with the most populated cube, until 90% of the remaining pixels are selected. Finally, those pixels are identified in the paintings, and those that do not correspond to a flower stem are discarded. The remaining pixels form the green ROI, g-ROI, associated with the stems of the sunflowers.
Let Y be the group of clusters of yellow colors identified in the first step of the algorithm. This group of clusters is processed in the same way as G was processed, leading to a group of pixels that are mostly yellow and that belong to the crowns of the sunflowers. This group of pixels forms the yf-ROI, associated with the crowns of the sunflowers.
Finally, all pixels that do not belong to the g-ROI and yf-ROI are deemed to belong to the background (this includes the table in the front, the vase, and the background behind the flowers. Those pixels form the bg-ROI.

Region specific color transfer between two images
Assume that we have identified a set of pixels A in one picture and that those pixels have similar (also not necessarily equal) colors. Similarly, we have identified a set of pixels B in another picture, with similar colors. A and B may be of different size and may contain different colors. Our goal is to transfer the colors from A to B. Let A = {a 1 , . . . , a i , . . . , a N } and B = {b 1 , . . . , b j , . . . , b M } where the a s and b s are 3D vectors containing the R, G, and B coordinates of the colors of the pixels in A and B, respectively. We apply the following algorithm: i) Compute the centers of mass of A and B: ā and b ; ii) Translate B to B * so that A and B * have the same center of mass: B * = {b * 1 , . . . , b * j , . . . , b * M } with b * j = b j −b +ā; iii) Build a HC-tree on the union of A and B * , using the Euclidean distance as a metric and complete linkage, and cut at a tree level to define clusters. If a cluster only contains elements originally from B * , merge it with its closest cluster (in terms of tree distance) that contains elements of A. At the end of this procedure, we have a set of clusters, with each cluster containing either elements of A and B * , or elements of A only. The latter are discarded.
iv) Let C be one of the clusters from step iii), and let AC = {a 1 , . . . , a i , . . . , a NC } and BC = {b * 1 , . . . , b * j , . . . , b * MC } be the subsets of A and B * that belong C, where NC and MC are their sizes, respectively. For each j in [1,MC], replace b * j with a color picked randomly in AC. Repeat over all clusters C.
At the end of this procedure, each element j of B has been assigned a new color b * j that is inherited from A. The pixels in the second picture are then assigned those new colors.