Review and interlaboratory comparison of the Oddy test methodology

Since the introduction of the Oddy test in 1973, many museums and cultural institutions have put the method in use, developing their own versions and protocols. Currently the 3‑in‑1 version, temperature at 60 ºC and 2 g of tested material are set as common practice; however, other variables of the test are not standardized. The purpose of this study is to examine current versions of the Oddy test, to identify differences in the results derived from variations in the procedures, and ultimately raising awareness within the conservation community to work together towards a standardized protocol. In this article, we review the available information on the methodological differences in Oddy test protocols published in the literature related to glassware cleaning, coupon preparation, reaction vessel setup and rating of materials. Based on the review, and to highlight the many variables that could affect the results of the test, seven European cultural institutions working under the H2020 IPERION HS project performed a comparative 3‑in‑1 Oddy test by blindly evaluating the same ten materials. Each institution used its own test methodology but some guidelines were advised: (1) Detergents as a cleaning procedure for glassware, (2) P600 sandpaper or micromesh pad close to 1500 to prepare metal coupons and (3) 1:100 as water–air ratio. Despite this, differences between institutions’ results were still observed. Some of them are due to the differences in the coupons preparation, either in the sanding pattern or in the edge area. In order to separate the contribution of the experimental setup and the subjectivity of the evaluation in the discrepancies, coupons from all institutions have been rated by a single team of judges with experience in the Oddy Test. Results show that differences in the evaluation criteria play a relevant role in the discrepancies of the results, especially for institutions with less experience in the test. These results highlight the need to further standardize the methodology and criteria for visual assessment. Nevertheless, the Oddy test has been found to be reliable for the identification of materials that produce emissions hazardous for the conservation of cultural assets.


Introduction
The Oddy test is an accelerated corrosion test widely implemented in cultural institutions as a tool for preventive conservation.Over time, it has shown to be a reliable test for the rejection of potentially hazardous materials (woods, adhesives, textiles, etc.) and, therefore, the selection of suitable materials for the construction of showcases, museum areas and storage facilities.W.A. Oddy published in 1973 [1] the original article of the test that bears his name.Since then, many museum and conservation institutions have integrated this test into their practice, developing their own versions and protocols.Silver and lead coupons were initially placed in individual closed glass flasks together with the testing material, avoiding direct contact.Carbon dioxide was added exclusively for lead testing, periodically avoiding its depletion.After 28 days at 60 °C, a visual inspection was performed and any testing material that did not cause corrosion was considered safe for the respective metal.A blank test, i.e. with no testing material, was used as a reference for comparison.Before the Oddy test similar methods for evaluating confined metals of industrial interest were already in use [2,3].The great achievement of W.A. Oddy was to propose first a method focused on metals sensitive to contaminants typical for indoor museum environments and apply it to materials other than wood [4] as a source of danger on display.
Over its long lifetime, the Oddy test has been revised and improved in multiple occasions.The first modification of the Oddy test was performed by W.A. Oddy himself in 1975 [5].A third metal coupon, copper, was also included in the test.In addition, water was added in the reaction vessel as the tarnishing of silver in the presence of hydrogen sulfide is accelerated by moisture.Initially, a few drops of water were added to the bottom of each flask; however, copper and lead coupons showed corrosion marks in the areas in direct contact with water.Hence, a small vial with moistened absorbent cotton was finally introduced.Blackshaw made further refinements to the test methodology in 1979 [6] to minimize differences between operators.The test was performed in a 250 mL flask or boiling tube.The mass of the testing material was 2 g and 1 mL of distilled water was added to the cotton-filled vial at the beginning of the test.In addition, 0.5 mL of distilled water was added weekly during the course of the test.Scientists from British Museum (BM) carried out in 1993 the first interlaboratory Oddy test [7], in which each participating institution followed its own protocol.The observation of differences in the results prompted the proposal of a standardized protocol, which was assessed in a second interlaboratory comparison [4].The purity (99.5%), size (10 × 15 mm) and cleanliness (acetone) of the metal coupons were standardized, as well as the mass of the testing material (2 g) and the water-air ratio (1:100).Distilled water was placed into a vial instead of wetting the cotton wool.The addition of carbon dioxide was also eliminated, as it was not necessary to cause lead corrosion.The vessel setup was slightly modified: the coupon no longer rested on the bottom of the flask, but hung on a nylon thread trapped by the stopper inserted in a test tube.Finally, reference photographs were prepared to assist in the evaluation of the coupons after the test.After the visual evaluation the materials were classified as permanent (P), temporary (T) and unsuitable (U) use.
In 1999, scientists from the Metropolitan Museum of Art (MMA) [8] published a more practical version of the test: the 3-in-1 Oddy test (Fig. 1a).This version consisted of simultaneously testing the three metal coupons (silver, lead and copper) in the same vessel instead of testing them individually.In addition, an alternative vessel setup was proposed.A threaded glass jar into which an inner beaker was inserted with the metal coupons bent at its rim in a V-shape.A polypropylene lid together with highvacuum silicone grease ensured a better seal for moisture and the emitted gases.
In 2003, British Museum scientists [9] implemented the 3-in-1 version of the test, but with a different vessel setup (Fig. 1b).A new positioning of the metal coupons was proposed to improve its reproducibility.The three metal coupons were inserted in parallel into the slots cut in a silicone plug, placing the lead in the middle to avoid condensation from contact with the vessel.This was a limitation of the previous setup, so a test tube was used instead of a jar with a beaker inside.
Therefore, in the 30 years following Oddy's first publication and after numerous modifications of the test, a unified protocol has not yet been achieved.Only the use of three coupons in one vessel, temperature at 60 °C and the mass of the testing material at 2 g have been widely adopted as common parameters.Other important factors regarding the reproducibility of the test are volumes and types of vessels (flask, test tube or jar), positioning of the coupons (inserted in plugs, hanged by thread or with the use of a beaker), as well as the addition of water and its ratio (directly into the vessel, with moistened absorbent cotton or added in a vial).
New alternatives for vessel setup have recently been published.For example, the Oddy test protocol of the MMA [10] initially incorporated into its threaded glass jar a platinum-cured silicone stopper with coupons inserted in a triangular formation or a 3D printed nylon holder with coupons wrapped around a strip of the holder.Finally, a stainless steel coupon holder similar to the 3D nylon holder is used in the current protocol [11] to reduce costs and waste (Fig. 1c).Other researchers [12] made a complete redesign of the vessel (Fig. 1d).A glass cylinder with a glass top insert from which hang glass hooks for the three metal coupons and two small vials, one for distilled water and the other one as an option for contaminant absorbing material.Others have proposed minor modifications [13], such as inserting glass hooks in a triangular formation replacing the parallel arrangement of the coupons inserted in the silicone plug.
To date, no fixed amount for the water-air ratio has been established.The volume of water varies from 0.17 to 5 mL and the volume of the test vessels range from 45 to 125 mL.In addition to the above variations, there are also different alternatives for cleaning the glassware, sanding and cleaning the coupons, sealing the vessel, sealing material, etc. [14].All these modifications are reflected in a survey, carried out in 2014, indicating that at least 19 variations of the Oddy test were used by different U.S. cultural institutions [15].A selection of these protocols has been tested, including some other Oddy test methodologies for testing materials in contact [16].Contact test caused more corrosion to coupons compared to the non-contact Oddy test.Therefore, another unified protocol should be standardized in order to be able to test materials that will be in direct contact with works of art.More recently, since 2018, the Materials Testing & Standards Committee of the AIC Materials Selection and Specification Working Group (MWG) focuses its efforts on promoting the use of protocols with the highest reproducibility between institutions [17].In this article, we review the available information on the methodological differences of the Oddy test published to date.Four categories have been established to organize and discuss such information: glassware cleaning, coupon preparation, reaction vessel setup and rating of materials.The latter includes alternatives to reduce the subjectivity of the visual evaluation.This allows for quick and detailed comparison of the different methodologies and helps to set variables towards a unified protocol for the Oddy test.To this end, an interlaboratory test has been carried out with the participation of seven European institutions within the framework of IPERION HS (EU H2020, grant agreement No 871034).Work package 5.1: Project 7 (Interlaboratory comparison of Oddy test).The purpose of this study is to examine current and useful good practices for the Oddy test, to identify differences in the results derived from variations in the procedures, and ultimately to create awareness within the conservation community to work together towards a standardized protocol.

Literature review of Oddy test methodologies
Additional file 1: Tables S1, S2, S3 and S4 show respectively the differences in the existing protocols for glassware cleaning, coupon preparation, reaction vessel setup and rating of materials.Most of the information regarding the methodology was provided by the protocols of the European institutions with previous experience in the Oddy test that participated in the IPERION HS interlaboratory comparison.Additional protocols were obtained from the open access AIC Wiki platform [18].The two institutions that originally contributed the most to the development of the Oddy test, the British Museum (BM) and the Metropolitan Museum of Art (MMA), head the tables in deference.The BM-2004 protocol [19] is included for reference, although this was partially refined by the BM-2017 protocol [14].

Glassware cleaning
There are two general groups among the different cleaning procedures: using detergents or not (see Additional file 1: Table S1).The detergents used such as Decon 90, Mr. Clean, Alconox, Micro-90, PCC-54 enzymatic, Extran AP17 are alkaline in nature with a pH range between 9 and 13.Theoretically, they are suitable for removing organic residues.Some institutions prefer to avoid them due to the risk of leaving residues after the cleaning process.However, some of the above detergent formulations indicate that they are free of organic surfactants and emulsifiers.The alternative to alkaline detergents is to use alkaline aqueous solutions, such as the MMA cleaning procedure.Subsequently this can be more or less exhaustive, i.e. applying acid neutralization baths with intermediate aqueous rinses in which the temperature and time vary or simply multiple aqueous rinses.Other methods, instead of using basic aqueous solutions, use diluted solutions of hydrochloric or nitric acid.Although these solutions are more suitable for removing inorganic residues, they can also be used for the removal of organic residues either because they act on the glass interface or due to their oxidizing nature, such as nitric acid.Finally, among the procedures that do not use detergents, there are also those that avoid the use of both acidic and basic baths.Instead, distilled water is used as solvent, varying or not the temperature to increase solubility, as well as consecutive rinses in organic solvents such as ethanol or acetone.These cleaning procedures are simpler because the reuse of glassware depends on the rating of previous testing material: some institutions do not reuse glassware suspicious of being contaminated by materials that have failed the test, especially for materials such as liquid coatings, adhesives and adhesive tape samples from unsuitable tests.

Coupon preparation
The summary presented in Additional file 1: Table S2 shows that the different protocols use metal coupons with areas ranging from 50 mm 2 (Green challenge, GC) to 350 mm 2 (BM).Economic reasons would be a possible explanation for the decrease in size with respect to the BM protocol.In fact, W.A. Oddy [1] initially tested larger coupons of approximately 500 mm 2 .The coupon area does not really affect the thermodynamic tendency of metals to corrode, but may exert a visual corrosion dilution effect for large area coupons when the released contaminant is the limiting reactant in the corrosion reaction.A standard area would not only aid in the comparison with reference photographs [4], but would also eliminate possible visual effects.The thickness of the coupons also varies from one protocol to another; even within the same protocol, the thickness of the three metal coupons may be different.In most cases, the thickness of coupons is around of 0.1 mm.Handling the soft lead is easier if the coupons are thicker and a thickness of up to 0.5 mm is found in existing protocols.Thickness does not seem to affect reproducibility and processing the coupons to reduce thickness should be avoided [4], especially for lead.
The purity of metal coupons varies between 99.5 and 99.998%,only the Auckland War Memorial Museum protocol (AWMM) indicates a slightly lower purity, i.e. ≥ 99%.Therefore, most protocols meet the established standard of 99.5% proposed in the 1995 Oddy interlaboratory test [4].The purpose of high purity coupons is to reduce impurities that can affect the corrosion behavior of the metals, as happened with sterling silver and low purity lead.Nowadays, higher purities can be easily achieved, thus a purity of at least 99.9% should be advised for the Oddy test.The preparation of metal coupons in terms of surface finishing also varies.Most protocols employ fiberglass brushes, following the initial British Museum protocol (BM-2004), although the most recent version (BM-2017) replaced them due to health and safety problems and concerns about their capability to remove all contaminants from the surface [14].There are two alternatives to fiberglass brushes: micromesh pads and sandpaper.The former is used by the current BM-2017 and MMA protocols, although with different grits, 1800 and 3200 respectively.Sandpaper is only used for the protocols of the Munch Museum (MUM) and the Swedish National Heritage Board (RAA).To ensure a good reproducibility, surface finishes should be equivalent regardless of the methodology applied.On the other hand, although most protocols abrade all three metal coupons, some institutions avoid abrading the lead coupon due to health concerns (Getty Conservation Institute (GCI) and Autry Museum of the American West (AMAW)).In contrast, in the MMA protocol only the lead coupon is sanded to remove the native corrosion layer, while copper and silver coupons are not sanded and therefore cannot be reused.This surface finishing step might be equalized between institutions if there were a single supplier of the coupons.Regarding the cleaning of the metal coupons, most protocols use very high purity acetone (> 99.9%) and the MMA protocol also performs a subsequent cleaning with HPLC-grade isopropanol.In contrast, the GCI protocol submerges silver and copper coupons in Mr. Clean liquid cleaning solution (a commercial mixture of water, surfactants, solvents and preservatives) prior to cleaning with acetone.Lead coupons are not abraded or washed.Finally, a quick drying with a lint-and acid-free tissue is also common, although some protocols avoid this step and coupons are air-dried instead.In the latter case, it is necessary to avoid prolonged drying times because it favors the appearance of corrosion [14] and acetone runoffs that will hinder the evaluation stage.

Reaction vessel setup
Additional file 1: Table S3 groups the data from the different protocols with which the following sections were performed.

Temperature
The temperature in most protocols is set to 60 °C; only the AWMM and Cultural Restoration & Preservation (CRP) protocols apply lower temperatures, between 50-60 °C and 40 °C respectively.The latter also performs the test at room temperature to determine if off-gassing of contaminants occurs at this temperature.Raising the temperature up to 60 °C accelerates the processes that potentially contribute to the emission of VOCs [20], such as diffusion within a material, desorption, evaporation and chemical reactions, and it especially affects the release of less volatile compounds [21].For example, the vapor pressure of acetic acid at 60 °C increases by about 8 times with respect to 25 °C (from 1117 to 9115 Pa), while formic acid, which is more volatile than the former, increases its vapor pressure at 60 °C by about 6 times (from 3358 to 21,082 Pa) [22].Temperature also affects corrosion rate kinetics in aqueous media, which is 15.6 times higher at 60 °C compared to 25 °C for a typical activation energy of 65 kJ/mol [23].

Volume of water
Relative humidity is the other parameter along with temperature that makes the Oddy test an accelerated corrosion test.Since these parameters are inversely proportional, a higher absolute humidity is required to reach 100% RH at 60 °C compared to room temperature.
For this purpose, water is added to the Oddy test.However, there is no fixed ratio of water to vessel volume, which mainly affects lead due to its higher sensitivity to moisture.The lead coupons in the blank test may corrode slightly if aqueous condensation occurs [4,14].Hence, the water volume should be the minimum necessary to reach 100% RH without causing condensation on the metal coupons.Four protocols (BM-2017, MUM, GCI and Heritage Conservation Centre Singapore (HCC)) differentiate between the water volume added for blank tests and non-hygroscopic materials and that for moisture-absorbing materials.The water volume is lower in the first case, between 0.17 mL (BM-2017) and 1.5 mL (GCI), with vessel volumes also different (50 and 60 mL respectively).The water volume is increased for hygroscopic testing materials, although sometimes in a general way, e.g., greater than 1.5 mL in the GCI protocol.In the rest of the protocols, the water volume is ranging from 0.5 mL to 5 mL regardless of the absorption of moisture by the testing material.Overall, the water volume added is usually less than 1 mL in all protocols.Since the vessel volume varies between 50 and 125 mL, the water-air ratio differs from the ratio initially proposed by Green and Thickett as standard, i.e. 1/100 [4].Representative examples are the BM-2017 protocol with a water-air ratio of 1/62.5 (water: 0.8 mL; vessel: 50 mL) and the MMA protocol with a ratio of 1/200 (water: 0.5 mL; vessel: 100 mL).On the other hand, relative humidity might also affect the emission rate of mainly polar VOCs from testing materials due to their possible interaction with water molecules [21].Field studies have shown that the indoor formaldehyde vapor pressure can increase with increased RH [24].

Type of vessel and mass of testing materials
Approximately half of the protocols use a test tube, while the other half a glass jar, i.e., the existing vessel setup in the BM-2017 and MMA protocols respectively.Only the Rijksmuseum (RIJKS) protocol uses a flask as vessel.The volume of the test tubes is around 50 mL, only the IMA protocol uses substantially larger, 75 mL tubes.Heine and Jeberien recently published an alternative vessel, i.e., a flat-bottomed test tube called MAT-CH [12], although its volume is not indicated.On the other hand, the glass jars have three possible volumes, around 50 mL, 100 mL and 125 mL, i.e. generally larger than those of the test tubes.A smaller volume vessel is preferable as it increases the concentration of the volatiles emitted and facilitates a uniform distribution within the vessel.In addition, the vessel type and its volume determines the closure, the disposition of the coupons and the mass of the tested material, especially if it is of low density.The test tubes are closed with silicone stoppers and the coupons are inserted directly into the stopper in parallel slots.Most specify that the lead be placed in the middle.The National Center for Metallurgical Research (CENIM) protocol inserts glass hooks in a triangular formation into the stopper from which the three coupons are hanged.This is to avoid capillary condensation in the crevices formed by inserting the coupons directly into the silicone stopper.The MAT-CH reaction vessel is the only one that replaces the silicone stopper with a glass insert with a double gasket that ensures a tight seal.The closure of glass jars follows the guidelines of the MMA protocol, i.e. through a screw cap, although some protocols could also use silicone stoppers, since they do not specify it.Most use earlier versions of the MMA protocol, where the metal coupons hang in a U-or V-shape from the rim of a beaker that is inserted into the jar.However, the current MMA protocol inserts a stainless steel coupon holder into the mouth of a 100 mL jar and the coupons are bent 5 to 7 mm from one end and crimped into the holder.
The type of glass used is reported only by some protocols as a commercial brand of borosilicate glass.It is necessary to avoid compositions less chemically stable under Oddy's test conditions, such as soda-lime glass.Smith [25] showed that the latter caused a passing result of the Oddy test by neutralizing possible organic acids released from the Delrin plastic (polyoxymethylene) through leaching alkalis.When borosilicate glass was used, the plastic material was rated as unsuitable.
Regarding the mass of the testing material, most protocols use 2 g.Only two protocols (AMAW and AWMM) report testing different masses, 1 g and 1-2 g respectively.Although only one of the two protocols reports the volume of its vessel (AMAW, 45 mL), it is likely that generically they refer to the limitation of lowdensity materials for low volume vessels.This limitation is more typical for protocols that use a test tube as a vessel, since its volume is smaller than that of glass jars.
Finally, note that some protocols also report an in-contact version of the Oddy test.This can be performed in the MMA protocol independently of the non-contact Oddy test or simultaneously, but different metal coupons are used.However, the AMAW protocol and an earlier version of the National Museum of the American Indian (NMAI) protocol use the same coupons, in such a way, that half of the coupon should touch the testing material and the other half should not.

Rating of testing materials
The rating of the testing materials is performed indirectly by visually evaluating their corrosive effect on the three metal coupons that act as corrosion dosimeters (Cu, Ag and Pb).Most protocols perform naked eye inspection, although some protocols such as Field Museum of Natural History (FMNH) and MUM rely on optical microscopy to establish the evaluation (see Additional file 1: Table S4).The procedure is the same, to rate the suitability of the testing material for each metal coupon based on the level of corrosion in comparison to the control coupons.Following the original BM protocol, three categories are established by most protocols: suitable for permanent use (P: no change compared with control), temporary use (T: slight corrosion) and unsuitable (U: obvious corrosion).The overall rating of the tested material is the same as for the most affected coupon.Some protocols may use slightly different, but equivalent terminology, for example: pass/suitable or fail/unsuitable.There is some controversy based on subjectivity regarding the use of a material rated as temporary, ranging from 3 months for the Brooklyn Museum (BKM) and AMAW protocols to 6 months for the BM and MMA protocols.More conservative protocols such as the GCI virtually eliminate this category since it hardly uses materials rated as temporary.On the other hand, the FMNH protocol adds an additional category to the three previous ones, called limited use, indicating that objects composed of lead or calcite, such as shells, which are very sensitive to corrosion by volatile organic acids, should not be exposed to these materials.The Silver Nanofilm Sensor (SNS) protocol establishes a numerical alternative by ranking materials from 1 to 5, i.e., from the least to the greatest change in color associated with corrosion.However, this is finally reduced to the previous rating of three categories, i.e. suitable, permanent and unsuitable.Numerical ranking is useful to address reproducibility studies as previously proposed by Green and Thickett [4].Five categories were also established: 0 (permanent use), 1 (permanent/temporary), 2 (temporary), 3 (temporary/ unsuitable) and 4 (unsuitable).Several operators usually perform the evaluation, but the corrosion of the control coupons is evaluated first.If it is significant, the test is considered not valid.Few protocols adopt the additional measure of the MMA protocol to give validity to the test, i.e. weighing the assembled jar before and after the test.A loss greater than 25% of the water mass is considered a failed test due to poor sealing.The number of replicates tested for each testing material is generally accepted to be at least in duplicate, although sometimes only one replicate is tested.The BM protocol advises to occasionally test duplicate replicas and the AWMM protocol only if possible.The lack of resources, limited time or the scarcity of staff might condition the number of replicates tested as specified by the FNMH and BKM protocols.
Different evaluation methods have been proposed to reduce the subjectivity of the visual inspection of the Oddy test.One of them is to use artificial intelligence that simulate the behavior of human operators.These are algorithms that can be trained to recognize corrosion patterns associated with the classification of materials for permanent, temporary and unsuitable use [12,26].Other algorithms such as k-means clustering have previously been used in digital image processing to quantify the extent of corrosion as a percentage of the total area.Wang et al. proposed the following grading using silver and copper metal nanofilms [27]: P < 20%, T 20-55%, U > 55% for silver and P < 35%, T 35-70%, U > 70% for copper.The SNS protocol based on the above study proposed a further validation for materials that are used to store daguerreotype images consisting mainly of silver, even if they pass the Oddy test, [28].An additional Oddy test was performed with silver in the form of nanofilm deposited on glass slides.The thickness of 7 nm reproduced the behavior of daguerreotype images and was highly sensitive to corrosive contaminants, therefore the test duration could be reduced to two weeks.As an alternative to visual assessment, there have been some proposals of direct methods for corrosion measurement.Thickett quantifies oxygen depletion during the Oddy test [29]: the higher the oxygen consumption, the higher the corrosion rate.This was in agreement with visual evaluation and mass loss measurements for lead and copper, but not for silver.The CENIM protocol [13] performs standardized electrochemical reduction measurements according to ISO 11844-2 methodology [30] for silver and copper coupons: the longer the reduction time, the higher the corrosion rate.Sometimes, the visual evaluation is confusing and does not agree with the electrochemically quantified corrosion of coupons.Previously, Reedy et al. [31] and Bischoff et al. [32] proposed to replace the Oddy test with electrochemical tests instead of supplementing it.Aqueous extracts were obtained from testing materials and used as electrolyte for electrochemical measurements, such as corrosion potential, polarization resistance or its conversion into corrosion current.This allows quantifying the corrosion rate of metal coupons; however, the aqueous extract does not reproduce the time condition of the Oddy test.Another alternative is to use analytical techniques such as different gas chromatography-mass spectroscopy (GC-MS) methods to detect VOCs released from tested materials [33].However, their corrosive effect on silver, copper and lead is not always known.In addition, the few minutes usually taken for the entire experiment hinders the detection of secondary VOCs, i.e., those generated by the degradation of tested materials that may be emitted after several weeks under Oddy test conditions.More limited is the use of classical tests based on wet chemistry analysis for the specific detection of certain corrosive volatiles, for example, the Beilstein test for chlorides and the Purpald and chromotropic acid tests for aldehydes [19].pH measurements of tested materials, either from aqueous extracts, on surface or with A-D strips, have been also performed by protocols such as the British Museum's to quickly discard materials.Since few materials failed the A-D strip test, this was abandoned as a routine test [14].Finally, other methods focus on the characterization of corrosion products formed on Oddy test coupons, aiding in the possible identification of corrosive volatiles emitted by the tested materials.The different characterization techniques include X-ray diffraction [34], µRaman spectroscopy [35], Fourier transform infrared spectroscopy [36], as well as quartz crystal microbalance [37].Despite the numerous options for scientific analysis of corrosion products and emissions formed during the Oddy test, it is worth remembering that the Oddy test is a tool widely used by institutions without access to that type of instrumentation.Even large institutions have limited resources in terms of staff and time to routinely conduct this type of analysis.Finding ways for standardizing visual inspection therefore remains important.

Interlaboratory test
Seven European cultural institutions or research centers (see Table 1) performed a comparative 3-in-1 Oddy test (28 days, 60 ºC) under the umbrella of IPERION HS.Five of them had years of experience with the Oddy test, as well as the judges who evaluated the coupons.The other two institutions (II and V), although without previous experience in the Oddy test, were experts in heritage science.Results here are presented anonymously, using Roman numerical notation from one to seven, not corresponding to the order of Table 1.Ten materials were blind tested in duplicate as well as the blank test.The materials were selected and sent by CENIM to each institution, identifying them only with numbers from one to ten.The tested materials and their general composition is shown in Table 2. Material 9 is the only non-solid product before application.Due mainly to the lack of space in the vessels of some institutions (50 mL test tube: 3.4 × 10 cm), it was not applied as a standard 6 × 12 cm area coating [14].Instead, 2 g of the material were applied on aluminum with approximate dimensions of 3 × 1 x 0.5 cm and allowed to cure into a solid.The time elapsed from application to receipt by the participants in the interlaboratory test was at least four weeks.Each institution prepared and weighed 2 g of each solid material for testing, with the exception of material 1 (Ethafoam).Due to its low density, the institutions with reaction vessel volumes less than 135 mL weighed 1 g of this material.
Although each institution applied its own Oddy test methodology, certain constraints were established to limit variables: (1) Detergents should be used as a cleaning procedure for glassware, either manually or with a dishwasher.(2) To prepare coupons, micromesh pads or sandpaper should be used with grit sizes that provide similar surface finishes (P600 sandpaper or micromesh pad close to 1500).Although micromesh grit does not follow the codification established by the Federation of European Producers of Abrasives (FEPA) for sandpaper, both are convertible for fine abrasive grits.
(3) The ratio of water volume to vessel volume was decided to 1/100.
Institution III could not meet constraints 1 and 2 due to its internal operating policy.Instead, new glassware was rinsed with deionized water and dried in the oven overnight.The metal coupons were abraded as originally proposed by the British museum protocol, i.e., with glass brushes.
Regarding the cleaning procedure of metal coupons, all institutions used ultra-high purity acetone as well as a lint-free cloth for drying.Details on glassware cleaning, coupon preparation and reaction vessel setup for each of the participating institutions are shown in Tables 3, 4 and 5, respectively.Finally, each institution evaluated by visual inspection the three metal coupons (silver, copper and lead) and their level of corrosion was compared to the blank test to rate the materials as suitable for permanent (P) or temporary (T) use and unsuitable (U) for use.Institutions should avoid assessing the area close to the insertion of the coupon into the plug when it is the only area affected, as corrosion usually starts from the lower edge of the coupon.The overall rating for each material was decided by selecting the results of the most corroded coupon.

Results and discussion
Table 6 shows the images of the silver, copper and lead coupons from the participant institutions after the Oddy test.Coupons were returned to CENIM and photographs were immediately taken keeping the same illumination and exposure in all cases by using a light box and adjusting the white balance.The rating of each coupon by the respective institutions is also included.Table 7 shows in addition the overall rating of the 10 materials tested by It should be noted that, although photographs are helpful as reference, they cannot replace direct visual evaluation.Direct visual examination allows to assess features such as the loss of shine or the thickness of the corrosion layer, and distinguish between reflections of the surface and actual degradation layers.All ratings presented in this work are based on direct visual examination.
Differences between institutions were observed in the surface finish of the metal coupons even after completion of the test (see Table 6).This is expected for coupons sanded with abrasive grit sizes different from those initially proposed as equivalent (P600 sandpaper and 1500 micromesh pad), although could also be attributed to how much pressure is applied during sanding and overall accuracy of the operator.Unsanded horizontal lines can be observed on the copper coupons from institution VII, which come from the forming of the copper sheet in the as-received condition (Fig. 2).Therefore, the 1800 grit size of the micromesh used by this institution may be too fine to uniformly prepare this type of copper coupons.A more consistent alternative could be to adopt the standard method from metallographic preparation laboratories: sanding  is done progressively from lower to higher grit, rotating the sanding direction 90 degrees when passing from one grit to another.The use of glass bristle brushes also does not remove pre-existing deformations on lead coupons; at least within a reasonable sanding time (Fig. 3).Less control over the desired surface finish is also observed with sanding lines appearing in directions other than longitudinal (especially copper and lead).Its limited roughing capacity together with the breakage of the glass bristles into small pieces could explain both facts.
As for the rest of the institutions that did use equivalent abrasive grit, differences in surface finish were also observed due to the sanding procedure.Not all institutions selected a preferred sanding direction.This can be seen in the silver coupons of institutions such as II or VI compared to the silver coupon from institution I (Fig. 4).
Random sanding causes shiny spots as a result of the different pressure exerted on the coupon during sanding.A different sanding pattern is also observed on the edges of the silver coupons of institution V and VII compared to the rest of the coupon, with different shades of grey appearing, not associated with tarnishing during the Oddy test (Fig. 5).This issue could be avoided by abrading a large surface of metal and then cutting coupons, rather that abrading each coupon individually.
It would be advisable not only to use equivalent sanding grits, but also to establish a preferred sanding direction to avoid possible confusion during coupon rating.Sanding is performed to remove the existing native corrosion layer on the metal coupons and thus activate their surface before starting the Oddy test.Differences in surface finish could affect the reproducibility of the test between institutions if they were misinterpreted as deterioration during the rating of the metal coupons.
Regarding the evaluation of the testing materials from the participating institutions (see Table 7), the most notable differences came from institution II, which rated materials 2, 3 and 6 as unsuitable, in contrast to the other institutions.These differences could be associated with a more conservative evaluation criteria, perhaps due to its lack of experience and training with the Oddy test and/or with the institutional protocol itself.A frequency table, such as Table 8, representing the sum of permanent, temporary and unsuitable ratings per institution could help to differentiate between these.
Institution II rated silver coupons as permanent 6 times, the average for all institutions being 8, while the copper and lead coupons only received temporary or unsuitable ratings by this institution (24 times in total).These values are higher than the average obtained, departing from it by at least one standard deviation, a parameter indicating the dispersion of the set of ratings.The copper and lead coupons are specifically responsible for the rating of materials 2, 3 and 6 as unsuitable.Therefore, the discrepancies seem to be due to a more conservative evaluation criteria for these coupons.The principal component analysis (PCA) performed with overall data in Table 8 would indicate that Institution II is an outlier (Fig. 6).For this analysis, each variable in the dataset was centered by subtracting its mean, and then scaled by dividing by its standard deviation.This ensured that each variable had a mean of 0 and a standard deviation of 1. PC1 predominantly separates Institution II from the rest, while PC2 separates Institution V.The arrows indicate the loadings of different variables.It can be seen that Institution II is defined by more classifications as U and T.These differences are more pronounced with the PCA conducted (Fig. 7) using the raw data of Table 7.To be able to conduct the PCA with the categorical data with three categories, it was encoded following the one-hot method, which is used to represent categorical data numerically [38].In this encoding scheme, each category (P, T or U) is represented by a binary vector, where all elements are zero except for the one corresponding to the index of the category.This was necessary in order to turn the categorical variables into numerical variables that could be used with PCA. Figure 7 shows only one axis, rather than a complete biplot with variable loadings, because the one-hot encoding method increases the number of variables, rendering the plot too busy for visualization of their individual loadings.
To further explore what makes Institution II different, the contribution of each variable to PC1 can be observed in Figure S1 (supplementary information: file 1).There are no materials that contribute more strongly to PC1 than others; they are due to small differences between all materials.PCA has been used extensively in heritage science as a technique for dimensionality reduction.In the case of metals, it has been used to group bronze artefacts according to the color of their patina [39] or to study relationships between objects according to their elemental composition [40].PCA is also commonly used to evaluate behavioral patterns, for example, how museum visitors rank certain aspects of their experience [41].
The inter-institutional agreement has been statistically evaluated through the Fleiss's kappa measure.It considers agreement between institution ratings while accounting for the agreement that could occur by chance, providing a measure of agreement adjusted for random agreement (0 indicates no agreement between institutions beyond what would be expected by chance and 1 perfect agreement).It is commonly used in interobserver comparison studies, for example, it has been used to see if policymakers agree in digitization priorities for heritage [42].
Table 9 shows Fleiss's kappa obtained from the evaluations in Table 8.It can be seen that agreement between institutions increases when institution II is removed.The highest agreement occurs in the evaluations rated as unsuitable, followed by permanent.This would show the reliability of the Oddy test to reject hazardous materials, accepting those that are safe.It is in the intermediate or temporary rating where there is the least agreement between institutions.
Discrepancies in the ratings of the institutions could be of two types: (1) associated to a differential or subjective evaluation criterion (which might be affected by inexperience of the evaluator); and 2) those associated to real differences in the degree of corrosion of the metal coupons evaluated by the institutions.Keeping the same evaluator would help to distinguish one from the other.This has been addressed through independent evaluation of coupons from all institutions by two judges with extensive experience in the Oddy test.To help a uniform and coherent evaluation of the different types and extents of corrosion observed in the coupons, the reference photographs of scored coupons from the Metropolitan Museum of Art were used [43][44][45].These documents include reference photographs, along with detailed descriptions of the morphology, color and extent  of corrosion observed, that help assigning consistent ratings to the different degradation phenomena observed in the coupons.This is especially helpful for the borderline situations (between P and T, and between T and U) in which the more simple description by Thickett and Lee [19] leave room for a more subjective decision of the evaluator.
The results of the single evaluation are observed to be more reproducible (Table 10).The discrepancies of Institution II disappear, thus showing that these were not due to differences in the degree of corrosion compared to other institutions, but to an overly conservative evaluation criterion.However, other discrepancies remain due to differences in the extent of corrosion.For example, the greatest disagreement obtained for silver is due to material 5. Silver was mostly rated as unsuitable and yet two institutions (V and VII) did not detect tarnish on their respective silver coupons, rating it as permanent.Both had poor sanding on the edges of the silver coupon, a general example of which was shown earlier in Fig. 5.This might explain why no corrosion was detected.In spite of this, the assessment of silver coupons shows the lowest dispersion between institutions.Its noble nature limits its rating as unsuitable since fewer materials were able to corrode it (see Table 10) and the dark and differential color of its corrosion products facilitates better reproducibility with respect to copper and lead.
Another discrepancy that remains is that of material 9, rated as unsuitable by institution I, in contrast to the permanent rating of most of the institutions.The black corrosion on the lower edge of the copper coupon differs from the dark red of the other institutions, suggesting that the chemical nature of the deterioration would be different.Since the cleaning procedure, surface finish and water-air ratio were restricted, the reason could be due to the preparation of the testing material.Material 9 is a sealant that was applied to a sheet of aluminum foil and left to air cure for several days before being shipped to the institutions.Institution I cut the material as finely as possible with scissors, i.e. between 2 or 3 mm per dimension.This increases the material surface area from which contaminants would diffuse and therefore their diffusion rate would increase.Another reason could be the time elapsed between receipt of the material and its testing and possible changes of the material during this period, including the evaporation of solvents.
On the other hand, interesting information is obtained from the testing of material 10.The volume of the reaction vessel could have controlled the extent of the deterioration on the copper coupon surface (Fig. 8), although all coupons have been rated as U, except institution II who rated it as T.   210 coupons evaluated shows a relevant difference (from P to U), which would disappear (Fig. 10) excluding the contribution of institution II (outlier from the principal components analysis) for the silver coupons.Figure 10 also shows that the discrepancies with respect to the temporary rating are concentrated in the copper and mainly lead coupons.However, these appear to be defined within the P-T range, excluding Institution II.Regardless of the discrepancies observed, and although efforts to reduce the subjectivity of the evaluation by means of detailed references [43][44][45] and /or instrumental measurements [13] are always welcomed, these results show that the Oddy test is a valid test to detect materials that can emit harmful pollutants and should be avoided in the environment of cultural heritage assets.

Conclusions
Although numerous cultural institutions and museum across the world rely on the Oddy test, no consensus has been reached on the protocol to be followed.This article aims to raise awareness within the conservation community by highlighting the discrepancies that may arise when different non-standardized protocols are conducted to analyse the same batch of materials.Firstly, we present a review of the available information on the methodological differences in Oddy test protocols published in the literature up to date, focusing on the different variables that can affect the results.Review of current practices showed that, although some parameters have been widely adopted (the 3-in-1 procedure, temperature at 60 ºC and 2 g of material), others are not yet standardized.
The second part of the study shows an interlaboratory comparison performed by seven European institutions under the umbrella of the IPERION HS project.Some guidelines were advised, such as glassware cleaning, coupon preparation and air-water ratio, to constrain the variables that can affect the results.
The interlaboratory comparison have shown some discrepancies in the ratings assigned to the same material by different institutions.Main discrepancies are found in materials rated as "Temporary", especially for copper and lead coupons.Some discrepancies between institutions might be due to non-standardized methodological differences in the protocols.Surface inhomogeneities, arising from a poor edge preparation or differences in sanding pattern, might introduce confounding factors in the evaluation step.Obtaining in the whole surface of the coupon a uniform surface finish with equivalent grit size, preferred sanding direction and applying the methodology of metallographic preparation laboratories (i.e., sanding progressively from lower to higher grit), could ensure a consistent reaction of the metal and help to avoid possible confusion during coupon rating.Establishing a standardized methodology and providing thorough user training would help to reduce possible discrepancies in the results.
Notwithstanding this, our results show that the main differences arise from the evaluation of the coupons.When performed by single experienced evaluators using detailed reference photographs and aspect descriptions, the discrepancies in the ratings are largely reduced, showing the validity of the Oddy test for identifying harmful materials for the conservation of cultural heritage assets.

Fig. 5
Fig. 5 Silver coupons of the Institutions V and VII after testing material 4

Fig. 6
Fig.6PCA performed from the overall assessments in columns 4-12 of Table8.The bottom and left axes of this biplot correspond to the institutions (I-VII).The top and right axes correspond to the variables (evaluations of P, T and U for each metal).These axes indicate the direction and strength of the variables in the space defined by PC1 and PC2.The variable axes help indicate which classifications explain the observed differences between institutions

Fig. 7
Fig. 7 PCA result with One-Hot Encoded Variables P, T and U

Fig. 9
Fig. 9 Paired counts of the ratings of all metal coupons obtained after the single and multiple evaluation.P means permanent use (no corrosion), T, temporary use (slight corrosion) and U, unsuitable use (large amount of corrosion).Green bubbles indicate agreement, orange bubbles indicate disagreement by one category and red bubbles by two categories

Table 1
Institutions participating in the interlaboratory comparison of the 3-in-1 Oddy test, in alphabetic order

Table 2
Materials tested in the interlaboratory comparison of the 3-in-1 Oddy test

Tested materials in the Oddy test
Number 7Medium-density fiberboard (MDF, made up of 82% wood fiber, 9% urea-fomaldehyde resin glue, 8% water, and 1% paraffin wax) Number 8 Plywood board (made of birch wood sheets glued with phenolic resin) Number 9 MS-35 Sealing (hybrid modified silane; MS, polymer of one component.Although no further information is provided by the manufacturer, two possible "chemical skeletons" for modified silanes would be: MS-polyester or MS-polyurethane) Number 10 Polylactic acid filament for 3D printing each institution from the individual evaluation of the three metal coupons.

Table 3
Glassware cleaning procedures of metal coupons used by each participating institution in the interlaboratory comparison of the 3-in-1 Oddy test

solution of Decon 90 in
deionized water.The test tubes remained submerged in the solution for several hours, ensuring that no air bubbles were trapped in the tubes.Then rinsed in distilled water overnight and subsequently rinsed again in distilled water several more times.Finally washed with technical grade ethanol II New glassware was used.It was rinsed with a 0.1% solution of ZVG Washing up liquid HS 50420-010 in tap water.Then rinsed again several times (until no bubbles were formed) with demineralized water, air-dried, rinsed with pure ethanol and finally air dried III New glassware was used for this experiment, all Erlenmeyer flask were rinsed with deionized water and dried in the oven overnight IV Schott glassware and caps were washed with detergent in warm water (Deconex 20 NS-x, 250 mL per 10 L water, submerged in a tub), rinsed with tap water, deionized water and MQ water.All pieces were ultrasonicated for 15 min in MQ water and dried in an oven at 110 °C for 16 h Silicone stoppers were rinsed with deionized water and left to air dry.Three incisions were cut into each stopper with a scalpel V New glassware was used and glassware was manually cleaned with Alconox (1%) detergent.A sponge was dedicated for washing Oddy testing glassware.Glassware was rinsed with deionized water several times and finally washed with ACS acetone.All containers were air-dried completely VI Schott glasses, caps, glass vials and silicone stoppers were left to soak in a 5% Decon 90/DI water solution for 4 h, manually washed with a clean sponge, and subsequently rinsed with DI water 3-5 times The Schott glasses, caps, glass vials and silicone stoppers were then wrapped in aluminum foil and placed in an oven at 105 °C to dry

Table 4
Surface preparation of coupons performed by the institutions participating in the interlaboratory comparison of the 3-in-1 Oddy test *The policy of this institution prevented the use of sandpaper and micromesh during the surface preparation of the metal coupons

Table 5
Information on the reaction vessel setup used by each institution participating in the interlaboratory comparison of the 3-in-1 Oddy testAfter 15 min at 60 ºC, the silicone stoppers and screw caps were checked for tightness to ensure an airtight seal.Glass lid The coupons were hung one at a time from an inert nylon wire taped to the outside of the Erlenmeyer flask.The tape is removed when the setup is placed in the oven.A 10 ml inert beaker was used to hold the vials containing water

Table 7
Rating of the suitability of the ten test materials, individually for the three metal coupons and overall, according to the 3-in-1 Oddy test of each institution

Table 8
Table of frequencies (total and individual per metal coupon) for permanent, temporary and unsuitable ratings made by each institution irrespective of the testing material

Table 9
Fleiss's Kappa statistical measures regarding the agreement of the institutions' overall ratings (P, permanent; T, temporary and U, unsuitable) shown in Table8