The European Centre for
Mathematics and Statistics in Metrology

Analysis of key comparisons


Key comparisons are interlaboratory comparisons between National Metrology Institutes (NMIs) within the framework of the CIPM MRA (Mutual Recognition Arrangement) [MRA]. The MRA has meanwhile been signed by more than 98 institutes. It enables the mutual recognition of calibrations, measurements, and test certificates and marks a major step in supporting international trade, commerce and regulatory affairs. In order to ensure the compatibility of the measurement capabilities provided by NMIs, the MRA prescribes that key comparisons are carried out on a regular basis. Based on the analysis of the data from a key comparison, the corresponding calibration and measurement capabilities (CMCs) of NMIs are validated. The final report and the supporting technical data of each key comparison are stored and made publicly available at the key comparison data base (BIPM KCDB) of the Bureau International des Poids et Mesures (BIPM). Figure 1 shows a typical example of key comparison data.

The goal of the analysis of KC data is to assess the results reported by the participating laboratories. According to the MRA, a so-called key comparison reference value (KCRV) is usually calculated. The KCRV is then used to calculate the degrees of equivalence (DoEs) as the difference between the results reported by the laboratories and the KCRV, along with the uncertainties associated with these differences. The DoEs quantify the extent to which the laboratories are compatible, and they can also be viewed as a measure to judge whether the laboratories measure as good as they claim. When a DoE is significant different from zero, the (CMC of the) corresponding laboratory is seen to be not approved.
Stacks Image 113312

Figure 1 Example of a key comparison along with key comparison reference value (KCRV). The blue results indicate control measurements made by the so-called pilot laboratory.

More generally, the analysis of KCs can be seen as a Meta-Analysis in which the results reported by the participating laboratories are assessed. Methods employed for Meta-Analyses such as fixed effects [Leandro 2008] or random effects models [DerSimonian et al. 2007] have also been proposed for the analysis of key comparisons, cf. [Elster et al. 2010, Kacker 2004, Sutton et al. 2004, Toman et al. 2009 and White et al. 2004]. Simpler methods such as the mean, the median or the weighted mean [Cox 2002] have been also employed for the calculation of a KCRV. Methods that have been applied for the analysis of KCs also include approaches based on the explicit or implicit removal of outliers [Cox 2007, Steele et al. 2005].
The GUM constitutes the main guidelines for uncertainty evaluation in metrology, and its recent supplements approach the Bayesian point of view. Bayesian methods have also been suggested for the analysis of KCs, for example [Bodnar et al. 2014,Bodnar et al. 2015, Bodnar et al. 2013, Elster et al. 2010, Rukhin et al. 2013, Toman et al. 2007], including Bayesian model averaging [Bodnar et al. 2013 and Elster et al. 2010]. When applying a Bayesian approach, a (posterior) distribution is derived for the unknown quantities such as the DoEs, cf. Figure 2.
Stacks Image 113317

Figure 2 Example posterior distributions for the degrees of equivalence (DoEs) obtained by a Bayesian inference of the data from Figure 1.

Ideally, all laboratories participating in a KC are measuring the same measurand, which makes a comparison of reported results most meaningful. However, this may not always be possible, because the measurand that is sent in turn to all laboratories changes its value over time. If the drift is a deterministic one, it can be accounted for in the analysis by an appropriate model [Bergoglio et al. 2011, Elster et al. 2005, Zhang et al. 2006, Zhang et al. 2009]. If the fluctuation of a common measurand is random, on the other hand, this is not possible and the question arises whether an analysis is still meaningful in order to assess the results reported by the participating laboratories. In [Wübbeler et al. 2015] the concept of power of statistical hypothesis tests has recently been suggested to assess the explanatory power of a key comparison in the presence of random fluctuations of the common measurand.



Current and future research in the analysis of KC data comprises the adequate selection of a prior distribution when employing Bayesian inference. This includes the elicitation of available prior knowledge [Garthwaite2005], but also the choice of adequate noninformative priors [Bodnar2014_3]. Further directions are the use of more flexible underlying distributions in the determination of a reference value (cf. [Bodnar2013_2] for the use of elliptically contoured distributions in a generalized marginal random effects model, or [Rukhin2011] for the use of Laplace distributions in a random effects model).
Other interesting questions include the design of key comparisons in order to optimally benefit from a subsequent analysis. To this end, the power of a statistical test could be utilized, or other concepts optimal design of experiments.

Related journal papers

A. G. Steele and R. J. DouglasChi-squared statistics for KCRV candidatesMetrologia 42, 2532005
A. L. RukhinEstimating heterogeneity variance in meta-analysisJournal of the Royal Statistical Society: Series B 75, 451–469 2013
A.L. Rukhin and A. PossoloLaplace random effects models for interlaboratory studiesComputational Statistics & Data Analysis 55, 1815–1827 2011
B. TomanBayesian approaches to calculating a reference value in key comparisonsTechnometrics 49, 81-872007
B. Toman, and A. PossoloLaboratory effects models for interlaboratory comparisonsAccreditation and Quality Assurance 14, 553–563 2009
C. Elster and B. TomanAnalysis of key comparison data: critical assessment of elements of current practice with suggested improvementsMetrologia 50, 549-5552013
C. Elster and B. TomanAnalysis of key comparisons: estimating laboratories' biases by a fixed effects model using Bayesian model averagingMetrologia 47, 113-1192010
C. Elster, W. Wöger and M.G. CoxAnalysis of Key Comparison Data: Unstable Travelling StandardsMeasurement Techniques 48, 883-8932005
C.M. SuttonAnalysis and linking of international measurement comparisonsMetrologia 41, 272-2772004
D.R. WhiteOn the analysis of measurement comparisonsMetrologia 41, 122–131 2004
G. Wübbeler, O. Bodnar, B. Mickan and C. ElsterExplanatory power of degrees of equivalence in the presence of a random instability of the common measurandMetrologia 52, 4002015
I. Lira, A. Chunovkina, C. Elster, and W. WögerAnalysis of key comparisons incorporating knowledge about biasIEEE T. Instrum. Meas. 61, 2079-20842012
K. Jousten, K. Aral, U. Becker, O. Bodnar, F. Boineau, J. A. Fedjak, V. Gorobey, Wu Jian, D. Mari, P. Mohan, J. Setina, B. Toma, M. Vicar, and Yu Hong YanFinal report of key comparison CCM.P-K12 for very low helium flow rates (leak rates)Metrologia 50,Tech. Suppl., 07001 (50pp)2013
K. Weise and W. WögerRemoving model and data non-conformity in measurement evaluationMeasurement Science and Technology 11, 1649-16582000
L. Spinelli, M. Botwicz, N. Zolek, M. Kacprzak, D. Milej, P. Sawosz, A. Liebert, U. Weigel, T. Durduran, F. Foschum, A. Kienle, F. Baribeau, S. Leclair, J.-P. Bouchard, I. Noiseux, P. Gallant, O. Mermut, A. Farina, A. Pifferi, A. Torricelli, R. Cubeddu, HDetermination of reference values for optical properties of liquid phantoms based on Intralipid and India ink.Biomed. Opt. Express 5, 2037-20532014
M. Bergoglio, A. Malengo and D. MariAnalysis of interlaboratory comparisons affected by correlations of the reference standards and drift of the travelling standardsMeasurement 44, 1461-14672011
M. G. Cox and P. M. HarrisThe evaluation of key comparison data using key comparison reference curvesMetrologia, 49, 164-1722012
M.G. CoxThe evaluation of key comparison dataMetrologia 39, 589-5952002
M.G. CoxThe evaluation of key comparison data: determining the largest consistent subsetMetrologia, 44, 187-2002007
N.F. Zhang, W. Strawderman, H. Liu, and N. SedranskStatistical analysis for multiple artefact problem in key comparisons with linear trendsMetrologia 43, 21-26 2006
O. Bodnar and C. ElsterOn the adjustment of inconsistent data using the Birge ratioMetrologia 51, 516-5212014
R. DerSimonian and R. KackerRandom-effects model for meta-analysis of clinical trials: an updateContemporary Clinical Trials 28, 105-1142007
R.N. KackerCombining information from interlaboratory evaluations using a random effects modelMetrologia 41, 132-1362004
W. Zhang, N.F. Zhang and H. LiuA generalized method for the multiple artefacts problem in interlaboratory comparisons with linear trendsMetrologia 46, 345-350 2009


This website uses cookies occasionally to provide you with the best web browsing experience. However, no web-analytics tracking based on cookies is employed here.