# Analysis of key comparisons

### Description

Key comparisons are interlaboratory comparisons between National Metrology Institutes (NMIs) within the framework of the CIPM MRA (Mutual Recognition Arrangement) [MRA]. The MRA has meanwhile been signed by more than 98 institutes. It enables the mutual recognition of calibrations, measurements, and test certificates and marks a major step in supporting international trade, commerce and regulatory affairs. In order to ensure the compatibility of the measurement capabilities provided by NMIs, the MRA prescribes that key comparisons are carried out on a regular basis. Based on the analysis of the data from a key comparison, the corresponding calibration and measurement capabilities (CMCs) of NMIs are validated. The final report and the supporting technical data of each key comparison are stored and made publicly available at the key comparison data base (BIPM KCDB) of the Bureau International des Poids et Mesures (BIPM). Figure 1 shows a typical example of key comparison data.

The goal of the analysis of KC data is to assess the results reported by the participating laboratories. According to the MRA, a so-called key comparison reference value (KCRV) is usually calculated. The KCRV is then used to calculate the degrees of equivalence (DoEs) as the difference between the results reported by the laboratories and the KCRV, along with the uncertainties associated with these differences. The DoEs quantify the extent to which the laboratories are compatible, and they can also be viewed as a measure to judge whether the laboratories measure as good as they claim. When a DoE is significant different from zero, the (CMC of the) corresponding laboratory is seen to be not approved.

The goal of the analysis of KC data is to assess the results reported by the participating laboratories. According to the MRA, a so-called key comparison reference value (KCRV) is usually calculated. The KCRV is then used to calculate the degrees of equivalence (DoEs) as the difference between the results reported by the laboratories and the KCRV, along with the uncertainties associated with these differences. The DoEs quantify the extent to which the laboratories are compatible, and they can also be viewed as a measure to judge whether the laboratories measure as good as they claim. When a DoE is significant different from zero, the (CMC of the) corresponding laboratory is seen to be not approved.

**Figure 1 **Example of a key comparison along with key comparison reference value (KCRV). The blue results indicate control measurements made by the so-called pilot laboratory.

More generally, the analysis of KCs can be seen as a Meta-Analysis in which the results reported by the participating laboratories are assessed. Methods employed for Meta-Analyses such as fixed effects [Leandro 2008] or random effects models [DerSimonian et al. 2007] have also been proposed for the analysis of key comparisons, cf. [Elster et al. 2010, Kacker 2004, Sutton et al. 2004, Toman et al. 2009 and White et al. 2004]. Simpler methods such as the mean, the median or the weighted mean [Cox 2002] have been also employed for the calculation of a KCRV. Methods that have been applied for the analysis of KCs also include approaches based on the explicit or implicit removal of outliers [Cox 2007, Steele et al. 2005].

The GUM constitutes the main guidelines for uncertainty evaluation in metrology, and its recent supplements approach the Bayesian point of view. Bayesian methods have also been suggested for the analysis of KCs, for example [Bodnar et al. 2014,Bodnar et al. 2015, Bodnar et al. 2013, Elster et al. 2010, Rukhin et al. 2013, Toman et al. 2007], including Bayesian model averaging [Bodnar et al. 2013 and Elster et al. 2010]. When applying a Bayesian approach, a (posterior) distribution is derived for the unknown quantities such as the DoEs, cf. Figure 2.

**Figure 2** Example posterior distributions for the degrees of equivalence (DoEs) obtained by a Bayesian inference of the data from Figure 1.

Ideally, all laboratories participating in a KC are measuring the same measurand, which makes a comparison of reported results most meaningful. However, this may not always be possible, because the measurand that is sent in turn to all laboratories changes its value over time. If the drift is a deterministic one, it can be accounted for in the analysis by an appropriate model [Bergoglio et al. 2011, Elster et al. 2005, Zhang et al. 2006, Zhang et al. 2009]. If the fluctuation of a common measurand is random, on the other hand, this is not possible and the question arises whether an analysis is still meaningful in order to assess the results reported by the participating laboratories. In [Wübbeler et al. 2015] the concept of power of statistical hypothesis tests has recently been suggested to assess the explanatory power of a key comparison in the presence of random fluctuations of the common measurand.

### References

- M. Bergoglio, A. Malengo and D. Mari
*.**Analysis of interlaboratory comparisons affected by correlations of the reference standards and drift of the travelling standards*.**Measurement 44, 1461-1467**, 2011

- M. Bergoglio, A. Malengo and D. Mari
- O. Bodnar, A. Link, K. Klauenberg, K. Jousten, and C. Elster
*.**Application of Bayesian model averaging using a fixed effects model with linear drift for the analysis of key comparison CCM.P-K12*.**Meas. Tech. 56, 584-590**, 2013

- O. Bodnar, A. Link, K. Klauenberg, K. Jousten, and C. Elster
- O. Bodnar and C. Elster
*.**On the adjustment of inconsistent data using the Birge ratio*.**Metrologia 51, 516-521**, 2014

- O. Bodnar and C. Elster
- O. Bodnar, A. Link and C. Elster
*.**Bayesian treatment of a random effects model for the analysis of key comparisons*.**Talk at (MATHMET) International Workshop on Mathematics and Statistics for Metrology, March, 24-26, 2014, Berlin**, 2014

- O. Bodnar, A. Link and C. Elster
- M.G. Cox
*.**The evaluation of key comparison data*.**Metrologia 39, 589-595**, 2002

- M.G. Cox
- M.G. Cox
*.**The evaluation of key comparison data: determining the largest consistent subset*.**Metrologia, 44, 187-200**, 2007

- M.G. Cox
- R. DerSimonian and R. Kacker
*.**Random-effects model for meta-analysis of clinical trials: an update*.**Contemporary Clinical Trials 28, 105-114**, 2007

- R. DerSimonian and R. Kacker
- C. Elster, W. Wöger and M.G. Cox
*.**Analysis of Key Comparison Data: Unstable Travelling Standards*.**Measurement Techniques 48, 883-893**, 2005

- C. Elster, W. Wöger and M.G. Cox
- C. Elster and B. Toman
*.**Analysis of key comparisons: estimating laboratories' biases by a fixed effects model using Bayesian model averaging*.**Metrologia 47, 113-119**, 2010

- C. Elster and B. Toman
- P.H. Garthwaite, J.B. Kadane and A. O'Hagan
*.**Statistical Methods for Eliciting Probability Distributions*.**Journal of the American Statistical Association 100, 680-701**, 2005

- P.H. Garthwaite, J.B. Kadane and A. O'Hagan
- R.N. Kacker
*.**Combining information from interlaboratory evaluations using a random effects model*.**Metrologia 41, 132-136**, 2004

- R.N. Kacker
- G. Leandro
*.**Meta-analysis in Medical Research: The handbook for the understanding and practice of meta-analysis*.**John Wiley & Sons**, 2008

- G. Leandro
- A.L. Rukhin and A. Possolo
*.**Laplace random effects models for interlaboratory studies*.**Computational Statistics & Data Analysis 55, 1815–1827**, 2011

- A.L. Rukhin and A. Possolo
- A. L. Rukhin
*.**Estimating heterogeneity variance in meta-analysis*.**Journal of the Royal Statistical Society: Series B 75, 451–469**, 2013

- A. L. Rukhin
- A. G. Steele and R. J. Douglas
*.**Chi-squared statistics for KCRV candidates*.**Metrologia 42, 253**, 2005

- A. G. Steele and R. J. Douglas
- B. Toman
*.**Bayesian approaches to calculating a reference value in key comparisons*.**Technometrics 49, 81-87**, 2007

- B. Toman
- B. Toman, and A. Possolo
*.**Laboratory effects models for interlaboratory comparisons*.**Accreditation and Quality Assurance 14, 553–563**, 2009

- B. Toman, and A. Possolo
- K. Weise and W. Wöger
*.**Removing model and data non-conformity in measurement evaluation*.**Measurement Science and Technology 11, 1649-1658**, 2000

- K. Weise and W. Wöger
- D.R. White
*.**On the analysis of measurement comparisons*.**Metrologia 41, 122–131**, 2004

- D.R. White
- G. Wübbeler, O. Bodnar, B. Mickan and C. Elster
*.**Explanatory power of degrees of equivalence in the presence of a random instability of the common measurand*.**Metrologia 52, 400**, 2015

- G. Wübbeler, O. Bodnar, B. Mickan and C. Elster
- N.F. Zhang, W. Strawderman, H. Liu, and N. Sedransk
*.**Statistical analysis for multiple artefact problem in key comparisons with linear trends*.**Metrologia 43, 21-26**, 2006

- N.F. Zhang, W. Strawderman, H. Liu, and N. Sedransk
- W. Zhang, N.F. Zhang and H. Liu
*.**A generalized method for the multiple artefacts problem in interlaboratory comparisons with linear trends*.**Metrologia 46, 345-350**, 2009

- W. Zhang, N.F. Zhang and H. Liu

### Research

Current and future research in the analysis of KC data comprises the adequate selection of a prior distribution when employing Bayesian inference. This includes the elicitation of available prior knowledge [Garthwaite2005], but also the choice of adequate noninformative priors [Bodnar2014_3]. Further directions are the use of more flexible underlying distributions in the determination of a reference value (cf. [Bodnar2013_2] for the use of elliptically contoured distributions in a generalized marginal random effects model, or [Rukhin2011] for the use of Laplace distributions in a random effects model).

Other interesting questions include the design of key comparisons in order to optimally benefit from a subsequent analysis. To this end, the power of a statistical test could be utilized, or other concepts optimal design of experiments.

Other interesting questions include the design of key comparisons in order to optimally benefit from a subsequent analysis. To this end, the power of a statistical test could be utilized, or other concepts optimal design of experiments.

### Related journal papers

### Links

- BIPM Key comparison data base
- Mutual recognition arrangement
- calibration and measurement capabilities (CMCs)