# Regression and Inverse Problems

### Description

Regression problems occur in many metrological applications, e.g. in everyday calibration tasks (as illustrated in Annex H.3 of the GUM), in the evaluation of inter-laboratory comparisons [Toman et al. 2012], the characterization of sensors [Matthews et al. 2014], determination of fundamental constants [Bodnar et al. 2014], interpolation or prediction tasks [Wübbeler et al. 2012] and many more. Such problems arise when the quantity of interest cannot be measured directly, but has to be inferred from measurement data (and their uncertainties) using a mathematical model that relates the quantity of interest to the data.

**Figure 1**: Illustration of a Normal straight line regression problem. Displayed is a mean regression curve (solid line) and pointwise 95% credible intervals (dashed lines). The thin vertical line shows a prediction at a new value $x$ and its 95% credible interval. The dotted points resemble the data.

### Definition and Examples

Regression problems often take the form

$$

\begin{equation*} \label{int_reg_eq1}

y_i = f_{\boldsymbol{\theta}}(x_i) + \varepsilon_i , \quad i=1, \ldots, n \,,

\end{equation*}

$$

where the measurements $\boldsymbol{y}=(y_1, \ldots, y_n)^\top$ are explained by a function $f_{\boldsymbol{\theta}}$ evaluated at values $\boldsymbol{x}=(x_1, \ldots, x_n)^\top$ and depending on unknown parameters $\boldsymbol{\theta}=(\theta_1, \ldots, \theta_p)^\top$. The measurement error $\pmb{\varepsilon}=(\varepsilon_1, \ldots, \varepsilon_n)^\top$ follows a specified distribution $p(\pmb{\varepsilon} | \boldsymbol{\theta}, \boldsymbol{\delta}).$

Regressions may be used to describe the relationship between a traceable, highly-accurate reference device with values denoted by $x$ and a device to be calibrated with values denoted by $y$. The pairs $(x_i,y_i)$ then denote simultaneous measurements made by the two devices of the same measurand such as, for example, temperature.

A simple example is the Normal straight line regression model (as illustrated in Figure 1)

$$

\begin{equation*} \label{int_reg_eq4}

y_i = \theta_1 + \theta_2 x_i + \varepsilon_i , \quad \varepsilon_i \stackrel{iid}{\sim} \text{N}(0, \sigma^2), \quad i=1, \ldots, n \,.

\end{equation*}

$$

The basic goal of regression tasks is to estimate the unknown parameters $\pmb{\theta}$ of the regression function, and possibly also the unknown parameters of the error distribution $\pmb{\delta}$. The estimated regression model may then be used to evaluate the shape of the regression function, predictions or interpolations of intermediate or extrapolated $x$-values or to invert the regression function to predict $x$-values for new measurements.

$$

\begin{equation*} \label{int_reg_eq1}

y_i = f_{\boldsymbol{\theta}}(x_i) + \varepsilon_i , \quad i=1, \ldots, n \,,

\end{equation*}

$$

where the measurements $\boldsymbol{y}=(y_1, \ldots, y_n)^\top$ are explained by a function $f_{\boldsymbol{\theta}}$ evaluated at values $\boldsymbol{x}=(x_1, \ldots, x_n)^\top$ and depending on unknown parameters $\boldsymbol{\theta}=(\theta_1, \ldots, \theta_p)^\top$. The measurement error $\pmb{\varepsilon}=(\varepsilon_1, \ldots, \varepsilon_n)^\top$ follows a specified distribution $p(\pmb{\varepsilon} | \boldsymbol{\theta}, \boldsymbol{\delta}).$

Regressions may be used to describe the relationship between a traceable, highly-accurate reference device with values denoted by $x$ and a device to be calibrated with values denoted by $y$. The pairs $(x_i,y_i)$ then denote simultaneous measurements made by the two devices of the same measurand such as, for example, temperature.

A simple example is the Normal straight line regression model (as illustrated in Figure 1)

$$

\begin{equation*} \label{int_reg_eq4}

y_i = \theta_1 + \theta_2 x_i + \varepsilon_i , \quad \varepsilon_i \stackrel{iid}{\sim} \text{N}(0, \sigma^2), \quad i=1, \ldots, n \,.

\end{equation*}

$$

The basic goal of regression tasks is to estimate the unknown parameters $\pmb{\theta}$ of the regression function, and possibly also the unknown parameters of the error distribution $\pmb{\delta}$. The estimated regression model may then be used to evaluate the shape of the regression function, predictions or interpolations of intermediate or extrapolated $x$-values or to invert the regression function to predict $x$-values for new measurements.

### Uncertainty evaluation

Decisions based on regression analyses require a reliable evaluation of measurement uncertainty. The current state of the art in uncertainty evaluation in metrology (i.e. the GUM and its supplements) provides little guidance however. One reason is that the GUM guidelines are based on a model that relates the quantity of interest (the measurand) to the input quantities. Yet, regression models cannot be uniquely formulated as such a measurement function. By way of example, Annex H.3 of the GUM nevertheless suggests a possibility to analyse regression problems. However, this analysis contains elements from both classical (least squares) and Bayesian statistics such that the results are no deductions of state-of-knowledge distributions and usually differ from a purely classical or Bayesian approach.

Consequently, there is a need for guidance and research in metrology for uncertainty evaluation in regression problems. The Joint Committee for Guides in Metrology (JCGM) identified this need. The EMRP project NEW041 developed template solutions for specific regression problems with known values x (cf. [Elster et al., 2015]). These solutions are based on Bayesian inference and consider (1) a simple, analytically solvable, Normal linear regression [Klauenberg et al., 2015], (2) a similar problem with additional constraints on the values of the regression curve [Kok et al., 2015], (3) a problem where the regression function is not known explicitly but needs to be determined through the numerical solution of a partial differential equation [Allard et al., 2015], (4) a problem where the variances of the observations are not constant and the information gained in the regression is used completely for a subsequent prediction of values of x [Klauenberg et al., 2015] and (5) a regression function which is computationally expensive to evaluate [Heidenreich et al., 2014]. Other Bayesian research of metrological regression problems include [Rocha et al., 2004, Toman et al.,2006, Grientschnig et al., 2011, Willink et al., 2008, Wübbeler et al., 2012, Toman et al., 2012-2, Elster et al., 2011 and Possolo et al., 2007].

Consequently, there is a need for guidance and research in metrology for uncertainty evaluation in regression problems. The Joint Committee for Guides in Metrology (JCGM) identified this need. The EMRP project NEW041 developed template solutions for specific regression problems with known values x (cf. [Elster et al., 2015]). These solutions are based on Bayesian inference and consider (1) a simple, analytically solvable, Normal linear regression [Klauenberg et al., 2015], (2) a similar problem with additional constraints on the values of the regression curve [Kok et al., 2015], (3) a problem where the regression function is not known explicitly but needs to be determined through the numerical solution of a partial differential equation [Allard et al., 2015], (4) a problem where the variances of the observations are not constant and the information gained in the regression is used completely for a subsequent prediction of values of x [Klauenberg et al., 2015] and (5) a regression function which is computationally expensive to evaluate [Heidenreich et al., 2014]. Other Bayesian research of metrological regression problems include [Rocha et al., 2004, Toman et al.,2006, Grientschnig et al., 2011, Willink et al., 2008, Wübbeler et al., 2012, Toman et al., 2012-2, Elster et al., 2011 and Possolo et al., 2007].

### References

- A. Allard, N. Fischer, G. Ebrard, B. Hay, P. M. Harris, L. Wright, D. Rochais, J. Mattout
*.**A multi-thermogram based Bayesian model for the determination of the thermal diffusivity of a material*.**Metrologia**, 2016

- A. Allard, N. Fischer, G. Ebrard, B. Hay, P. M. Harris, L. Wright, D. Rochais, J. Mattout
- O. Bodnar and C. Elster
*.**On the adjustment of inconsistent data using the Birge ratio*.**Metrologia 51, 516-521**, 2014

- O. Bodnar and C. Elster
- C. Elster and B. Toman
*.**Bayesian uncertainty analysis for a regression model versus application of GUM supplement 1 to the least-squares estimate*.**Metrologiam 48 (5), 233**, 2011

- C. Elster and B. Toman
- C. Elster, K. Klauenberg, M. Walzel, G. Wübbeler, P. Harris, M. Cox, C. Matthews, I. Smith, L. Wright, A. Allard, N. Fischer, S. Cowen, S. Ellison, P. Wilson, F. Pennecchi, G. Kok, A. van der Veen, and L. Pendrill
*.**A Guide to Bayesian Inference for Regression Problems*.**Deliverable of EMRP project NEW04 “Novel mathematical and statistical approaches to uncertainty evaluation”**, 2015

- C. Elster, K. Klauenberg, M. Walzel, G. Wübbeler, P. Harris, M. Cox, C. Matthews, I. Smith, L. Wright, A. Allard, N. Fischer, S. Cowen, S. Ellison, P. Wilson, F. Pennecchi, G. Kok, A. van der Veen, and L. Pendrill
- D. Grientschnig and I. Lira
*.**Reassessment of a calibration model by Bayesian reference analysis*.**Metrologia 48 (1), L7**, 2011

- D. Grientschnig and I. Lira
- S. Heidenreich, H. Gross, M.-A. Henn, C. Elster, and M. Bär
*.**A surrogate model enables a Bayesian approach to the inverse problem of scatterometry*.**J. Phys. : Conf. Ser. 490, 012007**, 2014

- S. Heidenreich, H. Gross, M.-A. Henn, C. Elster, and M. Bär
- K. Klauenberg, M. Walzel, B. Ebert and C. Elster
*.**Informative prior distributions for ELISA analyses*.**Biostatistics**, 2015

- K. Klauenberg, M. Walzel, B. Ebert and C. Elster
- K. Klauenberg, G. Wübbeler, B. Mickan, P. M. Harris, and C. Elster
*.**A Tutorial on Bayesian Normal Linear Regression*.**Metrologia, 52(6)**, 2015

- K. Klauenberg, G. Wübbeler, B. Mickan, P. M. Harris, and C. Elster
- GJP Kok, AMH van der Veen, PM Harris, IM Smith, C Elster
*.**Bayesian analysis of a flow meter calibration problem*.**Metrologia 52, 392-399**, 2015

- GJP Kok, AMH van der Veen, PM Harris, IM Smith, C Elster
- C. Matthews, F. Pennecchi, S. Eichstädt, A. Malengo, T. Esward, I. Smith, C. Elster, A. Knott, F. Arrhén and A. Lakka
*.**Mathematical modelling to support tracable dynamic calibration of pressure sensors*.**Metrologia 51, 326-338**, 2014

- C. Matthews, F. Pennecchi, S. Eichstädt, A. Malengo, T. Esward, I. Smith, C. Elster, A. Knott, F. Arrhén and A. Lakka
- A. Possolo and B. Toman
*.**Assessment of measurement uncertainty via observation equations*.**Metrologia 44(6), 464**, 2007

- A. Possolo and B. Toman
- G.M. Rocha and G.A. Kyriazisa
*.**A software for the evaluation of the stability of measuring standards using Bayesian statistics*.**In Proceedings of the 13th International Symposium on Measurements for Industry Applications, 386-391**, 2004

- G.M. Rocha and G.A. Kyriazisa
- B. Toman
*.**Linear statistical models in the presence of systematic effects requiring a Type B evaluation of uncertainty*.**Metrologia 43(1), 27**, 2006

- B. Toman
- B. Toman, D.L. Duewer, H.G. Aragon, F.R. Guenther and G.C. Rhoderick
*.**A Bayesian approach to the evaluation of comparisons of individually value-assigned reference materials*.**Analytical and Bioanalytical Chemistry 403(2), 537-548**, 2012

- B. Toman, D.L. Duewer, H.G. Aragon, F.R. Guenther and G.C. Rhoderick
- R. Willink
*.**Estimation and uncertainty in fitting straight lines to data: different techniques*.**Metrologia 45(3), 290**, 2008

- R. Willink
- G. Wübbeler, F. Schmähling, J. Beyer, J. Engert, and C. Elster
*.**Analysis of magnetic field fluctuation thermometry using Bayesian inference*.**Meas. Sci. Technol. 23, 125004 (9pp).**, 2012

- G. Wübbeler, F. Schmähling, J. Beyer, J. Engert, and C. Elster

### Research

To improve reliability and comparability in many fields of metrology, a consistent evaluation of regression problems is indispensable. Important issues to achieve this goal, are

- the proper treatment of the error structure associated with measured data (including errors in both stimulus and response variables),
- the thoughtful inclusion of all available information such as prior knowledge from previous measurements or physical constraints,
- the availability of reliable numerical methods (such as Monte Carlo meth- ods),
- the quantification of the sensitivity of the results obtained to the assump- tions made, and
- the consideration of model uncertainty and model validation.

- develop tutorials, guides and template solutions for typical regression problems,
- implement these in easy to use software,
- define conditions under which simple (approximate) methods are applicable, and
- bridge the gap to statisticians (especially at smaller metrology institutes) to tackle also more complex problems.