# Physics 212, 2019: Lecture 17

Back to the main Teaching page.

## Avoid blind fits

Follow the Optimization notebook for this section.

As we have stressed again and again, there are no guarantees that a nonlinear optimization routine is going to be able to find a global minimum, or even a "good" minimum, where, for example, the curve that we are trying to fit passes through the data points. More commonly, you will find a local minimum, where the parameter values produce a slightly better fit than the nearby parameters, but the fit overall will be bad and it won't make much sense. To avoid the problem, one must again put their thinking hat on, and try to figure out what reasonable values of parameters are. Remember that every optimization algorithm requires an initial guess from which to start searching, and if the initial guess is close to a good optimum, then the chances that the optimum will be found are much higher.

How do we find such good initial guesses? The main idea is, again, to explore special cases. The curves that you fit to data are generated by models, and it is frequently the case that different parts of the curve are affected differently by different model parameters. The trick is to find different ranges of the futted curves that look nearly linear, with the parameters of the full model corresponding to parameters of the linear fit. The beauty of linear models is that there is just one global minimum, which is easy to find. So one can do linear least squares fitting, find parameter guesses from pieces of the full curve, and then use these guesses as initial conditions for the full fit.

The following Module 3 Python script shows how we can do this using an example of a Michaelis-Menten enzymatic kinetics, where the change in the number of substrates is given by ${\displaystyle {\frac {dS}{dt}}=-{\frac {VS}{K_{M}+S}}}$. There are two unknown parameters -- the maximum reaction velocity ${\displaystyle V}$ and the Michaelis constant ${\displaystyle K_{M}}$. We notice that if the substrate concentration ${\displaystyle S}$ is large, then ${\displaystyle {\frac {dS}{dt}}\approx -V}$. Thus one can fit the initial part of the ${\displaystyle S}$ vs. ${\displaystyle t}$ data, and the slope of the linear fit will give us a good estimate of ${\displaystyle V}$. If the substrate concentration ${\displaystyle S}$ is small, then ${\displaystyle {\frac {dS}{dt}}=-{\frac {VS}{K_{M}}}}$. This has the usual exponential solution ${\displaystyle S\propto e^{-V/K_{M}t}}$. This exponential curve becomes linear in the semi-logarithmic coordinates: ${\displaystyle \log S=C-V/K_{M}t}$, where ${\displaystyle C}$ is some constant. One can do a linear fit of this semi-logarithmic data and get a good initial guess for ${\displaystyle V/K_{M}}$. But then, knowing ${\displaystyle V}$ from the first part of the curve, one can get an estimate of ${\displaystyle K_{M}}$ too. Finally, with these estimates, one can then do a 2-d global fit to find really good parameter values.

## Which data to fit?

The Optimization notebook also shows that fits often depend on what exactly is being fitted. If data comes as ${\displaystyle (x_{i},y_{i})}$ pairs, one can fit these data, or their transformed versions, such as ${\displaystyle (\log x_{i},y_{i})}$, ${\displaystyle (\log x_{i},\log y_{i})}$, ${\displaystyle (x_{i}^{2},y_{i})}$, ${\displaystyle (x_{i}^{2},y_{i}^{3})}$, or any other transformed combination. Which choice should we make? The sum-of-squares objective function assumes that the noise is of the same scale for every ${\displaystyle x}$. And you should transform your data (typically the ${\displaystyle y}$ coordinate) to satisfy this property. For example, for exponentially decaying data with multiplicative noise, such transformation is ${\displaystyle (x_{i},\log y_{i})}$ -- and it produces much better fits, as the script shows. Similarly, sum-of-squares algorithms usually work much better when the distribution of ${\displaystyle x}$ points is not skewed, and there are no outliers -- and one often can achieve this by transforming the ${\displaystyle x}$ variable as well.

Generate data using a model ${\displaystyle y={\rm {sin}}(ax)+{\rm {noise}}}$ for some fixed value of ${\displaystyle a}$. Then use the curve_fit function to fit the model ${\displaystyle y={\rm {sin}}(ax)}$ to this data. Explore how the fitted value depends on the initial guess for ${\displaystyle a}$. This should illustrate for you how important it is to start close to the correct value of the fitted parameter.