Physics 212, 2019: Lecture 15 - Revision history

Ilya: /* Maximum accuracy of optimization */

2019-03-20T13:58:27Z

‎Maximum accuracy of optimization

Ilya: /* Maximum accuracy of optimization */

2019-03-06T14:55:21Z

‎Maximum accuracy of optimization

Ilya: /* Maximum accuracy of optimization */

2019-03-06T14:54:58Z

‎Maximum accuracy of optimization

Ilya: /* 1-d optimization */

2019-03-06T14:54:35Z

‎1-d optimization

Ilya at 20:53, 5 March 2019

2019-03-05T20:53:52Z

Ilya at 20:43, 5 March 2019

2019-03-05T20:43:38Z

Ilya: Created page with "{{PHYS212-2019}} ==Optimization is hard== While a lot more can be added here (see the appropriate section of the Numerical Recipes book), it suffices to say that optimization..."

2019-03-05T20:23:39Z

Created page with "{{PHYS212-2019}} ==Optimization is hard== While a lot more can be added here (see the appropriate section of the Numerical Recipes book), it suffices to say that optimization..."

New page

{{PHYS212-2019}}

==Optimization is hard==
While a lot more can be added here (see the appropriate section of the Numerical Recipes book), it suffices to say that optimization is hard. And the more dimensional the optimization problem is, the harder it is, generally. Overall, this in because
*In large dimensions, the volume of the parameter space grows quickly. If one needs to find the minimum to the accuracy <math>\Delta x</math>, and the typical size of the system along each dimension is <math>L</math>, then there are <math>(L/\Delta x)^d</math> spaces to explore in <math>d</math> dimensions, which very quickly becomes hard even for relatively small <math>d</math>.
*There's usually just one global minimum, but many local minima that we are not interested in. The optimizers get stuck in the local minima and can't escape them to find the global ones. This is more likely in large dimensional problems.
*There are also regions of space where the loss function <math>{\cal L}</math> is nearly flat, so that the optimizer does not know which way to go towards the minimum. Again, this is more likely in higher dimensions.
*In higher dimensions there are a lot of other exotic situations, such as valleys, narrow canyons, funnels, discontinuous walls -- and all those make optimization harder.
There are some exceptions -- problems like the multivariate linear regression one -- that are known as "convex", which means that there's just one minimum (global or local), and there are no flat regions, and so one can easily find the minimum. Nonetheless, even in non-convex problems, one often can find at least a local minimum by simply following the gradient, and sliding downhill. We will explore this in the later lectures.

Why is fitting hard: Large dimensionality, multiple minima, flat regions.
What is the maximum accuracy of optimization?

For future lectures, start reading:
*http://docs.scipy.org/doc/scipy/reference/optimize.html -- Optimization with Python/SciPy
In the class lectures for this module, we will talk about the following methods:
*Multi-dimensional linear regression (no derivations) using numpy.linalg.lstsq http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.linalg.lstsq.html
*General considerations for nonlinear minimization (with and without derivatives; with and without second derivatives)
*Golden mean 1-d search
*Parabolic interpolation
*Newton method

;Your turn: Write down a parabolic interpolation or Newton method minimizer of 1-d functions. Minimize a <math>{\rm cosh} x</math> using your minimizer. Do you get the expected result?

*[https://docs.google.com/forms/d/e/1FAIpQLSeNvrPddCgvMfHdlSA8uYDddBFqLK81Rrr4l7jfozud6J7iww/viewform Submit your work].

@@ Line 22: / Line 22: @@
 ==Maximum accuracy of optimization==
-What is the maximum precision that we should request of an optimizer looking for the optimal point? Remember that floating point numbers at double precision are stored in the memory to the relative error of <math>\epsilon \approx 10^{-16}</math>. Also recall that most functions near an optimum behave like <math>f(\theta)\approx \frac{1}{2}\left.\frac{\partial^2 f}{\partial \theta^2}\right|_{\hat{\theta}}(\theta-\hat{\theta})^2</math>. ... more to be added...
+What is the maximum precision that we should request of an optimizer looking for the optimal point? Remember that floating point numbers at double precision are stored in the memory to the relative error of <math>\epsilon \approx 10^{-16}</math>. Also recall that most functions near an optimum behave like <math>f(\theta)\approx \frac{1}{2}\left.\frac{\partial^2 f}{\partial \theta^2}\right|_{\hat{\theta}}(\theta-\hat{\theta})^2</math>. To find the position of a minimum of a function, we must be able to compare the function values to each other near the minimum: indeed, we need to know, which of the function values is smaller than the others. However, comparisons of floating point numbers are never perfect, and can only be performed up to a certain relative precision. That is, we can only know that <math> f(\theta_0+\Delta \theta)</math> is greater than <math>f(\theta_0)</math> (and hence <math>\theta_0</math> is the position of the minimum) if ... more to be added...
 ==Your turn==
 Write down a parabolic interpolation or Newton method minimizer of 1-d functions. Minimize a  <math>{\rm cosh} x</math> using your minimizer. Do you get the expected result?

@@ Line 22: / Line 22: @@
 ==Maximum accuracy of optimization==
-What is the maximum precision that we should request of an optimizer looking for the optimal point? Remember that floating point numbers at double precision are stored in the memory to the relative error of <math>\epsilon \approx 10^{-16}</math>. Also recall that most functions near an optimum behave like <math>f(\theta)\approx \frac{1}{2}\left.\frac{\partial^2 f}{\partial \theta^2}\right|_{\hat{\theta}}(\theta-\hat{\theta})^2</math>.
+What is the maximum precision that we should request of an optimizer looking for the optimal point? Remember that floating point numbers at double precision are stored in the memory to the relative error of <math>\epsilon \approx 10^{-16}</math>. Also recall that most functions near an optimum behave like <math>f(\theta)\approx \frac{1}{2}\left.\frac{\partial^2 f}{\partial \theta^2}\right|_{\hat{\theta}}(\theta-\hat{\theta})^2</math>. ... more to be added...
 ==Your turn==
 Write down a parabolic interpolation or Newton method minimizer of 1-d functions. Minimize a  <math>{\rm cosh} x</math> using your minimizer. Do you get the expected result?

@@ Line 22: / Line 22: @@
 ==Maximum accuracy of optimization==
-What is the maximum precision that we should request of an optimizer looking for the optimal point? Remember that floating point numbers at double precision are stored in the memory to the relative error of <math>\epsilon \approx 10^{-16}</math>. Also recall that most functions near an optimum behave like <math>f(\theta)\approx \frac{1}{2}\left.\frac{\partial^2 f}{\partial \theta^2}\right_{\hat{\theta}}(\theta-\hat{\theta})^2</math>.
+What is the maximum precision that we should request of an optimizer looking for the optimal point? Remember that floating point numbers at double precision are stored in the memory to the relative error of <math>\epsilon \approx 10^{-16}</math>. Also recall that most functions near an optimum behave like <math>f(\theta)\approx \frac{1}{2}\left.\frac{\partial^2 f}{\partial \theta^2}\right|_{\hat{\theta}}(\theta-\hat{\theta})^2</math>.
 ==Your turn==
 Write down a parabolic interpolation or Newton method minimizer of 1-d functions. Minimize a  <math>{\rm cosh} x</math> using your minimizer. Do you get the expected result?

@@ Line 20: / Line 20: @@
 *Parabolic interpolation -- approximating the function based on three points as a parabola, and jumping to the putative minimum of the parabola. See pages 402 in [http://apps.nrbook.com/c/index.html Numerical Recipes] book.
 *Newton's method. Since the minimum corresponds to the zero of the derivative of a function, one can solve for the zero of the derivative similarly to how we did this for the Newton-Raphson method.
 ==Your turn==
 Write down a parabolic interpolation or Newton method minimizer of 1-d functions. Minimize a  <math>{\rm cosh} x</math> using your minimizer. Do you get the expected result?

@@ Line 8: / Line 8: @@
 *In higher dimensions there are a lot of other exotic situations, such as valleys, narrow canyons, funnels, discontinuous walls -- and all those make optimization harder.
 There are some exceptions -- problems like the multivariate linear regression one -- that are known as "convex", which means that there's just one minimum (global or local), and there are no flat regions, and so one can easily find the minimum. Nonetheless, even in non-convex problems, one often can find at least a local minimum by simply following the gradient, and sliding downhill. We will explore this in the later lectures.
 ==1-d optimization==
--d optimization is special. While for larger-dimensional problems finding a minimum is never guaranteed, in 1-d we are guaranteed to find a minimum if we bracketed it (that is, know the value of the function in three points, and the value is higher for the two exterior points than the interior one). This can be done with the golden mean 1-d search
+-d optimization is special. While for larger-dimensional problems finding a minimum is never guaranteed, in 1-d we are guaranteed to find a minimum if we bracketed it (that is, know the value of the function in three points, and the value is higher for the two exterior points than the interior one). This can be done with the golden section 1-d search -- see page 397 in [http://apps.nrbook.com/c/index.html Numerical Recipes] book. There's no equivalent method in higher dimensions.
-;Your turn: Write down a parabolic interpolation or Newton method minimizer of 1-d functions. Minimize a  <math>{\rm cosh} x</math> using your minimizer. Do you get the expected result?
+Additional methods for 1-d optimization, which actually have equivalents in higher dimensions, are
-*[https://docs.google.com/forms/d/e/1FAIpQLSeNvrPddCgvMfHdlSA8uYDddBFqLK81Rrr4l7jfozud6J7iww/viewform Submit your work].
+==Your turn==

@@ Line 9: / Line 9: @@
 There are some exceptions -- problems like the multivariate linear regression one -- that are known as "convex", which means that there's just one minimum (global or local), and there are no flat regions, and so one can easily find the minimum. Nonetheless, even in non-convex problems, one often can find at least a local minimum by simply following the gradient, and sliding downhill. We will explore this in the later lectures.
-Why is fitting hard: Large dimensionality, multiple minima, flat regions.
+==1-d optimization==
-What is the maximum accuracy of optimization?
+-d optimization is special. While for larger-dimensional problems finding a minimum is never guaranteed, in 1-d we are guaranteed to find a minimum if we bracketed it (that is, know the value of the function in three points, and the value is higher for the two exterior points than the interior one). This can be done with the golden mean 1-d search
 *Multi-dimensional linear regression (no derivations) using numpy.linalg.lstsq http://docs.scipy.org/doc/numpy-1.10.0/reference/generated/numpy.linalg.lstsq.html
 *General considerations for nonlinear minimization (with and without derivatives; with and without second derivatives)