Answers to: Convergence Rateshttp://www.or-exchange.com/questions/2712/convergence-rates<p>For convex functions, the Steepest Descent method converges linearly, and the Newton's method converges quadratically. For example, |x(k+1)-x<em>| < L |x(k)-x</em>|^2 for Newton's method implies quadratic convergence.
What does it mean when it is claimed that the number of iterations of a certain algorithm is of the order O(1/e) or O(1/e^2) where e is the desired absolute accuracy? Is this statement equivalent to saying something about the convergence rate?</p>enTue, 26 Apr 2011 02:47:15 -0400Answer by Brian Borchershttp://www.or-exchange.com/questions/2712/convergence-rates/2714<p>No, it's really adressing a slightly different question.<br>
</p>
<p>Asymptotic linear or quadratic convergence is a property of the algorithm only in the limit as k goes to infinity- it doesn't tell you anything about the number of iterations needed to reach a particular accuracy level. In particular, it takes a while to get to a point where the quadratic convergence behavior is apparent. If you only want an answer good to 2 digits, then you might not ever see that asymptotic behavior.<br>
</p>
<p>In practice, we typically will run a quadratically or superlinearly convergent algorithm to the limits of our floating point precision. The reason for this is that it we get huge improvements in accuracy in the last few iterations (for a quadratically convergent algorithm like Newton's method, the number of correct digits doubles in each iteration once you've hit the quadratic convergence), so that there's no point in stopping early- if 10 iterations gets you 4 digits accuracy and 12 iterations gets you 16 digits, then why stop early?<br>
</p>
<p>In comparison, the first order methods that have attracted so much interest lately are no better than linearly convergent and in practice are stopped as soon as a sufficiently accurate answer is obtained, perhaps even before the asymptotic behavior of the algorithm has become apparent.<br>
</p>
<p>In this situation, you're most interested in how many iterations it will take to get to that desired accuracy and you don't really care about the asymptotic convergence behavior (remember, it's linear anyway for these first order methods.) If you can get an epsilon approximate solution in (O(1/epsilon)) iterations using an accelerated scheme instead of (O(1/epsilon^2)) using (projected) subgradient descent, then you're obviously much better off. <br>
</p>
<p>A particular issue with the Nesterov 05 scheme for smoothing nonsmooth convex problems is that you have to pick a smoothing parameter so as to control the Lipschitz constant of gradient of the smoothed problem. Nesterov's scheme take (O(sqrt{L/epsilon})) iterations, and you can setup the smoothing so that (L=O(1/sqrt{epsilon})), so you end up with an (O(1/epsilon)) iteration algorithm for finding an ( epsilon ) approximate solution. Thus the algorithm depends in a fundamental way on the desired accuracy of the solution. <br>
</p>Brian BorchersTue, 26 Apr 2011 02:47:15 -0400http://www.or-exchange.com/questions/2712/convergence-rates/2714