Answers to: Convergence Rateshttp://www.or-exchange.com/questions/2712/convergence-rates<p>For convex functions, the Steepest Descent method converges linearly, and the Newton's method converges quadratically. For example, |x(k+1)-x<em>| < L |x(k)-x</em>|^2 for Newton's method implies quadratic convergence.
What does it mean when it is claimed that the number of iterations of a certain algorithm is of the order O(1/e) or O(1/e^2) where e is the desired absolute accuracy? Is this statement equivalent to saying something about the convergence rate?</p>enTue, 26 Apr 2011 14:18:24 -0400Comment by sks on Brian Borchers's answerhttp://www.or-exchange.com/questions/2712/convergence-rates#2729<p>Thanks a lot!</p>sksTue, 26 Apr 2011 14:18:24 -0400http://www.or-exchange.com/questions/2712/convergence-rates#2729Comment by Michael Trick on Brian Borchers's answerhttp://www.or-exchange.com/questions/2712/convergence-rates#2722<p>I'll check it out when I get a chance: otherwise I mathjaxed Brian's comments, so it looks better now.</p>Michael TrickTue, 26 Apr 2011 08:38:34 -0400http://www.or-exchange.com/questions/2712/convergence-rates#2722Comment by Bo Jensen on Brian Borchers's answerhttp://www.or-exchange.com/questions/2712/convergence-rates#2719<p>It would be nice if we got mathjax preview, I included a link in another thread hinting a possible solution.</p>Bo JensenTue, 26 Apr 2011 05:53:50 -0400http://www.or-exchange.com/questions/2712/convergence-rates#2719Comment by Buxley on Brian Borchers's answerhttp://www.or-exchange.com/questions/2712/convergence-rates#2718<p><a href="/users/282/brian-borchers/"><a href="/users/282/brian-borchers/">@Brian</a></a> You need to use double dollar marks before and after the equations instead of only one in the notation (not sure why)</p>BuxleyTue, 26 Apr 2011 05:50:11 -0400http://www.or-exchange.com/questions/2712/convergence-rates#2718Comment by Brian Borchers on Brian Borchers's answerhttp://www.or-exchange.com/questions/2712/convergence-rates#2716<p>Let me also mention that by "epsilon approximate solution", we mean that the function value at $x^{k}$ is within $epsilon$ of the optimal function value, not that $x^{k}$ is within $epsilon$ of an optimal solution $x^{*}$.</p>Brian BorchersTue, 26 Apr 2011 03:07:48 -0400http://www.or-exchange.com/questions/2712/convergence-rates#2716Comment by Brian Borchers on Brian Borchers's answerhttp://www.or-exchange.com/questions/2712/convergence-rates#2715<p>I'm not sure why MathJax didn't work correctly in the above answer...</p>Brian BorchersTue, 26 Apr 2011 02:52:16 -0400http://www.or-exchange.com/questions/2712/convergence-rates#2715Answer by Brian Borchershttp://www.or-exchange.com/questions/2712/convergence-rates/2714<p>No, it's really adressing a slightly different question.<br>
</p>
<p>Asymptotic linear or quadratic convergence is a property of the algorithm only in the limit as k goes to infinity- it doesn't tell you anything about the number of iterations needed to reach a particular accuracy level. In particular, it takes a while to get to a point where the quadratic convergence behavior is apparent. If you only want an answer good to 2 digits, then you might not ever see that asymptotic behavior.<br>
</p>
<p>In practice, we typically will run a quadratically or superlinearly convergent algorithm to the limits of our floating point precision. The reason for this is that it we get huge improvements in accuracy in the last few iterations (for a quadratically convergent algorithm like Newton's method, the number of correct digits doubles in each iteration once you've hit the quadratic convergence), so that there's no point in stopping early- if 10 iterations gets you 4 digits accuracy and 12 iterations gets you 16 digits, then why stop early?<br>
</p>
<p>In comparison, the first order methods that have attracted so much interest lately are no better than linearly convergent and in practice are stopped as soon as a sufficiently accurate answer is obtained, perhaps even before the asymptotic behavior of the algorithm has become apparent.<br>
</p>
<p>In this situation, you're most interested in how many iterations it will take to get to that desired accuracy and you don't really care about the asymptotic convergence behavior (remember, it's linear anyway for these first order methods.) If you can get an epsilon approximate solution in (O(1/epsilon)) iterations using an accelerated scheme instead of (O(1/epsilon^2)) using (projected) subgradient descent, then you're obviously much better off. <br>
</p>
<p>A particular issue with the Nesterov 05 scheme for smoothing nonsmooth convex problems is that you have to pick a smoothing parameter so as to control the Lipschitz constant of gradient of the smoothed problem. Nesterov's scheme take (O(sqrt{L/epsilon})) iterations, and you can setup the smoothing so that (L=O(1/sqrt{epsilon})), so you end up with an (O(1/epsilon)) iteration algorithm for finding an ( epsilon ) approximate solution. Thus the algorithm depends in a fundamental way on the desired accuracy of the solution. <br>
</p>Brian BorchersTue, 26 Apr 2011 02:47:15 -0400http://www.or-exchange.com/questions/2712/convergence-rates/2714