For convex functions, the Steepest Descent method converges linearly, and the Newton's method converges quadratically. For example, x(k+1)x < L x(k)x^2 for Newton's method implies quadratic convergence. What does it mean when it is claimed that the number of iterations of a certain algorithm is of the order O(1/e) or O(1/e^2) where e is the desired absolute accuracy? Is this statement equivalent to saying something about the convergence rate? asked 25 Apr '11, 19:33 sks 
No, it's really adressing a slightly different question. Asymptotic linear or quadratic convergence is a property of the algorithm only in the limit as k goes to infinity it doesn't tell you anything about the number of iterations needed to reach a particular accuracy level. In particular, it takes a while to get to a point where the quadratic convergence behavior is apparent. If you only want an answer good to 2 digits, then you might not ever see that asymptotic behavior. In practice, we typically will run a quadratically or superlinearly convergent algorithm to the limits of our floating point precision. The reason for this is that it we get huge improvements in accuracy in the last few iterations (for a quadratically convergent algorithm like Newton's method, the number of correct digits doubles in each iteration once you've hit the quadratic convergence), so that there's no point in stopping early if 10 iterations gets you 4 digits accuracy and 12 iterations gets you 16 digits, then why stop early? In comparison, the first order methods that have attracted so much interest lately are no better than linearly convergent and in practice are stopped as soon as a sufficiently accurate answer is obtained, perhaps even before the asymptotic behavior of the algorithm has become apparent. In this situation, you're most interested in how many iterations it will take to get to that desired accuracy and you don't really care about the asymptotic convergence behavior (remember, it's linear anyway for these first order methods.) If you can get an epsilon approximate solution in (O(1/epsilon)) iterations using an accelerated scheme instead of (O(1/epsilon^2)) using (projected) subgradient descent, then you're obviously much better off. A particular issue with the Nesterov 05 scheme for smoothing nonsmooth convex problems is that you have to pick a smoothing parameter so as to control the Lipschitz constant of gradient of the smoothed problem. Nesterov's scheme take (O(sqrt{L/epsilon})) iterations, and you can setup the smoothing so that (L=O(1/sqrt{epsilon})), so you end up with an (O(1/epsilon)) iteration algorithm for finding an ( epsilon ) approximate solution. Thus the algorithm depends in a fundamental way on the desired accuracy of the solution. answered 26 Apr '11, 02:47 Brian Borchers I'm not sure why MathJax didn't work correctly in the above answer...
(26 Apr '11, 02:52)
Brian Borchers
Let me also mention that by "epsilon approximate solution", we mean that the function value at $x^{k}$ is within $epsilon$ of the optimal function value, not that $x^{k}$ is within $epsilon$ of an optimal solution $x^{*}$.
(26 Apr '11, 03:07)
Brian Borchers
@Brian You need to use double dollar marks before and after the equations instead of only one in the notation (not sure why)
(26 Apr '11, 05:50)
Buxley
It would be nice if we got mathjax preview, I included a link in another thread hinting a possible solution.
(26 Apr '11, 05:53)
Bo Jensen ♦
I'll check it out when I get a chance: otherwise I mathjaxed Brian's comments, so it looks better now.
(26 Apr '11, 08:38)
Michael Trick ♦♦
Thanks a lot!
(26 Apr '11, 14:18)
sks
showing 5 of 6
show 1 more comments
