Some Optimization book says that the steepest descent algorithm has linear rate of convergence, and writes (x_{n+1}L = O(x_nL)). I was wondering if the usage of big O notation here expands the meaning of linear rate of convergence? From the perspective of real analysis, we have from Wikipedia about the big O notation
Also from Wikipedia about rates of convergence of a real sequence
My understanding is that if ({x_n}) either linearly, superlinearly or sublinearly converges to (L), then (x_{n+1}L = O(x_nL)). This is based on their definitions and viewing ({ x_{n+1}L }) and ({ x_nL }) as functions of (n). Note that the part after "then" may not imply the part after "if", since (μ) may lie outside of ([0,1]) and ({x_n}) may not converge. For more detail, please see this thread. So I suspect the big O notation used in the book mentioned at the beginning does not mean the same as linear rate of convergence, but expands the latter. Or maybe the optimization community has different meanings for the same notations in this matter but I am not aware of it? Thanks and regards! asked 16 Feb '12, 20:36 Timo
showing 5 of 7
show 2 more comments

Having seen @Brian's comment I'd better edit my answer.
Thanks, Marco! Do you mean that the big O notation has different meaning in optimization community from pure mathematics? In real analysis, the big O notation does not imply convergence, and even when the sequence converges, the big O notation does not assume the rate μ to be in (0, 1), but allow it to be any positive number less than infinity. For how people doing pure mathematics use the big O notation, please see http://math.stackexchange.com/questions/109686/rateofconvergenceofasequenceinmathbbrandbigonotation. Thanks!
To your edit: Regarding "linear convergence implies sublinear convergence", according to Wikipedia (the third link and quote), linear convergence requires μ to be in (0, 1), and sublinear convergence requires μ to be in 1, so I think there is no overlapping between the two cases, and neither implies the other.
Many authors use the convention that if (mu < 1), the sequence converges linearly, and if (mu=0) the convergence is superlinear. With that definition, any sequence that convergences superlinearly also converges linearly.
It is certainly true that if the sequence converges linearly, then
[ x_{n+1}  L = O(x_{n}  L) ]
However, the converse is not true.
The particular quote you've linked to isn't an incorrect statement about this, since the author isn't making the converse statement.
@Brian: Thanks! (1) Regarding the quote from the book, I think it is imprecise in the sense that the big O notation can mean linear, superlinear and sublinear convergence as defined in Wiki. (2) As you mentioned that "many authors use the convention that if (mu < 1), the sequence converges linearly, and if (mu=0) the convergence is superlinear", I wonder if they are from some references? In Wiki, it cites numerical computation books as its references.
With respect to (1), you're missing my point. The author never made the converse that if (x_{n+1}  L=O(x_n  L)) then the convergence is linear. Rather, the author said that if the sequence is linearly convergent, then (x_n+1  L = O(x_n  L)). This is a completely correct statement.
You seem to be having trouble recognizing that "If A then B" is a different statement than "If B then A" or perhaps you're misreading "If A then B" in the book as "If B then A." Is English your native language?
@Brian: No. I believe I didn't miss your point. I knew clearly what you said.