Consider a stochastic gradient iteration: \(\theta_$k+1$ = \theta_$k$ - \gamma_k F(\theta_k))\ where $F$ is a noisy estimate of the gradient $\nabla f$ Now, a book says that it converges in the following sense : $f(\theta_k)$ converges and $\nabla f(\theta_k)$ converges to zero and then it says that it is the strongest possible result for gradient related stochastic approximation. What is the meaning of it ? Why does not it shows the convergence of the iterates ?
asked
sosha |