It seems that one thing that could be done to minimize a convex function of one variable would be to start at a point x where the gradient is negative and then check point x'. If the gradient has reversed direction, then we choose the midpoint between x,x' and continue. However, I've never seen such a method used anywhere. Does it not perform well in practice compared to ordinary gradient descent? asked 26 Oct '11, 15:41 Opt 
Bisection search is certainly taught as a ”standard” method of line search (along with golden section search and some others that I'm forgetting at the moment). I don't know what's popular with vendors of NLP solvers. I used to favor golden section when I wrote my own code because (a) derivative computations are sometimes expensive and (b) not every convex function is differentiable; but that's just a personal preference. answered 26 Oct '11, 18:12 Paul Rubin ♦♦ Thanks for the answer. Are there multivariate generalization of the bisection and/or golden section methods?
(26 Oct '11, 18:28)
Opt
Not that I know of. The closest I can come are some "hillclimbing" methods, one of which is due to Nelder and Mead and (rather confusingly) called the "simplex" algorithm. (Search using the authors' names to avoid many, many irrelevant results.)
(26 Oct '11, 18:43)
Paul Rubin ♦♦
Perhaps it's a matter of naming conventions, but some references like Bazaraa et al.'s Nonlinear Programming denote the derivative counterpart of binary search method as the Bisection search method.
(27 Oct '11, 11:55)
Ehsan ♦

Bisection search usually applies to the problem of finding a root. Start with a point with positive value and a point with negative value. Take the midpoint, determine its value, and replace the endpoint with the same sign with the midpoint. If the function is continuous, you will eventually converge on a root. One could use bisection search with the derivative of the function of interest, which is essentially what Opt proposes, but I think other methods are generally considered superior. (I'd have to go back to an NLP text to recover the convergence rate proofs, but they are in any standard text.) The usual line searches for differentiable functions are based on Newton's method (which has quadratic local convergence), but modified to guarantee global convergence. Paul's suggestion is relevant for continuous functions that may not be differentiable. IIRC, Golden section search minimizes the number of function evaluations compared to other interval search techniques like midpoint. (I haven't taught NLP in a little while, so take this all with a grain of salt.) answered 26 Oct '11, 22:06 Matthew Salt... ♦ Once you've restricted the search to a finite interval, bisection reduces the interval 50% per iteration compared to roughly 40% with golden section.
(26 Oct '11, 22:23)
Paul Rubin ♦♦
So both converge linearly. But golden section reuses some function evaluations that can't be reused in bisection, so the overall number of function evaluations is smaller. That matters if function evaluations are costly. (I guess I really should look this up...)
(26 Oct '11, 22:32)
Matthew Salt... ♦
Required number of observation for binary search and golden section methods are given on page 355 of Bazaraa et al.'s Nonlinear Programming  Theory and Algorithms, 3rd edition.
(27 Oct '11, 11:57)
Ehsan ♦
