I've got an optimization problem that I'd love a little help with. I'm trying to model a binomial process, where the free parameters are (n_i) the number of trials (unknown), a parameter that determines success probability (p_i) when combined with other data, and the number of successes (k_i). It's actually slightly more complicated, as I have repeated/grouped measures, so I have several observations per free parameter. I've formulated an objective function in R that looks roughly like this:
Where The question is, what kind of optimizer can I/should I use to solve this? I tried a trustregion derivitivefree nonlinear optimizer I found on the R optimization page (minqa), and it seems to scale on my problem well (when I was minimizing SSE instead of maximizing LL), but it doesn't deal well with parameters with radically different ranges, like I have here. I've tried rescaling, but that makes the objective function more complex, and the fact that the group parameters can be a couple of orders of magnitude different causes problems. Are there other optimizers (available through R) that I could use that wouldn't choke on that many parameters, that don't require a derivative function (or should I think about deriving that?), and that don't need the parameters to be similarly scaled? (Or, am I drastically misunderstanding something?) Thanks! asked 10 Jun '11, 16:29 Harlan Paul Rubin ♦♦ 
Separate from the question of concavity, there's apparently an R package neldermead that implements the NelderMead simplex algorithm (which has nothing to do with linear programming and Dantzig's simplex algorithm, by the way). NelderMead is a hillclimbing technique that requires no derivatives. I don't know if your scaling would cause it problems, but likely not. Hillclimbing methods generally don't win foot races, but I think they're fairly robust. Again, just to be explicit, if you don't have concavity, there's no guarantee how good the final answer is. answered 10 Jun '11, 19:21 Paul Rubin ♦♦ 
The terminology confuses me a bit  I assume that what you're calling "parameters" are actually the variables in the optimization. Have you checked whether the LL function is actually concave in the "parameters" (so that locally optimal implies globally optimal)? If not, you'll need a global solver (unless you're content with a local optimum). answered 10 Jun '11, 17:03 Paul Rubin ♦♦ Yeah, I come out of a ML background, and tend to be loose with terminology. No, I haven't checked for concavity. Not totally sure how to do that. Can you suggest a reference with a walkthrough of that process?
(10 Jun '11, 18:02)
Harlan
1
The bit about rounding the expression for size creates a discontinuity that rules out concavity (and also probably creates headaches for any derivativebased algorithm). If you drop the rounding bit for the moment, and let "size" be noninteger, then you can try to compute the Hessian (second derivative matrix) of the log likelihood function. I think this would require using gamma functions in lieu of factorials. You need the Hessian matrix to be positive semidefinite for all values of the "parameters" within the feasible region. Good luck with the calculus.
(10 Jun '11, 19:15)
Paul Rubin ♦♦
