# Trouble in understanding a simple example of No Free Lunch theorem

 3 2 I have trouble in understanding a simple example following No Free Lunch theorem in James Spall's Introduction to stochastic search and optimization: My understanding is that a cost function is a mapping from a finite set ({ heta_i}) to a finite set ({L_j}). A search algorithm, when applied to a cost function, is to search for a ( heta_i) such that the value of the cost function at it is the least. If there are (N_ heta) possible values in ({ heta_i}) and (N_L) possible values in ({L_j}), then by direct enumeration there are ((N_{L})^{N_ heta}) possible mappings from ({ heta_i}) to ({L_j}). The NFL theorems indicate that: When averaging over all ((N_{L})^{N_ heta}) possible mappings from from ({ heta_i}) to ({L_j}), all algorithms work the same (i.e., none can work better than a blind random search). Example 1.7 — NFL implications in a small problem. Suppose that (Theta = { heta_i, heta_2, heta_3}) and that there are two possible outcomes for the noise-free loss measurements, ({L_1, L_2}). Hence, ((N_L)^{N_ heta} = 2^3 = 8). Table 1.1 summarizes the eight possible mappings. Note that all rows in Table 1.1 have the same number of (L_1) and (L_2) values. Hence, the mean loss value across all eight possible problems is the same regardless of which ( heta_i) is chosen. Because the algorithm chooses the ( heta_i), all algorithms produce the same mean loss value when averaging across all possible problems. As a specific instance of the implication of the NFL theorems, suppose that (L_1 < L_2). Then, an algorithm that puts priority on picking ( heta) will work well on problems 1, 2, 3, and 7, but poorly on problems 4, 5, 6, and 8. The average performance is the same as an algorithm that puts priority on picking ( heta_2) or ( heta_3). My questions are: Does the example assume that an algorithm will always output the same element in (Theta), regardless of which of the eight loss function it is applied to? (This is the only way that I can make sense out of the example. Otherwise, how shall I understand it?) If yes, isn't it contrary to what an algorithm is in reality? For example, for minimizing different cost functions, the gradient descent algorithm outputs different solutions instead of a fixed solution. If no, I can construct an algorithm that outputs ( heta_1) for cost functions 1, 2, 3, 7 and 8, ( heta_2) for cost functions 4 and 5, and ( heta_3) for cost function 6. This algorithm outputs the best solution for each cost function, so when considering its average performance over all cost functions, it is obviously better than many other algorithms. This violates NFL theorem. What is going wrong? What exactly is a search algorithm? Is it a mapping from some set to another set? Thanks and regards! asked 11 Feb '12, 17:44 Timo 59●2●5 accept rate: 0% fbahr ♦ 4.6k●7●16 Is there some LaTex support on this site? (11 Feb '12, 17:46) Timo 2 Yes, but (a) you need to use ( and ) rather than \$ to terminate inline math formulas and (b) you need to double all backslashes to escape them (including the ones I just wrote as singles). (11 Feb '12, 20:57) Paul Rubin ♦♦ @Paul: Thanks! I don't quite understand. Can someone do the edit on my post, so that I can see how? (11 Feb '12, 21:18) Timo Looks like Florian did it (danke sehr). (12 Feb '12, 10:31) Paul Rubin ♦♦

 5 I believe that this book page wants to teach a lesson you can fully appreciate only with a little more experience (that you soon will have!). A problem can be thought of as a collection of instances, these are represented by the above mappings. The instances are concrete instantiations of your problem, like a particular graph with particular edge weights is one particular instance of the traveling salesperson problem. The NFL theorems now say that no algorithm can have a competitive advantage on all instances (somehow assuming that the (search) problem is hard to solve). This may seem to be counterintuitive and not in line with our experience when solving problems. However, remember, that in practice we do not see all instances, but only practical instances and some algorithms perform remarkably well on them, but there is no guarantee that they do. After all, also branch-and-bound is not guaranteed to be principally better/faster than complexte enumeration on the search problem of integer programming. This is one implication of the problem being NP hard. A lesson to learn from this (and this is also indicated on the book page you refer to) is that in practice you better design your algorithm to concentrate on particular instances, to exploit problem stucture you learned from intensively studying the problem. This way, your algorithm may be very poor on average instances but you don't care about average instances, you care about those you encounter in practice. edit: answer to question 1 is "no". The algorithmic construction you propose is not allowed here. You propose: run "the best" algorithm depending on the problem instance but for this to work you need to know the instance and have a catalog of which is the "best" algorithm to apply. even though it is debatable whether this is an algorithm (it is), it is not an algorithm the author has in mind in this book, I guess. answered 12 Feb '12, 06:25 Marco Luebbecke ♦ 3.4k●1●6●15 accept rate: 16% Thanks, Marco! I haven't understood and accepted the example of the book yet. My questions are the last part of my post, numbered in 1 and 2. I appreciate it if you could take a look at them. (12 Feb '12, 08:12) Timo Thanks for the edit! Why are "algorithms" in the context of NFL theorem (not just this book and this author) so restricted? I guess it tries to model "something" in reality (what is that thing, I don't know)? But, for example, in that standard, a gradient descent algorithm is not an "algorithm", but maybe is it a collection of many "algorithms"? (12 Feb '12, 10:05) Timo A deterministic algorithm always gives the same output when provided with the same input, in a finite number of steps. Your "gradient descent algorithm" is an algorithm because it always computes what you want it to compute: a solution of a given quality for a given function. it does not matter whether there are many such solutions possible or not; you wanted your algorithm to compute just one. if you wanted an algorithm that computes all such solutions, it would be a different algorithm (but still an algorithm, BTW). (12 Feb '12, 10:22) Marco Luebbecke ♦ Thanks! (1) As in the questions in my original post, it seems to me that NFL assumes an algorithm will always output the same solutions for all cost functions, does it? But gradient descent algorithm will output differently for different cost functions. That is where my confusion comes from. (12 Feb '12, 10:45) Timo (2) "A deterministic algorithm always gives the same output when provided with the same input", what is the input of an algorithm? I thought that an input to an algorithm is a cost function. But if an algorithm will output the same for all cost functions, we can drop cost function off from its input, can we? Also, besides a cost function, what else are inputs to an algorithm? (12 Feb '12, 11:01) Timo Think of a GPS in a car, this is the algorithm. When you enter the input "from your home address to your work address" you will get the same route suggestion (the output) every time you try it. But this does not imply that I get your route when I enter my addresses (a different input). Of course, for my addresses I will always get the same route (my output for my input). And if you enter my addresses you will get my route, not yours. You get it? (12 Feb '12, 11:41) Marco Luebbecke ♦ showing 5 of 6 show 1 more comments
 1 I'm not overly familiar with the No Free Lunch theorem, but from reading on Wikipedia it seems ridiculous to me. Empirical evidence would seem to suggest that some algorithms are simply better than others - perhaps not on every problem instance, but definitely on most. This theorem seems to me like the Efficient Market Hypothesis from economics. Theorists come up with something that sounds good on paper, but is clearly at odds with what happens in the real world. answered 11 Feb '12, 19:03 DC Woods ♦ 4.1k●2●25●46 accept rate: 5% This theorem is not ridiculous. It is just that many of us haven't understood it correctly. I don't rule out the possiblity the example in the book I mentioned may be wrong. (11 Feb '12, 19:49) Timo Oh, it's very likely that I haven't understood it correctly. At least I hope so. (11 Feb '12, 21:34) DC Woods ♦ @DC actually, I believe that the problem instances we see in practice is only a tiny subset of all (conceivable, no matter how ridiculous) instances. I always think of "practice is much nicer to us than theory predicts", and my personal explanation for that is: practice is made by humans, and theory is much too complicated for humans, so there is no practical instance that matches theory's complexity ;-) (12 Feb '12, 06:40) Marco Luebbecke ♦
 toggle preview community wiki

By Email:

Markdown Basics

• *italic* or _italic_
• **bold** or __bold__
• image?![alt text](/path/img.jpg "Title")
• numbered list: 1. Foo 2. Bar
• to add a line break simply add two spaces to where you would like the new line to be.
• basic HTML tags are also supported

Tags:

×190
×29
×5