Hello everyone, I am evaluating two sampling procedures for the multi-armed bandit problem. I was able to plot the allocation patterns for each of them for an artificial case where I know in advance the distributions of the arms; hence, I know which arm has the best mean. The allocation pattern is the number of samples allocated to each arm as a function of time, averaged over many replications. For each allocation pattern, I obtained the corresponding probability of correct selection curve (PCS), again averaged over many replications. Studying the PCS convergence rates, method 1 has a fast rate at the beginning, but slows down afterwards. This is in comparison to method 2, which has a slower rate at the beginning, but speeds up afterwards eventually achieving a PCS = 1 much faster than method 1. I hypothesized that method 1 allocates more samples than "it should" to the top arms at the beginning, and neglects the others, leading to poor estimates of the inferior arms' parameters, and eventually forcing method 1 to go back and sample the inferior ones more. I did so of course after looking at the allocation patterns of methods 1 and 2, where it is clear that method 2 "invests" more samples in the inferior arms at the beginning, leading to dropping them a bit later, but when it does, it does not go back to them. All of this is based on observed performance, and is just a hypothesis as we do not know what is the "optimal" allocation pattern, or at which point should we stop exploring inferior arms, and exploit the good one(s). Base on this, I have two questions: - Can we create an artificial scenario where we know what the optimal allocation pattern is so we can compare to it?
- Is it possible to fit a regression model of the PCS vs. the number of samples each arm gets at each point in time? This might be way off as the independent variables here are constrained to have a constant sum at each point in time, and are highly correlated.
Please let me know if you have any thoughts, or if you like to see the results to better understand what is happening. Regards,
asked
anahana |