Answers to: Help with importance sampling over HUGE sample spacehttp://www.or-exchange.com/questions/10427/help-with-importance-sampling-over-huge-sample-space<p><del>Apologies for the awful typesetting, but MathJax doesn't appear to be working, as described on the FAQ.</del> [fixed; see comment below -- <a href="/users/25/fbahr"><a href="/users/25/fbahr">@fbahr</a></a>]</p>
<p>My underlying problem is fairly simple, but the sheer size is what is causing issues. I would like to use importance sampling, but am unsure about its implementation. </p>
<p>Problem statement:</p>
<p>We have \( \text{N} \) vectors \( \mathbf{x_i} \in \mathbb{(R^+)}^{k}\) an a \( \leq\) constraint for each dimension of \( \mathbb{(R^+)}^{k}\).</p>
<p>I would like to examine all possible on-trivial subsets \(S\) of the vectors \(\mathbf{x_i}\) (there are $2^N-1$ of these) and determine, for each vector \(\mathbf{x_i}:\;\; i \in \{1...N\}\), the fraction of all subsets that satisfy the following constraints:</p>
<p>Let \(T := \{1...N\}\), \(\Omega := 2^T\setminus \emptyset\), and \(L_i>0 \;\;i \in \{1...N\}\)</p>
<p>Let \( C_i:= S \subset \Omega \) such that:</p>
<p>\( \mathbf{x_i} \in S \)</p>
<p>\( \exists ~j \in \{1...N\}: {\sum\limits_{\mathbf{x_z} \in S} ( \mathbf{x_z} )_j > L_j} \quad \text{and}~ {\sum \limits_{\mathbf{x_z} \in S \setminus \mathbf{x_i}} (\mathbf{x_z})_{j} \leq {L_j}} \)</p>
<p>My goal is to estimate \(\frac{|C_i|}{2^N-1}\) for each \(i \in \{i...N\}\)</p>
<p>Basically, sets that meet these conditions are ones whose vector sum falls outside the "box" defined by the \(L_i\) constraints, yet when the \(i^{th}\) member is removed, it falls inside the box.</p>
<p>I figured that importance sampling will help get an estimate withing a reasonable amount of time, where I preferentially select subset sizes that are more likely to meet the above constraints.</p>
<p>I would like input on how to formulate the importance sampling distribution over the subsets. My initial thought is to fit a discrete beta distribution to the vector of "reciprocal maximal exceedance ratios", which is \( \min \{\frac{L_k}{(x_j)_k},1\} \) calculated for each vector. This will approximate the distribution of required subset sizes to exceed the box constraint.</p>enThu, 09 Jul 2020 03:50:07 -0000