Title: Algebraic-based sampling via permutations
Authors: Francesca Romana Crucinio - University of Warwick (United Kingdom)
Roberto Fontana - Politecnico di Torino (Italy) [presenting]
Abstract: Consider two samples of size $n_1$ and $n_2$ coming from some non-negative discrete exponential family. There exists a uniformly most powerful unbiased procedure to test if the two samples come from the same distribution, performed conditionally on the sum of their entries $t$. The resulting sample space is the fiber of vectors of size $n_1 + n_2$ and entries adding up to $t$. Vectors can be sampled from this set by building a Markov chain Monte Carlo (MCMC) procedure which exploits Markov basis to connect all the elements of $F$. The fiber can be partitioned into orbits of permutations and the probability of sampling a vector $y$ given its orbit is uniform. As a consequence, it is possible to sample orbits which are contained in $F$ via MCMC with the appropriate Markov basis, and then $y$ via standard Monte Carlo. The two sampling procedures above result in two well-known estimators: the indicator function and the permutation cumulative distribution function. Both estimators are unbiased, but the one provided by the permutation approach has the lowest variance and the lowest mean absolute deviation. These theoretical results are verified by a simulation study, showing that the permutation approach grants convergence in the least time too.