COMPSTAT 2018: Start Registration
View Submission - COMPSTAT2018
Title: A note on the computational complexity of the sample variance over an interval-valued dataset Authors:  Ondrej Sokol - University of Economics, Prague (Czech Republic) [presenting]
Miroslav Rada - University of Economics, Prague (Czech Republic)
Abstract: Assume an interval-valued dataset. The exact values of data are not known; however, the intervals which these values belong to are observable. This is a common situation when we work with censored, categorized or rounded data. A similar problem also occurs when we work with measurement error or predicted values. In these situations, the computation of the identification region of even basic statistics can be computationally expensive. We focus on the problem of computing the maximal sample variance over interval data, which can be expressed maximization of a convex function over a convex set. The problem is known to be NP-hard. However, various polynomial-time algorithms were constructed for special cases. One of the most interesting is an algorithm usable in most of common cases. It works in polynomial time in $k$, where $k$ is the maximal number of \textit{narrowed} intervals intersecting at one point; narrowed means that the intervals are shrinked proportionally to the size $n$ of the dataset, namely to $1/n$ of their original width. Considering randomly generated datasets, our experiments allowed for conjecturing that $k$ is at most of logarithmic size for a reasonable choice of the data-generating process. This implies that the algorithm works in polynomial-time in average. We discuss the computational complexity from the view of randomly generated data and its relation to other problems in statistics such as so called \emph{birthday problem}.