The Computational Complexity of Densest Region Detection.
Abstract.
We investigate the computational complexity of the task of detecting
dense regions of an unknown distribution from un-labeled samples of
this distribution. We introduce a formal learning model for this task
that uses a hypothesis class as its 'anti overfitting' mechanism.
The learning task in our model can be reduced to a combinatorial
optimization problem. We can show that for some constants, depending
on the hypothesis class, these problems are NP hard to approximate to
within these constant factors.
We go on and introduce a new criterion for the success of approximate
optimization geometric problems. The new criterion requires that the
algorithm competes with hypotheses only on the points that are
separated by some margin /mu from their boundaries.
Quite surprisingly, we discover that for each of the two hypothesis
classes that we investigate, there is a 'critical value' of the
margin parameter /mu. For any value below the critical value the
problems are NP hard to approximate, while, once this value is
exceeded, the problems become poly-time solvable.