|
Abstract.
The notion of embedding a class of dichotomies in a class of linear
half spaces is central to the support vector machines paradigm.
We examine the question of determining the minimal Euclidean dimension
and the maximal margin that can be obtained when the embedded class
has a finite VC dimension.
We show that an overwhelming majority of the family of finite concept
classes of any constant VC dimension cannot be embedded in
low-dimensional half spaces. (In fact, we show that the Euclidean
dimension must be almost as high as the size of the instance space.)
We strengthen this result even further by showing that an overwhelming
majority of the family of finite concept classes of any constant VC
dimension cannot be embedded in half spaces (of arbitrarily high
Euclidean dimension) with a large margin. (In fact, the margin cannot
be substantially larger than the margin achieved by the trivial
embedding.) Furthermore, these bounds are robust in the sense that
allowing each image half space to err on
a small fraction of the instances does not imply a significant
weakening of these dimension and margin bounds.
Our results indicate that any universal learning machine, which
transforms data into the Euclidean space and then applies linear (or
large margin) classification, cannot enjoy any meaningful
generalization guarantees that are based on either VC dimension or
margins considerations. This failure of generalization bounds applies
even to classes for which "straight forward" empirical risk minimization
does enjoy meaningful generalization guarantees.
|