Sample-efficient Strategies for Learning in the Presence of Noise
Abstract.
In this paper we prove various results about PAC learning in the presence of malicious noise.
Our main interest is the sample size behavious of learning algorithms. We prove the first nontrivial
sample complexity lower bound in the model by showing that order of
\epsilon/\Delta2+d/\Delta (up to logarithmic factors) examples are necessary
for PAC learning any target class of {0,1}-valued functions of VC dimension d, where
\epsilon is the disired accuracy and \eta = \epsilon/(1+\epsilon) - \Delta the malicious noise
rate (it is well known that any nontrivial target class cannot be PAC learned with accuracy \epsilon and
malicious noise rate \eta >= \epsilon/(1+\epsilon), this irrespective to sample complexity).
We also show that this result cannot be significantly improved in general by presenting effictient
learning algorithms for the class of all subsets of d elements and the class of unions of at most d
intervals on the real line. This is especially interesting as we can also show that the popular
minimum disagreement strategy needs samples of size d\epsilon/\Delta2, hence is not
optimal with respect to sample size. We then discuss the use of randomized hypotheses. For these the
bound \epsilon/(1+\epsilon) on the noise rate is no longer true and is replaced by
2\epsilon/(1+2\epsilon). In
fact, we present a generic algorithm using ranomized hypotheses which can tolerate noise rates
slightly larger than \epsilon/(1+\epsilon) while using samples of size d/\epsilon as in the
noise-free case. Again one observes a quadratic powerlaw (in this case d\epsilon/\Delta2,
\Delta=2\epsilon/(1+2\epsilon)-\eta) as \eta goes to zero. We show upper and lower bounds of this
order.