Lehrstuhl Mathematik & Informatik

PAC-Learning in the Presence of One-sided Classification Noise

Abstract. We derive an upper and a lower bound on the sample size needed for PAC-learning a concept class in the presence of one-sided classification noise. The upper bound is achieved by the strategy ``Minimum One-sided Disagreement''. It matches the lower bound (which holds for any learning strategy) up to a logarithmic factor. Although ``Minimum One-sided Disagreement'' often leads to NP-hard combinatorial problems, we show that it can be implemented quite efficiently for some simple concept classes like, for example, unions of intervals, axis-parallel rectangles, and TREE(2,n,2,k) which is a broad subclass of 2-level decision trees. For the first class, there is an easy algorithm with time bound O(m log m). For the second-one (resp.~the third-one), we design an algorithm that applies the well-known UNION-FIND data structure and has an almost quadratic time bound (resp. time bound O(n^2 m \log m)).

Fakultät für Mathematik

PAC-Learning in the Presence of One-sided Classification Noise