Material Detail

The sample complexity of agnostic learning under deterministic labels

The sample complexity of agnostic learning under deterministic labels

This video was recorded at 27th Annual Conference on Learning Theory (COLT), Barcelona 2014. With the emergence of Machine Learning tools that allow handling data with a huge number of features, it becomes reasonable to assume that, over the full set of features, the true labeling is (almost) fully determined. That is, the labeling function is deterministic, but not necessarily a member of some known hypothesis class. However, agnostic learning of deterministic labels has so far received little research attention. We investigate this setting and show that it displays a behavior that is quite different from that of the fundamental results of the common (PAC) learning setups. First, we show that the sample complexity of learning a binary hypothesis class (with respect to deterministic labeling functions) is not fully determined by the VC-dimension of the class. For any d, we present classes of VC-dimension d that are learnable from O(d/ϵ)-many samples and classes that require samples of size Ω(d/ϵ2). Furthermore, we show that in this setup, there are classes for which any proper learner has suboptimal sample complexity. While the class can be learned with sample complexity O(d/ϵ), any proper (and therefore, any ERM) algorithm requires Ω(d/ϵ2) samples. We provide combinatorial characterizations of both phenomena, and further analyze the utility of unlabeled samples in this setting. Lastly, we discuss the error rates of nearest neighbor algorithms under deterministic labels and additional niceness-of-data assumptions.


  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material


Disciplines with similar materials as The sample complexity of agnostic learning under deterministic labels


Log in to participate in the discussions or sign up if you are not already a MERLOT member.