Material Detail

Online MKL for Structured Prediction

This video was recorded at NIPS Workshops, Whistler 2010. Structured prediction (SP) problems are characterized by strong interdependence among the output variables, usually with sequential, graphical, or combinatorial structure [17, 7]. Obtaining good predictors often requires a large effort in feature/kernel design and tuning (usually done via crossvalidation). Because discriminative training of structured predictors can be quite slow, specially in large scale settings, it is appealing to learn the kernel function simultaneously. In multiple kernel learning (MKL, [8]), the kernel is learned as a linear combination of prespecified base kernels. This framework has been made scalable with the advent of wrapper-based methods, in which a standard learning problem (e.g., an SVM) is repeatedly solved in an inner loop up to a prescribed accuracy [14, 11, 6]. Unfortunately, extending such methods to large-scale SP raises practical hurdles: since the output space is large, so are the kernel matrices, and the number of support vectors; moreover, it becomes prohibitive to tackle the inner problem in its batch form, and one usually resorts to online algorithms [12, 3, 13]. The latter are fast learners but slow optimizers [2]; using them in the inner loop with early stopping may misguide the overall MKL optimization. We propose instead a stand-alone online MKL algorithm which exploits the large-scale tradeoffs directly. The algorithm iterates between subgradient and proximal steps, and has important advantages: (i) it is simple, flexible, and compatible with sparse and non-sparse variants of MKL, (ii) it is adequate for SP, (iii) it offers regret, convergence, and generalization guarantees. Experimentally, the proposed algorithm yields near-optimal solutions faster than wrapper-based approaches (adapted to SP). We use the two following SP experimental testbeds: sequence labeling for handwritten text recognition, and natural language dependency parsing. The potential of our approach is shown in learning combinations of kernels from tens of thousands of training instances, with encouraging results in terms of speed, accuracy, and interpretability.

Keywords:: videolectures, ocwc, oec

Disciplines:

Science and Technology / Computer Science / Programming & Programming Languages

Go to Material

Bookmark / Add to Course ePortfolio

Create a Learning Exercise

Add Accessibility Information

Rate

Add a Comment

Quality

User Rating
Comments
Learning Exercises
Bookmark Collections
Course ePortfolios
Accessibility Info

Report Broken Link
Report as Inappropriate

More about this material

Material Type:: Presentation
Date Added to MERLOT:: February 10, 2015
Date Modified in MERLOT:: February 10, 2015
Author:: André F. T. Martins, Language Technologies Institute, Carnegie Mellon University
Submitter:: The Open Education Consortium
Primary Audience:: College General Ed, College Lower Division, College Upper Division
Technical Format:: Video

Mobile Compatibility:: Not specified at this time
Language:: English
Cost Involved:: No
Source Code Available:: No
Creative Commons:: This work is licensed under a Attribution-NonCommercial-NoDerivs 3.0 United States