Material Detail

Potential and limitations of minimally supervised botstrapping

This video was recorded at Solomon seminar. The detection of relation instances is a central functionality for the extraction of structured information from unstructured textual data and for gradually turning texts into semi-structured information. With respect to the acquisition of the classifiers or detection grammars, the existing approaches fall in three large categories: * detection by classifiers/grammars acquired through intellectual human labor * detection by classifiers/grammars acquired through supervised learning * detection by classifiers/grammars acquired through unsupervised or minimally supervised learning. In the talk we will provide examples for the classes of approaches and summarize their respective advantages and disad¬vantages. We will argue that different relation detection tasks require different methods or even different combinations of methods. One empirically promising and theoretically attractive line of research is the learning of extraction rules from seeds. Several minimally supervised approaches have been investigated that accomplished rather decent results with a minimum of effort. The learning algorithms are not domain dependent. The seed-based bootstrapping approaches are theoretically pleasing because the learned patterns and rules are modular and transparent. They can be reused in new applications and they can be a valuable resource for (computational) linguistic investigation. We will explain several bootstrapping methods, most of them starting with patterns as seeds and some with event seeds. We will also describe our own approach of bootstrapping (Xu et al. 2007) a radical extension of Xu et al. (2006). In this approach, learning starts from a small set of n-ary relation instances as "seeds" in order to auto-ma¬ti¬cally learn pattern rules from parsed data, which then can extract new instances of the n-ary relation and its projections. After a fruitful period of skillful trial and error, there seems to be the right time now for a more systematic investigation of the alternative approaches to relation detection. In addition to tables of recall and precision values for competing methods, we urgently need explanations, i.e. causal theories explaining the virtues and shortcomings of alternative techniques with respect to properties of domains and text data. We describe one theory of this kind based on experimental evidence and explanatory insight. The advocated scientific methodology will enable optimal choices for specific tasks, effectively reduce the number of promising combinations of methods for future investigation, and guide the search for completely new approaches.

Keywords:: videolectures, ocwc, oec

Disciplines:

Science and Technology / Computer Science / Programming & Programming Languages

Go to Material

Bookmark / Add to Course ePortfolio

Create a Learning Exercise

Add Accessibility Information

Rate

Add a Comment

Quality

User Rating
Comments
Learning Exercises
Bookmark Collections
Course ePortfolios
Accessibility Info

Report Broken Link
Report as Inappropriate

More about this material

Material Type:: Presentation
Date Added to MERLOT:: February 10, 2015
Date Modified in MERLOT:: February 10, 2015
Author:: Hans Uszkoreit, German Research Center for Artificial Intelligence (DFKI)
Submitter:: The Open Education Consortium
Primary Audience:: College General Ed, College Lower Division, College Upper Division
Technical Format:: Video

Mobile Compatibility:: Not specified at this time
Language:: English
Cost Involved:: No
Source Code Available:: No
Creative Commons:: This work is licensed under a Attribution-NonCommercial-NoDerivs 3.0 United States