Material Detail

Searching the Web by Discovering and Clustering Related Terms

This video was recorded at Solomon seminar. The amount of information on the web is growing so fast that it is becoming more and more difficult for classical search engines to find relevant information. Indeed, due to the frenetic increase of webpages written in different languages and sometimes in mis-interpreted languages, the degree of ambiguity of the human language has been constantly evolving to levels unseen so far. However, people still query the systems with no more than 2 words on average. As a consequence, new information retrieval systems need to be proposed to decrease the level of ambiguity of the queries. Such systems usually make use of query expansion techniques to solve this problem. In this talk, I will present a system based on the automatic discovery of terms that are related to the query as a means of helping the user to search for relevant information. This technique can be classified within Interactive Query Expansion systems. However, unlike other systems, we use Web Mining Techniques to discover related terms based on different features such as association measures, document similarity, document relevance, etc. In the second part of my talk, I will present the future extensions of our retrieval systems based on the automatic discovery of relations between related terms. So, by using agglomerative clustering techniques and an auto-fed WebWarehouse, we hope to be able to propose less ambiguous query expansion terms than in present systems where the user needs to sort out the terms he is interested in. Web spider Web Spider is a system that returns all related terms and links from a given URL and a given query. The Spider has been developped using C5.0 machine learning algorithm.

Keywords:: videolectures, ocwc, oec

Disciplines:

Science and Technology / Computer Science / Programming & Programming Languages

Go to Material

Bookmark / Add to Course ePortfolio

Create a Learning Exercise

Add Accessibility Information

Rate

Add a Comment

Quality

User Rating
Comments
Learning Exercises
Bookmark Collections
Course ePortfolios
Accessibility Info

Report Broken Link
Report as Inappropriate

More about this material

Material Type:: Presentation
Date Added to MERLOT:: February 10, 2015
Date Modified in MERLOT:: February 10, 2015
Author:: Gaël Dias, University of Beira Interior
Submitter:: The Open Education Consortium
Primary Audience:: College General Ed, College Lower Division, College Upper Division
Technical Format:: Video

Mobile Compatibility:: Not specified at this time
Language:: English
Cost Involved:: No
Source Code Available:: No
Creative Commons:: This work is licensed under a Attribution-NonCommercial-NoDerivs 3.0 United States