Material Detail

Combining Information Retrieval and Information Extraction for Medical Intelligence

Combining Information Retrieval and Information Extraction for Medical Intelligence

This video was recorded at NATO Advanced Study Institute on Mining Massive Data Sets for Security. Global epidemic and medical surveillance is an essential function of Public Health agencies, whose primary aim is to protect the public from major health threats. To perform this function effectively one requires timely and accurate medical information from a wide range of sources. In this work we present a system designed to monitor the disease epidemics by analyzing textual reports, mostly in the form of news, available on the Web. The system rests on two major components—MedISys, based on Information Retrieval (IR) technology, and PULS, an Information Extraction (IE) system. The Medical Information System, MedISys, is an automatic tool that gathers reports concerning Public Health from thousands of Internet sources world-wide in 32 languages, classifies them according to hundreds of categories, detects trends across categories and languages, and notifies users.MedISys compiles quantitative summaries of latest reports on a variety of diseases, bioterrorism, toxins, bacteria, hemorrhagic fevers, viruses, medicines, water contaminations, animal diseases, Public Health organisations, etc.3 The system categorises all documents according to about 200 classes of health threats, using pre-defined weighted boolean queries, or alerts. It uses statistical procedures to detect a sudden increase in the volume of articles in any of the classes. MedISys is part of the EuropeMediaMonitor (EMM) product family [2], developed at the EC's Joint Research Centre (JRC), which also includes NewsBrief,4 a live news aggregation system, and NewsExplorer,5 a news summary and analysis system [1]. MedISys has already proved to be a useful and an effective tool, which attracts thousands of users daily. IE technology is a natural direction for further enhancing the functionality that MedISys offers. One reason for this is that IE is able to deliver information about specific incidents of the diseases, whereas IR returns entire matched documents (with an indication which alerts fired). Another reason is that IE could boost precision, since keyword-based queries may trigger on documents which are off-topic but happen to mention the alerts in unrelated contexts, while pattern matching in IE assures that the keywords appear in relevant contexts only.


  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material


Log in to participate in the discussions or sign up if you are not already a MERLOT member.