Material Detail

Open Source Intelligence

Open Source Intelligence

This video was recorded at NATO Advanced Study Institute on Mining Massive Data Sets for Security. Open Source Intelligence can be defined as the retrieval, extraction and analysis of information from publicly available sources. Each of these three processes is the subject of ongoing research resulting in specialised techniques. Today the largest source of open source information is the Internet. Most newspapers and news agencies have web sites with live updates on unfolding events, opinions and perspectives on world events are published. Most governments monitor news reports to feel the pulse of public opinion, and for early warning and current awareness of emerging crises. The phenomenal growth in knowledge, data and opinions published on the Internet requires advanced software tools which allow analysts to cope with the overflow of information. Malicious use of the Internet has also grown rapidly particularly on-line fraud, illegal content, virtual stalking, and various scams. These are all creating major challenges to security and law enforcement agencies. The alarming increase in the use of the Internet by extremist and Terrorist groups has emerged. The number of terrorist linked websites has grown from about 15 in 1998 to some 4500 today. These sites use slick multimedia to distil propaganda whose main purpose is to 1) enthuse and stir up rebellion in embedded communities 2) instill fear in the "enemy" and fight psychological warfare. Anonymous communication between terrorist cells via bulletin boards, chat rooms and email is also prevalent. The Joint Research Centre has developed significant experience in Internet content monitoring through its work on media monitoring (EMM) for the European Commission. EMM forms the core of the Commissions daily press monitoring service, and has also been adopted by the European Council Situation Centre for their ODIN system. A new research topic at the JRC is Web mining and open source intelligence. This applies EMM technology to the wider Internet and not just to news sites. This applies advanced multi-lingual search techniques to identify potential web resources and the extraction and download of all the textual content. This is then followed by automatic change detection, the recognition of places, names and relationships, and further analysis of the resultant large bodies of text. These tools help analysts to process large amounts of documents and derive structured data easier to analyse. This talk will review 4 main topics: • Internet trends and the rapid rise of Web 2.0 user generated content • Information retrieval: Live content monitoring of multilingual news reports. Web scraping & RSS feed generation, Web Mining and content monitoring • Information Extraction: Topic filtering, Topic Clustering, multilingual named entity extraction, geocoding and geolocating text, event extraction, opinion mining. • Information Analysis: Social Network derivation, geospatial indexing and analysis, incident tracking databases, statistical trend analysis, threat monitoring and assessment.


  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material


Log in to participate in the discussions or sign up if you are not already a MERLOT member.