Material Detail

Semantic Data Mining

Semantic Data Mining

This video was recorded at European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Athens 2011. The term semantic data mining denotes a data mining approach where domain ontologies are used as background knowledge. Such approach is motivated by large amounts of data that are increasingly becoming openly available and described using real-life ontologies represented in Semantic Web languages, arguably most extensively in the domain of biology. This recently opened up the possibility for interesting large-scale and real-world semantic applications. The availability of semantically annotated data poses requirements for new kinds of approaches for data mining that would be able to deal with the complexity, and expressivity of the semantic representation languages, leverage on availability of ontologies and explicit semantics of the described resources, and account for novel assumptions (e.g., open world) that underlie reasoning services exploiting ontologies. The tutorial addresses the above issues, focusing on the problems of how machine learning techniques can work directly on the richly structured Semantic Web data, exploit ontologies, and the Semantic Web technologies, what is the value added of machine learning methods exploiting ontologies, and what are the challenges for developers of semantic data mining methods. It also contains demonstrations of tools supporting semantic data mining. The tutorial presents the topic of semantic data mining from three complementary perspectives. Firstly, it presents a general framework for semantic data mining, following the work [NVTL09]. The first part of the tutorial also discusses a new method for semantic subgroup discovery: g-SEGS. It is accompanied with a presentation of the developed tool, a part of Orange4WS environment. The second part of tutorial covers the topic of learning from description logics (DL-learning), motivated by the fact that the standard Web ontology language, OWL, is theoretically based on description logics. This includes a demo of a tool supporting DL-learning (a plugin to the Rapid Miner system). Finally, the third part of the tutorial covers the topic of semantic meta-mining. This approach has three features that distinguish it from its predecessors. First, more than in previous work, it adopts a process-oriented approach where meta-learning is applied to support design choices at different stages of the complete data mining process or workflow. Second, it complements dataset descriptions with an in-depth analysis and characterization of algorithms—their underlying assumptions, optimization goals and strategies, the models and patterns they generate. Finally, it relies on a data mining ontology which distills extensive background knowledge concerning knowledge discovery itself.


  • User Rating
  • Comments
  • Learning Exercises
  • Bookmark Collections
  • Course ePortfolios
  • Accessibility Info

More about this material


Log in to participate in the discussions or sign up if you are not already a MERLOT member.