Text Mining and Link Analysis for Web and Semantic Web
Marko Grobelnik and Dunja Mladenić
The tutorial on Text Mining and Link Analysis for Web Data will focus on two main analytical approaches when analyzing web data: text mining and link analysis for the purpose of analyzing web documents and their linkage. First, the tutorial will cover some basic steps and problems when dealing with the textual and network (graph) data showing what is possible to achieve without very sophisticated technology. The idea of this first part is to present the nature of unstructured and semi-structured data. Next, in the second part, more sophisticated methods for solving more difficult and challenging problems will be shown. In the last part, some of the current open research issues will be presented and some practical pointers on the available tolls for solving previously mentioned problems will be provided.
Marko Grobelnik is an expert in analysis of large amounts of complex data with the purpose to extract useful knowledge. In particular, the areas of expertise comprise: Data Mining, Text Mining, Information Extraction, Link Analysis, and Data Visualization as well as more integrative areas such as Semantic Web, Knowledge Management and Artificial Intelligence. Apart from research on theoretical aspects of unconventional data analysis techniques, he has valuable experience in the field of practical applications and development of business solutions based on the innovative technologies. Marko was employed as a researcher first, at the Computer Science Department at University of Ljubljana and later at the Department of Knowledge Technologies at Jozef Stefan Institute, Ljubljana Slovenia, that is the main national research institute for natural sciences in the country. His primary focus of research and applications is intelligent data analysis which deals with unconventional scenarios going beyond classical statistical approaches and solving problems including unstructured or semi structured data. His main achievements are from the field of Text-Mining (analysis of large amounts of textual data), having leading role on scientific and applicative projects funded by European Commission, having projects with industries such as Microsoft Research, British Telecom, New York Times, Siemens, and organizing several international events on the related topics. He is also founder and CEO (from 2001 on) of the Quintelligence Company with the core business in transferring knowledge management and knowledge discovery/data mining expertise into marketable products and services. Marko is also co-founder of Cycorp Europe Company (European branch of US based Cycorp company) (from 2006 on) working in deep knowledge modeling and reasoning.
Currently Marko co-manages (together with two co-managers) the team of 35 people where Marko’s role is management and active work with the scientific and technical part of the team of 20 researchers, research programmers and software engineers. The key activities of Marko’s team include development of new research ideas from innovation to research prototypes and industrial deployment through spin-off companies at later stages. The team collaboratively contributes into software platform Text-Garden enabling flexibility and efficiency in building solutions for analytic and semantic technologies. Marko Grobelnik was involved in acquiring, execution and management of several EU funded projects from FP5, FP6, and FP7. In the period between 2000 and 2007 the group acquired approx. 8 million euro of European funding of EC contribution. The projects are from the three main areas/strategic objectives: (1) semantic and knowledge, (2) cognitive systems, and (3) networked organizations.
Dunja Mladenić ((http://kt.ijs.si/Dunja/) is an expert on study and development of Machine Learning, Data Mining and Text Mining techniques and their application on real-world problems from different areas such as, publishing, medicine, pharmacology, manufacturing, economy. Her current research focuses on data analysis, with particular interest in learning from Text and the Web including personal intelligent agents. She works as a researcher at the Department of Knowledge Technologies of the J. Stefan Institute, Ljubljana, Slovenia since 1992. She graduate in Computer Science at w:st="on">University of Ljubljana and continued as a PhD student focused on Artificial Intelligence. She got her MSc and PhD in Computer Science at University of Ljubljana in 1995 and 1998 respectively. She was visiting School of Computer Science, Carnegie Mellon University, and Pittsburgh PA, USA, as a visiting researcher in 1996-1997 and in 2000-2001. Currently Dunja co-manages (together with two co-managers) the team of 35 people working on development of new research ideas from innovation to research prototypes and industrial deployment through spin-off companies at later stages. In the period between 2000 and 2007 the group acquired approx. 8 million euro of European funding of EC contribution. The projects are from the three main areas/strategic objectives: (1) semantic and knowledge, (2) cognitive systems, and (3) networked organizations.
Dunja Mladenić was coordinating EU RTD project Data Mining and Decision Support for business competitiveness: A European virtual enterprise (Sol-Eu-Net). She is or was on the Management Board of several European research and development projects including FP7 IP project ACTIVE Enabling the Knowledge Powered Enterprise, FP6 IP projects SEKT Semantically-Enabled Knowledge Technologies and NeOn Lifecycle Support for Networked Ontologies, FP6 STREP projects MAGINATION Image-based Navigation in Multimedia Archives, SMART Statistical Multilingual Analysis for Retrieval and Translation, SWING Semantic Web Services Interoperability for Geospatial Decision Making and TAO Transitioning Applications to Ontologies, FP7 NoE PASCAL2 and FP6 NoE project PASCAL - Pattern Analysis, Statistical Modelling and Computational Learning, FP6 CA project KD-ubiq A blue print for ubiquitous knowledge discovery systems, FP6 SSA projects CEC-WYS: Central European Centre for Women and Youth in Science and WS-DEBATE. She is the Slovenian representative in EC Enwise STRATA ETAN Expert Group ”Promoting women scientists from the Central and Eastern European countries and the Baltic States to produce gender equality in science in the wider Europe”. She serves as project evaluator of project proposals for EC programme on Information and Society Technology (IST). In 2001, she was evaluator of project proposals for National Science Foundation (NSF) initiative on Information Technology Research (ITR), NSF 00-126, USA.
She has published several papers in refereed conferences and journals, served in the program committee of different international conferences and organized several international events in the area of Text Mining, Link Analysis and Data Mining. She is co-editor of the book Data Mining and Decision Support: Integration and Collaboration, Kluwer Academic Publishers 2003, the book Semantic Knowledge Management: Integrating Ontology Management, Knowledge Discovery, and Human Language Technologies. Springer 2008, the book Semantics, Web and Mining, Springer 2006, the book Web Mining: from Web to Semantic Web, Springer, 2004.