Objetivo
The information society can benefit greatly from advanced database techniques and knowledge discovery from large datasets. We believe that studying the inductive database framework (Imielinski and Mannila, 96) might drastically improve the state of the art in data mining. Inductive databases integrate facts with some arbitrarily complex forms of knowledge (e.g. rules) such that mining processes become querying processes. Today, a few query languages are candidates and the relation among them is unclear. By designing inductive extensions of several querying paradigms (e.g. relational, deductive), we want to define the fundamental primitives for effective database mining. This might be a major step towards the definition of general-purpose query languages. Beside this long-term goal, the project will have an immediate impact on the practice by providing new data mining methods and tools.
OBJECTIVES
Within the framework of inductive databases, knowledge discovery is considered as an extended querying process: from the user point of view, there is no such thing as real discovery, just a matter of the expressive power of the query languages. Today, proposed models for inductive queries are rather preliminary. Therefore, in cInQ, both theoretical and practical issues of this approach will be studied in depth. The ultimate goal is to define the fundamental (querying) primitives for database mining. In a sense, we are looking for the equivalent of Codd's introduction of the relational algebra but with database mining in mind instead of transactional processing. To ensure the genericity, we will study inductive extensions for different querying paradigms and the evaluation of applicability on various data mining tasks as well.
DESCRIPTION OF WORK
We will study both theoretical and practical issues of inductive querying for the discovery of knowledge from data. An inductive database instance contains data (e.g. a relational database) and patterns (e.g. association rules, graphs).
A query language for an inductive database is an extension of a database query language that allows to:
1) select, manipulate and query data as in standard queries;
2) select, manipulate and query "interesting" patterns (e.g. those patterns that satisfy certain constraints) and;
3) cross over patterns and data (e.g. selecting the data in which some patterns hold). Types of patterns that will be studied are itemsets, episodes, datalog queries, data dependencies, clusters, Constraints that will be studied concern frequency, confidence, accuracy, similarity. Querying patterns is then considered as the effective generation of the patterns that satisfy the given constraints.
To identify the fundamental (querying) primitives for database mining, we will:
- develop a general theory of inductive databases;
- analyze inductive query evaluation for several well-studied pattern domains (e.g. association rules, datalog queries) and some new ones (e.g. graphs);
- study inductive extensions of available query languages (SQL, deductive databases), and implement prototypes;
- study and evaluate the use of these prototypes for a wide variety of data mining tasks and applications, including web mining (e.g. web usage mining), bioinformatics (e.g. predictive toxicology) and telecommunication data analysis (e.g. fault prediction and detection).
Ámbito científico
Not validated
Not validated
Convocatoria de propuestas
Data not availableRégimen de financiación
CSC - Cost-sharing contractsCoordinador
69621 VILLEURBANNE CEDEX
Francia