Evaluating text mining practices
The Parmenides system was designed to support the entire text mining process, from gathering documents through information extraction and semantic annotation to the application of data mining techniques. Being ontology-based, it includes an ontology management system and tools for extracting new concepts and relations, in addition to providing document- and data-warehousing facilities. Although the Parmenides system can support the entire text mining process, it also possible to for users to employ only a sub-set of the available facilities depending on the task they wish to carry out. During the lifetime of the PARMENIDES project and in parallel with the development of the system itself, an evaluation framework was developed in conjunction with the users. The ultimate aim of this exercise undertaken by the PARMENIDES project partners was two-fold. Firstly, they sought to perform a complete user-centred evaluation of the system architecture and to assess how well it answered the user's requirements. And secondly, the general framework built up for Parmenides system was intended to be re-usable for evaluating similar systems. In particular, the Relative ordering tool (ROTE) was employed to build a parameterised quality model for evaluation. This was designed at the Université de Genève to help users in specifying the relative importance of different quality characteristics and associated metrics. The tool allows users to order any number of quality characteristics by comparing them in pair-wise fashion. For example, a user may consider both an ontology management system and the facility to build and maintain ontologies to be mandatory. Nevertheless, the performance of dedicated tools for acquiring new concepts may be characterised as having less importance than the quality of the management system. Such an evaluation framework for a large and complex text mining system resulted in a quality model, containing more than 180 metrics. It was this complexity of the quality model which initially led to the development of the ROTE tool. However, before its overall benefits can be assessed it will need to be further tested on other systems with varying complexity.