Text Mining in the SOMLib Digital Library System: The Representation of Topics and Genres

Authors: 
Andreas Rauber
Wolfdieter Merkl
Type: 
Journal article
Proceedings: 
Publisher: 
Applied Intelligence, 18-3
Pages: 
271 - 293
ISBN: 
Year: 
2003
Abstract: 
With the increasing amount of textual information available in electronic<br> form, more powerful methods for exploring, searching, and organizing the<br> available mass of information are needed to cope with this situation.<br> This paper presents the SOMLIB digital library system, built on neural<br> networks to provide text mining capabilities. At its foundation we use the<br> self-organizing map to provide content-based clustering of documents. By<br> using an extended model, i.e. the growing hierarchical self-organizing map,<br> we can further detect subject hierarchies in a document collection, with<br> the neural network adapting its size and structure automatically during its<br> unsupervised training process to reflect the topical hierarchy.<br> By mining the weight vector structure of the trained maps our system is<br> able to select keywords describing the various topical clusters.<br> Text mining has to incorporate more than the mere analysis of content.<br> Structural and genre information are key in organizing and locating<br> information. Using color-coding techniques we can integrate a structural<br> analysis of documents based on self-organizing maps into the subject-based<br> clustering relying on metaphor graphics for intuitive visualization.<br> We demonstrate the capabilities of the SOMLib system using collections of <br> articles from various newspapers and magazines.<br> <br> Keywords: Document Clustering, Self-Organizing Map (SOM), Genre Analysis,<br> Metaphor Graphics, Digital Libraries.
TU Focus: 
Information and Communication Technology
Reference: 

A. Rauber, W. Merkl:
" Text Mining in the SOMLib Digital Library System: The Representation of Topics and Genres";
Applied Intelligence, 18 (2003), 3; S. 271 - 293.

Zusätzliche Informationen

Last changed: 
17.12.2003 17:35:30
TU Id: 
137948
Accepted: 
Accepted
Invited: 
Department Focus: 
Business Informatics
Abstract German: 
Author List: 
A. Rauber, W. Merkl