lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From petite_abeille <petite_abei...@mac.com>
Subject Re: Document Clustering
Date Tue, 11 Nov 2003 15:42:00 GMT

On Nov 11, 2003, at 16:05, Marcel Stör wrote:

> As everybody seems to be so exited about it, would someone please be 
> so kind to explain
> what "document based clustering" is?

This mostly means finding document which are "similar" in some way(s). 
The "similitude" is mostly in the eyes of the beholder. In such a 
world, a "cluster" would be a pile of document sharing something. As 
far as Lucene goes, a straightforward way of approaching this could be 
to use an entire document content to query an index. Lucene's result 
set could be construed as a "document cluster". Admittedly, this is 
ground zero of "document clustering", but here you go anyway :)

Here is an illustration:

"Patterns in Unstructured Data"
Discovery, Aggregation, and Visualization
http://javelina.cet.middlebury.edu/lsa/out/cover_page.htm

Cheers,

PA.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message