lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From petite_abeille <>
Subject Re: Document Clustering
Date Tue, 11 Nov 2003 15:42:00 GMT

On Nov 11, 2003, at 16:05, Marcel Stör wrote:

> As everybody seems to be so exited about it, would someone please be 
> so kind to explain
> what "document based clustering" is?

This mostly means finding document which are "similar" in some way(s). 
The "similitude" is mostly in the eyes of the beholder. In such a 
world, a "cluster" would be a pile of document sharing something. As 
far as Lucene goes, a straightforward way of approaching this could be 
to use an entire document content to query an index. Lucene's result 
set could be construed as a "document cluster". Admittedly, this is 
ground zero of "document clustering", but here you go anyway :)

Here is an illustration:

"Patterns in Unstructured Data"
Discovery, Aggregation, and Visualization



To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message