lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua O'Madadhain" <jmad...@ics.uci.edu>
Subject Re: Document Clustering
Date Tue, 11 Nov 2003 19:31:10 GMT
On Tuesday, Nov 11, 2003, at 11:05 US/Pacific, Marcel Stor wrote:

> Stefan Groschupf wrote:
>> Hi,
>>> How is document clustering different/related to text categorization?
>>
>> Clustering: try to find own categories and put documents that match
>> in it. You group all documents with minimal distance together.
>
> Would I be correct to say that you have to define a "distance 
> threshold"
> parameter in order to define when to build a new category for a certain
> group?

Depends on the type of clustering algorithm.  Some clustering 
algorithms take the number of clusters as a parameter (in this case the 
algorithm may be run several times with different values, to determine 
the best value).  Other types of algorithms, such as hierarchical 
agglomerative clustering algorithms, work more as you suggest.

Regards,

Joshua O'Madadhain

  jmadden@ics.uci.edu...Obscurium Per 
Obscurius...www.ics.uci.edu/~jmadden
   Joshua O'Madadhain: Information Scientist, Musician, 
Philosopher-At-Tall
  It's that moment of dawning comprehension that I live for--Bill 
Watterson
My opinions are too rational and insightful to be those of any 
organization.


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message