lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Groschupf ...@media-style.com>
Subject Re: Document Clustering
Date Tue, 11 Nov 2003 19:31:53 GMT


Marcel Stor wrote:

>Stefan Groschupf wrote:
>  
>
>>Hi,
>>    
>>
>>>How is document clustering different/related to text categorization?
>>>      
>>>
>>Clustering: try to find own categories and put documents that match
>>in it. You group all documents with minimal distance together.
>>    
>>
>
>Would I be correct to say that you have to define a "distance threshold"
>parameter in order to define when to build a new category for a certain
>group?
>  
>
I'm not sure. There are different data mining algorithms that could be used. Depends on this
algoritm. I prefer Support vector machines(SVM). There you calculate distances of multi demensional
vectors in a multidemensional "room".
One vector represent one document. 

Stefan



Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message