lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "ClusteringComponent" by StanislawOsinski
Date Fri, 25 Nov 2011 08:42:29 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "ClusteringComponent" page has been changed by StanislawOsinski:
http://wiki.apache.org/solr/ClusteringComponent?action=diff&rev1=56&rev2=57

  
  Carrot^2^ assumes that each search result provided on input can consist of three types of
fields: [[#carrot.title|document title]], [[#carrot.snippet|document content/snippet]] and
[[#carrot.url|URL]]. Document title is required, content/snippet and URL are optional. The
reason to distinguish between the document's title and content is that Carrot^2^ can give
more weight to the titles, which increases the quality of clusters and labels. Carrot^2^ needs
at least about 20 search results to generate meaningful clusters. For more information, please
see [[http://download.carrot2.org/stable/manual/#section.advanced-topics.fine-tuning.input-documents-characteristics|the
desired qualities of the documents for clustering in Carrot2 manual]].
  
- '''Note''': Carrot^2^ can only perform clustering on stored fields. The reason for this
is that Carrot^2^ aims to create meaningful cluster labels by using phrases (sequences of
words) taken directly from the documents' text. The easiest way of providing input for such
a process is feeding Carrot^2^ with raw (stored) document content.
+ '''Note''': Carrot^2^ can only perform clustering on stored fields. The reason for this
is that Carrot^2^ aims to create meaningful cluster labels by using phrases (sequences of
words) taken directly from the documents' text. The easiest way of providing input for such
a process is feeding Carrot^2^ with raw (stored) document content. As a result, character
and token filters are currently ignored. There are plans to implement support for character
and selected token filters during clustering: https://issues.apache.org/jira/browse/SOLR-2917.
  
  
  == Parameters ==

Mime
View raw message