lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Trivial Update of "ClusteringComponent" by GunnlaugurBriem
Date Tue, 18 Oct 2011 16:29:45 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "ClusteringComponent" page has been changed by GunnlaugurBriem:
http://wiki.apache.org/solr/ClusteringComponent?action=diff&rev1=54&rev2=55

Comment:
fix a couple of typos

  
  Carrot^2^ is best suited for clustering small-to-medium collections of short documents.
While it may work for longer documents, processing times may be too long to meet on-line clustering
requirements.
  
- Carrot^2^ assumes that each search result provided on input can consist of three types of
fields: [[#carrot.title|document title]], [[#carrot.snippet|document content/snippet]] and
[[#carrot.url|URL]]. Document tile is required, content/snippet and URL are optional. The
reason to distinguish between the document's title and content is that Carrot^2^ can give
more weight to the titles, which increases the quality of clusters and labels. Carrot^2^ needs
at least about 20 search results to generate meaningful clusters. For more information, please
see [[http://download.carrot2.org/stable/manual/#section.advanced-topics.fine-tuning.input-documents-characteristics|the
desired qualities of the documents for clustering in Carrot2 manual]].
+ Carrot^2^ assumes that each search result provided on input can consist of three types of
fields: [[#carrot.title|document title]], [[#carrot.snippet|document content/snippet]] and
[[#carrot.url|URL]]. Document title is required, content/snippet and URL are optional. The
reason to distinguish between the document's title and content is that Carrot^2^ can give
more weight to the titles, which increases the quality of clusters and labels. Carrot^2^ needs
at least about 20 search results to generate meaningful clusters. For more information, please
see [[http://download.carrot2.org/stable/manual/#section.advanced-topics.fine-tuning.input-documents-characteristics|the
desired qualities of the documents for clustering in Carrot2 manual]].
  
  '''Note''': Carrot^2^ can only perform clustering on stored fields. The reason for this
is that Carrot^2^ aims to create meaningful cluster labels by using phrases (sequences of
words) taken directly from the documents' text. The easiest way of providing input for such
a process is feeding Carrot^2^ with raw (stored) document content.
  
@@ -249, +249 @@

  
  The frag size to use for highlighting. Meaningful only when [[#carrot.produceSummary|carrot.produceSummary]]
is `true`. If not specified, the default highlighting fragsize (`hl.fragsize`) will be used.
If that isn't specified, then 100.
  
- <!> In Solr versions 3.1.x, 3.2.x and 3.3.0 this parameter is [[https://issues.apache.org/jira/browse/SOLR-2692|incorrectly
named]] {{{carrot.fragzise}}}. Solr versions 3.4.x and further use the correct parameter name
{{{carrot.fragSize}}}.
+ <!> In Solr versions 3.1.x, 3.2.x and 3.3.0 this parameter is [[https://issues.apache.org/jira/browse/SOLR-2692|incorrectly
named]] {{{carrot.fragsize}}}. Solr versions 3.4.x and further use the correct parameter name
{{{carrot.fragSize}}}.
  
  === carrot.numDescriptions ===
  

Mime
View raw message