lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "ClusteringComponent" by YonikSeeley
Date Wed, 21 Oct 2009 19:37:44 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "ClusteringComponent" page has been changed by YonikSeeley.
The comment on this change is: move up example into quickstart.
http://wiki.apache.org/solr/ClusteringComponent?action=diff&rev1=31&rev2=32

--------------------------------------------------

  This component can cluster both search results and documents.  In case you're wondering
what clustering is good for, think of it as a quick way to summarize a whole bunch of results/documents,
or as a way to group together like results/documents.
  
  See http://en.wikipedia.org/wiki/Data_clustering for more background, as well as links to
further reading.
- 
  
  = Clustering Component =
  
@@ -21, +20 @@

  
  == Installation ==
  
- The !ClusteringComponent is in the contrib area of Solr.  Due to some dependencies on LGPL
libraries for the Carrot2 implementation, we cannot package a complete binary solution (with
all the dependencies).  To get the Carrot2 solution, you will need to download these libraries.
 To do this, on the command line in the contrib/clustering directory, run {{{ant get-libraries}}}.
 This will create a downloads directory under the lib directory.  From there, you just need
to grab the Solr clustering JAR and all the libraries and it should work.  To see an example
of it working, try running {{{ant example}}} and then switching over to $SOLR_HOME/example/clustering
and follow the directions below.
+ The !ClusteringComponent is in the contrib area of Solr.  Due to some dependencies on LGPL
libraries for the Carrot2 implementation, we cannot package a complete binary solution (with
all the dependencies).  To get the Carrot2 solution, you will need to download these libraries.
 To do this, on the command line in the contrib/clustering directory, run {{{ant get-libraries}}}.
 This will create a downloads directory under the lib directory for the downloaded jars.
+ 
+ == Quick Start ==
+ 
+ To run the example, cd to the Solr install directory, then:
+ {{{
+ $ ant example #builds the local example for clustering, including downloading jars
+ $ cd example
+ $ java -Dsolr.solr.home=../contrib/clustering/example -jar start.jar
+ }}}
+ Then, in a different window, add some docs using the post tool in the exampledocs directory.
+ {{{
+ $ cd example/exampledocs
+ $ ./post.sh *.xml
+ }}}
+ Now try a query that turns on clustering (clustering=true):
+ {{{
+ http://localhost:8983/solr/select?indent=on&q=*:*&rows=10&clustering=true
+ }}}
+ This should yield results that include cluster information at the bottom of the response,
like:
+ {{{
+ <arr name="clusters">
+  <lst>
+   <arr name="labels">
+ 	<str>DDR</str>
+   </arr>
+   <arr name="docs">
+ 	<str>TWINX2048-3200PRO</str>
+ 	<str>VS1GB400C3</str>
+ 	<str>VDBDB1A16</str>
+   </arr>
+  </lst>
+  <lst>
+   <arr name="labels">
+ 	<str>Car Power Adapter</str>
+   </arr>
+   <arr name="docs">
+ 	<str>F8V7067-APL-KIT</str>
+ 	<str>IW-02</str>
+   </arr>
+  </lst>
+  <lst>
+   <arr name="labels">
+ 	<str>Hard Drive</str>
+   </arr>
+   <arr name="docs">
+ 	<str>SP2514N</str>
+ 	<str>6H500F0</str>
+   </arr>
+  </lst>
+  <lst>
+ [...]
+ }}}
+ 
+ Clusters produced by Carrot2 group the results into different product categories: DDR (memory),
Car Power Adapter, Display, Hard Drive. Notice that, depending on the quality of input documents,
some clusters may not make much sense.
+ 
  
  == Configuration ==
  
@@ -39, +93 @@

  
  == Carrot2 Clustering ==
  
- Carrot2 is a scalable, BSD licensed search results clustering engine.  It can cluster many
different types of search results, including Y!, Google, etc.  Our implementation, naturally,
clusters Solr/Lucene results.
+ Carrot2 is a scalable, BSD licensed search results clustering engine.  It can cluster many
different types of search results, including Y!, Google, etc.  Our implementation, naturally,
clusters Solr results.
  
  Carrot2 is best suited for clustering small-to-medium collections of short documents. While
Carrot2 may work for longer documents, processing times may be too long to meet on-line clustering
requirements.
  
  See http://project.carrot2.org
- 
- == Example ==
- 
- The contrib/clustering sub directory contains a simple example that works off of the existing
sample documents, but does clustering on them.
- 
- To run the example, cd to the Solr install directory, then:
- {{{
- $ ant example //builds the local example for clustering
- $ cd example
- $ java -Dsolr.solr.home=../contrib/clustering/example -jar start.jar
- }}}
- Then, add some docs using the post tool in the exampledocs directory.
- 
  
  The configuration (solrconfig.xml) looks like:
  {{{
@@ -121, +162 @@

  
  The thing to note here is the mapping of Solr Fields (name, id, etc.) to the Carrot2 needs
of title, snippet and url. Clustering will take into account the text of title and snippet.
  
- Next, inputting a query that turns on clustering (clustering=true:
- {{{
- http://localhost:8983/solr/select?indent=on&q=*:*&rows=10&clustering=true
- }}}
- 
- yields the results like:
- {{{
- <arr name="clusters">
-  <lst>
-   <arr name="labels">
- 	<str>DDR</str>
-   </arr>
-   <arr name="docs">
- 	<str>TWINX2048-3200PRO</str>
- 	<str>VS1GB400C3</str>
- 	<str>VDBDB1A16</str>
-   </arr>
-  </lst>
-  <lst>
-   <arr name="labels">
- 	<str>Car Power Adapter</str>
-   </arr>
-   <arr name="docs">
- 	<str>F8V7067-APL-KIT</str>
- 	<str>IW-02</str>
-   </arr>
-  </lst>
-  <lst>
-   <arr name="labels">
- 	<str>Hard Drive</str>
-   </arr>
-   <arr name="docs">
- 	<str>SP2514N</str>
- 	<str>6H500F0</str>
-   </arr>
-  </lst>
-  <lst>
- [...]
- }}}
- 
- Clusters produced by Carrot2 group the results into different product categories: DDR (memory),
Car Power Adapter, Display, Hard Drive. Notice that, depending on the quality of input documents,
some clusters may not make much sense.
  
  == Tuning Carrot2 clustering ==
  

Mime
View raw message