lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "FieldCollapsing" by YonikSeeley
Date Fri, 17 Sep 2010 01:47:47 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "FieldCollapsing" page has been changed by YonikSeeley.
The comment on this change is: doc search result grouping / field collapsing.
http://wiki.apache.org/solr/FieldCollapsing?action=diff&rev1=17&rev2=18

--------------------------------------------------

  <!> [[Solr4.0]]
  
+ = Result Grouping / Field Collapsing =
  <<TableOfContents>>
  
- /!\ This page refers to functionality from [[https://issues.apache.org/jira/browse/SOLR-236|SOLR-236]].
It is not yet available in trunk.
+ = Introduction =
+ Field Collapsing and Result Grouping are different ways to think about the same Solr feature.
  
- = Introduction =
+ Field Collapsing collapses a group of results with the same field value down to a single
(or fixed number) of entries.  For example, most search engines such as Google collapse on
site so only 1 or two entries are shown, along with a link to click to see more results from
that site.  Field collapsing can also be used to suppress duplicate documents. 
+ 
+ Result Grouping groups documents with a common field value into groups, returning the top
documents per group, and the top groups based on what documents are in the groups.  One example
is a search at BestBuy for a common term such as DVD, that shows the top 3 results for each
category ("TVs & Video","Movies","Computers", etc)
+ 
+ = Quick Start =
+ If you haven't already, get a recent nightly build of [[Solr4.0]], start the example server
and index the example data as shown in the [[http://lucene.apache.org/solr/tutorial.html|solr
tutorial]].
+ 
+ Now send a query request to solr and turn on result grouping.  We'll first try grouping
on the manufacturer name (the manu_exact field). <!> You can currently only group on
single-valued fields!
+ 
+ [[http://localhost:8983/solr/select?wt=json&indent=true&q=solr%20memory&fl=id,name&group=true&group.field=manu_exact]]
+ 
+ And the grouped response is returned:
  
  {{{
+ [...]
+   "grouped":{
+     "manu_exact":{
+       "matches":9,
+       "groups":[{
+           "groupValue":"Apache Software Foundation",
+           "doclist":{"numFound":1,"start":0,"docs":[
+               {
+                 "id":"SOLR1000",
+                 "name":"Solr, the Enterprise Search Server"}]
+           }},
+         {
+           "groupValue":"Corsair Microsystems Inc.",
+           "doclist":{"numFound":4,"start":0,"docs":[
+               {
+                 "id":"VS1GB400C3",
+                 "name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC
3200) System Memory - Retail"}]
+           }},
+         {
+           "groupValue":"A-DATA Technology Inc.",
+           "doclist":{"numFound":2,"start":0,"docs":[
+               {
+                 "id":"VDBDB1A16",
+                 "name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200)
System Memory - OEM"}]
+           }},
+ [...]
- "Used in order to collapse a group of results with similar value for a given field to a
single entry in the result set. Site collapsing is a special case of this, where all results
for a given web site is collapsed into one or two entries in the result set, typically with
an associated "more documents from this site" link. See also Duplicate detection."
- }}}
- From [[http://www.fastsearch.com/glossary.aspx?m=48&amid=299|fast search]] (TODO: this
link is broken, fix it)
- 
- This topic was discussed a while ago: http://www.nabble.com/result-grouping--tf2910425.html#a8131895
- 
- = Setup =
- The easiest way to configure field collapsing is by overriding the query component. This
can be achieved by adding the following xml in your solrconfig.xml:
- 
- {{{
- <searchComponent name="query" class="org.apache.solr.handler.component.CollapseComponent"
/>
- }}}
- That is all, now you can have field collapse enabled searches. The CollapseComponents extends
from the QueryComponent, so a normal search is still possible.
- 
- If you wish to use both the QueryComponent and the CollapseComponent along side each other
then you need to configure a little bit more in your solrconfig.xml. First, register the collapse
searchComponent like this:
- 
- {{{
-   <searchComponent name="collapse" class="org.apache.solr.handler.component.CollapseComponent"
/>
- }}}
- Then reference that search component in a custom search handler. For example, you could
modify the standard request handler to look like this:
- 
- {{{
-   <requestHandler name="standard" class="solr.SearchHandler" default="true">
-     <!-- default values for query parameters -->
-      <lst name="defaults">
-        <str name="echoParams">explicit</str>
- 
-      </lst>
-      <arr name="components">
-         <str>collapse</str>
-         <str>facet</str>
-         <str>highlight</str>
-         <str>debug</str>
-      </arr>
-   </requestHandler>
- }}}
- Note that we have not included "query" in the list of component; the collapse handler implements
query functionality itself.
- 
- In the latest patch it is possible to configure caching for the field collapsing execution.
There are memory issues with this cache.
- Its therefore recommend to keep this cache small (e.g. with size 20) or to disable this
cache. How big the cache should be depends on your environment.
- 
- This is an extra cache in addition
- to the already existing caches. It caches the result of the collapse logic and configured
collapse collectors.
- The following xml configuration can be placed inside the solrconfig.xml as child of the
config element.
- {{{
-   <fieldCollapsing>
- 
-   	<fieldCollapseCache
-       class="solr.FastLRUCache"
-       size="512"
-       initialSize="512"
-       autowarmCount="128"/>
- 
-   </fieldCollapsing>
  }}}
  
- If the field collapse cache is not configured then the field collapse logic will not be
cached. 
+ The response indicates that there are 9 total matches to our query.
+ For each unique value of collapse.field (manufacturer names in this example) a docList with
the top scoring document is returned.  The docList also gives the total number of matches
in that group as "numFound".  The groups themselves are also sorted by the score of the top
document within each group.
  
  <<Anchor(parameters)>>
  = Request Parameters =
- ||'''param''' ||'''description''' ||
+ ||'''param name''' ||'''param value'''||'''description''' ||
+ ||group||true/false||if true, turn on result grouping||
+ ||group.field||[fieldname]||Group based on the unique values of a field.  The field must
currently be single-valued and must be either indexed, or be another field type that has a
value source and works in a function query - such as [[http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html|ExternalFileField]]||
+ ||group.func||[function query]||Group based on the unique values of a function query.||
+ ||rows||[number]||The number of groups to return. Defaults to 10.||
+ ||group.limit||[number]||The number of results (documents) to return for each group.  Defaults
to 1.||
+ ||sort||[sortspec]||How to sort the groups relative to each other.  For example, {{{sort=popularity
desc}}} will cause the groups to be sorted by the popularity of the highest ranking (first)
document in each group.  Defaults to "score desc".||
+ ||group.sort||[sortspec]||How to sort documents within a single group.  Defaults to the
same value as the {{{sort}}} parameter.||
- ||collapse.type ||normal/adjacent -- does this collapse all documents or just the ones that
are next to each other.  Defaults to normal ||
- ||collapse.field ||Which field to collapse. If this field is not specified then field collapsing
is not enabled and falls back to to the QueryComponent to do a search. ||
- ||collapse.facet ||before/after -- apply faceting before or after collapsing.  Defaults
to after ||
- ||collapse.max ||Deprecated use collapse.threshold instead. This parameter is removed in
the latest patch. ||
- ||collapse.threshold ||The number of documents with the same value for collapse.field after
which collapsing kicks in. The default value is one. ||
- ||collapse.maxdocs ||Maximum number of documents to process during field collapsin. This
parameter defaults to one greater then the largest document number. ||
- ||collapse.info.doc ||Return collapse count for each document? Defaults to true ||
- ||collapse.info.count ||Return collapse count for each field value? Defaults to true ||
- ||collapse.includeCollapsedDocs.fl ||Parameter indicating to return the collapsed documents
in the response and what fields to return in comma separated manner. A value * indicates that
all fields will be returned ||
- ||collapse.debug ||wheter to include collapse debug information ||
- ||collapse.aggregate ||Execute aggregate functions on the collapsed documents. The parameter
expect the functions in the following format: function_name(field_name) [, function_name(field_name].
So for example: sum(stock), avg(weight). Currently there are four functions available: min(...),
max(...), sum(...), avg(...). The functionality is available from the patch added at 2009-10-25
10:13 PM. ||
  
+ Notes:
+  * Distributed search support for result grouping has not yet been implemented.
- <<Anchor(examples)>>
- = Examples =
- Using the example data:
  
- Collapse all documents using 'manu_exact' and 'normal' collapse type:
- http://localhost:8983/solr/select/?q=*:*&collapse.field=manu_exact&collapse.threshold=1&collapse.type=normal
- {{{
- <lst name="collapse_counts">
-     <str name="field">manu_exact</str>
-     <lst name="results">
-         <lst name="F8V7067-APL-KIT">
-             <int name="collapseCount">1</int>
-             <str name="fieldValue">Belkin</str>
-         </lst>
-         <lst name="TWINX2048-3200PRO">
-             <int name="collapseCount">3</int>
-             <str name="fieldValue">Corsair Microsystems Inc.</str>
-         </lst>
-         <lst name="VDBDB1A16">
-             <int name="collapseCount">1</int>
-             <str name="fieldValue">A-DATA Technology Inc.</str>
-         </lst>
-         <lst name="0579B002">
-             <int name="collapseCount">1</int>
-             <str name="fieldValue">Canon Inc.</str>
-         </lst>
-         <lst name="SOLR1000">
-             <int name="collapseCount">1</int>
-             <str name="fieldValue">Apache Software Foundation</str>
-         </lst>
-     </lst>
- </lst>
- }}}
- 
- Collapse all documents using 'manu_exact' and 'adjacent' collapse type:
- http://localhost:8983/solr/select/?q=*:*&collapse.field=manu_exact&collapse.threshold=1&collapse.type=adjacent
- 
- {{{
- <lst name="collapse_counts">
-     <str name="field">manu_exact</str>
-     <lst name="results">
-         <lst name="F8V7067-APL-KIT">
-             <int name="collapseCount">1</int>
-             <str name="fieldValue">Belkin</str>
-         </lst>
-         <lst name="TWINX2048-3200PRO">
-             <int name="collapseCount">1</int>
-             <str name="fieldValue">Corsair Microsystems Inc.</str>
-         </lst>
-         <lst name="TWINX2048-3200PRO-payload">
-             <int name="collapseCount">1</int>
-             <str name="fieldValue">Corsair Microsystems Inc.</str>
-         </lst>
-     </lst>
- </lst>
- }}}
- 
- The response is centred around collapse groups. A collapse group represents documents that
were collapsed during the search. A collapse group is identifier by the most relevant document
of that collapse group, which is document that did not get collapsed and remained present
in the search result. So the ids like 233238 are from documents that are also present in the
search result.
- 
- = Distributed field collapsing =
- In a distributed environment fieldcollapsing is supported in a limited manner. While indexing
you must make sure that the documents of a collapse group are not scattered across different
shards. Documents of a collapse group must reside on the same shard, failing to do so will
corrupt your search results. Doing a distributed search with collapsing requires not extra
parameters to be send with the request. For example the following request is sufficient: http://localhost:8080/solr/select/?q=solr&collapse.field=my_field&shards=localhost:55527/solr,localhost:55529/solr
- 
- = Other resources =
- Some other resources regarding to field collapsing:
- 
-  * [[http://blog.jteam.nl/2009/10/20/result-grouping-field-collapsing-with-solr/|Result
grouping / field collapsing with Solr]]
- 
- If anyone has links about this topic feel free to add it.
- 

Mime
View raw message