lucene-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (Confluence)" <conflue...@apache.org>
Subject [CONF] Apache Solr Reference Guide > Result Grouping
Date Sat, 03 Aug 2013 21:39:00 GMT
Space: Apache Solr Reference Guide (https://cwiki.apache.org/confluence/display/solr)
Page: Result Grouping (https://cwiki.apache.org/confluence/display/solr/Result+Grouping)

Change Comment:
---------------------------------------------------------------------
group.truncate in fact works in distributed grouping -- SOLR-2776

Edited by David Smiley:
---------------------------------------------------------------------
Result Grouping groups documents with a common field value into groups and returns the top
documents for each group. For example, if you searched for "DVD" on an electronic retailer's
e-commerce site, you might be returned three categories such as "TV and Video," "Movies,"
and "Computers," with three results per category. In this case, the query term "DVD" appeared
in all three categories, so Solr groups them together in order to increase relevancy for the
user.

Result Grouping is separate from [Faceting]. Though it is conceptually similar, faceting returns
all relevant results and allows the user to refine the results based on the facet category.
For example, if you searched for "shoes" on a footwear retailer's e-commerce site, you would
be returned all results for that query term, along with selectable facets such as "size,"
"color," "brand," and so on.

However, with Solr 4 you can also group facets. The grouped faceting works with the first
{{group.field}} parameter, and other {{group.field}} parameters are ignored. Grouped faceting
only supports {{facet.field}} for string based fields that are not tokenized and are not multivalued.

Grouped faceting currently doesn't support date and pivot faceting, but it does support range
faceting.

Grouped faceting differs from non grouped facets (sum of all facets) == (total  of products
with that property) as shown in  the following example:

Object 1
- name: Phaser 4620a
- ppm: 62
- product_range: 6

Object 2
- name: Phaser 4620i
- ppm: 65
- product_range: 6

Object 3
- name: ML6512
- ppm: 62
- product_range: 7

If you ask Solr to group these documents by "product_range", then the total amount of groups
is 2, but the facets for ppm are 2 for 62 and 1 for 65.

h2. Request Parameters

Result Grouping takes the following request parameters. Any number of these request parameters
can be included in a single request:

|| Parameter || Type || Description ||
| group | Boolean | If true, query results will be grouped. |
| group.field | string | The name of the field by which to group results. The field be single-valued,
and either be indexed or a field type that has a value source and works in a function query,
such as {{ExternalFileField}}. It must also be a string-based field, such as {{StrField}}
or {{TextField}} |
| group.func | query | Group based on the unique values of a function query. Supported only
in Sol4r 4.0. |
| group.query | query | Return a single group of documents that match the given query. |
| rows | integer | The number of groups to return. The default value is 10. |
| start | integer | Specifies an initial offset for the list of groups. |
| group.limit | integer | Specifies the number of results to return for each group. The default
value is 1. |
| group.offset | integer | Specifies an initial offset for the document list of each group.
|
| sort | sortspec | Specifies how Solr sorts the groups relative to each other. For example,
{{sort=popularity desc}} will cause the groups to be sorted according to the highest popularity
document in each group. The default value is {{score desc}}. |
| group.sort | sortspec | Specifies how Solr sorts documents within a single group. The default
value is {{score desc}}. |
| group.format | grouped/simple | If this parameter is set to {{simple}}, the grouped documents
are presented in a single flat list, and the {{start}} and {{rows}} parameters affect the
numbers of documents instead of groups. |
| group.main | Boolean | If true, the result of the first field grouping command is used as
the main result list in the response, using {{group.format=simple}}. |
| group.ngroups | Boolean | If true, Solr includes the number of groups that have matched
the query in the results. The default value is false. |
| group.truncate | Boolean | If true, facet counts are based on the most relevant document
of each group matching the query. The default value is false. |
| group.facet | Boolean |  Determines whether to compute grouped facets for the field facets
specified in facet.field parameters. Grouped facets are computed based on the first specified
group. As with normal field faceting, fields shouldn't be tokenized (otherwise counts are
computed for each token). Grouped faceting supports single and multivalued fields. Default
is false. New with Solr 4. |
| group.cache.percent | integer between 0 and 100 | Setting this parameter to a number greater
than 0 enables caching for result grouping. Result Grouping executes two searches; this option
caches the second search. The default value is 0. Testing has shown that group caching only
improves search time with Boolean, wildcard, and fuzzy queries. For simple queries like term
or "match all" queries, group caching degrades performance. |

Any number of group commands ({{group.field}}, {{group.func}}, {{group.query}}) may be specified
in a single request.

Grouping is also supported for distributed searches. Currently {{group.func}} is the only
parameter that doesn't supported distributed searches. 

h2. Examples

All of the following examples work with the data provided in the Solr Example directory.

h3. Grouping Results by Field

In this example, we will group results based on the {{manu_exact}} field, which specifies
the manufacturer of the items in the sample dataset.

{{[http://localhost:8983/solr/select?wt=json&indent=true&fl=id,name&q=solr+memory&group=true&group.field=manu_exact]}}

{code:borderStyle=solid|borderColor=#666666}
{
...
"grouped":{
  "manu_exact":{
    "matches":6,
    "groups":[{
        "groupValue":"Apache Software Foundation",
        "doclist":{"numFound":1,"start":0,"docs":[
            {
              "id":"SOLR1000",
              "name":"Solr, the Enterprise Search Server"}]
        }},
      {
        "groupValue":"Corsair Microsystems Inc.",
        "doclist":{"numFound":2,"start":0,"docs":[
            {
              "id":"VS1GB400C3",
              "name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200)
System Memory - Retail"}]
        }},
      {
        "groupValue":"A-DATA Technology Inc.",
        "doclist":{"numFound":1,"start":0,"docs":[
            {
              "id":"VDBDB1A16",
              "name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System
Memory - OEM"}]
        }},
      {
        "groupValue":"Canon Inc.",
        "doclist":{"numFound":1,"start":0,"docs":[
            {
              "id":"0579B002",
              "name":"Canon PIXMA MP500 All-In-One Photo Printer"}]
        }},
      {
        "groupValue":"ASUS Computer Inc.",
        "doclist":{"numFound":1,"start":0,"docs":[
            {
              "id":"EN7800GTX/2DHTV/256M",
              "name":"ASUS Extreme N7800GTX/2DHTV (256 MB)"}]
        }
      }
    ]
   }
  }
{code}

The response indicates that there are six total matches for our query. For each unique value
of {{group.field}}, Solr returns a {{docList}} with the top scoring document. The {{docList}}
also includes the total number of matches in that group as the {{numFound}} value. The groups
are sorted by the score of the top document within each group.

We can run the same query with the request parameter {{group.main=true}}. This will format
the results as a single flat document list. This flat format does not include as much information
as the normal result grouping query results, but it may be easier for existing Solr clients
to parse.

{{[http://localhost:8983/solr/select?wt=json&indent=true&fl=id,name,manufacturer&q=solr+memory&group=true&group.field=manu_exact&group.main=true]}}

{code:borderStyle=solid|borderColor=#666666}
{
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "fl":"id,name,manufacturer",
      "indent":"true",
      "q":"solr memory",
      "group.field":"manu_exact",
      "group.main":"true",
      "group":"true",
      "wt":"json"}},
  "grouped":{},
  "response":{"numFound":6,"start":0,"docs":[
      {
        "id":"SOLR1000",
        "name":"Solr, the Enterprise Search Server"},
      {
        "id":"VS1GB400C3",
        "name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System
Memory - Retail"},
      {
        "id":"VDBDB1A16",
        "name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System
Memory - OEM"},
      {
        "id":"0579B002",
        "name":"Canon PIXMA MP500 All-In-One Photo Printer"},
      {
        "id":"EN7800GTX/2DHTV/256M",
        "name":"ASUS Extreme N7800GTX/2DHTV (256 MB)"}]
  }
}
{code}

h3. Grouping by Query

In this example, we will use the {{group.query}} parameter to find the top three results for
"memory" in two different price ranges: 0.00 to 99.99, and over 100.

{{[http://localhost:8983/solr/select?wt=json&indent=true&fl=name,price&q=memory&group=true&group.query=price:\[0+TO+99.99\]&group.query=price:\[100+TO+*\]&group.limit=3]}}

{code:borderStyle=solid|borderColor=#666666}
{
  "responseHeader":{
    "status":0,
    "QTime":42,
    "params":{
      "fl":"name,price",
      "indent":"true",
      "q":"memory",
      "group.limit":"3",
      "group.query":["price:[0 TO 99.99]",
        "price:[100 TO *]"],
      "group":"true",
      "wt":"json"}},
  "grouped":{
    "price:[0 TO 99.99]":{
      "matches":5,
      "doclist":{"numFound":1,"start":0,"docs":[
          {
            "name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200)
System Memory - Retail",
            "price":74.99}]
      }},
    "price:[100 TO *]":{
      "matches":5,
      "doclist":{"numFound":3,"start":0,"docs":[
          {
            "name":"CORSAIR  XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200)
Dual Channel 
               Kit System Memory - Retail",
            "price":185.0},
          {
            "name":"Canon PIXMA MP500 All-In-One Photo Printer",
            "price":179.99},
          {
            "name":"ASUS Extreme N7800GTX/2DHTV (256 MB)",
            "price":479.95}]
      }
     }
   }
 }
{code}

In this case, Solr found five matches for "memory," but only returns four results grouped
by price. This is because one result for "memory" did not have a price assigned to it.

h2. Distributed Result Grouping

Solr also supports result grouping on distributed indexes. If you are using result grouping
on the "/select" request handler, you must provide the {{shards}} parameter described here.
If you are using result grouping on a request handler other than "/select", you must also
provide the {{shards.qt}} parameter:

|| Parameter || Description ||
| shards | Specifies the shards in your distributed indexing configuration. For more information
about distributed indexing, see [Distributed Search with Index Sharding] |
| shards.qt | Specifies the request handler Solr uses for requests to shards. This parameter
is not required for the {{/select}} request handler. |

For example: {{[http://localhost:8983/solr/select?wt=json&indent=true&fl=id,name,manufacturer&q=solr+memory&group=true&group.field=manu_exact&group.main=true&shards=solr-shard1:8983/solr,solr-shard2:8983/solr]}}

{scrollbar}


Stop watching space: https://cwiki.apache.org/confluence/users/removespacenotification.action?spaceKey=solr
Change email notification preferences: https://cwiki.apache.org/confluence/users/editmyemailsettings.action


    

Mime
View raw message