lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "HierarchicalFaceting" by ErikHatcher
Date Sun, 19 Jul 2009 21:25:46 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The following page has been changed by ErikHatcher:
http://wiki.apache.org/solr/HierarchicalFaceting

The comment on the change is:
New page to compare/contrast hierarchical faceting approaches

New page:
= Overview =

There are many cases where documents represent objects associated with hierarchical structures.
 For example, if documents represent restaurants one might want to have geographical hierarchies
such as "US/California/Cupertino".  There are various indexing techniques for faceting on
hierarchical structures.  At first, this wiki page is designed to describe the various approaches,
comparing and contrasting them across various real-world use cases.  As approaches become
codified and committed to Solr proper, this page will evolve to be a HOW-TO.

There won't be a single best approach to faceting on hierarchical fields, as different field
semantics and usages will lend themselves to being indexed in varying ways.

= Comparing some approaches =

There are currently two similar, non-competing, approaches to generating tree/hierarchical
facets from Solr: SOLR-64 and SOLR-792.  These approaches can be tried out easily using a
single set of sample data and the Solr example application (assumes current trunk codebase
and latest patches posted to the respective issues).

{{{
 svn http://svn.apache.org/repos/asf/lucene/solr/trunk/ hiersolr
 cd hiersolr
 patch -p0 < SOLR-64.patch
 patch -p1 < SOLR-792.patch  # note, p1 difference from previous line
 ant run-example

 # <new shell>
 ruby hiergen.rb > hierfacets.csv  # hiergen.rb pasted below
 curl "http://localhost:8983/solr/update/csv?commit=true&optimize=true" --data-binary
@hierfacets.csv -H 'Content-type:text/plain; charset=utf-8'
}}}

The hiergen.rb script outputs CSV with this format:

{{{
  id,levels_h,level1_s,level2_s
  0,A/1,A,1
  1,A/2,A,2
  ...
  259998,Z/9999,Z,9999
  259999,Z/10000,Z,10000
}}}

An initial set of two-level hierarchical facets were generated, values A-Z for the top level
and values 1-10000 for the second level for a total of 260,000 documents.  The levels_h field
is used for trying out SOLR-64.  The level1_s and level2_s fields are for trying out SOLR-792.

Details of each implementation on generating the entire facet hierarchy across all documents
(request stats shown were for 2nd or later duplicate requests, thereby ensuring filter caches
are warmed):

SOLR-64:
{{{
$ time curl "http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=levels_h"
| wc
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
100 14.5M    0 14.5M    0     0  2745k      0 --:--:--  0:00:05 --:--:-- 3359k
      4  520101 15256346

real	0m5.431s
user	0m0.143s
sys	0m0.073s
}}}
Solr logged this:
{{{
$      [java] INFO: [] webapp=/solr path=/select params={facet.field=levels_h&rows=0&q=*:*&facet=on}
hits=260000 status=0 QTime=907
}}}
Summary of key SOLR-64 stats:
  * filter cache entries created: 260027
  * solr response time: 907ms
  * time to receive response: 5.4s !!!
  * response size: 14.5M !!!

SOLR-792:
{{{
$ time curl "http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.tree=level1_s,level2_s&facet.field=level1_s"
| wc
 % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
100 63816    0 63816    0     0  1213k      0 --:--:-- --:--:-- --:--:-- 5665k
      4    2677   63816

real	0m0.056s
user	0m0.002s
sys	0m0.006s
}}}
Solr logged this:
{{{
    [java] INFO: [] webapp=/solr path=/select params={facet.field=level1_s&facet.tree=level1_s,level2_s&rows=0&q=*:*&facet=on}
hits=260000 status=0 QTime=29
}}}

Summary of key SOLR-792 stats:
  * filter cache entries created: 27
  * solr response time: 29ms
  * time to receive response: 56ms
  * response size: 63K

= Basic hierarchical facet use cases =
== Facets across all documents for only top-level of hierarchy ==
   In general, no need to leverage any tree/hierarchical faceting for this use case; index
the first level as a separate facet field and use current Solr faceting capabilities for this
common case.
   * SOLR-64:  http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=levels_h&facet.depth=1
   * SOLR-792: http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=level1_s

== Facet across second level of hierarchy given single top-level constraint ==
   * SOLR-64:  http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=levels_h&fq=levels_h:A*&facet.mincount=1
   * SOLR-792: http://localhost:8983/solr/select?q=*:*&rows=0&facet=on&facet.field=level2_s&fq=level1_s:A&facet.mincount=1
This is existing Solr built-in faceting/filtering, the SOLR-792 patch is not involved in this
request.

Mime
View raw message