lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "HierarchicalFaceting" by HossMan
Date Wed, 10 Oct 2012 23:38:19 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "HierarchicalFaceting" page has been changed by HossMan:
http://wiki.apache.org/solr/HierarchicalFaceting?action=diff&rev1=13&rev2=14

Comment:
TOC and clean up heading lelves

  = Approaches to Hierarchical Facets in Solr =
  
- == facet.prefix ==
+ This document contains various suggestions and solutions for dealing with "Hierarchical
Facets" - a comcept which can mean differnet things to differnet people depending on their
data.
+ 
+ <<TableOfContents>>
+ 
+ = 'facet.prefix'  Based Drill Down =
  
  <!> [[Solr1.2]]
  
@@ -12, +16 @@

  
  This is a basic approach that works well for most use cases and takes advantage of basic
Solr faceting parameters by encoding the facet terms at index time.
  
- 
- ==== Flattened Data “breadcrumbs” ====
+ == Flattened Data “breadcrumbs” ==
  
  {{{
  Doc#1: NonFic > Law
@@ -25, +28 @@

  
  You must perform some index time processing on this flattened data in order to create the
tokens needed for a facet.prefix approach. When we index the data we create specially formatted
terms that encode the depth information for each node that appears as part of the path, and
include the hierarchy separated by a common separator (“depth/first level term/second level
term/etc”). We also add additional terms for every ancestors in the original data.
  
- ==== Indexed Terms ====
+ == Indexed Terms ==
  
  {{{
  Doc#1: 0/NonFic, 1/NonFic/Law
@@ -34, +37 @@

               0/NonFic, 1/NonFic/Sci, 2/NonFic/Sci/Phys
  }}}
  
- ==== Initial Query ====
+ == Initial Query ==
  
  With this type of index data, we can then go on and query this to get a drill-down.
  Initially, we can say we want to facet on the category field with the ''facet.prefix'' “1/NonFic”:
 things that are children of NonFic at a depth of 1.
@@ -54, +57 @@

       <int name=”1/NonFic/Law”>1</int>
  }}}
  
- ==== Drill Down ====
+ == Drill Down ==
  
  If we drill down into NonFic/Sci, we just add the ''fq'' (filter query) as normal and tweak
the ''facet.prefix'' from the children 1/NonFic to the children of 2/NonFic/Sci
  
@@ -75, +78 @@

  We’ve used the depth prefix that lets us look one level deep, but by tweaking the encoding,
alternative user experiences can be created.
  
  
- == PathHierarchyTokenizerFactory ==
+ = PathHierarchyTokenizerFactory =
+ 
  <!> [[Solr 3.1]]
  
  The [[http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PathHierarchyTokenizerFactory|solr.PathHierarchyTokenizerFactory]]
is designed to output file path hierarchies as synonyms, but can also be used in other simple
hierarchies.
  
- ==== Flattened Data ====
+ == Flattened Data ==
  
  {{{
  Doc #1: /usr/local/apache
@@ -88, +92 @@

  Doc #3: /etc/apache2/conf.d
  }}}
  
- ==== Output Tokens ====
+ == Output Tokens ==
  
  {{{
  Doc #1: /usr, /usr/local, /usr/local/apache
@@ -96, +100 @@

  Doc #3: /etc, /etc/apache2, /etc/apache2/conf.d
  }}}
  
- ==== Initial Query ====
+ == Initial Query ==
  
  {{{
  facet.field = category
@@ -117, +121 @@

  
  Unlike the ''facet.prefix'' approach, it isn’t as easy to constrain the depth of the taxonomies,
but for small numbers of terms this may be a good approach.
  
- == Pivot Facets ==
+ = Pivot Facets =
+ 
  <!> [[Solr 4.0]]
  [[https://issues.apache.org/jira/browse/SOLR-792|SOLR-792]]
  
@@ -127, +132 @@

  
  This feature can be easily applied to hierarchical facets in some cases, particularly those
where a particular document only appears at one point in the taxonomy.
  
- ==== Flattened Data “breadcrumbs” ====
+ == Flattened Data “breadcrumbs” ==
  
  {{{
  Doc#1: NonFic > Law
@@ -137, +142 @@

  
  At index time, we split the data into a separate field for each level of the hierarchy.
  
- ==== Indexed Terms ====
+ == Indexed Terms ==
  
  {{{
  Doc#1: category_level0: NonFic; category_level1: Law
@@ -180, +185 @@

         </arr>
  }}}
  
- == Strict hierarchical facets ==
+ = Strict hierarchical facets =
+ 
  [[https://issues.apache.org/jira/browse/SOLR-64|SOLR-64]]
  
  Strict Facet Hierarchies:
@@ -193, +199 @@

   * expand node if count > 100.... or maybe expand node if count > 10% of hits
  
  
- == Multipath hierarchical faceting ==
+ = Multipath hierarchical faceting =
  [[https://issues.apache.org/jira/browse/SOLR-2412|SOLR-2412]]
  
  Hierarchical faceting with slow startup, low memory overhead and fast response. Distinguishing
features as compared to [[https://issues.apache.org/jira/browse/SOLR-64|SOLR-64]] and [[https://issues.apache.org/jira/browse/SOLR-792|SOLR-792]]
are
@@ -205, +211 @@

  
  This is a shell around [[https://issues.apache.org/jira/browse/LUCENE-2369|LUCENE-2369]],
making it work with the Solr API. The underlying principle is to reference terms by their
ordinals and create an index wide documents to tags map, augmented with a compressed representation
of hierarchical levels.
  
- == Faceting Module ==
+ = Faceting Module =
  [[https://issues.apache.org/jira/browse/LUCENE-3079]]
  
  TBD (To Be Documented)

Mime
View raw message