lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shai Erera <>
Subject Re: FacetedSearch and MultiReader
Date Wed, 23 Jan 2013 11:13:15 GMT
> I think we should open an issue to provide support for distributed
> faceting?


BTW Nicola, I remember you said something about TBs of indexes. I just
wanted to point out that if you have really large indexes, with many
documents, then you may want to look at facets sampling. I.e., instead of
working hard to get exact counts, you can sample the result set and get an
approximation to the top-K categories. You can then choose to either 'fully
count the approximated top-K', or stick w/ their partial counts and display
pctg (%) to the user.

In fact, when the number of results is so big, think about the following

A (456,873,234)
  A/1 (143,548,034)
  A/1 (137,323,452)

These numbers are too big for a human to process the value behind them.
Following the big numbers rule, these just denote "lots of results" to
Rather, it may be better if it displayed A/1 (87%) and A/2 (85%).
This is something you may want to consider too.

Sampling improves the performance of faceted search, especially on large
result sets.
Displaying % counts clarifies the returned top-K categories better, IMO, to
the common user.


On Tue, Jan 22, 2013 at 4:57 PM, Michael McCandless <> wrote:

> On Mon, Jan 21, 2013 at 11:20 PM, Shai Erera <> wrote:
> > (unfortunately, there's still no tool in Lucene to do that for you).
> I think we should open an issue to provide support for distributed
> faceting?
> For example, we already provide support for distributed searching
> (TopDocs.merge), and distributed grouping (TopGroups.merge) ... seems
> like we should do the same for distributed faceting (even though its
> somewhat tricky)?
> Mike McCandless
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message