lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicola Buso <nb...@ebi.ac.uk>
Subject Re: FacetedSearch and MultiReader
Date Wed, 23 Jan 2013 12:22:34 GMT
Thanks Shai,

I'm trying your solution and it's working, I need to check some number
to test it.
As I said we are aware having big indexes, than I use facets only on
subsets, but if it will result in performances issues too than I'll for
sure take a look into facet sampling.


Nicola.

On Wed, 2013-01-23 at 13:13 +0200, Shai Erera wrote:
> >
> > I think we should open an issue to provide support for distributed
> > faceting?
> >
> 
> Opened https://issues.apache.org/jira/browse/LUCENE-4710.
> 
> BTW Nicola, I remember you said something about TBs of indexes. I just
> wanted to point out that if you have really large indexes, with many
> documents, then you may want to look at facets sampling. I.e., instead of
> working hard to get exact counts, you can sample the result set and get an
> approximation to the top-K categories. You can then choose to either 'fully
> count the approximated top-K', or stick w/ their partial counts and display
> pctg (%) to the user.
> 
> In fact, when the number of results is so big, think about the following
> result:
> 
> A (456,873,234)
>   A/1 (143,548,034)
>   A/1 (137,323,452)
> 
> These numbers are too big for a human to process the value behind them.
> Following the big numbers rule, these just denote "lots of results" to
> anyone.
> Rather, it may be better if it displayed A/1 (87%) and A/2 (85%).
> This is something you may want to consider too.
> 
> Sampling improves the performance of faceted search, especially on large
> result sets.
> Displaying % counts clarifies the returned top-K categories better, IMO, to
> the common user.
> 
> Shai
> 
> 
> On Tue, Jan 22, 2013 at 4:57 PM, Michael McCandless <
> lucene@mikemccandless.com> wrote:
> 
> > On Mon, Jan 21, 2013 at 11:20 PM, Shai Erera <serera@gmail.com> wrote:
> >
> > > (unfortunately, there's still no tool in Lucene to do that for you).
> >
> > I think we should open an issue to provide support for distributed
> > faceting?
> >
> > For example, we already provide support for distributed searching
> > (TopDocs.merge), and distributed grouping (TopGroups.merge) ... seems
> > like we should do the same for distributed faceting (even though its
> > somewhat tricky)?
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message