lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4600) Explore facets aggregation during documents collection
Date Mon, 21 Jan 2013 15:30:13 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558838#comment-13558838
] 

Shai Erera commented on LUCENE-4600:
------------------------------------

Thanks for running this. I think that given these results, making NO_PARENTS the default policy
is not that good. I anyway think it's not a good default, because it forces the user to stop
and think if the documents that he'll index share or not parents. This looks like an advanced
setting to me, i.e. if you want to get "expert" and really know your content, then you can
choose to index like so. Plus, given those statistics, I'd say that you have to test before
you go to production with it (i.e. looks like it may be expensive as the number of ordinals
grow...).

Mike found a bug in how I count up the parents in the NO_PARENTS case, so I fixed it (and
added a test). I'll run tests a couple of times and commit this.
                
> Explore facets aggregation during documents collection
> ------------------------------------------------------
>
>                 Key: LUCENE-4600
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4600
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>            Assignee: Shai Erera
>         Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch,
LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch
>
>
> Today the facet module simply gathers all hits (as a bitset, optionally with a float[]
to hold scores as well, if you will aggregate them) during collection, and then at the end
when you call getFacetsResults(), it makes a 2nd pass over all those hits doing the actual
aggregation.
> We should investigate just aggregating as we collect instead, so we don't have to tie
up transient RAM (fairly small for the bit set but possibly big for the float[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message