lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4769) Add a CountingFacetsAggregator which reads ordinals from a cache
Date Mon, 11 Feb 2013 21:33:12 GMT


Shai Erera commented on LUCENE-4769:

FacetsAggregator is an abstraction of the facets package that lets you compute different functions
on the aggregated ordinals. E.g. counting is equivalent to #sum(1), while SumScoreFacetsAggregator
does #sum(score) etc.

You're right that this could be implemented as a Codec, and then we won't even need to alert
the user that if he uses that caching method, he should use DiskValuesFormat. But it looks
an awkward decision to me. Usually, caching does not force you to index stuff in a specific
way. Rather, you decide at runtime if you want to cache the data or not. You can even choose
to stop using the cache, while the app is running. Also, it's odd that if the app already
indexed documents with the default Codec, it won't be able to using this caching method, unless
it reindexes, or until those segments are merged (b/c their DVFormat will be different, and
so the aggregator would need to revert to a different counting code).

I dunno ... it's certainly doable, but it doesn't feel right to me.
> Add a CountingFacetsAggregator which reads ordinals from a cache
> ----------------------------------------------------------------
>                 Key: LUCENE-4769
>                 URL:
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/facet
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>         Attachments: LUCENE-4769.patch
> Mike wrote a prototype of a FacetsCollector which reads ordinals from a CachedInts structure
on LUCENE-4609. I ported it to the new facets API, as a FacetsAggregator. I think we should
offer users the means to use such a cache, even if it consumes more RAM. Mike tests show that
this cache consumed x2 more RAM than if the DocValues were loaded into memory in their raw
form. Also, a PackedInts version of such cache took almost the same amount of RAM as straight
int[], but the gains were minor.
> I will post the patch shortly.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message