lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <>
Subject [jira] [Commented] (LUCENE-4769) Add a CountingFacetsAggregator which reads ordinals from a cache
Date Tue, 12 Feb 2013 06:35:12 GMT


Shai Erera commented on LUCENE-4769:

Ok .. I think I know where the confusion is, and it's mostly due my lack of proper understanding
of Codecs ..

We basically mean the same thing, only what you propose is more realistic w/ today's IndexReader
API, which only exposes docValues. While what I had in mind (taking a look again at notes
I wrote few months ago) is that facets could have a CompositeReader impl which adds facets
specific API. Until then, we have no other choice but to piggy-back on DV API, and that means
extending DVFormat. Thanks for insisting, it made me understand how this should work ... (sorry,
but I didn't write a Codec yet).

Perhaps separately we can think about an IndexReader impl for facets, which will open the
road to many different optimizations, e.g. maintaining a per-segment taxonomy and top-level
reader global-ordinal map (all in-memory), encoding facet ordinals in their own structure
(and not DV) and maybe even managing the global taxonomy as part of the search index (through
sidecar files or something), w/o the sidecar index, which I think today is a barrier for apps
as well as integrating that into Solr or ES. But that should be done separately as it's a
major refactoring to how facets work.

Even FacetsDV are sort of a refactoring (i.e. replacing CategoryListIterator with that ..
if we want to do it right), so I think that for now I'm going to still commit that cache as
an aggregator and we can get rid of it once we do FacetsDV.

Oh .. and there was one thing that bothered me in that statement:

bq. You seem hell-bent on the idea that lucene should have a getInts(docid, IntsRef) api for

First, I'm not hell-bent on anything (don't even know what that means). Second, facets are
now a \*lucene\* module, and not private to me. From my perspective, *lucene* doesn't need
to have anything for me, but *lucene* should have the best facets module. So far I've been
busy refactoring facets so they work faster and have cleaner API ... not to me, to *lucene*
users. I'm sure things can be simplified even further and improved even more. I think about
it constantly. If you have a better idea of how facets should work (while maintaining current
capabilities, as much as possible), I'm all open to suggestions, really.
> Add a CountingFacetsAggregator which reads ordinals from a cache
> ----------------------------------------------------------------
>                 Key: LUCENE-4769
>                 URL:
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/facet
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>         Attachments: LUCENE-4769.patch
> Mike wrote a prototype of a FacetsCollector which reads ordinals from a CachedInts structure
on LUCENE-4609. I ported it to the new facets API, as a FacetsAggregator. I think we should
offer users the means to use such a cache, even if it consumes more RAM. Mike tests show that
this cache consumed x2 more RAM than if the DocValues were loaded into memory in their raw
form. Also, a PackedInts version of such cache took almost the same amount of RAM as straight
int[], but the gains were minor.
> I will post the patch shortly.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message