lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4769) Add a CountingFacetsAggregator which reads ordinals from a cache
Date Tue, 12 Feb 2013 12:49:13 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13576593#comment-13576593
] 

Robert Muir commented on LUCENE-4769:
-------------------------------------

{quote}
Until then, we have no other choice but to piggy-back on DV API, and that means extending
DVFormat.
{quote}

Well mainly I'm trying to make sure we only have the minimum DocValues types and APIs we actually
need. Additional types are very costly to us.

I'm still unsure myself that lucene should have a byte[] docvalues type that is unsorted:
I don't see any real use cases for it directly.

But for someone who wants to encode their own data structures, having a per-document byte[]
type where your codec can see all the values is pretty powerful. So if having this "catch-all"
type prevents additional types from being added to lucene, then maybe its worth it.

{quote}
Perhaps separately we can think about an IndexReader impl for facets, which will open the
road to many different optimizations, e.g. maintaining a per-segment taxonomy and top-level
reader global-ordinal map (all in-memory), encoding facet ordinals in their own structure
(and not DV) and maybe even managing the global taxonomy as part of the search index (through
sidecar files or something), w/o the sidecar index, which I think today is a barrier for apps
as well as integrating that into Solr or ES. But that should be done separately as it's a
major refactoring to how facets work.
{quote}

I think a custom IndexReader impl would prevent barriers for integration with those systems
too, just in a different way. Personally I think the current design (sidecar) is the most
performant. But we should consider adding other possibilities to lucene that make different
tradeoffs, e.g. work without it. 

{quote}
First, I'm not hell-bent on anything (don't even know what that means). Second, facets are
now a *lucene* module, and not private to me. From my perspective, lucene doesn't need to
have anything for me, but lucene should have the best facets module. So far I've been busy
refactoring facets so they work faster and have cleaner API ... not to me, to lucene users.
I'm sure things can be simplified even further and improved even more. I think about it constantly.
If you have a better idea of how facets should work (while maintaining current capabilities,
as much as possible), I'm all open to suggestions, really.
{quote}

I know, you are doing a great job. I'm just explaining my opinion on this situation: having
facets "build on top of" BinaryDocValues doesnt hurt it in the slightest. Sometimes I wonder
if you are having this argument with me to avoid a single type cast in the facets codebase
or for some other cosmetic reason :)

                
> Add a CountingFacetsAggregator which reads ordinals from a cache
> ----------------------------------------------------------------
>
>                 Key: LUCENE-4769
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4769
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/facet
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>         Attachments: LUCENE-4769.patch, LUCENE-4769.patch
>
>
> Mike wrote a prototype of a FacetsCollector which reads ordinals from a CachedInts structure
on LUCENE-4609. I ported it to the new facets API, as a FacetsAggregator. I think we should
offer users the means to use such a cache, even if it consumes more RAM. Mike tests show that
this cache consumed x2 more RAM than if the DocValues were loaded into memory in their raw
form. Also, a PackedInts version of such cache took almost the same amount of RAM as straight
int[], but the gains were minor.
> I will post the patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message