lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4764) Faster but more RAM/Disk consuming DocValuesFormat for facets
Date Sat, 09 Feb 2013 16:37:12 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575199#comment-13575199
] 

Shai Erera commented on LUCENE-4764:
------------------------------------

I think that it would actually be interesting to test *only* VInt, without dgap. Because the
ords seem to be arbitrary, I'm not even sure what they buy us. Mike, can you try that? Index
with a Sorting(Unique(VInt8)) and modify FastCountingFacetsAggregator to not do dgap? Would
be interesting to see the effects on compression as well as speed. Dgap is something you want
to do if you suspect that a document will have e.g. higher ordinals, that are close to each
other in such a way that dgap would make them compress better ...

Robert, if I understand your proposal correctly, what you suggest is to encode:

int[] -- pairs of highest/lowest ordinal in a document + length (#additional ords)
byte[] -- a packed-int of deltas for all documents (but deltas are computed off the absolute
ord in the int[]

Why would that be better than a single byte[] (packed-ints) + offsets?
                
> Faster but more RAM/Disk consuming DocValuesFormat for facets
> -------------------------------------------------------------
>
>                 Key: LUCENE-4764
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4764
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 4.2, 5.0
>
>         Attachments: LUCENE-4764.patch
>
>
> The new default DV format for binary fields has much more
> RAM-efficient encoding of the address for each document ... but it's
> also a bit slower at decode time, which affects facets because we
> decode for every collected docID.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message