lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4620) Explore IntEncoder/Decoder bulk API
Date Mon, 14 Jan 2013 12:38:12 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552612#comment-13552612
] 

Shai Erera commented on LUCENE-4620:
------------------------------------

I made this change to VInt8IntDecoder instead of checking inside the loop:

{code}
int numValues = buf.length; // a value occupies at least 1 byte
if (values.ints.length < numValues) {
  values.grow(numValues);
}
{code}

Ran EncodingSpeed again and compared the results. On average (4 datasets), VInt8 achieves
a 0.69% speedup, DGap(VInt) 7.85% and Sorting(Unique(DGap(VInt))) 10.16%. The last one is
the default Encoder, thought its decoder is only DGap(VInt), so I'm not sure why the difference
between that run and the previous one with 7.85%.

However, it does look like it speeds things up...
                
> Explore IntEncoder/Decoder bulk API
> -----------------------------------
>
>                 Key: LUCENE-4620
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4620
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>             Fix For: 4.1, 5.0
>
>         Attachments: LUCENE-4620.patch, LUCENE-4620.patch, LUCENE-4620.patch
>
>
> Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) and decode(int).
Originally, we believed that this layer can be useful for other scenarios, but in practice
it's used only for writing/reading the category ordinals from payload/DV.
> Therefore, Mike and I would like to explore a bulk API, something like encode(IntsRef,
BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder can still be streaming (as we
don't know in advance how many ints will be written), dunno. Will figure this out as we go.
> One thing to check is whether the bulk API can work w/ e.g. facet associations, which
can write arbitrary byte[], and so may decoding to an IntsRef won't make sense. This too we'll
figure out as we go. I don't rule out that associations will use a different bulk API.
> At the end of the day, the requirement is for someone to be able to configure how ordinals
are written (i.e. different encoding schemes: VInt, PackedInts etc.) and later read, with
as little overhead as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message