lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (Issue Comment Edited) (JIRA)" <j...@apache.org>
Subject [jira] [Issue Comment Edited] (LUCENE-3654) Optimize BytesRef comparator to use Unsafe long based comparison (when possible)
Date Wed, 21 Dec 2011 10:09:31 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-3654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173973#comment-13173973
] 

Uwe Schindler edited comment on LUCENE-3654 at 12/21/11 10:08 AM:
------------------------------------------------------------------

The SIGSEGV can be solved by doing some safety checks at the beginning of compare: check that
offset>=0 and offset+length<=bytes.length. If you use Unsafe, you have to make sure
that your parameters are 1000% correct, that's all. This is why java.nio does lots of checks
in their Buffer methods.

*EDIT*
You also have to copy offset, length and the actual byte[] reference to a local variable at
the beginning and before the bounds checks (because otherwise another thread could change
the *public* npon-final fields in BytesRef and cause OOM). BytesRef is a user-visible class
so it must be 100% safe against all usage-violations.

Based on this additional overhead, the whole comparator makes no sense except for terms with
a size of 200 bytes. But Lucene terms are in 99% of all cases shorter.

If you want to use this comparator, just subclass Lucene40Codec and return it as term comparator,
this can be completely outside Lucene. You can even use Guava.
                
      was (Author: thetaphi):
    The SIGSEGV can be solved by doing some safety checks at the beginning of compare: check
that offset>=0 and offset+length<=bytes.length. If you use Unsafe, you have to make
sure that your parameters are 1000% correct, that's all. This is why java.nio does lots of
checks in their Buffer methods.
                  
> Optimize BytesRef comparator to use Unsafe long based comparison (when possible)
> --------------------------------------------------------------------------------
>
>                 Key: LUCENE-3654
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3654
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index, core/search
>            Reporter: Shay Banon
>         Attachments: LUCENE-3654.patch
>
>
> Inspire by Google Guava UnsignedBytes lexi comparator, that uses unsafe to do long based
comparisons over the bytes instead of one by one (which yields 2-4x better perf), use similar
logic in BytesRef comparator. The code was adapted to support offset/length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message