lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2944) BytesRef reuse bugs in QueryParser and analysis.jsp
Date Tue, 01 Mar 2011 15:21:36 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13000915#comment-13000915
] 

Robert Muir commented on LUCENE-2944:
-------------------------------------

Well its not just the ICU implementation... Test2BTerms does this too.

In general the attributes are owned by the producer: for example the char[] in TermAttribute
is owned by the analysis chain, if you want to do something with it, you should copy it.

So it would be very strange from the analysis api to treat byte[] in the complete opposite
fashion... but I'm fine with making steps to prevent bugs.

> BytesRef reuse bugs in QueryParser and analysis.jsp
> ---------------------------------------------------
>
>                 Key: LUCENE-2944
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2944
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2944.patch, LUCENE-2944.patch
>
>
> Some code uses BytesRef as if it were a "String", in this case consumers of TermToBytesRefAttribute.
> The thing is, while our general implementation works on char[] and then populates the
consumers BytesRef,
> not all TermToBytesRefAttribute implementations do this, specifically ICU collation,
it reuses the bytes and simply sets the pointers:
> {noformat}
>   @Override
>   public int toBytesRef(BytesRef target) {
>     collator.getRawCollationKey(toString(), key);
>     target.bytes = key.bytes;
>     target.offset = 0;
>     target.length = key.size;
>     return target.hashCode();
>   }
> {noformat}
> Most of the blame falls on me as I added this to the queryparser in LUCENE-2514.
> Attached is a patch so that these consumers re-use a 'spare' and copy the bytes when
they are going to make a long lasting object such as a Term.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message