lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Updated: (LUCENE-2944) BytesRef reuse bugs in QueryParser and analysis.jsp
Date Tue, 01 Mar 2011 15:07:37 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-2944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-2944:
--------------------------------

    Attachment: LUCENE-2944.patch

I reviewed all uses of this attribute, and fixed some more problems in contrib and solr.

So in my opinion there are two options:
1. apply this patch and fix the javadoc for this expert attribute, which does say that it
makes a copy of the bytes.
2. Don't apply this patch, but instead change Test2BTerms and ICUCollationAttribute to make
(useless) copies of the bytes for each term.

The indexer has no problems either way, the problem is only other consumers. I'm just bringing
up the second option because any performance improvement saved from not copying the bytes
might be negligible, and clearly its easy to screw this up.


> BytesRef reuse bugs in QueryParser and analysis.jsp
> ---------------------------------------------------
>
>                 Key: LUCENE-2944
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2944
>             Project: Lucene - Java
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 4.0
>
>         Attachments: LUCENE-2944.patch, LUCENE-2944.patch
>
>
> Some code uses BytesRef as if it were a "String", in this case consumers of TermToBytesRefAttribute.
> The thing is, while our general implementation works on char[] and then populates the
consumers BytesRef,
> not all TermToBytesRefAttribute implementations do this, specifically ICU collation,
it reuses the bytes and simply sets the pointers:
> {noformat}
>   @Override
>   public int toBytesRef(BytesRef target) {
>     collator.getRawCollationKey(toString(), key);
>     target.bytes = key.bytes;
>     target.offset = 0;
>     target.length = key.size;
>     return target.hashCode();
>   }
> {noformat}
> Most of the blame falls on me as I added this to the queryparser in LUCENE-2514.
> Attached is a patch so that these consumers re-use a 'spare' and copy the bytes when
they are going to make a long lasting object such as a Term.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message