lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shalin Shekhar Mangar (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1662) BufferedTokenStream incorrect cloning
Date Thu, 17 Dec 2009 10:16:18 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12791866#action_12791866
] 

Shalin Shekhar Mangar commented on SOLR-1662:
---------------------------------------------

{quote}
So if we decide its the responsibility of the subclass, these implementations need thorough
tests to see if they are ok or not.
If we add the cloning to BufferedTokenStream itself, then we know they are ok...
{quote}

I think cloning should be done by sub-classes before writing. If BufferedTokenStream clones
the token then every sub-class pays the price even though the use-case may just be to throw
the token away.

> BufferedTokenStream incorrect cloning
> -------------------------------------
>
>                 Key: SOLR-1662
>                 URL: https://issues.apache.org/jira/browse/SOLR-1662
>             Project: Solr
>          Issue Type: Bug
>          Components: Schema and Analysis
>    Affects Versions: 1.4
>            Reporter: Robert Muir
>
> As part of writing tests for SOLR-1657, I rewrote one of the base classes (BaseTokenTestCase)
to use the new TokenStream API, but also with some additional safety.
> {code}
>  public static String tsToString(TokenStream in) throws IOException {
>     StringBuilder out = new StringBuilder();
>     TermAttribute termAtt = (TermAttribute) in.addAttribute(TermAttribute.class);
>     // extra safety to enforce, that the state is not preserved and also
>     // assign bogus values
>     in.clearAttributes();
>     termAtt.setTermBuffer("bogusTerm");
>     while (in.incrementToken()) {
>       if (out.length() > 0)
>         out.append(' ');
>       out.append(termAtt.term());
>       in.clearAttributes();
>       termAtt.setTermBuffer("bogusTerm");
>     }
>     in.close();
>     return out.toString();
>   }
> {code}
> Setting the term text to bogus values helps find bugs in tokenstreams that do not clear
or clone properly. In this case there is a problem with a tokenstream AB_AAB_Stream in TestBufferedTokenStream,
it converts A B -> A A B but does not clone, so the values get overwritten.
> This can be fixed in two ways: 
> * BufferedTokenStream does the cloning
> * subclasses are responsible for the cloning
> The question is which one should it be?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message