lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <>
Subject [jira] Commented: (LUCENE-1906) Problem with CharStream and Tokenizers with custom reset(Reader) method
Date Thu, 10 Sep 2009 16:35:58 GMT


Uwe Schindler commented on LUCENE-1906:

bq. I will now check, if the change of the "input" member variable leads to backwards breaks
(it was changed from Reader to CharStream)...

We also have a not-in-CHANGES.txt backwards break. We changed Reader to CharStream. When committing
this change, also the backwards-branch was modified to use CharStream. I reverted this change
(to get the state of a legacy Tokenizer) and wrote a Test, that called inside
the Tokenizer. The test compiled correct in backwards and also run correct.

In trunk, the test-tag failed:
    [junit] Testcase: testChangeToCharStream29(org.apache.lucene.analysis.TestTokenizer):
      Caused an ERROR
    [junit] input
    [junit] java.lang.NoSuchFieldError: input
    [junit]     at org.apache.lucene.analysis.TestTokenizer$
    [junit]     at org.apache.lucene.analysis.TestTokenizer.testChangeToCharStream29(
    [junit] Test org.apache.lucene.analysis.TestTokenizer FAILED

So somebody cannot use his old Tokenizers without recompiling (because Java is not able to
respect the type change). After recompiling his classes it works.

If we want to do this, we should clearly state this in CHANGES at the backwards-breaks.

> Problem with CharStream and Tokenizers with custom reset(Reader) method
> -----------------------------------------------------------------------
>                 Key: LUCENE-1906
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 2.9
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>            Priority: Blocker
>             Fix For: 2.9
>         Attachments: LUCENE-1906.patch
> When reviewing the new CharStream code added to Tokenizers, I found a
> serious problem with backwards compatibility and other Tokenizers, that do
> not override reset(CharStream).
> The problem is, that e.g. CharTokenizer only overrides reset(Reader):
> {code}
>   public void reset(Reader input) throws IOException {
>     super.reset(input);
>     bufferIndex = 0;
>     offset = 0;
>     dataLen = 0;
>   }
> {code}
> If you reset such a Tokenizer with another CharStream (not a Reader), this
> method will never be called and breaking the whole Tokenizer.
> As CharStream extends Reader, I propose to remove this reset(CharStream
> method) and simply do an instanceof check to detect if the supplied Reader
> is no CharStream and wrap it. We could also remove the extra ctor (because
> most Tokenizers have no support for passing CharStreams). If the ctor also
> checks with instanceof and warps as needed the code is backwards compatible
> and we do not need to add additional ctors in subclasses.
> As this instanceof check is always done in CharReader.get() why not remove
> ctor(CharStream) and reset(CharStream) completely?
> Any thoughts?
> I would like to fix this somehow before RC4, I'm, sorry :(

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message