lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Sekiguchi (JIRA)" <>
Subject [jira] Updated: (SOLR-1423) Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream & others
Date Mon, 14 Sep 2009 23:32:57 GMT


Koji Sekiguchi updated SOLR-1423:

    Attachment: SOLR-1423.patch

The patch that is Uwe's one with replacing split()/group() methods.

bq. Why does the PatternTokenizer does not have the methods newToken and so on in its own
Yeah, I'd realized it immediately after posting the patch, but I was going to be out.

And thank you for adapting it for new TokenStream API.

bq. I searched for setOffset() in Solr source code and found one additional occurence of it
without offset correcting in This patch fixes this.
Good catch, Uwe! I slipped over it.

I think the empty tokens is a bug and should be omitted in this patch.

bq. A second thing: Lucene has a new BaseTokenStreamTest class for checking tokens without
Token instances (which would no loger work, when Lucene 3.0 switches to Attributes only).
Maybe you should update these test and use assertAnalyzesTo from the new base class instead.
Very nice! Can you open a separate ticket?

> Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream & others
> --------------------------------------------------------------------------------
>                 Key: SOLR-1423
>                 URL:
>             Project: Solr
>          Issue Type: Task
>          Components: Analysis
>    Affects Versions: 1.4
>            Reporter: Uwe Schindler
>            Assignee: Koji Sekiguchi
>             Fix For: 1.4
>         Attachments: SOLR-1423-FieldType.patch, SOLR-1423.patch, SOLR-1423.patch, SOLR-1423.patch
> Because of some backwards compatibility problems (LUCENE-1906) we changed the CharStream/CharFilter
API a little bit. Tokenizer now only has a input field of type (as before the
CharStream code). To correct offsets, it is now needed to call the Tokenizer.correctOffset(int)
method, which delegates to the CharStream (if input is subclass of CharStream), else returns
an uncorrected offset. Normally it is enough to change all occurences of input.correctOffset()
to this.correctOffset() in Tokenizers. It should also be checked, if custom Tokenizers in
Solr do correct their offsets.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message