lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Koji Sekiguchi (JIRA)" <>
Subject [jira] Updated: (SOLR-1423) Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream & others
Date Mon, 14 Sep 2009 02:26:57 GMT


Koji Sekiguchi updated SOLR-1423:

    Attachment: SOLR-1423.patch

I thought I call tokenizer.correctOffset() in newToken() method, but I couldn't because the
method is protected. In this patch, I converted the anonymous Tokenizer class to PatternTokenizer,
and PatternTokenizer has the following:

+    public int correct( int currentOffset ){                                   
+      return correctOffset( currentOffset );                                   
+    }                                                                          

> Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream & others
> --------------------------------------------------------------------------------
>                 Key: SOLR-1423
>                 URL:
>             Project: Solr
>          Issue Type: Task
>          Components: Analysis
>    Affects Versions: 1.4
>            Reporter: Uwe Schindler
>            Assignee: Koji Sekiguchi
>             Fix For: 1.4
>         Attachments: SOLR-1423.patch
> Because of some backwards compatibility problems (LUCENE-1906) we changed the CharStream/CharFilter
API a little bit. Tokenizer now only has a input field of type (as before the
CharStream code). To correct offsets, it is now needed to call the Tokenizer.correctOffset(int)
method, which delegates to the CharStream (if input is subclass of CharStream), else returns
an uncorrected offset. Normally it is enough to change all occurences of input.correctOffset()
to this.correctOffset() in Tokenizers. It should also be checked, if custom Tokenizers in
Solr do correct their offsets.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message