lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <>
Subject [jira] Commented: (SOLR-1423) Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream & others
Date Tue, 15 Sep 2009 14:25:57 GMT


Yonik Seeley commented on SOLR-1423:

bq. This Tokenizer is not backwards compatible, as it only return non-zero length tokens.
Maybe we should have a switch somewhere to change this behaviour. 

Passing through only non-zero length tokens was probably always the intent, and the old behavior
is a bug and isn't useful, so I don't think  we need a switch.

> Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream & others
> --------------------------------------------------------------------------------
>                 Key: SOLR-1423
>                 URL:
>             Project: Solr
>          Issue Type: Task
>          Components: Analysis
>    Affects Versions: 1.4
>            Reporter: Uwe Schindler
>            Assignee: Koji Sekiguchi
>             Fix For: 1.4
>         Attachments: SOLR-1423-FieldType.patch, SOLR-1423-fix-empty-tokens.patch, SOLR-1423-with-empty-tokens.patch,
SOLR-1423.patch, SOLR-1423.patch, SOLR-1423.patch
> Because of some backwards compatibility problems (LUCENE-1906) we changed the CharStream/CharFilter
API a little bit. Tokenizer now only has a input field of type (as before the
CharStream code). To correct offsets, it is now needed to call the Tokenizer.correctOffset(int)
method, which delegates to the CharStream (if input is subclass of CharStream), else returns
an uncorrected offset. Normally it is enough to change all occurences of input.correctOffset()
to this.correctOffset() in Tokenizers. It should also be checked, if custom Tokenizers in
Solr do correct their offsets.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message