lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <>
Subject [jira] Updated: (SOLR-1423) Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream & others
Date Tue, 15 Sep 2009 10:21:58 GMT


Uwe Schindler updated SOLR-1423:

    Attachment: SOLR-1423-fix-empty-tokens.patch

This is a patch that fixes the empty tokens:
This Tokenizer is not backwards compatible, as it only return non-zero length tokens. Maybe
we should have a switch somewhere to change this behaviour. It is currently for discussion

> Lucene 2.9 RC4 may need some changes in Solr Analyzers using CharStream & others
> --------------------------------------------------------------------------------
>                 Key: SOLR-1423
>                 URL:
>             Project: Solr
>          Issue Type: Task
>          Components: Analysis
>    Affects Versions: 1.4
>            Reporter: Uwe Schindler
>            Assignee: Koji Sekiguchi
>             Fix For: 1.4
>         Attachments: SOLR-1423-FieldType.patch, SOLR-1423-fix-empty-tokens.patch, SOLR-1423-with-empty-tokens.patch,
SOLR-1423.patch, SOLR-1423.patch, SOLR-1423.patch
> Because of some backwards compatibility problems (LUCENE-1906) we changed the CharStream/CharFilter
API a little bit. Tokenizer now only has a input field of type (as before the
CharStream code). To correct offsets, it is now needed to call the Tokenizer.correctOffset(int)
method, which delegates to the CharStream (if input is subclass of CharStream), else returns
an uncorrected offset. Normally it is enough to change all occurences of input.correctOffset()
to this.correctOffset() in Tokenizers. It should also be checked, if custom Tokenizers in
Solr do correct their offsets.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message