lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <>
Subject [jira] Commented: (LUCENE-2183) Supplementary Character Handling in CharTokenizer
Date Thu, 28 Jan 2010 17:25:34 GMT


Uwe Schindler commented on LUCENE-2183:

bq. we might want to insert a note/warning on the char-based methods, consistent with the
JDK javadocs, "Note this method cannot handle supplementary characters..." for example, like: I think
its important to include the link to the JDK explanation of what a supplementary character
is, also. 

For that a link using javadoc {@link Character#supplementary} would be good. I will fix this
here, as I already have the patcxh applied and will commit it later.

bq. if possible we might want to include some class-level wordage on how the whole thing works.
If you implement the int-based API, you can use your class with all Lucene Versions, and bw
layer will make it work correctly with old indexes. If you only stay with the char-based API,
then you can only use your CharTokenizer for Version <= 3.0. We can also mention it is
unnecessary to implement both, only the int-based api!!! 

++++++1. The old TokenStream API had a check for these problems, right?

> Supplementary Character Handling in CharTokenizer
> -------------------------------------------------
>                 Key: LUCENE-2183
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Simon Willnauer
>            Assignee: Uwe Schindler
>             Fix For: 3.1
>         Attachments: LUCENE-2183.patch, LUCENE-2183.patch, LUCENE-2183.patch, LUCENE-2183.patch,
> CharTokenizer is an abstract base class for all Tokenizers operating on a character level.
Yet, those tokenizers still use char primitives instead of int codepoints. CharTokenizer should
operate on codepoints and preserve bw compatibility. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message