lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <>
Subject [jira] Commented: (LUCENE-2183) Supplementary Character Handling in CharTokenizer
Date Wed, 27 Jan 2010 13:54:37 GMT


Simon Willnauer commented on LUCENE-2183:

Short update: I found a bug in the latest version which was untested I will update soon with
a speed comparison between the current version and the version using the proxy class.

> Supplementary Character Handling in CharTokenizer
> -------------------------------------------------
>                 Key: LUCENE-2183
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Simon Willnauer
>            Assignee: Uwe Schindler
>             Fix For: 3.1
>         Attachments: LUCENE-2183.patch, LUCENE-2183.patch, LUCENE-2183.patch, LUCENE-2183.patch
> CharTokenizer is an abstract base class for all Tokenizers operating on a character level.
Yet, those tokenizers still use char primitives instead of int codepoints. CharTokenizer should
operate on codepoints and preserve bw compatibility. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message