lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson (JIRA)" <>
Subject [jira] [Resolved] (LUCENE-1216) CharDelimiterTokenizer
Date Sun, 10 Mar 2013 13:31:13 GMT


Erick Erickson resolved LUCENE-1216.

    Resolution: Won't Fix

SPRING_CLEANING_2013 We can reopen if necessary. Lots of work has been done with CJK since
this was opened.
> CharDelimiterTokenizer
> ----------------------
>                 Key: LUCENE-1216
>                 URL:
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/analysis
>            Reporter: Hiroaki Kawai
>            Assignee: Otis Gospodnetic
>            Priority: Minor
>         Attachments:,,,,
> WhitespaceTokenizer is very useful for space separated languages, but my Japanese text
is not always separated by a space. So, I created an alternative Tokenizer that we can specify
the delimiter. The file submitted will be an improvement of the current WhitespaceTokenizer.
> I tried to extend it from CharTokenizer, but CharTokenizer has a limitation that a token
can't be longer than 255 chars.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message