lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
Date Wed, 02 Dec 2009 15:12:20 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784867#action_12784867
] 

Robert Muir commented on LUCENE-2074:
-------------------------------------

Uwe, thats not the only interesting new feature. The other is it supports the properties necessary
to actually tokenize text according to unicode standards, instead of words just being runs
of IsLetter=True. 

You can see such a grammar here: http://jflex.svn.sourceforge.net/viewvc/jflex/trunk/testsuite/testcases/src/test/cases/unicode-word-break/UnicodeWordBreakRules_5_2.flex?revision=578&view=markup

you should try %unicode 5.2 for now, but we shouldn't wimp out with that for 3.1.


> Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer
> -------------------------------------------------------------------------------
>
>                 Key: LUCENE-2074
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2074
>             Project: Lucene - Java
>          Issue Type: Bug
>    Affects Versions: 3.0
>            Reporter: Uwe Schindler
>            Assignee: Uwe Schindler
>             Fix For: 3.1
>
>         Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, LUCENE-2074-lucene30.patch,
LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch
>
>
> The current trunk version of StandardTokenizerImpl was generated by Java 1.4 (according
to the warning). In Java 3.0 we switch to Java 1.5, so we should regenerate the file.
> After regeneration the Tokenizer behaves different for some characters. Because of that
we should only use the new TokenizerImpl when Version.LUCENE_30 or LUCENE_31 is used as matchVersion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message