lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DM Smith <>
Subject Re: Why release 3.0?
Date Tue, 17 Nov 2009 00:42:03 GMT

On Nov 16, 2009, at 6:43 PM, Robert Muir wrote:

> DM, in this case I'm not referring to surrogates, etc, but instead the idea that properties
for an existing character can change (the soft hyphen and arabic ayah were two examples),
also new characters are introduced.
> these will affect what analysis components (ex. tokenizers) do, because they like to
use categories such as .isWhiteSpace, .isLetter, things like that.
> this means these components have different behavior, because they are data-driven, even
though we didnt change any code. 

Then why not make ICU a dependency. At least then one has control of the delivered version.
Any of us that are working with texts in non latin-1 languages are likely to be using ICU

-- DM

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message