lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hoss Man (JIRA)" <>
Subject [jira] Commented: (LUCENE-841) Replace UTF8 characters in stemmer code with integer values.
Date Wed, 21 Mar 2007 21:35:32 GMT


Hoss Man commented on LUCENE-841:

there are lots of OSes and editors where changing the file encoding is somewhat hard .. particularly
if you have reasons why other files need to be in ASCII to deal with other systems.

It's a trade off, people with UTF-8 capable environments would probably rather see the real
character, while people still using ascii would probably rather see \uXXXX ... i would think
the \xXXXX approach is the most universally functional, since anyone can lookup a character
from it's character code, but people looking at funky control characters can't always tell
what character code it is.

(I wonder if there is an fast/easy way to get a char from a Unicode Character name?)

> Replace UTF8 characters in stemmer code with integer values.
> ------------------------------------------------------------
>                 Key: LUCENE-841
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Karl Wettin
>            Priority: Critical
> BrazillianStemmer, GermanStemmer, FrenchStemmer and DutchStemmer all contains UTF characters
in the java code. All environments does not handle that. It really ought to be integer values
> I'll come up with a patch sooner or later.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message