lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] Created: (LUCENE-2411) clean up uses of String.toLowerCase in code
Date Thu, 22 Apr 2010 17:29:50 GMT
clean up uses of String.toLowerCase in code

                 Key: LUCENE-2411
             Project: Lucene - Java
          Issue Type: Bug
    Affects Versions: 3.1
            Reporter: Robert Muir
             Fix For: 3.1

Uwe recently fixed this in the ThaiWordFilter, which reminded me to search our code for use
of String.toLowerCase()

The problem with this method is the following:
* it depends on the "default locale" which is flimsy and should be avoided I think, it typically
just causes problems.
  This is because there can be hard-to-debug issues if the machine is not configured correctly
for the same Locale
  at both index and query time.
* lowercasing with locale-sensitive rules is really only suitable for display and presentation,

  if we want international lowercasing for search we should be using case folding.
  This is especially important since otherwise people unknowingly using this special casing
at query-time are
  not going to get results, e.g. if they use a TermRangeQuery from the queryparser and it
lowercases stuff differently.

in my opinion we should fix all these methods to use Character.toLowerCase
(if possible especially for speed with TokenStreams), otherwise String.toLowerCase 
with the ROOT Locale, new Locale(""). This is closer to case folding.

If some things really need locale-sensitivity for some extreme reason I think we should just
make the Locale 
a mandatory parameter.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message