lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 6091] - QueryParser not recognizing asterisk with UTF-8 index
Date Sat, 27 Mar 2004 14:20:30 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=6091>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=6091

QueryParser not recognizing asterisk with UTF-8 index

daniel.naber@t-online.de changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |INVALID



------- Additional Comments From daniel.naber@t-online.de  2004-03-27 14:20 -------
This is not a Lucene bug. Lucene takes a string so the caller is responsible 
that the string has been correctly decoded. What happens here is this: 
 
text.getBytes("UTF-8") returns the String as an array of bytes (UTF-8). Using 
this as the input for new String() will interpret this array as a byte 
sequence in the platform's default charset (usually iso-8859-1 on Linux). Thus 
the string is "broken"/misinterpreted. As Lucene has analyzers it relies on 
strings which have not been misinterpreted.

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message