lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tremont romain <romain.trem...@xml-ais.com>
Subject Bug in lucene located and fixed
Date Tue, 19 Nov 2002 09:22:26 GMT
Hi folk,

Remember a little ago Olivier Perrin was having trouble when indexing
and searching text in grec or russian.

I dig the source code and learned a little about javacc here is what I
found:

When you don't specify the Options UNICODE_INPUT the charactere table
created is a ASCII table. (a charctere table is different from the
encoding format !!! Unicode 3.0 is a characters table and UTF-8 is a
character encoding). So when dealing with characters over than the one
in the ASCII table javacc do not recognized it.

For exemple a russian character is not in the table. So when the
queryparser or the standard analyzer receive that, he doesn't know what
to do with it and abort.

Just by adding the UNICODE_INPUT = true; in both .jj file fixed the
problem.

Sorry for my poor english I hope you got the idea. Maybe I can submit a
little patch if wanted but I m not used to diff :)


 
-- 
trémont romain <romain.tremont@xml-ais.com>
A.I.S. http://www.xml-ais.com


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message