lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From PERRIN GOURON Olivier <olivier.per...@xml-ais.com>
Subject problem with non latin characters
Date Wed, 30 Oct 2002 16:20:46 GMT

Hello,

I am using Lucene to index UTF-16 and UTF-8 files . Those files are
trans-encoded to the right format so that they can be indexable with Lucene.
The index is searched through with queries made from an UTF-16 file.
Everything works fine as long my query file contains latin characters (even
specific french chars such as éàèêoe...)
Problems occur when the UTF-16 query file  contains not latin characters. I
have tried russian characters, such as ?, which is \u0418, but Lucene sends
me this error:

	Exception in thread "main"
org.apache.lucene.queryParser.TokenMgrError: Lexical
	error at line 1, column 8.  Encountered: "\u0018" (24), after : ""
        at
org.apache.lucene.queryParser.QueryParserTokenManager.getNextToken(Unknown
Source)
        at org.apache.lucene.queryParser.QueryParser.jj_scan_token(Unknown
Source)
        at org.apache.lucene.queryParser.QueryParser.jj_3_1(Unknown Source)
        at org.apache.lucene.queryParser.QueryParser.jj_2_1(Unknown Source)
        at org.apache.lucene.queryParser.QueryParser.Clause(Unknown Source)
        at org.apache.lucene.queryParser.QueryParser.Query(Unknown Source)
        at org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
        at org.apache.lucene.queryParser.QueryParser.parse(Unknown Source)
        at org.apache.lucene.CherchLeTex.main(CherchLeTex.java:51)

Seems that the queryParser doesn't use the right code for ?... I have tried
greek, and I does the same. Is it due to the analyser? to the query
parser?...
Have you ever met this problem? I would appreciate your help and advices

Thanks for your consideration

Olivier Perrin-Gouron
AIS Berger-Levrault


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message