lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrey Grishin" <>
Subject problems with search on Russian content
Date Thu, 21 Nov 2002 13:38:25 GMT
Hi All, 
I have a problems with searching on Russian content using lucene 1.2

I indexed the content using Cp1251 charset
text = new String(text.getBytes("Cp1251"));

and I am searching using the same charset

String txt = "Анд";
txt = new String(txt.getBytes("Cp1251"));
PrefixQuery query = new PrefixQuery(new Term(PortalHTMLDocument.CONTENT_FIELD, txt));
hits =;


Analyzer analyzer = new StandardAnalyzer();
String txt = "Андрей";
txt = new String(txt.getBytes("Cp1251"));
Query query = QueryParser.parse(txt, PortalHTMLDocument.CONTENT_FIELD, analyzer);

hits =;

and lucene can't find nothing.
Also I checked for the DecodeInterceptor in my server.xml - there isn't any

I tried UTF-8/16 - and got the same result.

Also, if I list all index's content via iterating IndexReader - I can see that my russian
content is stored in index...
Can you please help me? Do you have any more ideas about what else can be done here to fix
this problem?

I will appreciate any help.
Thanks, Andrey.

I am using lucene 1.2, tomcat 4.1.12, jdk 1.4.1 on Win2000 AS
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message