lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Satish Kagathare <>
Subject Does Lucene support UNICODE?
Date Tue, 08 Jun 2004 07:06:50 GMT


Does Lucene support UNICODE search and indexing of UNICODE 
data(especially..Devnagari unicode data)?
Does it make any difference between utf-8 & utf-16 unicode docs? Bcoz 
java strings supports utf-16.

Bcoz i tried indexing(using indexFiles & indexHTML from lucene Demo) 
devnagari uni data(utf-8 & utf-16) & seraching for query using tomcat, 
but it shows only utf-8 files and also shows files which does not 
contain query. Also It does not show summary of fetched docs in correct 

Also i have changed unicode range in HTMLparser.jj, StandardTokenizer.jj & 
QueryParser.jj and analyzer while indexing and parsing query but it does 
not reflect any changes in output.
shall i have to write my own analyzer for devnagari unicode data or 
Standaranalyzer will work for any languages?

Or does it require more changes? Plz mention problems and solutions. 

Thanks in advance
Satish Kagathara,
IIT Bombay.      


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message