lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From karl øie <k...@gan.no>
Subject Re: How to include strange characters??
Date Mon, 14 Oct 2002 10:18:52 GMT
Also note that both apache and tomcat has a default setting that force 
re-encodes all pages. in tomcat it is the <DecodeInterceptor /> in 
server.xml, in apache it is a line that says "AddDefaultCharset on" in 
httpd.conf. These are applied _after_ any servlet output so it might 
lead to strange result, be sure to turn off both directives when you 
test different encoding problems.

Last but not least is the encoding the SQL database was created in. On 
DB2 i have to use the right database constructor to get norwegian 
character support (db2 CREATE DATABASE mydb USING CODESET ISO-8859-1 
TERRITORY NO COLLATE USING SYSTEM;). Without the correct encoding on 
the database constructor the database behave strange in sorting and 
insert/update scenarios.

To be sure to get everything make sure that all steps are using the 
same encoding, just like you use the same analyzer (perhaps encoding 
should be a part of a analyzer?!?)

1: create the database with ISO-8859-1 encoding (my favorite)...

	CREATE DATABASE mydb USING CODESET ISO-8859-1 TERRITORY NO COLLATE 
USING SYSTEM;

2: in the indexer force feed lucene with ISO-8859-1 strings:

	String value = resultset.getString("fieldname");
	document.add(Field.UnStored("fieldname", new 
String(value.getBytes("ISO-8859-1"))));
Mime
View raw message