lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "redpineseed" <>
Subject Re: setting encoding
Date Mon, 20 May 2002 20:29:58 GMT
> The biggest problem is some cp1252 characters are "private" in the unicode
> byte set.

those chararcters may not be in the unicode byte (char) set at all and that is the major trouble
with processing chinese, 

convert your native code to unicode (UTF16) with the following lines:

File f = new File('cp1252_input');
FileInputStream tmp = new FileInputStream(f);
BufferedReader  brin = new BufferedReader( new InputStreamReader( tmp, "CP1252"));
String inputString = brin.readLine();

not sure your code designater is CP1252, chech that out in Java Docs. 

View raw message