lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Carlson <carl...@bookandhammer.com>
Subject Re: setting encoding
Date Mon, 20 May 2002 04:55:36 GMT
I don't know how have Lucene store in cp1252 (Windows latin-1), but I don't
think you have to.
I'm pretty sure it will take what ever information you have in a Java String
and save it as unicode. Then recreate it into a Java String.

So the issue I think you have is converting from cp1252 into a Java String
which is pretty straight forward.


Also, does the encoding matter?

Can you convert cp1252 to UTF-8 on the fly (and even backward if needed)?

The biggest problem is some cp1252 characters are "private" in the unicode
byte set.

You can get the conversion from the Glue lossless transcoder project.
http://www.ascc.net/xml/en/utf-8/transcode-index.html

I hope these random thoughts help.

--Peter

On 5/18/02 3:15 PM, "Dario Novakovic" <darionis@hotmail.com> wrote:

> i need to search non-english text and it is written using Cp1252 encoding.
> there are some fields i need to store using that encoding. i am able to
> store them but some chars specific to 1252 are lost. how can i tell lucene
> to store fields using specific encoding?
> 
> thanks everybody
> 
> _________________________________________________________________
> Send and receive Hotmail on your mobile device: http://mobile.msn.com
> 
> 
> --
> To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>
> 
> 


--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message