Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 62265 invoked from network); 20 May 2002 04:55:36 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 20 May 2002 04:55:36 -0000 Received: (qmail 15147 invoked by uid 97); 20 May 2002 04:55:42 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@jakarta.apache.org Received: (qmail 15102 invoked by uid 97); 20 May 2002 04:55:41 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 15068 invoked by uid 98); 20 May 2002 04:55:40 -0000 X-Antivirus: nagoya (v4198 created Apr 24 2002) Errors-To: User-Agent: Microsoft-Entourage/10.0.0.1331 Date: Sun, 19 May 2002 21:55:36 -0700 Subject: Re: setting encoding From: Peter Carlson To: Lucene Users List Message-ID: In-Reply-To: Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N I don't know how have Lucene store in cp1252 (Windows latin-1), but I don't think you have to. I'm pretty sure it will take what ever information you have in a Java String and save it as unicode. Then recreate it into a Java String. So the issue I think you have is converting from cp1252 into a Java String which is pretty straight forward. Also, does the encoding matter? Can you convert cp1252 to UTF-8 on the fly (and even backward if needed)? The biggest problem is some cp1252 characters are "private" in the unicode byte set. You can get the conversion from the Glue lossless transcoder project. http://www.ascc.net/xml/en/utf-8/transcode-index.html I hope these random thoughts help. --Peter On 5/18/02 3:15 PM, "Dario Novakovic" wrote: > i need to search non-english text and it is written using Cp1252 encoding. > there are some fields i need to store using that encoding. i am able to > store them but some chars specific to 1252 are lost. how can i tell lucene > to store fields using specific encoding? > > thanks everybody > > _________________________________________________________________ > Send and receive Hotmail on your mobile device: http://mobile.msn.com > > > -- > To unsubscribe, e-mail: > For additional commands, e-mail: > > -- To unsubscribe, e-mail: For additional commands, e-mail: