Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@apache.org Received: (qmail 93560 invoked from network); 18 Mar 2003 16:36:27 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 18 Mar 2003 16:36:27 -0000 Received: (qmail 3465 invoked by uid 97); 18 Mar 2003 16:38:12 -0000 Delivered-To: qmlist-jakarta-archive-lucene-user@nagoya.betaversion.org Received: (qmail 3458 invoked from network); 18 Mar 2003 16:38:11 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 18 Mar 2003 16:38:11 -0000 Received: (qmail 92905 invoked by uid 500); 18 Mar 2003 16:36:18 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 92849 invoked from network); 18 Mar 2003 16:36:17 -0000 Received: from leopard.unilog.fr (HELO leopard-out.unilog.fr) (194.3.185.110) by daedalus.apache.org with SMTP; 18 Mar 2003 16:36:17 -0000 Received: from leopard.unilog.fr (leopard.unilog.fr [127.0.0.1]) by leopard-out.unilog.fr (Postfix) with ESMTP id D9DFE115D for ; Tue, 18 Mar 2003 17:36:13 +0100 (CET) Received: from ses035100032 (unknown [192.168.19.131]) by leopard.unilog.fr (Postfix) with SMTP id 388311100 for ; Tue, 18 Mar 2003 17:36:13 +0100 (CET) From: "MERCIER ALEXANDRE" To: Subject: Indexing and searching non-latin languages using utf-8 Date: Tue, 18 Mar 2003 17:35:56 +0100 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Hi all, I've a matter with indexing then searching docs written in non-latin languages and encoded in utf-8 (Russian, by example). I have a web application, with a simple form to search in the contents of the docs. When I submit the form, I encode the query term in utf-8 with encodeURI(String) but I match no doc. I think that is due to a bad indexing but I'm not sure. Lucene is normally indexing docs in writing Terms in the 'xxx.tis' file, encoding it in utf-8, I believe. So when it reads the file, it correctly gets russian characters (2 bytes) but when writing them in the index, they seem different (I've listed the terms in my application console). If someone has a solution to resolve my problem, all advices are welcome. Thanks. Alex --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org