From general-return-2597-apmail-lucene-general-archive=lucene.apache.org@lucene.apache.org Sun Jun 20 15:41:16 2010 Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 52742 invoked from network); 20 Jun 2010 15:41:16 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 Jun 2010 15:41:16 -0000 Received: (qmail 98216 invoked by uid 500); 20 Jun 2010 15:41:16 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 98074 invoked by uid 500); 20 Jun 2010 15:41:14 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 98066 invoked by uid 99); 20 Jun 2010 15:41:14 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 20 Jun 2010 15:41:14 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [188.121.53.1] (HELO n1plout04-01.prod.ams1.secureserver.net) (188.121.53.1) by apache.org (qpsmtpd/0.29) with SMTP; Sun, 20 Jun 2010 15:41:05 +0000 Received: (qmail 28924 invoked from network); 20 Jun 2010 15:40:44 -0000 Received: from unknown (95.35.184.178) by n1plout04-01.prod.ams1.secureserver.net (188.121.53.1) with ESMTP; 20 Jun 2010 15:40:43 -0000 From: "Itamar Syn-Hershko" To: Cc: References: Subject: RE: Problem indexin accented characters. Date: Sun, 20 Jun 2010 18:40:30 +0300 Message-ID: <3F3DB3D86C7846D994C93437EDE6BB88@hp6690ej01> MIME-Version: 1.0 Content-Type: text/plain; charset="ISO-8859-1" Content-Transfer-Encoding: quoted-printable X-Mailer: Microsoft Office Outlook 11 Thread-Index: AcsQWerxqNtQ+AgfQtiCgpWgjQtsdQANBjQg X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5931 In-Reply-To: X-Virus-Checked: Checked by ClamAV on apache.org Looks like an encoding issue. Is the file being read correctly (check = with your debugger)? Also, please post such questions to the CLucene user group. Itamar.=20 > -----Original Message----- > From: Itziar Cortes [mailto:itziar@eleka.net]=20 > Sent: Sunday, June 20, 2010 12:21 PM > To: general@lucene.apache.org > Subject: Problem indexin accented characters. >=20 > Hi all! >=20 > I have a little problem with CLucene when I try to index=20 > accented characters. I need index characters like =F1, =E8, =FC, or=20 > =F3. I use Luke to see the indexed data. >=20 > I tried this, and I had no problem: >=20 > pDoc->add(*new Field(_T("field"), _T("a b =F1 c d"),=20 > Field::STORE_YES | Field::INDEX_TOKENIZED)); >=20 >=20 > The problem begins when I tried read from a file, and index=20 > each line. For example, >=20 > wifstream file; > wstring lineread; > while(std::getline(file, lineread)){ > pDoc->add(*new Field(_T("testua"), lineread.c_str(),=20 > Field::STORE_YES > | Field::INDEX_TOKENIZED)); >=20 > It only index "a" and "b". >=20 >=20 > How can I solve this problem? >=20 > Thanks in advance, >=20 > Best regards, >=20 > -- > Itziar >=20