Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 32137 invoked from network); 21 Jun 2010 06:06:12 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 21 Jun 2010 06:06:12 -0000 Received: (qmail 13258 invoked by uid 500); 21 Jun 2010 06:06:11 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 13017 invoked by uid 500); 21 Jun 2010 06:06:09 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 13009 invoked by uid 99); 21 Jun 2010 06:06:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Jun 2010 06:06:08 +0000 X-ASF-Spam-Status: No, hits=2.4 required=10.0 tests=AWL,HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.214.48] (HELO mail-bw0-f48.google.com) (209.85.214.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Jun 2010 06:06:03 +0000 Received: by bwz10 with SMTP id 10so391134bwz.35 for ; Sun, 20 Jun 2010 23:05:41 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.8.144 with SMTP id h16mr2453598bkh.211.1277100341166; Sun, 20 Jun 2010 23:05:41 -0700 (PDT) Received: by 10.204.72.2 with HTTP; Sun, 20 Jun 2010 23:05:41 -0700 (PDT) In-Reply-To: <3F3DB3D86C7846D994C93437EDE6BB88@hp6690ej01> References: <3F3DB3D86C7846D994C93437EDE6BB88@hp6690ej01> Date: Mon, 21 Jun 2010 08:05:41 +0200 Message-ID: Subject: Re: Problem indexin accented characters. From: Itziar Cortes To: general@lucene.apache.org Cc: clucene-developers@lists.sourceforge.net Content-Type: multipart/alternative; boundary=0015175888962d8448048984188c --0015175888962d8448048984188c Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi! Thanks for the reply. I supposed the problem could be encoding problem... but I am sure that the file is reading correctly. Generally I have a problem when I tried to index a variable. Could you tell me where can I post this question in CLucene user group? Is that a mailing list? Thanks in advance, -- Itziar 2010/6/20 Itamar Syn-Hershko > Looks like an encoding issue. Is the file being read correctly (check wit= h > your debugger)? > > Also, please post such questions to the CLucene user group. > > Itamar. > > > -----Original Message----- > > From: Itziar Cortes [mailto:itziar@eleka.net] > > Sent: Sunday, June 20, 2010 12:21 PM > > To: general@lucene.apache.org > > Subject: Problem indexin accented characters. > > > > Hi all! > > > > I have a little problem with CLucene when I try to index > > accented characters. I need index characters like =F1, =E8, =FC, or > > =F3. I use Luke to see the indexed data. > > > > I tried this, and I had no problem: > > > > pDoc->add(*new Field(_T("field"), _T("a b =F1 c d"), > > Field::STORE_YES | Field::INDEX_TOKENIZED)); > > > > > > The problem begins when I tried read from a file, and index > > each line. For example, > > > > wifstream file; > > wstring lineread; > > while(std::getline(file, lineread)){ > > pDoc->add(*new Field(_T("testua"), lineread.c_str(), > > Field::STORE_YES > > | Field::INDEX_TOKENIZED)); > > > > It only index "a" and "b". > > > > > > How can I solve this problem? > > > > Thanks in advance, > > > > Best regards, > > > > -- > > Itziar > > > > --0015175888962d8448048984188c--