lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Itziar Cortes <itz...@eleka.net>
Subject Re: Problem indexin accented characters.
Date Mon, 21 Jun 2010 06:05:41 GMT
Hi!

Thanks for the reply.

I supposed the problem could be encoding problem... but I am sure that the
file is reading correctly.

Generally I have a problem when I tried to index a variable.

Could you tell me where can I post this question in CLucene user group? Is
that a mailing list?

Thanks in advance,

--
Itziar

2010/6/20 Itamar Syn-Hershko <itamar@code972.com>

> Looks like an encoding issue. Is the file being read correctly (check with
> your debugger)?
>
> Also, please post such questions to the CLucene user group.
>
> Itamar.
>
> > -----Original Message-----
> > From: Itziar Cortes [mailto:itziar@eleka.net]
> > Sent: Sunday, June 20, 2010 12:21 PM
> > To: general@lucene.apache.org
> > Subject: Problem indexin accented characters.
> >
> > Hi all!
> >
> > I have a little problem with CLucene when I try to index
> > accented characters. I need index characters like ñ, è, ü, or
> > ó. I use Luke to see the indexed data.
> >
> > I tried this, and I had no problem:
> >
> >  pDoc->add(*new Field(_T("field"), _T("a b ñ c d"),
> > Field::STORE_YES | Field::INDEX_TOKENIZED));
> >
> >
> > The problem begins when I tried read from a file, and index
> > each line. For example,
> >
> >  wifstream file;
> >  wstring lineread;
> >  while(std::getline(file, lineread)){
> >       pDoc->add(*new Field(_T("testua"), lineread.c_str(),
> > Field::STORE_YES
> > | Field::INDEX_TOKENIZED));
> >
> > It only index "a" and "b".
> >
> >
> > How can I solve this problem?
> >
> > Thanks in advance,
> >
> > Best regards,
> >
> > --
> > Itziar
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message