lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Itamar Syn-Hershko" <ita...@code972.com>
Subject RE: Problem indexin accented characters.
Date Sun, 20 Jun 2010 15:40:30 GMT
Looks like an encoding issue. Is the file being read correctly (check with
your debugger)?

Also, please post such questions to the CLucene user group.

Itamar. 

> -----Original Message-----
> From: Itziar Cortes [mailto:itziar@eleka.net] 
> Sent: Sunday, June 20, 2010 12:21 PM
> To: general@lucene.apache.org
> Subject: Problem indexin accented characters.
> 
> Hi all!
> 
> I have a little problem with CLucene when I try to index 
> accented characters. I need index characters like ñ, è, ü, or 
> ó. I use Luke to see the indexed data.
> 
> I tried this, and I had no problem:
> 
>  pDoc->add(*new Field(_T("field"), _T("a b ñ c d"), 
> Field::STORE_YES | Field::INDEX_TOKENIZED));
> 
> 
> The problem begins when I tried read from a file, and index 
> each line. For example,
> 
>  wifstream file;
>  wstring lineread;
>  while(std::getline(file, lineread)){
>       pDoc->add(*new Field(_T("testua"), lineread.c_str(), 
> Field::STORE_YES
> | Field::INDEX_TOKENIZED));
> 
> It only index "a" and "b".
> 
> 
> How can I solve this problem?
> 
> Thanks in advance,
> 
> Best regards,
> 
> --
> Itziar
> 


Mime
View raw message