lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: Invalid character in search results
Date Tue, 04 Dec 2007 15:16:27 GMT
On Dec 4, 2007 5:02 AM, Maciej Szczytowski
<Maciej.Szczytowski@becomo.com> wrote:
> Hi, I use Solr 1.1 application for indexing russian documents. Sometimes
> I've got as search results docs with invalid character.
>
> For example I've indexed "иго" but search returned "и��о". It's strange
> because something has changed 2 bytes into 6 bytes.
>
> иго - D0 B8 D0 B3 D0 BE
>
> и��о - D0 B8 EF BF BD EF BF BD D0 BE
>
> This field is indexed as string verbatim.
>
> <fieldtype name="string" class="solr.StrField" sortMissingLast="true"
> omitNorms="true"/>
>
> After reindexing documents with invalid character are fixed.
>
> Has anybody idea where is the problem?

Probably an issue with the charset not being set correctly (or the
character encoding not matching the charset declaration) when it was
first indexed.

-Yonik
Mime
View raw message