lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis.gospodne...@gmail.com>
Subject Re: Solr and Unicode characters in strings
Date Tue, 22 Jan 2013 16:59:01 GMT
Hi,

When you run your indexing app make sure you treat what you send to Solr as
UTF-8.
Use -Dfile.encoding=UTF8 -Dclient.encoding.override=UTF-8 to the Java
command line.

Otis
--
Solr & ElasticSearch Support
http://sematext.com/





On Mon, Jan 21, 2013 at 3:06 PM, Jack Park <jackpark@topicquests.org> wrote:

> Here is a situation I now experience:
>
>                         What Solr has:
>                                 economist and thus …@en
>                         What was sent:
>                                 economist and thus …@en
> where those are just snippets from what I sent up -- the ellipsis was
> created by Carrot2, and what comes back when I fetch the document with
> that passage.
>
> There is a hint in the Solr FAQ that the server must support UTF-8;
> it's not clear how to do that from HTTPSolrServer.
> Other hints from around the web suggest I should be using a different
> field than type = "string"
>
> I should point out that I am running these developmental tests on the
> Solr 4 example build with my schema.xml.
>
> My question is this: what simple, say, utility call would return the
> text to its original?
> (perhaps that's the wrong question...)
>
> Many thank in advance
> Jack
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message