lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gulliver Smith <gulliver.m.sm...@gmail.com>
Subject Re: Character encoding problems
Date Thu, 31 Jul 2014 01:48:14 GMT
Thanks for all the replies - I should have made clear that the first
thing I did was confirm that everything on the PHP side is UTF-8. The
web pages, the input text, the input files etc. The browser confirms
that the encoding is UTF-8 for all of the web pages, the response
headers as inspected by the development tools. The PHP curl POSTS are
definitely UTF-8 and the responses from Solr claim to be UTF-8.

The really strange thing is that iconv("utf-8", "iso-8859-1", $title)
turns the value into something that the browser, with the UTF-8
encoding, displays correctly.

On Tue, Jul 29, 2014 at 5:55 PM, Paul Libbrecht <paul@hoplahup.net> wrote:
>> If you are seeing " appelé au téléphone" in the browser, I would guess that
the data is being rendered in UTF-8 by your server and the content type of the html is set
to iso-8859-1 or not being set and your browser is defaulting to iso-8859-1.
>>
>> You can force the encoding to utf-8 in the browser, usually this is a menu item (in
Chrome/Safari/Firefox).
>>
>> FWIW having messed around with this kind of stuff in the past, I always generate
utf-8 and always set the HTML content type to utf-8 with:
>>
>>       <meta contentType-equiv="Content-Type" content="text/html; charset=utf-8"
/>
>
> And make sure that the server does not send the charset in the header.
> This can happen and, as per http (I think) takes precedence to the content indicated
encoding.
>
> paul

Mime
View raw message