lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-4265) Encoding problem from test console
Date Fri, 04 Jan 2013 20:18:12 GMT

    [ https://issues.apache.org/jira/browse/SOLR-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544217#comment-13544217
] 

Uwe Schindler commented on SOLR-4265:
-------------------------------------

Alex: Solr expects all URL parameters encoded as UTF-8 - PERIOD. The problem we are discussing
about here is that some servlet containers use ISO-8859-1 to decode the parameters, so although
you pass UTF-8-URL-encoded values (e.g. your example would be "q=m%C3%AAme") the servlet container
may not use UTF-8 to decode the %-encoded parts. This causes the issue you have seen. And
this is currently a configuration issue (in Tomcat you have to change connector), in Jetty
you have to set the body encoding (sorry,

The HTTP protocol by itsself has nothing to do with this. The whole issue is about the request
URI and the decoding of the URL parameters (URLDecorder java class).

My proposal to fix this in a portable way (like we did with the InputStreams/OutputStreams
instead of using Readers/Writers to prevent the buggy Jetty Readers/Writers)): For POST requests,
let us set the body encoding (as demonstrated in the patch) to UTF-8. And for the GET-parameters
lets decode them manually. Its just a series of String.split() and URLDecoder.decode(...,
"UTF-8")
                
> Encoding problem from test console
> ----------------------------------
>
>                 Key: SOLR-4265
>                 URL: https://issues.apache.org/jira/browse/SOLR-4265
>             Project: Solr
>          Issue Type: Bug
>          Components: web gui
>    Affects Versions: 4.0
>         Environment: Windows but, environment independent
>            Reporter: Alex Rocher
>            Priority: Blocker
>         Attachments: SolrDispatchFilter.java.patch
>
>
> When you type an accent (in french language for example) in the console query tester,
there's no charset conversion (servlet request charset conversion)
> Eg.: "même" is converted into it's ISO-8859-1 representation ==> fail
> The reason : getCharacterEncoding from HTTPRequest is not tested. Il it's null, il will
assume to convert an UTF-8 encoding charset.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message