lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-1959) SolrJ GET operation does not send correct encoding
Date Tue, 22 Jun 2010 13:02:04 GMT

    [ https://issues.apache.org/jira/browse/SOLR-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881190#action_12881190
] 

Yonik Seeley commented on SOLR-1959:
------------------------------------

bq. The problem is that URLs are urlencoded with UTF-8 but the Content-type: header is not
set.

Content-type does not apply to a GET.  The URL in a GET is strictly defined to be percent
encoded UTF-8 bytes.  For historic reasons, Tomcat defaults to latin-1, and it needs to be
changed in the server config.

http://www.ietf.org/rfc/rfc3986.txt


> SolrJ GET operation does not send correct encoding
> --------------------------------------------------
>
>                 Key: SOLR-1959
>                 URL: https://issues.apache.org/jira/browse/SOLR-1959
>             Project: Solr
>          Issue Type: Bug
>          Components: clients - java
>    Affects Versions: 1.4.1, Next
>            Reporter: Lance Norskog
>         Attachments: SOLR-1959.patch
>
>
> The SolrJ query operation fails to set the character encoding when doing a GET. It works
when doing a POST.
> The problem is that URLs are urlencoded with UTF-8 but the Content-type: header is not
set. I tested it with "Content-Type:text/plain;charset=utf-8" and that worked. The Content-type
header encoding defaults to ISO 8859-1.
> The result is that SolrJ queries fail for any search with a character above 127. The
work around is to use a POST query instead of a GET. I have not searched for other places.
So, change: 
> {code}
> QueryResponse qr = CommonsHttpSolrServer.query(query);
> {code}
> to:
> {code}
> QueryResponse qr = CommonsHttpSolrServer.query(query, SolrRequest.METHOD.POST);
> {code}
> One quirk of this behavior is that url-bashing a query string with an ISO 8859-1 character
(like an umlaut) works in a browser, but fails in a SolrJ request.. It also searches correctly
from the admin/index.jsp and admin/form.jsp pages, because they set the content-type in the
FORM declaration. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message