lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley (JIRA)" <>
Subject [jira] Commented: (SOLR-443) POST queries don't declare its charset
Date Sat, 21 Jun 2008 13:44:45 GMT


Yonik Seeley commented on SOLR-443:

bq. I can confirm that setting the content type manually to "application/x-www-form-urlencoded;
charset=UTF-8" works, but that seems like a dirty hack to me. There's no standard/specification/..
covering that.

I agree it's a bit hackish... but that's the state of things.  I'm more concerned if it actually
works everywhere (and I was surprised that it seems to).  I imagine in the future, UTF-8 will
be the standard... there's no getting around it unless one want's to just ban x-www-form-urlencoded
POST for non-ascii, and that doesn't seem reasonable.

I started using POST because the queries could go over the size limits of GET (so that's yet
another hack).  Using multi-part would really blow up the size of these requests, and could
actually become a bottleneck when the number of servers is high.

> POST queries don't declare its charset
> --------------------------------------
>                 Key: SOLR-443
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>          Components: clients - java
>    Affects Versions: 1.2
>         Environment: Tomcat 6.0.14
>            Reporter: Andrew Schurman
>            Priority: Minor
>         Attachments: SOLR-443-multipart.patch, solr-443.patch, solr-443.patch, SolrDispatchFilter.patch
> When sending a query via POST, the content-type is not set. The content charset for the
POST parameters are set, but this only appears to be used for creating the Content-Length
header in the commons library. Since a query is encoded in UTF-8, the http headers should
also specify content type charset.
> On Tomcat, this causes problems when the query string contains non-ascii characters (characters
with accents and such) as it tries to parse the POST body in its default ISO-9886-1. There
appears to be no way to set/change the default encoding for a message body on Tomcat.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message