lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Kotthoff (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-443) POST queries don't declare its charset
Date Sun, 22 Jun 2008 07:29:45 GMT

    [ https://issues.apache.org/jira/browse/SOLR-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607069#action_12607069
] 

Lars Kotthoff commented on SOLR-443:
------------------------------------

I agree that using multi-part increases the size of the requests significantly, but I don't
think that it's going to be much of a problem.

For example, consider SOLR-303. The requests for facet refinements use a large number of facet
queries, so those would become significantly bigger. This is only really going to impact performance
on the network interface of the machine sending the requests. The responses still come back
in the old format, and creating a multi-part POST request isn't more expensive that creating
a normal one. So the request would take a longer time to transmit, and the shards probably
need more processing time to assemble the parts. I'd be surprised if the increase in processing
time has any measurable impact on performance. As for network connectivity, even with multi-part
requests for many facets we're talking about sizes of in the order of some 100kB. Unless the
increase in size actually saturates the network connection (which won't happen until several
100 shards) the penalty will be some milliseconds more delay.

It certainly seems inefficient and wasteful to use multi-part requests, but I don't think
that the actual performance penalty is going to be significant. AFAIK the requests send like
this by Solr are small anyway. I'll try to do some experiments to be able to give some hard
numbers.

> POST queries don't declare its charset
> --------------------------------------
>
>                 Key: SOLR-443
>                 URL: https://issues.apache.org/jira/browse/SOLR-443
>             Project: Solr
>          Issue Type: Bug
>          Components: clients - java
>    Affects Versions: 1.2
>         Environment: Tomcat 6.0.14
>            Reporter: Andrew Schurman
>            Priority: Minor
>         Attachments: SOLR-443-multipart.patch, solr-443.patch, solr-443.patch, SolrDispatchFilter.patch
>
>
> When sending a query via POST, the content-type is not set. The content charset for the
POST parameters are set, but this only appears to be used for creating the Content-Length
header in the commons library. Since a query is encoded in UTF-8, the http headers should
also specify content type charset.
> On Tomcat, this causes problems when the query string contains non-ascii characters (characters
with accents and such) as it tries to parse the POST body in its default ISO-9886-1. There
appears to be no way to set/change the default encoding for a message body on Tomcat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message