lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lance Norskog (JIRA)" <>
Subject [jira] Commented: (SOLR-1959) SolrJ GET operation does not send correct encoding
Date Tue, 22 Jun 2010 05:25:55 GMT


Lance Norskog commented on SOLR-1959:

Demonstrating this bug is rather difficult with encoding-challenged text editors. 
This test uses the Greek letter sigma, Unicode character 03/A3, defined here:

With the solr/example/exampledocs/ application, index this file:
  <field name="id">SP2514N</field>
  <field name="name">A greek letter: &#x03A3; should be a sigma</field>
Do a search with this command:
curl "http://localhost:8983/solr/select?q=%ce%a3&indent=on"
(Yes, it's C3 and not 03.)
Without the patch, search with this text string via solrj:
{code:title=search code snippet|borderStyle=solid}
String queryString = URLDecoder.decode("%ce%a3", "UTF-8");
CommonsHttpSolrServer server = 
  new CommonsHttpSolrServer("http://localhost:8983/solr");
SolrQuery query = new SolrQuery();
QueryResponse qr = server.query(query, SolrRequest.METHOD.GET);
This search will fail, because the HTTP server decodes the %xx characters via ISO-8859-1.
Now, change GET to POST. The code will work, because POST explicitly sets UTF-8.
This patch does the same default for queries.

As I said, seeing the right characters in all of the moving parts is tricky. Tracking all
of this is easier with a tcp/ip monitor; I used apache's tcpmon.

> SolrJ GET operation does not send correct encoding
> --------------------------------------------------
>                 Key: SOLR-1959
>                 URL:
>             Project: Solr
>          Issue Type: Bug
>          Components: clients - java
>    Affects Versions: 1.4.1, Next
>            Reporter: Lance Norskog
>         Attachments: SOLR-1959.patch
> The SolrJ query operation fails to set the character encoding when doing a GET. It works
when doing a POST.
> The problem is that URLs are urlencoded with UTF-8 but the Content-type: header is not
set. I tested it with "Content-Type:text/plain;charset=utf-8" and that worked. The Content-type
header encoding defaults to ISO 8859-1.
> The result is that SolrJ queries fail for any search with a character above 127. The
work around is to use a POST query instead of a GET. I have not searched for other places.
So, change: 
> {code}
> QueryResponse qr = CommonsHttpSolrServer.query(query);
> {code}
> to:
> {code}
> QueryResponse qr = CommonsHttpSolrServer.query(query, SolrRequest.METHOD.POST);
> {code}
> One quirk of this behavior is that url-bashing a query string with an ISO 8859-1 character
(like an umlaut) works in a browser, but fails in a SolrJ request.. It also searches correctly
from the admin/index.jsp and admin/form.jsp pages, because they set the content-type in the
FORM declaration. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message