lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bertrand Delacretaz (JIRA)" <>
Subject [jira] Created: (SOLR-38) PATCH: demonstrate correct handling of UTF-8 encoded input documents
Date Sat, 22 Jul 2006 07:42:13 GMT
PATCH: demonstrate correct handling of UTF-8 encoded input documents

                 Key: SOLR-38
             Project: Solr
          Issue Type: Improvement
          Components: update
            Reporter: Bertrand Delacretaz
            Priority: Minor

Here's an UTF-8 example with accented chars that can go in example/exampledocs, to demonstrate
correct handling of accented chars.

After posting this to SOLR, searching for "êâîôû" from http://localhost:8983/solr/admin/
correctly finds this document.

Needs a small patch to example/exampledocs/ (enclosed below), to specifiy the encoding
for the POST. 

The XML pull parser seems to be able to handle the encoding declaration correctly, but if
the encoding is not specified in the POST, the servlet container might get in the way (Jetty
does with the current configuration).

Index: example/exampledocs/
--- example/exampledocs/ (revision 424529)
+++ example/exampledocs/ (working copy)
@@ -4,7 +4,7 @@
 for f in $FILES; do
   echo Posting file $f to $URL
-  curl $URL --data-binary @$f
+  curl $URL --data-binary @$f -H 'Content-type:text/xml; charset=utf-8'

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:


View raw message