lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-2381) The included jetty server does not support UTF-8
Date Wed, 09 Mar 2011 13:58:59 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004546#comment-13004546
] 

Uwe Schindler commented on SOLR-2381:
-------------------------------------

Hi Bernd,
we know where the problem in Jetty is (they buffer 512 chars without respecting surrogates).
When they then convert those buffered chars to UTF-8 its broken at the boundaries. This bug
in Jetty may also affect JSON output, but JSON is much more compact and may not easily hit
this buffer issue (as it does not use Strings to feed to writer, the broken method in JETTY
is handling Writer.write(String,...).

In general we are discussing to not use Readers and Writers supplied by the Servlet Container.
As HTTP is a byte-based protocol, code should only use InputStreams and OutputStreams to communicate
with the client. Writers and Readers are only provided for convenience with JSP engines.

The input part of Solr no longer uses Readers, they pass always pass InputStreams around.
I uploaded a patch a week ago to do the same on the output side of Solr: SOLR-ServletOutputWriter.patch

Please note: As JSP pages use Jetty's writers, analysis.jsp may still produce corrupt output.

Can you patch your solr with that one, then your problems should disappear for all OutputHandler
generated content except JSP pages in Solr. We are thinking about optimizing this, internally,
but the above patch removes all use of Solr. The patch is against trunk as far as I know.

> The included jetty server does not support UTF-8
> ------------------------------------------------
>
>                 Key: SOLR-2381
>                 URL: https://issues.apache.org/jira/browse/SOLR-2381
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>            Priority: Blocker
>             Fix For: 3.1, 4.0
>
>         Attachments: SOLR-2381.patch, SOLR-2381_xmltest.patch, SOLR-ServletOutputWriter.patch,
jetty-6.1.26-patched-JETTY-1340.jar, jetty-util-6.1.26-patched-JETTY-1340.jar, post_utf8enhanced.sh,
utf8enhanced.xml
>
>
> Some background here: http://www.lucidimagination.com/search/document/6babe83bd4a98b64/which_unicode_version_is_supported_with_lucene
> Some possible solutions:
> * wait and see if we get resolution on http://jira.codehaus.org/browse/JETTY-1340. To
be honest, I am not even sure where jetty is being maintained (there is a separate jetty project
at eclipse.org with another bugtracker, but the older releases are at codehaus).
> * include a patched version of jetty with correct utf-8, using that patch.
> * remove jetty and include a different container instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message