lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Updated: (SOLR-2381) The included jetty server does not support UTF-8
Date Tue, 08 Mar 2011 18:41:59 GMT

     [ https://issues.apache.org/jira/browse/SOLR-2381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated SOLR-2381:
------------------------------

    Attachment: SOLR-2381_xmltest.patch

attached is a unit test. if you disable the 'case 4' so that it only uses 1, 2, and 3 byte
codepoints, the test always passes.

additionally it only fails with the XML response format (the default binary is fine). the
test chooses different formats for each iteration.

{noformat}
junit-sequential:
    [junit] Testsuite: org.apache.solr.client.solrj.embedded.SolrExampleJettyTest
    [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 3.829 sec
    [junit]
    [junit] ------------- Standard Error -----------------
    [junit] NOTE: reproduce with: ant test -Dtestcase=SolrExampleJettyTest -Dtestmethod=testUnicode
-Dtests.seed=-8507816048970822444:1424998400651628841
    [junit] WARNING: test class left thread running: Thread[MultiThreadedHttpConnectionManager
cleanup,5,main]
    [junit] RESOURCE LEAK: test class left 1 thread(s) running
    [junit] NOTE: test params are: codec=PreFlex, locale=es_GT, timezone=Asia/Hovd
    [junit] NOTE: all tests run in this JVM:
    [junit] [SolrExampleJettyTest]
    [junit] NOTE: Windows Vista 6.0 x86/Sun Microsystems Inc. 1.6.0_23 (32-bit)/cpus=4,threads=2,free=9760576,total=16252928
    [junit] ------------- ---------------- ---------------
    [junit] Testcase: testUnicode(org.apache.solr.client.solrj.embedded.SolrExampleJettyTest):
 Caused an ERROR
    [junit] Error executing query
    [junit] org.apache.solr.client.solrj.SolrServerException: Error executing query
    [junit]     at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
    [junit]     at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:119)
    [junit]     at org.apache.solr.client.solrj.SolrExampleTests.testUnicode(SolrExampleTests.java:290)
    [junit]     at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1213)
    [junit]     at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1145)
    [junit] Caused by: org.apache.solr.common.SolrException: parsing error
    [junit]     at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:145)
    [junit]     at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:106)
    [junit]     at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:478)
    [junit]     at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:245)
    [junit]     at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
    [junit] Caused by: com.ctc.wstx.exc.WstxIOException: Invalid UTF-8 character 0xdf05(a
surrogate character)  at char #2475, byte #127)
    [junit]     at com.ctc.wstx.sr.StreamScanner.throwFromIOE(StreamScanner.java:708)
    [junit]     at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1086)
    [junit]     at org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:218)
    [junit]     at org.apache.solr.client.solrj.impl.XMLResponseParser.readNamedList(XMLResponseParser.java:244)
    [junit]     at org.apache.solr.client.solrj.impl.XMLResponseParser.processResponse(XMLResponseParser.java:130)
    [junit] Caused by: java.io.CharConversionException: Invalid UTF-8 character 0xdf05(a surrogate
character)  at char #2475, byte #127)
    [junit]     at com.ctc.wstx.io.UTF8Reader.reportInvalid(UTF8Reader.java:335)
    [junit]     at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:247)
    [junit]     at com.ctc.wstx.io.MergedReader.read(MergedReader.java:101)
    [junit]     at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:84)
    [junit]     at com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:57)
    [junit]     at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:992)
    [junit]     at com.ctc.wstx.sr.StreamScanner.getNext(StreamScanner.java:763)
    [junit]     at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2721)
    [junit]     at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1019)
    [junit]
    [junit]
{noformat}

> The included jetty server does not support UTF-8
> ------------------------------------------------
>
>                 Key: SOLR-2381
>                 URL: https://issues.apache.org/jira/browse/SOLR-2381
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>            Priority: Blocker
>             Fix For: 3.1, 4.0
>
>         Attachments: SOLR-2381.patch, SOLR-2381_xmltest.patch, SOLR-ServletOutputWriter.patch,
jetty-6.1.26-patched-JETTY-1340.jar, jetty-util-6.1.26-patched-JETTY-1340.jar
>
>
> Some background here: http://www.lucidimagination.com/search/document/6babe83bd4a98b64/which_unicode_version_is_supported_with_lucene
> Some possible solutions:
> * wait and see if we get resolution on http://jira.codehaus.org/browse/JETTY-1340. To
be honest, I am not even sure where jetty is being maintained (there is a separate jetty project
at eclipse.org with another bugtracker, but the older releases are at codehaus).
> * include a patched version of jetty with correct utf-8, using that patch.
> * remove jetty and include a different container instead.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message