lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Created: (SOLR-2347) Use InputStream and not Reader for XML parsing
Date Thu, 03 Feb 2011 15:32:28 GMT
Use InputStream and not Reader for XML parsing
----------------------------------------------

                 Key: SOLR-2347
                 URL: https://issues.apache.org/jira/browse/SOLR-2347
             Project: Solr
          Issue Type: Bug
            Reporter: Uwe Schindler
            Assignee: Uwe Schindler


Followup to SOLR-96:

Solr mostly uses java.io.Reader and passes this Reader to the XML parser. According to XML
spec, a XML file should be initially seen as a binary stream with a default charset of UTF-8
or another charset given by the network protocol (like Content-Type header in HTTP). But very
important, this default charset is only a "hint" to the parser - mandatory is the charset
from the XML header processing inctruction. Because of this, the parser must be able to change
the charset when reading the XML headers (possibly also when seeing BOM markers). This is
not possible if the XML parser gets a java.io.Reader instead of java.io.InputStreams. SOLR-96
already fixed this for the XmlUpdateRequestHandler and the DocumentAnalysisRequestHandler.
This issue should fix the rest to be conforming to XML-spec (open schema.xml and config.xml
as InputStream not Reader and others).

This change would not break anything in Solr (perhaps only backwards compatibility in the
API), as the default used by XML parsers is UTF-8.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message