hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7309) XMLUtils.mangleXmlString doesn't seem to handle less than sign
Date Thu, 30 Oct 2014 20:29:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190773#comment-14190773
] 

Colin Patrick McCabe commented on HDFS-7309:
--------------------------------------------

So, the intention when writing {{XMLUtils#mangleXmlString}} was that it would handle stuff
that the normal XML parser didn't.  Basically XML says that there's just no way to have certain
code points in your document, so it fails to provide a standard way to escape them.  One example
is the first few code points like code point 0, 1, 2, etc.  There IS a standard way to escape
things like <, >, %, etc. so we didn't handle those.  {{org.xml.sax.XMLReader}} already
escapes those code points.

Since you're not using XMLParser, you don't get the benefit of this "built-in" escaping.
You could get it manually with this:

{code}
public static string XmlUnescape(string escaped) {
    XmlDocument d = new XmlDocument();
    var node = d.CreateElement("root");
    node.InnerXml = escaped;
    return node.InnerText;
}

public static string XmlEscape(string unescaped){
    XmlDocument d = new XmlDocument();
    var node = d.CreateElement("root");
    node.InnerText = unescaped;
    return node.InnerXml;
}
{code}

Or we could add this functionality to XMLUtils#mangleXmlString.  But we'd have to handle all
the XML code points that need escaping (I think <, >, &, and maybe some of the quote
signs).  Also it would need to be optional, to avoid double-escaping for callers who are using
{{org.xml.sax.XMLReader}}.

> XMLUtils.mangleXmlString doesn't seem to handle less than sign
> --------------------------------------------------------------
>
>                 Key: HDFS-7309
>                 URL: https://issues.apache.org/jira/browse/HDFS-7309
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.1.0-beta
>            Reporter: Ravi Prakash
>            Priority: Minor
>         Attachments: HDFS-7309.patch
>
>
> My expectation was that "<someElement>" + XMLUtils.mangleXmlString(
>       "Containing<ALessThanSign") + "</someElement>" would be a string acceptable
to a SAX parser. However this was not true. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message