xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Kelly" <ke...@ad1440.net>
Subject Binary data in XML documents
Date Fri, 06 Oct 2000 06:14:34 GMT

I'm working on a project that returns an XML document as the result of a
query.  Sometimes, we want to return a small binary object, like an image or
a sound bite, in the document.  As an example:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE result ...>
  <name>John Q. Public</name>
  <phone>+1 234 567 8900</phone>

where the <portrait> tag in this case would contain some binary data.
Normally, we'd just put the URL to a portrait in there, but often the binary
data lives somewhere that's not HTTP accessible.

I'm curious to know how other people have encoded binary data in XML
documents.  Anyone?

One thought is that we'd base-64 encode it to make sure it consists of just
printable ASCII characters.  But then I remembered that we're using UTF-8
...why not use that richer character set to get a more compact encoding?
And then I remembered that XML can use the entity mechanism to encode all
sorts of uncommon characters.

So, just for fun, I used the Xerces 1.2.0 serializer to output a document
containing a JPG image as the text child of its only element (yeah, call me
insane).  I got something like this out of the serializer:


But if I try to parse this document, I get a fatal SAX exception:

Character reference "&#0;" is an invalid XML character.

How can it be invalid if Xerces's own serializer generated it?


View raw message