uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marshall Schor <...@schor.com>
Subject Re: XmiCasDeserializer.deserialize with InputSource rather than InputStream
Date Sun, 22 Aug 2010 20:48:39 GMT
 I'm not an expert here, but I found by googling that at least one person thinks
it's a bad practice to read things into char arrays, and then send those to an
XML parser.

The web page http://www.odi.ch/prog/design/newbies.php#7 says:

It is a very bad idea to read an XML file and store it in a String. An XML
specifies its encoding in the XML header. But when reading a file you have to
know the encoding beforehand! Also storing an XML file in a String wastes
memory. All XML parsers accept an InputStream as a parsing source and they
figure out the encoding themselves correctly. So you can feed them an
InputStream instead of storing the whole file in memory temporarily. The byte
order (big-endian, little-endian) is another trap when a multi-byte encoding
(such as UTF-8) is used. XML files may carry a byte order mark at the beginning
that specifies the byte order. XML parsers handle them correctly.


On 8/22/2010 8:52 AM, John Wiesel wrote:
> Dear all,
> I am currently stalled in my project by XmiCasDeserializer.deserialize: I
> am wondering why there is no method that allows to directly set up the XML
> parser with a InputSource instead of an InputStream. I would like to load
> my CAS from an XMI file that I have cached in a CharArray. As I cannot
> generate an InputStream from a String (StringBufferInputStream is
> deprecated since JDK 1.1) but should be able to do so using an InputSource
> w/o much trouble, I hope there is a sensible solution for this that I just
> haven't thought of yet.
> Any suggestions?
> Thanks folks.
> John

View raw message