uima-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Holmberg" <holmberg2...@comcast.net>
Subject Re: XmiCasDeserializer.deserialize with InputSource rather than InputStream
Date Sun, 22 Aug 2010 23:57:54 GMT
Or, if you feel you have to store the file in memory before parsing, then
at least store it in a ByteArray and not a CharArray.  Then you can feed
the parser with an InputStream on the ByteArray, and avoid the encoding
and byte-order problems that Marshall describes.

On Sun, 22 Aug 2010 13:48:39 -0700, Marshall Schor <msa@schor.com> wrote:

>  I'm not an expert here, but I found by googling that at least one  
> person thinks
> it's a bad practice to read things into char arrays, and then send those  
> to an
> XML parser.
>
> The web page http://www.odi.ch/prog/design/newbies.php#7 says:
>
> It is a very bad idea to read an XML file and store it in a String. An  
> XML
> specifies its encoding in the XML header. But when reading a file you  
> have to
> know the encoding beforehand! Also storing an XML file in a String wastes
> memory. All XML parsers accept an InputStream as a parsing source and  
> they
> figure out the encoding themselves correctly. So you can feed them an
> InputStream instead of storing the whole file in memory temporarily. The  
> byte
> order (big-endian, little-endian) is another trap when a multi-byte  
> encoding
> (such as UTF-8) is used. XML files may carry a byte order mark at the  
> beginning
> that specifies the byte order. XML parsers handle them correctly.
>
> -Marshall
>
> On 8/22/2010 8:52 AM, John Wiesel wrote:
>> Dear all,
>>
>> I am currently stalled in my project by XmiCasDeserializer.deserialize:  
>> I
>> am wondering why there is no method that allows to directly set up the  
>> XML
>> parser with a InputSource instead of an InputStream. I would like to  
>> load
>> my CAS from an XMI file that I have cached in a CharArray. As I cannot
>> generate an InputStream from a String (StringBufferInputStream is
>> deprecated since JDK 1.1) but should be able to do so using an  
>> InputSource
>> w/o much trouble, I hope there is a sensible solution for this that I  
>> just
>> haven't thought of yet.
>>
>> Any suggestions?
>> Thanks folks.
>>
>> John
>>
>>


-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

Mime
View raw message