Return-Path: Delivered-To: apmail-uima-user-archive@www.apache.org Received: (qmail 81878 invoked from network); 22 Aug 2010 20:48:56 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 22 Aug 2010 20:48:56 -0000 Received: (qmail 92552 invoked by uid 500); 22 Aug 2010 20:48:56 -0000 Delivered-To: apmail-uima-user-archive@uima.apache.org Received: (qmail 92502 invoked by uid 500); 22 Aug 2010 20:48:55 -0000 Mailing-List: contact user-help@uima.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@uima.apache.org Delivered-To: mailing list user@uima.apache.org Received: (qmail 92494 invoked by uid 99); 22 Aug 2010 20:48:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 22 Aug 2010 20:48:55 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of msa@schor.com designates 67.18.103.14 as permitted sender) Received: from [67.18.103.14] (HELO gateway04.websitewelcome.com) (67.18.103.14) by apache.org (qpsmtpd/0.29) with SMTP; Sun, 22 Aug 2010 20:48:48 +0000 Received: (qmail 17924 invoked from network); 22 Aug 2010 20:48:27 -0000 Received: from gator74.hostgator.com (67.18.27.130) by gateway04.websitewelcome.com with SMTP; 22 Aug 2010 20:48:27 -0000 Received: from [68.198.198.59] (port=22277 helo=[192.168.1.102]) by gator74.hostgator.com with esmtpsa (TLSv1:AES256-SHA:256) (Exim 4.69) (envelope-from ) id 1OnHSv-0004Fe-Bl for user@uima.apache.org; Sun, 22 Aug 2010 15:48:29 -0500 Message-ID: <4C718D27.9050505@schor.com> Date: Sun, 22 Aug 2010 16:48:39 -0400 From: Marshall Schor User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.8) Gecko/20100802 Thunderbird/3.1.2 MIME-Version: 1.0 To: user@uima.apache.org Subject: Re: XmiCasDeserializer.deserialize with InputSource rather than InputStream References: <3308.87.185.196.168.1282481533.webmail@portal.zedat.fu-berlin.de> In-Reply-To: <3308.87.185.196.168.1282481533.webmail@portal.zedat.fu-berlin.de> X-Enigmail-Version: 1.1.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - gator74.hostgator.com X-AntiAbuse: Original Domain - uima.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - schor.com I'm not an expert here, but I found by googling that at least one person thinks it's a bad practice to read things into char arrays, and then send those to an XML parser. The web page http://www.odi.ch/prog/design/newbies.php#7 says: It is a very bad idea to read an XML file and store it in a String. An XML specifies its encoding in the XML header. But when reading a file you have to know the encoding beforehand! Also storing an XML file in a String wastes memory. All XML parsers accept an InputStream as a parsing source and they figure out the encoding themselves correctly. So you can feed them an InputStream instead of storing the whole file in memory temporarily. The byte order (big-endian, little-endian) is another trap when a multi-byte encoding (such as UTF-8) is used. XML files may carry a byte order mark at the beginning that specifies the byte order. XML parsers handle them correctly. -Marshall On 8/22/2010 8:52 AM, John Wiesel wrote: > Dear all, > > I am currently stalled in my project by XmiCasDeserializer.deserialize: I > am wondering why there is no method that allows to directly set up the XML > parser with a InputSource instead of an InputStream. I would like to load > my CAS from an XMI file that I have cached in a CharArray. As I cannot > generate an InputStream from a String (StringBufferInputStream is > deprecated since JDK 1.1) but should be able to do so using an InputSource > w/o much trouble, I hope there is a sensible solution for this that I just > haven't thought of yet. > > Any suggestions? > Thanks folks. > > John > > >