Return-Path: Delivered-To: apmail-poi-dev-archive@www.apache.org Received: (qmail 96195 invoked from network); 5 May 2010 17:52:43 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 5 May 2010 17:52:43 -0000 Received: (qmail 29529 invoked by uid 500); 5 May 2010 17:52:43 -0000 Delivered-To: apmail-poi-dev-archive@poi.apache.org Received: (qmail 29487 invoked by uid 500); 5 May 2010 17:52:43 -0000 Mailing-List: contact dev-help@poi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "POI Developers List" Delivered-To: mailing list dev@poi.apache.org Received: (qmail 29479 invoked by uid 99); 5 May 2010 17:52:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 May 2010 17:52:43 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=AWL,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of nick.burch@alfresco.com designates 88.151.129.23 as permitted sender) Received: from [88.151.129.23] (HELO mx-out-manc3.simplymailsolutions.com) (88.151.129.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 May 2010 17:52:35 +0000 Received: from zimbra.alfresco.com ([10.2.10.4]) by mx-out-manc3.simplymailsolutions.com (8.14.1/8.14.1) with ESMTP id o45HqDOg001048 for ; Wed, 5 May 2010 18:52:13 +0100 Received: from localhost (localhost.localdomain [127.0.0.1]) by zimbra.alfresco.com (Postfix) with ESMTP id 5EC264140BC for ; Wed, 5 May 2010 18:52:13 +0100 (BST) X-Virus-Scanned: amavisd-new at unx-d-manc4.tc.ifeltd.com Received: from zimbra.alfresco.com ([127.0.0.1]) by localhost (zimbra.alfresco.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2nsmg49ovYDE for ; Wed, 5 May 2010 18:52:13 +0100 (BST) Received: from urchin.earth.li (urchin.earth.li [193.201.200.73]) (Authenticated sender: nick.burch@alfresco.com) by zimbra.alfresco.com (Postfix) with ESMTP id EA4CE414085 for ; Wed, 5 May 2010 18:52:12 +0100 (BST) Date: Wed, 5 May 2010 18:52:12 +0100 (BST) From: Nick Burch X-X-Sender: nick@urchin.earth.li To: POI Developers List Subject: Re: DO NOT REPLY [Bug 49020] "org.xml.sax.SAXParseException: does not close tag
." when opening some Excel 2007 files In-Reply-To: Message-ID: References: <20100331111454.CA87A234C4BE@brutus.apache.org> User-Agent: Alpine 1.10 (DEB 962 2008-03-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed On Wed, 31 Mar 2010, Paul Spencer wrote: >> For the long term, you should report a bug to Microsoft about this. >> They either need to sanitise the user input and sort out the tags (eg >>
becomes
), or they need to give up and escape the whole tag >> contents for the bits where iffy data could get added (eg put this >> textbox within a CDATA section) > > I will report the but to Microsoft, but that does not address existing > files. Any luck getting them to agree with the fault? >> Medium term, we should get a list of the problem bits that Excel does wrong, >> such as
(but perhaps others). Then, we need to write a XML Input Wrapper >> that cleans these up before they get passed to the XML Processor for loading. >> Something like this is quite nasty, though it's possible some other project out >> there has already done it, and we can just re-use what they do. > > I like this as a solution. Having just written code for this workaround, I really don't... It's amazingly sick code! Seems to mostly work though, certainly for your test file Nick --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional commands, e-mail: dev-help@poi.apache.org