Return-Path: X-Original-To: apmail-poi-dev-archive@www.apache.org Delivered-To: apmail-poi-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 89B0D631B for ; Tue, 24 May 2011 06:12:55 +0000 (UTC) Received: (qmail 69085 invoked by uid 500); 24 May 2011 06:12:54 -0000 Delivered-To: apmail-poi-dev-archive@poi.apache.org Received: (qmail 68848 invoked by uid 500); 24 May 2011 06:12:54 -0000 Mailing-List: contact dev-help@poi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "POI Developers List" Delivered-To: mailing list dev@poi.apache.org Received: (qmail 68838 invoked by uid 99); 24 May 2011 06:12:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 May 2011 06:12:52 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [212.227.126.171] (HELO moutng.kundenserver.de) (212.227.126.171) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 May 2011 06:12:44 +0000 Received: from [10.1.15.161] ([88.130.198.93]) by mrelayeu.kundenserver.de (node=mreu4) with ESMTP (Nemesis) id 0LenJz-1PsDCT3unP-00qi6S; Tue, 24 May 2011 08:12:23 +0200 Message-ID: <4DDB4C24.1010008@mind8.com> Date: Tue, 24 May 2011 08:11:48 +0200 From: Stefan Stern Organization: Mind8 GmbH & Co. KG User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; de; rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: POI Developers List Subject: Re: Some fun and troubles with ArrayIndexOutOfBoundsException References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Provags-ID: V02:K0:j4bzUqaOKUQl+eQAOBFo7H5LI8HoPx3FxpVjRbzfeZ8 7NL7BL9x3FqwHaKjhn2QO/ohbtZXO7RhOq7g0nd7FXgtoo02f6 MRaW7yVdu3yqQ2HkvrMD/DgUAQqHmYfMScAUlDqhP10o7u9DPU hc+0U/HTlqC7MB7tgNPRr+YbfTQy89UXuEb4CjMvLDUtZPa13r x5UJt8IjXMiROVk5hIqxFlssI7OxYh8r9awK3MHT9w= Hi > When you are not in perfect word et you need to > works with 3rd users supplied data you have a risk to "meet" not > correct data and produce a lot of errors. To avoid a such of situation > you can use try / catch blocks. Hm, there are always several, at least two, approaches. One end says, there is some sort of specification for the file format, so expect your data to be conform to that specification. The other dogma tries to come along with all possible mis-configured or non-conform data. Personally, I prefer to support the first approach as a start. Supporting all "looks to some extent like a Word document" data is a lot of effort. This can be done, while POI evolves, but while there are Word features, which are lacking proper support in the POI API, I don't see the point in concentrating development efforts in order to handle non-conform data. In my opinion, the POI API should allow to read all Word-created files and write Word documents that can be used in MS Word, without Word complaining about the documents being corrupt. I do not see POI as a repair tool, supposed to patch up corrupt files. Nor as a rescue tool, expected to extract the most information from a corrupt DOC(X) file - and that is what I understood you are talking about. Of course, provoking RuntimeExceptions ain't very good style and should not happen. On the other hand, if POI classes encounter non-standard data, the code must take a decision what to do next. Sometimes there are ways to handle some data flaws. But more often, a method will simply abort, throwing a POIXMLException and stop processing, as there is no way to make sure the further processing won't fail again and again. Thinkable data faults start with corrupted ZIPs, pass malformed XML and end up in wrong references inside the document itself. Of course we are aware of the try-catch-block usage. But what would you propose to do inside the catch block? With your task in mind - get the most textual data from the file - the implementation can just "grit its teeth" and pretent nothing happened. But if you will try to handle / modify the corrupt data furthermore, the result will get worse and worse. Maybe you could provide some sample data, that causes the trouble you reported? Kind regards, Stefan Stern --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org For additional commands, e-mail: dev-help@poi.apache.org