Mailing-List: contact dev-help@poi.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "POI Developers List" <dev@poi.apache.org>
Received-SPF: neutral (athena.apache.org: local policy)
Message-ID: <4DDB4C24.1010008@mind8.com>
Date: Tue, 24 May 2011 08:11:48 +0200
From: Stefan Stern <stefan.stern@mind8.com>
Organization: Mind8 GmbH & Co. KG
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; de;
 rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Thunderbird/3.1.10
MIME-Version: 1.0
To: POI Developers List <dev@poi.apache.org>
Subject: Re: Some fun and troubles with ArrayIndexOutOfBoundsException
References: <BANLkTi=L19PDYAANSYPLpRN-uz5FP0fs=w@mail.gmail.com>
In-Reply-To: <BANLkTi=L19PDYAANSYPLpRN-uz5FP0fs=w@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Hi

> When you are not in perfect word et you need to
> works with 3rd users supplied data you have a risk to "meet" not
> correct data and produce a lot of errors. To avoid a such of situation
> you can use try / catch blocks.

Hm, there are always several, at least two, approaches. One end says, 
there is some sort of specification for the file format, so expect your 
data to be conform to that specification. The other dogma tries to come 
along with all possible mis-configured or non-conform data.

Personally, I prefer to support the first approach as a start. 
Supporting all "looks to some extent like a Word document" data is a lot 
of effort. This can be done, while POI evolves, but while there are Word 
features, which are lacking proper support in the POI API, I don't see 
the point in concentrating development efforts in order to handle 
non-conform data. In my opinion, the POI API should allow to read all 
Word-created files and write Word documents that can be used in MS Word, 
without Word complaining about the documents being corrupt. I do not see 
POI as a repair tool, supposed to patch up corrupt files. Nor as a 
rescue tool, expected to extract the most information from a corrupt 
DOC(X) file - and that is what I understood you are talking about.

Of course, provoking RuntimeExceptions ain't very good style and should 
not happen. On the other hand, if POI classes encounter non-standard 
data, the code must take a decision what to do next. Sometimes there are 
ways to handle some data flaws. But more often, a method will simply 
abort, throwing a POIXMLException and stop processing, as there is no 
way to make sure the further processing won't fail again and again. 
Thinkable data faults start with corrupted ZIPs, pass malformed XML and 
end up in wrong references inside the document itself. Of course we are 
aware of the try-catch-block usage. But what would you propose to do 
inside the catch block? With your task in mind - get the most textual 
data from the file - the implementation can just "grit its teeth" and 
pretent nothing happened. But if you will try to handle / modify the 
corrupt data furthermore, the result will get worse and worse.

Maybe you could provide some sample data, that causes the trouble you 
reported?

Kind regards,
Stefan Stern

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org