poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nokleberg <ch...@sixlegs.com>
Subject Re: POIFS: reading BIG streams
Date Thu, 14 Aug 2003 03:02:43 GMT
Andrew C. Oliver wrote:
> On 8/13/03 10:04 PM, "Chris Nokleberg" <chris@sixlegs.com> wrote:
>> On Wed, Aug 13, 2003 at 09:18:54PM -0400, Andrew C. Oliver wrote:
>>> On 8/13/03 7:49 PM, "Chris Nokleberg" <chris@sixlegs.com> wrote:
>>>> Even when running off of a file, POIFS2 still has to read all of the
>>>> bookkeeping info, which for a 2GB file is going to be a lot of data. I'm
>>>> just guessing, but maybe 10-20MB? So if you are trying to process these
>>>> huge documents in a *very* memory constrained environment you may be out
>>>> of luck.
>>> The issue is more serious than that.  We use standard Java collections.
>>> Most of which use "int" as an index at some point in the process.  I'm still
>>> looking at POIFS2, but I suspect it suffers from this problem as well.
>> Arrays are also limited to a size of Integer.MAX_VALUE, and this does
>> put a limit on POIFS2 of (2^31)*512 bytes, which is just over 1TB. If
>> files larger than this break it is probably a bug.
> Ahh.. . Cool.

Of course I meant to say if files *smaller* than 1TB break then there is
probably a bug. I wouldn't be surprised if there was, either...it's not
the easiest thing to test. And it's a silly thing to do.

>> A bigger issue is that the size of each stream is stored within the file
>> as an int, which means to reach the 1TB total limit you'd have to have
>> 500 2GB streams in the same file...not very likely I hope!
> Hell, I didn't think anyone would want to store 2GB in one file.  I suppose
> each stream is limited to 2GB though right?

Yes, 2^31 bytes...unless they are treating the size as an unsigned int,
which I doubt very much.


To unsubscribe, e-mail: poi-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: poi-user-help@jakarta.apache.org

View raw message