poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Glen Stampoultzis <gst...@iinet.net.au>
Subject Re: Request for feedback : More Shared Formula thoughts
Date Tue, 09 Sep 2003 12:51:34 GMT
At 09:54 AM 9/09/2003, you wrote:
>On Mon, Sep 08, 2003 at 04:48:40PM +0530, Naveen N Rao wrote:
> > 2. The permanent fix (which addresses issues that may arise in the future
> > wherein there is a need to be able to access all records relating to a
> > particular cell) is to use indexing as Chris has pointed out. However, the
> > solution that Chris has pointed out - to use DBCell and INDEX records
> > might just be getting pointers to the stream, but not references to
> > in-memory records. So, I would suggest constructing a 2D array-like
> > container full of Lists. Each List contains all records relating to a
> > single cell. This way, we could support a new API in Model.Sheet that
> > returns a List of all records relating to a cell.
>One problem is that such a 2D array will use even more memory. A sheet
>can contain (256 ^ 3) cells and even the bookmaking info becomes
>unwieldy long before that point.
>In my spare time I have been working on a new approach to the whole
>issue that builds upon the RandomAccessFile capabilities of POIFS2. The
>INDEX and DBCELL records are used to locate arbitrary cells in the sheet
>and cells are read lazily by seeking to the proper location and copying
>only the necessary records. There is an adjustable cache of recently
>read records which helps to keep the seeking to a minimum. The end
>result is that you can process arbitrarily large sheets using a small
>and essentially constant amount of memory. Naturally by controlling
>memory use the speed is greatly improved as well.
>But this is such a change from the current codebase that I don't really
>know if it is worth pursuing...

Do you have some performance numbers?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message