poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nokleberg <ch...@sixlegs.com>
Subject Re: Reducing HSSF memory use
Date Tue, 29 Jul 2003 19:09:42 GMT
On Tue, Jul 29, 2003 at 02:48:29PM -0400, Andrew C. Oliver wrote:
> Do you think the tradeoff would be worth it in a read-write scenario?  I'm
> leaning towards to completing this level of refactoring done before going
> another round.  Though if you want to take a go at it PLEASE do so.

I may do a little technology demo to test the feasibility...I don't have
enough knowledge of POI internals yet to make it worthwhile to try
patching the real thing.

Our messages just passed each other, but in summary I do think that in
read-write scenario it is still useful. If you end up modifying every
cell of course you will not save any memory, but I doubt that is very

> There are still several places left where we can remove object creation.  I
> was going for the least radical and invasive and figured we'd iterate from
> there.

Sure. I am just concerned that the wrong high-level API may affect how much
performance you can end up squeezing out in the long run.

> We actually have an API for read-only which will become more efficient soon.
> Its a reactor pattern which allows you to specify *what* types of data
> you're interested in.  I plan to add more granularity ("only interested in
> rows x-y or columns i-n", etc).  I think this is actually more efficient
> than a cursor approach, though please attempt to persuade me otherwise.

I do see some value in the reactor pattern. Actually for our PowerPoint
thing we setup a JXPath-like navigator so you could write an xpath
expression to pull out the exact set of records you wanted. e.g. all all
text record strings containing "Chris" that are within a yellow

But taking a flexible query and generating an optimized access path is
not trivial. XSLT is very fast, but only in theory :-) 

With an iterator-based API it is relatively simple to get good
performance. The simple reactor implementation of just blowing through
all records and filtering out the ones you don't want is not going to be
faster. And you can also use the iterator for writing...maybe iterator
isn't the best name, it is more of a "window":

  Cell cell = new Cell();

View raw message