poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toshiaki Kamoshida <kamoshida.toshi...@future.co.jp>
Subject [PROPOSAL]Decrease memory footprint at hssf.usermodel
Date Fri, 25 Apr 2003 09:08:22 GMT
Hello developers,

To use POI with limited memory area or read-only use etc,
you made eventmodel APIs like SAX Parser for XML.

For DEEP CORE POI users,it is useful.
But,we need special knowledge to use eventmodel APIs.
Most of users don't know the lower structure of XLS format
(and perhaps don't want to study it).

There is a great difference between XML and XLS,

Usermodel API is still important one.
So,I feel it is important matter that the memory footprint
of usermodel package decrease as possible.

I feel there are features when users use POI to make XLS files.
1.Users generally access to only few records.
  So you don't have to deserialize ALL records completely.
2.Once a user write data to a record,generally he won't change
  it.He write data to serialize next moment,not to chagnge it.

Now in usermodel,each Concrete instances of Record is fully
deserialize during parsing the source at RecordFactory.
But I feel, it is no need to deserialize ALL records from 
source.We can use Proxy Pattern and Lazy Constraction,Eager
Deconstraction at the record managing model.

Like this;(Perhaps this is not the best way,maybe no sense XP
please discuss...)

1.Source is managed as a simple bytearray(on memory area or
Random Access File etc.We can use them polymorphic.)

2.Proxy-Record Class contains only a pointer to the position
at the Source array.

3.RecordFactory creates only Proxy-Record's list.
(Maybe it is good that useful some Records deserialize at here)

4.When once user want to pull a record by calling any APIs,
Proxy receive the message and then create Concrete Record
instance based upon the bytearray the Proxy pointed, and
replace it.

5.Generated Concrete Record is managed in the cache area.
  They are managed as LRU or SoftReference etc.
  If once cache area will be overflow,the latest Concrete Record
  is serialized individually and make"Serialized Concrete Record"
  (Serialized Concrete Record can deserialize to Concrete Record
  as user's request.),and replace it.

6.When user want to serialize the workbook,POI do:
  (1)Concrete Record→Serialize it as NOW doing.
  (2)Proxy Record→Read source array and copy it to output.
  (3)Serialized Concrete Record → copy bytearray field to output.

The benefits are;
1.Source array is Immutable,so we can share it for many Workbooks.
  (Perhaps it is good for using at server-side)
2.If using RAF to manage the Source bytearray,you can decrease 
  consumption at main memory area a lot.
3.Because some records will never be deserialized and serialized(only
  copying from source),maybe performance increase??? but I can't 
  assert it XP

Thanks for your reading.


Toshiaki Kamoshida


View raw message