poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Burch <apa...@gagravarr.org>
Subject Re: Using MapDB to reduce memory footprint of shared strings table in SXSSF
Date Tue, 16 Dec 2014 09:03:12 GMT
On Mon, 15 Dec 2014, Alex Geller wrote:
> I think that the job of interning strings can be abstracted in a simple 
> way (A function or two) so that we can worry about the implementation 
> later (My implementation is perhaps attractive because it is simple, it 
> has a reasonable performance and can be activated on a threshold basis 
> so that no option to switch it on or off is needed but I am OK with any 
> other implementation too).

If we could do it in such a way that people using XSSF, SXSSF and 
SAX+Helpers are all able to use it, that'd be great! If it's SXSSF only, 
that's not the end of the world. Having pluggable implementations behind 
an interface is probably the best way, if possible

> My problem (And I think that Sumedh is perhaps struggling with the same 
> issue) is some guidance on how to go about this. We can just look at the 
> patch that was made for the support of shared strings which apparently 
> shares a lot of code witch XSSF and ask ourselves how to make that code 
> use less memory but I suspect that we could perhaps do a much better job 
> if we really understood why the code is the way it is.

"svn log" and "svn blame" are your best bet for the why. On the whole, 
it's whatever worked best at the time support was added!

> I would like to have answers to the question like "What are all the 
> other fields in the CTRst class for? Is there a documentation from 
> Microsoft or ECMA that covers this point? Can you point me to the 
> documentation that handles the string externalizing?

I seem to have a slightly older copy of the file format docs on my laptop, 
but in that one (iso29500_Part1.pdf - ISO/IEC 29500-1:2008(E)) there's a 
section 18.4 entitled "Shared String Table" within the "SpreadsheetML 
Reference Material" area. That looks to be a pretty good guide to the 
different kinds of things you can find in the shared strings table


To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org

View raw message