incubator-odf-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Weir <>
Subject Re: ODF Performance
Date Mon, 10 Feb 2014 16:42:21 GMT
On Fri, Jan 31, 2014 at 6:25 AM, Nicholas Evans <> wrote:
> Dear ODF users,
> For a project I am working on, I am using the ODF toolkit to create
> spreadsheets that can become rather large (>10 000 rows).  I have noticed
> that as the spreadsheet gets larger, writing the rows becomes very slow.  I
> have put together a class containing 4 different ways of writing 10 000 rows
> of 10 columns to a spreadsheet.  The fastest method (using getRowByIndex and
> then getCellByIndex) takes 70 seconds.  The methods that use getRowList and
> getNextRow are much slower, taking about 170 seconds each.  The method using
> the Iterator<Row> seems to freeze for large inputs, and doesn't behave as
> expected for small inputs.
> I would really like to improve this performance.  I think this could be done
> by manipulating the DOM directly.  However, it would great if there was a
> way of using the Simple API that I have overlooked that could help me.
> Does anyone have experience with improving the performance of the ODF
> toolkit in the context of writing rows to an ods spreadsheet?

We've had discussions on this topic before.  It comes down to use
cases.  The DOM model with everything in memory at once, facilitates
random-access to the content of the document and a style of
programming that is similar to what one might do in spreadsheet macro.
 It is a very natural way to think about a document, but it does
require a lot of RAM.

There are specialized use-cases where it should be possible to write
code that will perform much faster, e.g.:

1) Uses cases that can be met with a read-onl single-pass streaming
process.  In such cases you don't need a DOM at all.  It could be done
via SAX.

2) A write-only scenario where you specify the contents of a document,
but don't need to query things like "the contents of cell B27".   Iyt
is also possible to have a read/write scenario, but at increased
complexity.  Finding B27 is easy in a 2D array, but harder in a sparse
matrix representation.

Note:  if we want to, we can always start up a branch to experiment
with a different approach.  If it pans out we integrate it with the
trunk.  If it doesn't, then we learn from the experience.   I wouldn't
find starting a new package to do the read-only streaming approach.


> Regards,
> Nick

View raw message