incubator-odf-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Weir <robw...@apache.org>
Subject Re: ODF Performance
Date Mon, 10 Feb 2014 16:42:21 GMT
On Fri, Jan 31, 2014 at 6:25 AM, Nicholas Evans <nick.evans@inology.nl> wrote:
> Dear ODF users,
>
> For a project I am working on, I am using the ODF toolkit to create
> spreadsheets that can become rather large (>10 000 rows).  I have noticed
> that as the spreadsheet gets larger, writing the rows becomes very slow.  I
> have put together a class containing 4 different ways of writing 10 000 rows
> of 10 columns to a spreadsheet.  The fastest method (using getRowByIndex and
> then getCellByIndex) takes 70 seconds.  The methods that use getRowList and
> getNextRow are much slower, taking about 170 seconds each.  The method using
> the Iterator<Row> seems to freeze for large inputs, and doesn't behave as
> expected for small inputs.
>
> I would really like to improve this performance.  I think this could be done
> by manipulating the DOM directly.  However, it would great if there was a
> way of using the Simple API that I have overlooked that could help me.
>
> Does anyone have experience with improving the performance of the ODF
> toolkit in the context of writing rows to an ods spreadsheet?
>

We've had discussions on this topic before.  It comes down to use
cases.  The DOM model with everything in memory at once, facilitates
random-access to the content of the document and a style of
programming that is similar to what one might do in spreadsheet macro.
 It is a very natural way to think about a document, but it does
require a lot of RAM.

There are specialized use-cases where it should be possible to write
code that will perform much faster, e.g.:

1) Uses cases that can be met with a read-onl single-pass streaming
process.  In such cases you don't need a DOM at all.  It could be done
via SAX.

2) A write-only scenario where you specify the contents of a document,
but don't need to query things like "the contents of cell B27".   Iyt
is also possible to have a read/write scenario, but at increased
complexity.  Finding B27 is easy in a 2D array, but harder in a sparse
matrix representation.

Note:  if we want to, we can always start up a branch to experiment
with a different approach.  If it pans out we integrate it with the
trunk.  If it doesn't, then we learn from the experience.   I wouldn't
find starting a new package to do the read-only streaming approach.

-Rob

> Regards,
> Nick

Mime
View raw message