incubator-odf-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Svante Schubert <svante.schub...@gmail.com>
Subject Re: ODF Performance
Date Mon, 10 Feb 2014 17:49:41 GMT
Am 10.02.2014 17:42, schrieb Rob Weir:
> On Fri, Jan 31, 2014 at 6:25 AM, Nicholas Evans <nick.evans@inology.nl> wrote:
>> Dear ODF users,
>>
>> For a project I am working on, I am using the ODF toolkit to create
>> spreadsheets that can become rather large (>10 000 rows).  I have noticed
>> that as the spreadsheet gets larger, writing the rows becomes very slow.  I
>> have put together a class containing 4 different ways of writing 10 000 rows
>> of 10 columns to a spreadsheet.  The fastest method (using getRowByIndex and
>> then getCellByIndex) takes 70 seconds.  The methods that use getRowList and
>> getNextRow are much slower, taking about 170 seconds each.  The method using
>> the Iterator<Row> seems to freeze for large inputs, and doesn't behave as
>> expected for small inputs.
>>
>> I would really like to improve this performance.  I think this could be done
>> by manipulating the DOM directly.  However, it would great if there was a
>> way of using the Simple API that I have overlooked that could help me.
>>
>> Does anyone have experience with improving the performance of the ODF
>> toolkit in the context of writing rows to an ods spreadsheet?
>>
> We've had discussions on this topic before.  It comes down to use
> cases.  The DOM model with everything in memory at once, facilitates
> random-access to the content of the document and a style of
> programming that is similar to what one might do in spreadsheet macro.
>  It is a very natural way to think about a document, but it does
> require a lot of RAM.
>
> There are specialized use-cases where it should be possible to write
> code that will perform much faster, e.g.:
>
> 1) Uses cases that can be met with a read-onl single-pass streaming
> process.  In such cases you don't need a DOM at all.  It could be done
> via SAX.
>
> 2) A write-only scenario where you specify the contents of a document,
> but don't need to query things like "the contents of cell B27".   Iyt
> is also possible to have a read/write scenario, but at increased
> complexity.  Finding B27 is easy in a 2D array, but harder in a sparse
> matrix representation.
>
> Note:  if we want to, we can always start up a branch to experiment
> with a different approach.  If it pans out we integrate it with the
> trunk.  If it doesn't, then we learn from the experience.   I wouldn't
> find starting a new package to do the read-only streaming approach.
>
>From my understanding it was not about exotic edge cases, but the usage
of the given Simple API, which lead to performance loss.
I would still love to see what changes made the differences, although I
am working as well on the underlaying layer without the Simple API I
could learn from it.

Thanks,
Svante

Mime
View raw message