incubator-odf-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nicholas Evans <nick.ev...@inology.nl>
Subject Re: ODF Performance
Date Mon, 10 Feb 2014 20:29:17 GMT
I have attached a class to the JIRA issue (ODFTOOLKIT-385) to illustrate
what I have in mind.  The methods I have defined there can write 10000 rows
to the spreadsheet in 1 second, which is vastly faster than anything
possible in the Simple API.  Usage is limited to the case that you have the
data ready up front and want to write it all to the spreadsheet in one go.
However I think this is a pretty common use case.

The methods used in the simple API first get all the cells in the
spreadsheet, and then iterate through them until they find the one you want
to write to.  When writing lots of rows, this leads to an O(N^2) algorithm,
which is slow.


2014-02-10 18:49 GMT+01:00 Svante Schubert <svante.schubert@gmail.com>:

> Am 10.02.2014 17:42, schrieb Rob Weir:
> > On Fri, Jan 31, 2014 at 6:25 AM, Nicholas Evans <nick.evans@inology.nl>
> wrote:
> >> Dear ODF users,
> >>
> >> For a project I am working on, I am using the ODF toolkit to create
> >> spreadsheets that can become rather large (>10 000 rows).  I have
> noticed
> >> that as the spreadsheet gets larger, writing the rows becomes very
> slow.  I
> >> have put together a class containing 4 different ways of writing 10 000
> rows
> >> of 10 columns to a spreadsheet.  The fastest method (using
> getRowByIndex and
> >> then getCellByIndex) takes 70 seconds.  The methods that use getRowList
> and
> >> getNextRow are much slower, taking about 170 seconds each.  The method
> using
> >> the Iterator<Row> seems to freeze for large inputs, and doesn't behave
> as
> >> expected for small inputs.
> >>
> >> I would really like to improve this performance.  I think this could be
> done
> >> by manipulating the DOM directly.  However, it would great if there was
> a
> >> way of using the Simple API that I have overlooked that could help me.
> >>
> >> Does anyone have experience with improving the performance of the ODF
> >> toolkit in the context of writing rows to an ods spreadsheet?
> >>
> > We've had discussions on this topic before.  It comes down to use
> > cases.  The DOM model with everything in memory at once, facilitates
> > random-access to the content of the document and a style of
> > programming that is similar to what one might do in spreadsheet macro.
> >  It is a very natural way to think about a document, but it does
> > require a lot of RAM.
> >
> > There are specialized use-cases where it should be possible to write
> > code that will perform much faster, e.g.:
> >
> > 1) Uses cases that can be met with a read-onl single-pass streaming
> > process.  In such cases you don't need a DOM at all.  It could be done
> > via SAX.
> >
> > 2) A write-only scenario where you specify the contents of a document,
> > but don't need to query things like "the contents of cell B27".   Iyt
> > is also possible to have a read/write scenario, but at increased
> > complexity.  Finding B27 is easy in a 2D array, but harder in a sparse
> > matrix representation.
> >
> > Note:  if we want to, we can always start up a branch to experiment
> > with a different approach.  If it pans out we integrate it with the
> > trunk.  If it doesn't, then we learn from the experience.   I wouldn't
> > find starting a new package to do the read-only streaming approach.
> >
> From my understanding it was not about exotic edge cases, but the usage
> of the given Simple API, which lead to performance loss.
> I would still love to see what changes made the differences, although I
> am working as well on the underlaying layer without the Simple API I
> could learn from it.
>
> Thanks,
> Svante
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message