incubator-odf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devin Han <devin...@apache.org>
Subject Re: Table API Performance Issue
Date Mon, 28 Nov 2011 16:41:44 GMT
2011/11/28 Svante Schubert <svante.schubert@gmail.com>

> It is unclear to me if he works on the deprecated DOC API or the Simple
> API.
> Certainly he should switch to the latter.
>

he works on  the deprecated DOC API before, I have suggested him migrate to
Simple API.


> In both cases the table code is currently keeping the two-dimensional
> model of the table at all time in synch with the underlying XML model
> (in opposite to synch the XML when writing the table back).
>

He also pointed out an old issue that how to compute table size in
spreadsheet. The count of table-row/table-column or the border of table
content. Obviously, most of user think the second one is more reasonable.


> In addition the mentioned getCellByPosition(x, y) already expands the
> Table when reading it, I would require this behavior only from a write
> access, returning a default cell otherwise.
>

Yes, but we can't know whether it's a read only access or a write access
when the user call getCellByPosition(x, y).
Supply another method?


> You see, there is room for performance improvements.
>

Performance is an issue forever. Actually, I always take the benchmark
result of jOpenDocument to heart. I wish one day, we can give them a
powerful fight back, not keep silence as it is now.


> - Svante
>
> Am 28.11.2011 10:26, schrieb Devin Han:
> > Though we have done lots work on Table API performance[1][2][3], but we
> > still need to do more. See the following mail sent by Sebastian, a
> passsion
> > user;)
> >
> > ------------------------------------------------------------- mail
> >
> 1-------------------------------------------------------------------------------------
> >
> > Hi Devin,
> >
> > thanks for pointing me to the new apache-system. I'm currently trying to
> > improve a (very basic) MATLAB interface to the ODFdom-package, mainly
> being
> > interested in the table-import functionality.
> > [...]
> >
> > OK, concerning the interface ... generally the import now works for all
> > sorts of fields (using the ODFdom 0.8.7), but I have already noticed some
> > limitations and problems, which are kind of annoying. I'm not quite sure
> if
> > they are due to the ODFdom implementation or due to the fact that I'm
> using
> > Libre-Office (3.4.4). Is there any compatibility issue with LibreOffice
> ?!
> > The code often freezes when calling methods like ODFtable.getRowCount,
> > ODFtable.getColumnCount, or ODFtable.getCellByPosition(column, row).
> >
> > Anyways, I'll keep trying to check out the developers-branch, set up an
> > Eclipse project and get the debugging-environment working from Matlab. I
> > don't know, if my Java skills are sufficient to track down the problem
> and
> > really contribute anything. If I find anything, which I can track down to
> > ODFdom, I will report.
> >
> > Thanks & cheers
> >
> > Sebastian
> >
> >
> > ------------------------------------------------------------- mail
> >
> 2-------------------------------------------------------------------------------------
> >
> > Hi,
> >
> > I have investigated the issue further and I think I got to the core of
> the
> > problem:
> >
> > Anytime you set an attribute (like justification, data-type, validator
> ...)
> > for a whole column or row of a table, the methods getRowCount or
> > getColumnCount, respectively, will go to maximum values. This ist caused
> by
> > the fact that in the contents.xlm file there are according entries, just
> > like the one you had shown in the last mail (... just learned that
> > ods-files are zip-files :-))
> >
> > I see three possible solutions to the problem:
> >
> > 1) Edit all contents.xml files by hand ... not a good solution after all.
> >
> > 2) The following method was tested directly on the ods-file in
> combination
> > with the API:
> >    - remove direct cell formating from table (performed in Libre office,
> > don't know if this is possible with the API) and save
> >    - count cells or rows (using the API)
> >    - UNDO remove of cell formating (performed in Libre office, don't know
> > if this is possible with the API) and save
> >
> > Problem: by removing the direct formating, we also loose valuable
> > information, e.g. the number-format of the cell (date, float, time, ...),
> > so we need to UNDO these changes.
> >
> > 3) Change the according methods (e.g. for getRowCount):
> >
> >             if (n instanceof TableTableRowElement) {
> >                 result += ((TableTableRowElement)
> > n).getTableNumberRowsRepeatedAttribute();
> >             }
> >
> > As far as I understand this, the method ...
> > -> looks for all row-nodes (e.g. in contents.xml: <table:table-row
> > table:number-rows-repeated="1048569"> ... </table:table-row>)
> > -> look how many times row is repeated,
> > -> build sum of all occurences.
> >
> > Now, this is problematic, if the xml file containts rows, which are
> > obviously just there to establish a formating:
> >
> >  <table:table-row table:style-name="ro1"
> > table:number-rows-repeated="1048569"><table:table-cell
> > table:number-columns-repeated="4"/></table:table-row>
> >
> > Now my proposed solution: the method needs to check, if the according
> > RowElement contains any child-node (more specific any cell) *with
> *contents.
> > If not, the according row should be ommited from the count.
> >
> > I have no clue, if this is a suitable solution within the framework of
> the
> > Simple API, since the methods are used at several places within the API.
> > Because the problem can be tracked down to a really simple cause (i.e.
> the
> > formating of a whole row or column, which is a quite common thing to do
> > ...), it is quite severe: it prevent the API to work properly for quite
> > common tables and should be addressed with high priority.
> >
> > OK, this is all I can do for the moment. Since my java skills and my
> > insight into the framework of the API are very limited, I think I'll
> leave
> > the work to the pros ... :-) If you have any other solutions or need me
> to
> > test something, I'd be willing to help.
> >
> > Cheers
> >
> > Sebastian
> >
> ----------------------------------------------------------------------------------------------------------------------------
> >
> > What's your guys opinions for Sebastian's suggestion in mail 2?
> > I think we need to continue Table API performacne tuning in the coming
> > verion.
> >
> > [1] https://issues.apache.org/jira/browse/ODFTOOLKIT-98
> > [2] https://issues.apache.org/jira/browse/ODFTOOLKIT-215
> > [3] https://issues.apache.org/jira/browse/ODFTOOLKIT-284
>
>


-- 
-Devin

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message