incubator-odf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Svante Schubert <svante.schub...@gmail.com>
Subject Re: Table API Performance Issue
Date Mon, 28 Nov 2011 11:59:16 GMT
It is unclear to me if he works on the deprecated DOC API or the Simple API.
Certainly he should switch to the latter.

In both cases the table code is currently keeping the two-dimensional
model of the table at all time in synch with the underlying XML model
(in opposite to synch the XML when writing the table back).
In addition the mentioned getCellByPosition(x, y) already expands the
Table when reading it, I would require this behavior only from a write
access, returning a default cell otherwise.
You see, there is room for performance improvements.

- Svante

Am 28.11.2011 10:26, schrieb Devin Han:
> Though we have done lots work on Table API performance[1][2][3], but we
> still need to do more. See the following mail sent by Sebastian, a passsion
> user;)
>
> ------------------------------------------------------------- mail
> 1-------------------------------------------------------------------------------------
>
> Hi Devin,
>
> thanks for pointing me to the new apache-system. I'm currently trying to
> improve a (very basic) MATLAB interface to the ODFdom-package, mainly being
> interested in the table-import functionality.
> [...]
>
> OK, concerning the interface ... generally the import now works for all
> sorts of fields (using the ODFdom 0.8.7), but I have already noticed some
> limitations and problems, which are kind of annoying. I'm not quite sure if
> they are due to the ODFdom implementation or due to the fact that I'm using
> Libre-Office (3.4.4). Is there any compatibility issue with LibreOffice ?!
> The code often freezes when calling methods like ODFtable.getRowCount,
> ODFtable.getColumnCount, or ODFtable.getCellByPosition(column, row).
>
> Anyways, I'll keep trying to check out the developers-branch, set up an
> Eclipse project and get the debugging-environment working from Matlab. I
> don't know, if my Java skills are sufficient to track down the problem and
> really contribute anything. If I find anything, which I can track down to
> ODFdom, I will report.
>
> Thanks & cheers
>
> Sebastian
>
>
> ------------------------------------------------------------- mail
> 2-------------------------------------------------------------------------------------
>
> Hi,
>
> I have investigated the issue further and I think I got to the core of the
> problem:
>
> Anytime you set an attribute (like justification, data-type, validator ...)
> for a whole column or row of a table, the methods getRowCount or
> getColumnCount, respectively, will go to maximum values. This ist caused by
> the fact that in the contents.xlm file there are according entries, just
> like the one you had shown in the last mail (... just learned that
> ods-files are zip-files :-))
>
> I see three possible solutions to the problem:
>
> 1) Edit all contents.xml files by hand ... not a good solution after all.
>
> 2) The following method was tested directly on the ods-file in combination
> with the API:
>    - remove direct cell formating from table (performed in Libre office,
> don't know if this is possible with the API) and save
>    - count cells or rows (using the API)
>    - UNDO remove of cell formating (performed in Libre office, don't know
> if this is possible with the API) and save
>
> Problem: by removing the direct formating, we also loose valuable
> information, e.g. the number-format of the cell (date, float, time, ...),
> so we need to UNDO these changes.
>
> 3) Change the according methods (e.g. for getRowCount):
>
>             if (n instanceof TableTableRowElement) {
>                 result += ((TableTableRowElement)
> n).getTableNumberRowsRepeatedAttribute();
>             }
>
> As far as I understand this, the method ...
> -> looks for all row-nodes (e.g. in contents.xml: <table:table-row
> table:number-rows-repeated="1048569"> ... </table:table-row>)
> -> look how many times row is repeated,
> -> build sum of all occurences.
>
> Now, this is problematic, if the xml file containts rows, which are
> obviously just there to establish a formating:
>
>  <table:table-row table:style-name="ro1"
> table:number-rows-repeated="1048569"><table:table-cell
> table:number-columns-repeated="4"/></table:table-row>
>
> Now my proposed solution: the method needs to check, if the according
> RowElement contains any child-node (more specific any cell) *with *contents.
> If not, the according row should be ommited from the count.
>
> I have no clue, if this is a suitable solution within the framework of the
> Simple API, since the methods are used at several places within the API.
> Because the problem can be tracked down to a really simple cause (i.e. the
> formating of a whole row or column, which is a quite common thing to do
> ...), it is quite severe: it prevent the API to work properly for quite
> common tables and should be addressed with high priority.
>
> OK, this is all I can do for the moment. Since my java skills and my
> insight into the framework of the API are very limited, I think I'll leave
> the work to the pros ... :-) If you have any other solutions or need me to
> test something, I'd be willing to help.
>
> Cheers
>
> Sebastian
> ----------------------------------------------------------------------------------------------------------------------------
>
> What's your guys opinions for Sebastian's suggestion in mail 2?
> I think we need to continue Table API performacne tuning in the coming
> verion.
>
> [1] https://issues.apache.org/jira/browse/ODFTOOLKIT-98
> [2] https://issues.apache.org/jira/browse/ODFTOOLKIT-215
> [3] https://issues.apache.org/jira/browse/ODFTOOLKIT-284


Mime
View raw message