incubator-odf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devin Han <devin...@apache.org>
Subject Table API Performance Issue
Date Mon, 28 Nov 2011 09:26:35 GMT
Though we have done lots work on Table API performance[1][2][3], but we
still need to do more. See the following mail sent by Sebastian, a passsion
user;)

------------------------------------------------------------- mail
1-------------------------------------------------------------------------------------

Hi Devin,

thanks for pointing me to the new apache-system. I'm currently trying to
improve a (very basic) MATLAB interface to the ODFdom-package, mainly being
interested in the table-import functionality.
[...]

OK, concerning the interface ... generally the import now works for all
sorts of fields (using the ODFdom 0.8.7), but I have already noticed some
limitations and problems, which are kind of annoying. I'm not quite sure if
they are due to the ODFdom implementation or due to the fact that I'm using
Libre-Office (3.4.4). Is there any compatibility issue with LibreOffice ?!
The code often freezes when calling methods like ODFtable.getRowCount,
ODFtable.getColumnCount, or ODFtable.getCellByPosition(column, row).

Anyways, I'll keep trying to check out the developers-branch, set up an
Eclipse project and get the debugging-environment working from Matlab. I
don't know, if my Java skills are sufficient to track down the problem and
really contribute anything. If I find anything, which I can track down to
ODFdom, I will report.

Thanks & cheers

Sebastian


------------------------------------------------------------- mail
2-------------------------------------------------------------------------------------

Hi,

I have investigated the issue further and I think I got to the core of the
problem:

Anytime you set an attribute (like justification, data-type, validator ...)
for a whole column or row of a table, the methods getRowCount or
getColumnCount, respectively, will go to maximum values. This ist caused by
the fact that in the contents.xlm file there are according entries, just
like the one you had shown in the last mail (... just learned that
ods-files are zip-files :-))

I see three possible solutions to the problem:

1) Edit all contents.xml files by hand ... not a good solution after all.

2) The following method was tested directly on the ods-file in combination
with the API:
   - remove direct cell formating from table (performed in Libre office,
don't know if this is possible with the API) and save
   - count cells or rows (using the API)
   - UNDO remove of cell formating (performed in Libre office, don't know
if this is possible with the API) and save

Problem: by removing the direct formating, we also loose valuable
information, e.g. the number-format of the cell (date, float, time, ...),
so we need to UNDO these changes.

3) Change the according methods (e.g. for getRowCount):

            if (n instanceof TableTableRowElement) {
                result += ((TableTableRowElement)
n).getTableNumberRowsRepeatedAttribute();
            }

As far as I understand this, the method ...
-> looks for all row-nodes (e.g. in contents.xml: <table:table-row
table:number-rows-repeated="1048569"> ... </table:table-row>)
-> look how many times row is repeated,
-> build sum of all occurences.

Now, this is problematic, if the xml file containts rows, which are
obviously just there to establish a formating:

 <table:table-row table:style-name="ro1"
table:number-rows-repeated="1048569"><table:table-cell
table:number-columns-repeated="4"/></table:table-row>

Now my proposed solution: the method needs to check, if the according
RowElement contains any child-node (more specific any cell) *with *contents.
If not, the according row should be ommited from the count.

I have no clue, if this is a suitable solution within the framework of the
Simple API, since the methods are used at several places within the API.
Because the problem can be tracked down to a really simple cause (i.e. the
formating of a whole row or column, which is a quite common thing to do
...), it is quite severe: it prevent the API to work properly for quite
common tables and should be addressed with high priority.

OK, this is all I can do for the moment. Since my java skills and my
insight into the framework of the API are very limited, I think I'll leave
the work to the pros ... :-) If you have any other solutions or need me to
test something, I'd be willing to help.

Cheers

Sebastian
----------------------------------------------------------------------------------------------------------------------------

What's your guys opinions for Sebastian's suggestion in mail 2?
I think we need to continue Table API performacne tuning in the coming
verion.

[1] https://issues.apache.org/jira/browse/ODFTOOLKIT-98
[2] https://issues.apache.org/jira/browse/ODFTOOLKIT-215
[3] https://issues.apache.org/jira/browse/ODFTOOLKIT-284
-- 
-Devin

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message