incubator-odf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Svante Schubert <>
Subject Re: Table API Performance Issue
Date Mon, 28 Nov 2011 22:25:45 GMT
Am 28.11.2011 17:41, schrieb Devin Han:
> 2011/11/28 Svante Schubert <>
> ..
>> In both cases the table code is currently keeping the two-dimensional
>> model of the table at all time in synch with the underlying XML model
>> (in opposite to synch the XML when writing the table back).
> He also pointed out an old issue that how to compute table size in
> spreadsheet. The count of table-row/table-column or the border of table
> content. Obviously, most of user think the second one is more reasonable.
Some earlier versions ODF documents repeat the rows/columns to maximum
with empty rows/columns.
There is no rule in the spec against it, therefore the content way seems
to be the saver of both ways.
Still we might question ourselves if content is sufficient to find the
size of a table? What if some cells are colored, have borders or
metadata, but no content as text?
None default cells might be our best choice and content perhaps a fall
back (e.g. as heuristic for printing)

If during initial table parsing the maximums size could be gathered and
during run-time constantly being updated.
Such run-time updates include an expansion, if content was added to an
external cell or shrank when the last cell content (and layout) at the
border of a table has been deleted.
This would save time and would be more accurate, as shrinking a table is
currently not realized by the Simple API, right?
>> In addition the mentioned getCellByPosition(x, y) already expands the
>> Table when reading it, I would require this behavior only from a write
>> access, returning a default cell otherwise.
> Yes, but we can't know whether it's a read only access or a write access
> when the user call getCellByPosition(x, y).
> Supply another method?
As said above table size correction should be triggered, when the
content is being written (or deleted), or the cell is set-back to a
default table cell.
>> You see, there is room for performance improvements.
> Performance is an issue forever. Actually, I always take the benchmark
> result of jOpenDocument to heart. I wish one day, we can give them a
> powerful fight back, not keep silence as it is now.
If you want to tweak on the performance, we need performance regression
tests first. Otherwise how can we proof our achievement and keep it for
the future?
Second, we must try to tweak the scenarios our users do most often. We
might consider object reusage with pools, e.g. for cells and using
something similar to hash maps for quick access.
In any case, this extra logic should invisible to the end user, hidden
behind the Simple API.
>> - Svante
>> Am 28.11.2011 10:26, schrieb Devin Han:
>>> Though we have done lots work on Table API performance[1][2][3], but we
>>> still need to do more. See the following mail sent by Sebastian, a
>> passsion
>>> user;)
>>> ------------------------------------------------------------- mail
>> 1-------------------------------------------------------------------------------------
>>> Hi Devin,
>>> thanks for pointing me to the new apache-system. I'm currently trying to
>>> improve a (very basic) MATLAB interface to the ODFdom-package, mainly
>> being
>>> interested in the table-import functionality.
>>> [...]
>>> OK, concerning the interface ... generally the import now works for all
>>> sorts of fields (using the ODFdom 0.8.7), but I have already noticed some
>>> limitations and problems, which are kind of annoying. I'm not quite sure
>> if
>>> they are due to the ODFdom implementation or due to the fact that I'm
>> using
>>> Libre-Office (3.4.4). Is there any compatibility issue with LibreOffice
>> ?!
>>> The code often freezes when calling methods like ODFtable.getRowCount,
>>> ODFtable.getColumnCount, or ODFtable.getCellByPosition(column, row).
>>> Anyways, I'll keep trying to check out the developers-branch, set up an
>>> Eclipse project and get the debugging-environment working from Matlab. I
>>> don't know, if my Java skills are sufficient to track down the problem
>> and
>>> really contribute anything. If I find anything, which I can track down to
>>> ODFdom, I will report.
>>> Thanks & cheers
>>> Sebastian
>>> ------------------------------------------------------------- mail
>> 2-------------------------------------------------------------------------------------
>>> Hi,
>>> I have investigated the issue further and I think I got to the core of
>> the
>>> problem:
>>> Anytime you set an attribute (like justification, data-type, validator
>> ...)
>>> for a whole column or row of a table, the methods getRowCount or
>>> getColumnCount, respectively, will go to maximum values. This ist caused
>> by
>>> the fact that in the contents.xlm file there are according entries, just
>>> like the one you had shown in the last mail (... just learned that
>>> ods-files are zip-files :-))
>>> I see three possible solutions to the problem:
>>> 1) Edit all contents.xml files by hand ... not a good solution after all.
>>> 2) The following method was tested directly on the ods-file in
>> combination
>>> with the API:
>>>    - remove direct cell formating from table (performed in Libre office,
>>> don't know if this is possible with the API) and save
>>>    - count cells or rows (using the API)
>>>    - UNDO remove of cell formating (performed in Libre office, don't know
>>> if this is possible with the API) and save
>>> Problem: by removing the direct formating, we also loose valuable
>>> information, e.g. the number-format of the cell (date, float, time, ...),
>>> so we need to UNDO these changes.
>>> 3) Change the according methods (e.g. for getRowCount):
>>>             if (n instanceof TableTableRowElement) {
>>>                 result += ((TableTableRowElement)
>>> n).getTableNumberRowsRepeatedAttribute();
>>>             }
>>> As far as I understand this, the method ...
>>> -> looks for all row-nodes (e.g. in contents.xml: <table:table-row
>>> table:number-rows-repeated="1048569"> ... </table:table-row>)
>>> -> look how many times row is repeated,
>>> -> build sum of all occurences.
>>> Now, this is problematic, if the xml file containts rows, which are
>>> obviously just there to establish a formating:
>>>  <table:table-row table:style-name="ro1"
>>> table:number-rows-repeated="1048569"><table:table-cell
>>> table:number-columns-repeated="4"/></table:table-row>
>>> Now my proposed solution: the method needs to check, if the according
>>> RowElement contains any child-node (more specific any cell) *with
>> *contents.
>>> If not, the according row should be ommited from the count.
>>> I have no clue, if this is a suitable solution within the framework of
>> the
>>> Simple API, since the methods are used at several places within the API.
>>> Because the problem can be tracked down to a really simple cause (i.e.
>> the
>>> formating of a whole row or column, which is a quite common thing to do
>>> ...), it is quite severe: it prevent the API to work properly for quite
>>> common tables and should be addressed with high priority.
>>> OK, this is all I can do for the moment. Since my java skills and my
>>> insight into the framework of the API are very limited, I think I'll
>> leave
>>> the work to the pros ... :-) If you have any other solutions or need me
>> to
>>> test something, I'd be willing to help.
>>> Cheers
>>> Sebastian
>> ----------------------------------------------------------------------------------------------------------------------------
>>> What's your guys opinions for Sebastian's suggestion in mail 2?
>>> I think we need to continue Table API performacne tuning in the coming
>>> verion.
>>> [1]
>>> [2]
>>> [3]

View raw message