incubator-odf-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bruno Girin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ODFTOOLKIT-388) Test hangs when iterating over a spreadsheet created with LibreOffice 4.0.0
Date Wed, 16 Apr 2014 13:06:15 GMT

    [ https://issues.apache.org/jira/browse/ODFTOOLKIT-388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971379#comment-13971379
] 

Bruno Girin commented on ODFTOOLKIT-388:
----------------------------------------

After some analysis, here is a summary of what happens. It all comes down to a couple of things
that LibreOffice does:
- it uses repeated empty cells and rows to specify the maximum size of the sheet area irrespective
of whether the whole sheet is filled in or not;
- the number of columns declared is a maximum and in most cases the sheet does not contain
a single row with that many columns.

h2. How LibreOffice specifies the document

First, it declares a number of columns that may or may not be filled in but that more importantly
for LibreOffice defines what styles to use:
{code}
<table:table-column table:style-name="co1" table:number-columns-repeated="257" table:default-cell-style-name="ce1"/>
{code}

Then at the end of each row, it includes a repeated empty cell (note the self closing tag
with no content and no {{office:value-type}} attribute):
{code}
<table:table-cell table:number-columns-repeated="254"/>
{code}

At the end of the sheet, it defines a number of repeated empty rows, i.e. rows with one or
several empty cells:
{code}
<table:table-row table:style-name="ro1" table:number-rows-repeated="1048574">
  <table:table-cell table:number-columns-repeated="257"/>
</table:table-row>
<table:table-row table:style-name="ro1">
  <table:table-cell table:number-columns-repeated="257"/>
</table:table-row>
{code}

When taking this info at face value, it declares a spreadsheet that has 257 columns and 1048576
rows even though the there are only 3 cells in a single row that are not empty.

h2. What ODF Toolkit does

The {{Table.getRowCount()}} method is fairly straight-forward and counts each row by adding
all {{number-rows-repeated}} values together. However, as it doesn't recognise empty rows,
it returns a value of 1048576 even though there is only one non-empty row.

The {{Row.getCellCount()}} method is more complex as it tries to take into account the cover
list by calling {{Table.getCellCoverInfos}} and that's where it hangs for the following reasons:
- {{Row.getCellCount()}} calls {{Table.getCellCoverInfos}} by giving it the number of columns
(taken from the columns declared in the document irrespective of whether the particular row
it's on really has that many cells) and the total number of rows so in this case {{Table.getCellCoverInfos}}
iterates over what it believes is a 257 by 1048576 cells sheet;
- {{Table.getCellCoverInfos}} calls {{Table.getCellByPosition}} which itself calls {{Table.getRowByIndex}}
and {{Row.getCellByIndex}} all of which have the side effects of creating missing instances
of Cell or Row on the fly;
- The result of {{Table.getCellCoverInfos}} is re-calculated every time {{Row.getCellCount()}}
is called.

So this means that it is possible to craft a very small ODS document that if ODF1.2 compliant
and can trigger ODF Toolkit to hang, as LibreOffice does.

h2. How to fix this

In order to fix this, I would suggest the following (none of which sounds easy):
# Recognise empty rows and cells and change {{Table.getRowCount()}} and {{Row.getCellCount()}}
so that they don't include those in results,
# Take the number of columns as an indicative maximum rather than the actual number of columns,
# Split {{Table.getCellByPosition}}, {{Table.getRowByIndex}} and {{Row.getCellByIndex}} into
versions with and versions without side effects,
# Abstract cover list handling into the Table class so that it caches it and doesn't re-create
it each time {{Row.getCellCount()}} is called (there are probably additional optimisations
that can be done on the cover list: for example, you can assume that any cell that covers
another one has row and column indices that are lower than the covered cell).

This can probably be done without breaking the existing behaviour of the API but is not a
small endeavour. So I'd appreciate feedback and suggestions before I start fiddling with code.

> Test hangs when iterating over a spreadsheet created with LibreOffice 4.0.0
> ---------------------------------------------------------------------------
>
>                 Key: ODFTOOLKIT-388
>                 URL: https://issues.apache.org/jira/browse/ODFTOOLKIT-388
>             Project: ODF Toolkit
>          Issue Type: Bug
>          Components: simple api
>    Affects Versions: 0.6-incubating, 0.6.1-incubating
>            Reporter: Bruno Girin
>         Attachments: SpreadsheetDocumentTest.java, saxProblem.ods, simple.ods, toolkit.patch
>
>
> When iterating over a simple spreadsheet created with LibreOffice 4, the code hangs on
Row.getCellCount().
> Running the same document through the validator at http://odf-validator.rhcloud.com/
confirms that it is conformant to ODF1.2:
> {quote}
> The document is conformant ODF1.2!
> Details:
> simple.ods: Info: ODF version of root document: 1.2
> internal:/schema/odf1.2/OpenDocument-v1.2-cos01-manifest-schema.rng: Info: parsed.
> simple.ods/META-INF/manifest.xml: Info: no errors, no warnings
> simple.ods/mimetype: Info: no errors, no warnings
> simple.ods: Info: Media Type: application/vnd.oasis.opendocument.spreadsheet
> internal:/schema/odf1.2/OpenDocument-v1.2-cos01-schema.rng: Info: parsed.
> simple.ods/meta.xml: Info: Generator: LibreOffice/4.0.2.2$Linux_X86_64 LibreOffice_project/400m0$Build-2
> simple.ods/meta.xml: Info: no errors, no warnings
> simple.ods/settings.xml: Info: no errors, no warnings
> simple.ods/styles.xml: Info: no errors, no warnings
> simple.ods/content.xml: Info: no errors, no warnings
> internal:/schema/odf1.2/OpenDocument-v1.2-cos01-dsig-schema.rng: Info: parsed.
> simple.ods: Info: no errors, no warnings
> {quote}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message