incubator-odf-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Minto van der sluis <>
Subject Re: How to get the last row in a table
Date Wed, 06 Jun 2012 08:35:08 GMT
Op 6-6-2012 1:04, Rob Weir schreef:
> On Tue, Jun 5, 2012 at 10:55 AM, Minto van der sluis <> wrote:
>> Rob,
>> Thanks for your answer, see my remarks below.
>> Op 5-6-2012 14:56, Rob Weir schreef:
>>> On Tue, Jun 5, 2012 at 7:07 AM, Minto van der sluis <> wrote:
>>>> Hmm, might be related to my spreadsheet. I have a spreadsheet that was
>>>> downloaded from google docs (converted from google's native format).
>>>> Initially it had:
>>>> rows: 100000+
>>>> columns: 1024
>>>> After opening and saving the document. It contains:
>>>> rows: 100
>>>> cols: 1024
>>>> This is still far to much for my table of 5 columns by 12 rows.
>>>> For comparison a new empty table gives the following:
>>>> rows: 2
>>>> cols: 5
>>> I see this happening in spreadsheet editors quite often.  They save
>>> the ODF file with a much larger span of cells than actually have
>>> content.
>>> One thing you can do to verify this is rename the ODF document from
>>> *.ods to *.zip.  Then you can use any ZIP utility to look inside.  The
>>> main file will be content.xml.  If you look at that file you will see
>>> how your editor actually defined the table.
>> In this file I encounter tags like:
>>        <table:table-column
>>                table:style-name="co2"
>>                table:number-columns-repeated="1020"
>>                table:default-cell-style-name="Default"/>
>>        <table:table-row
>>                table:style-name="ro4"
>>                table:number-rows-repeated="1048475">
>> Can these be identified using in the simple API? If so I could check if
>> the repeated number is larger as some threshold (eq. > 100). If so I can
>> stop processing any further rows or columns.
> No.  The abstraction of the Simple API is that a table is a 2D array.
> We abstract away details like repeated rows.  You can get to that
> level of abstraction using ODFDOM.
>> If this happens in other editors as well then iterators become almost
>> useless and performs lousy. Getting the iterator when having 1048475
>> empty rows is killing performance.
> Right.  Unfortunately the table is this case is in bad shape when we
> load it. I wonderif it would have to have a convenience method like
> calculateTrueBounds() or resizeToContent() or something like that,
> which would walk in from the lower right of the allocated table and
> determine the lowest and right-most cell with actual content?

This would be nice but feels a bit odd. In this case Simple API users
should practically always need to call such a method when they are not
sure of the document origin.

Another solution could be to add support for RepeatedRow and
RepeatedColumn (derived from Row and Column). This would improve
iterator performance and still allows access to style info of cells past
the calculated bounds. Also client can handle repeated row/columns more

Anyway, just a thought ;-)

>>>> Anyone an idea why I get these high numbers? Or is google docs the cause?
>>> I think the idea was for the editor to leave room for user to easily
>>> add new content into the sheet.  If it saved it as exactly how much
>>> was used, then the next time the document was retrieved, the table
>>> would be shrunk to exactly that size.  So if the user wanted to add
>>> new content they would need to first append or insert new rows and
>>> columns, not a very good user experience.   Of course, we have the
>>> opposite expectation with tables in a text document.
>> In my opinion it's okay for an editor to show it like that to the user.
>> But when storing the spreadsheet it should skip the empty trailing rows
>> and columns. Like this automated processing of ODS spreadsheet file will
>> be very time consuming due these wasteful empty rows and columns.
> It depends on the application.  I've seen spreadsheets that were
> designed for printed output, to make forms for manual data collection,
> where there were repeated blank rows there intentionally.  Or blank of
> content, though they did have styles associated with them.  So they
> could have shaded background, etc.  So blank is not always the same as
> unused.  It depends on the use case.
> -Rob
>>> -Rob
>>>> Regards,
>>>> Minto
>>>> Op 5-6-2012 12:19, Minto van der sluis schreef:
>>>>> Hi,
>>>>> Having looked at several places I can't find an answer to this simple
>>>>> question.
>>>>> Table.getRowCount() gives me a number far to high for my little table
>>>>> with < 50 rows.

View raw message