incubator-odf-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Weir <robw...@apache.org>
Subject Re: How to get the last row in a table
Date Tue, 05 Jun 2012 23:04:37 GMT
On Tue, Jun 5, 2012 at 10:55 AM, Minto van der sluis <minto@xup.nl> wrote:
> Rob,
>
> Thanks for your answer, see my remarks below.
>
> Op 5-6-2012 14:56, Rob Weir schreef:
>> On Tue, Jun 5, 2012 at 7:07 AM, Minto van der sluis <minto@xup.nl> wrote:
>>> Hmm, might be related to my spreadsheet. I have a spreadsheet that was
>>> downloaded from google docs (converted from google's native format).
>>>
>>> Initially it had:
>>> rows: 100000+
>>> columns: 1024
>>>
>>> After opening and saving the document. It contains:
>>> rows: 100
>>> cols: 1024
>>>
>>> This is still far to much for my table of 5 columns by 12 rows.
>>>
>>> For comparison a new empty table gives the following:
>>> rows: 2
>>> cols: 5
>>>
>>
>> I see this happening in spreadsheet editors quite often.  They save
>> the ODF file with a much larger span of cells than actually have
>> content.
>>
>> One thing you can do to verify this is rename the ODF document from
>> *.ods to *.zip.  Then you can use any ZIP utility to look inside.  The
>> main file will be content.xml.  If you look at that file you will see
>> how your editor actually defined the table.
>
> In this file I encounter tags like:
>
>        <table:table-column
>                table:style-name="co2"
>                table:number-columns-repeated="1020"
>                table:default-cell-style-name="Default"/>
>
>        <table:table-row
>                table:style-name="ro4"
>                table:number-rows-repeated="1048475">
>
> Can these be identified using in the simple API? If so I could check if
> the repeated number is larger as some threshold (eq. > 100). If so I can
> stop processing any further rows or columns.
>

No.  The abstraction of the Simple API is that a table is a 2D array.
We abstract away details like repeated rows.  You can get to that
level of abstraction using ODFDOM.

> If this happens in other editors as well then iterators become almost
> useless and performs lousy. Getting the iterator when having 1048475
> empty rows is killing performance.
>

Right.  Unfortunately the table is this case is in bad shape when we
load it. I wonderif it would have to have a convenience method like
calculateTrueBounds() or resizeToContent() or something like that,
which would walk in from the lower right of the allocated table and
determine the lowest and right-most cell with actual content?

>>
>>> Anyone an idea why I get these high numbers? Or is google docs the cause?
>>>
>>
>> I think the idea was for the editor to leave room for user to easily
>> add new content into the sheet.  If it saved it as exactly how much
>> was used, then the next time the document was retrieved, the table
>> would be shrunk to exactly that size.  So if the user wanted to add
>> new content they would need to first append or insert new rows and
>> columns, not a very good user experience.   Of course, we have the
>> opposite expectation with tables in a text document.
>>
>
> In my opinion it's okay for an editor to show it like that to the user.
> But when storing the spreadsheet it should skip the empty trailing rows
> and columns. Like this automated processing of ODS spreadsheet file will
> be very time consuming due these wasteful empty rows and columns.
>

It depends on the application.  I've seen spreadsheets that were
designed for printed output, to make forms for manual data collection,
where there were repeated blank rows there intentionally.  Or blank of
content, though they did have styles associated with them.  So they
could have shaded background, etc.  So blank is not always the same as
unused.  It depends on the use case.

-Rob

>> -Rob
>>
>>
>>> Regards,
>>>
>>> Minto
>>>
>>> Op 5-6-2012 12:19, Minto van der sluis schreef:
>>>> Hi,
>>>>
>>>> Having looked at several places I can't find an answer to this simple
>>>> question.
>>>>
>>>> Table.getRowCount() gives me a number far to high for my little table
>>>> with < 50 rows.
>>>>
>>>> Regards,
>>>>
>>>> Minto
>>>
>>>
>>> --
>>> ir. ing. Minto van der Sluis
>>> Software innovator / renovator
>>> Xup BV
>>>
>>> Mobiel: +31 (0) 626 014541
>
> Regards,
>
> Minto
>
> --
> ir. ing. Minto van der Sluis
> Software innovator / renovator
> Xup BV
>
> Mobiel: +31 (0) 626 014541

Mime
View raw message