poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Law <David....@apconsult.de>
Subject Re: POI 3.7 beta 2 "new XSSFWorkbook" very slow (MOD function)
Date Tue, 05 Oct 2010 07:51:52 GMT

sorry its taken so long to post this, I've been "mulling over"...
...and I see the patch has been applied.

As Yegor mentions in the Bug report...
mod(-3.4,2.0) being 0.60...01
...is due to binary arithmetic.

Now I just wondered how OpenOffice got it right and stumbled over
some code to round to 15 digits in what seems to be the mod function.

Also, surely the aim here is to produce the same result as Excel,
which I tried playing with a bit & seems to round an awful lot of
things to just 14 digits, although it seems to be able to deal with
values MUCH smaller than that.

Anyway, this was just intended to get a bit of discussion going on
the matter, if there's anything to discuss that is.  :-)

Best regards,

Ryan Skow wrote:
> Converting to XLS results in the workbook being opened in less than a 
> second using POI 3.7b3 so that is probably what I'll do as a 
> workaround.  Thank you very much for that idea.
> To hopefully help get this problem solved for others and for future 
> reference, I have included answers to the rest of the questions that 
> were asked.
> General information related to the conversion and file size:
> POI Version  File Format  File Size(bytes)  Open Time(seconds)
> 3.6             xlsx        1,501,533               28
> 3.6             xls         4,951,040                1
> 3.7b3           xlsx        1,501,533              600
> 3.7b3           xls         4,951,040                1
> The workbook contains 8 sheets and the (columns,rows) are as follows:
> Sheet 0: (19,123) (mostly raw numeric/text values - inputs)
> Sheet 1: (13,124) (almost all formulas)
> Sheet 2: (33,1459) (almost all formulas)
> Sheet 3: (20,1459) (almost all formulas)
> Sheet 4: (19,131) (almost all formulas)
> Sheet 5: (28, 775) (almost all raw numeric values)
> Sheet 6: (11, 109) (almost all raw numeric values)
> Sheet 7: (33, 89) (mostly tables used in VLOOKUP)
> Attributes of the Workbook:
>   * There are LOTS of VLOOKUP function calls.
>   * There are LOTS regular multiply/divide/add/subtract which use back 
> referenced cells in their calculations.
>   * There are LOTS of nested cell references.
>   * There are quite a few 'named' cells which are referred to 
> throughout the various calculations.
>   * There are NO references to the 'SUM' function or other array 
> related calculations from what I can tell.
> You are correct that the performance issue has nothing to do with 
> INDIRECT - I removed those references and the timings were still the 
> same.
> As a side note, I did figure out what was wrong with the formula 
> values.  It turned out to be a problem in MathX.java and its 
> implementation of the MOD function.  It incorrectly reported 
> mod(13,12) to be 0.9999999999999991 instead of simply 1.  This threw 
> off some of the VLOOKUP calls since the MODs were being used as the 
> index of the VLOOKUP table.  I have submitted a patch including test 
> case through Bugzilla:
> https://issues.apache.org/bugzilla/show_bug.cgi?id=50033
> If additional information would be helpful, please let me know and 
> I'll track it down.
> Thanks!
> On 9/30/2010 6:33 AM, Yegor Kozlov wrote:
>> How big is the problem workbook? Can you tell us its size in MB, the
>> number of sheets and the size of data matrix on each sheet. Every little
>> bit of information helps.
>> Can you obfuscate sensitive data? Without a sample file we can't do much
>> to help you.
>> I suspect the slowness may be related to the support for array formulas
>> which was introduced right after 3.6-FINAL. Does your workbook contain
>> array formulas? They look something like SUM(C11:C12*D11:D12).
>> Evaluation of array formulas is not yet fully implemented and it
>> explains why you got incorrect formula results.
>> Indirect evaluation of defined names is unlikely to affect performance.
>> It was an isolated fix in the implementation of INDIRECT.
>> Try to save the file as .xls and run the same code against HSSF. How
>> much is it faster?
>> Regards,
>> Yegor
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org

To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org

View raw message