poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yegor Kozlov <ye...@dinom.ru>
Subject Re: POI 3.7 beta 2 "new XSSFWorkbook" very slow (MOD function)
Date Tue, 05 Oct 2010 15:29:26 GMT
  Different products address rounding errors differently. The question 
is what to consider the "correct" result.

Consider the following formula:
     =(0.5-0.4-0.1)

You may think it evaluates to 0.00 but it does not!

MS Office and POI evaluate it as -2.78E-17 and this is a correct result 
for 'strict' floating-point math.

However, OpenOffice and Google Spreadsheet evaluate it to 0.00.

The equivalent formula SUM(0.5,-0.4,-0.1) results in the same values: 
-2.78E-17 in MS Office and 0.00 in OpenOffice.

Some information about Excel's floating-point arithmetic can found at 
http://support.microsoft.com/kb/78113 and 
http://support.microsoft.com/kb/214118

Yegor

> Hi,
>
> sorry its taken so long to post this, I've been "mulling over"...
> ...and I see the patch has been applied.
>
> As Yegor mentions in the Bug report...
> mod(-3.4,2.0) being 0.60...01
> ...is due to binary arithmetic.
>
> Now I just wondered how OpenOffice got it right and stumbled over
> some code to round to 15 digits in what seems to be the mod function.
>
> Also, surely the aim here is to produce the same result as Excel,
> which I tried playing with a bit & seems to round an awful lot of
> things to just 14 digits, although it seems to be able to deal with
> values MUCH smaller than that.
>
> Anyway, this was just intended to get a bit of discussion going on
> the matter, if there's anything to discuss that is.  :-)
>
> Best regards,
> DaveLaw
>
> Ryan Skow wrote:
>>
>> Converting to XLS results in the workbook being opened in less than a 
>> second using POI 3.7b3 so that is probably what I'll do as a 
>> workaround.  Thank you very much for that idea.
>>
>> To hopefully help get this problem solved for others and for future 
>> reference, I have included answers to the rest of the questions that 
>> were asked.
>>
>> General information related to the conversion and file size:
>> POI Version  File Format  File Size(bytes)  Open Time(seconds)
>> 3.6             xlsx        1,501,533               28
>> 3.6             xls         4,951,040                1
>> 3.7b3           xlsx        1,501,533              600
>> 3.7b3           xls         4,951,040                1
>>
>> The workbook contains 8 sheets and the (columns,rows) are as follows:
>> Sheet 0: (19,123) (mostly raw numeric/text values - inputs)
>> Sheet 1: (13,124) (almost all formulas)
>> Sheet 2: (33,1459) (almost all formulas)
>> Sheet 3: (20,1459) (almost all formulas)
>> Sheet 4: (19,131) (almost all formulas)
>> Sheet 5: (28, 775) (almost all raw numeric values)
>> Sheet 6: (11, 109) (almost all raw numeric values)
>> Sheet 7: (33, 89) (mostly tables used in VLOOKUP)
>>
>> Attributes of the Workbook:
>>   * There are LOTS of VLOOKUP function calls.
>>   * There are LOTS regular multiply/divide/add/subtract which use 
>> back referenced cells in their calculations.
>>   * There are LOTS of nested cell references.
>>   * There are quite a few 'named' cells which are referred to 
>> throughout the various calculations.
>>   * There are NO references to the 'SUM' function or other array 
>> related calculations from what I can tell.
>>
>> You are correct that the performance issue has nothing to do with 
>> INDIRECT - I removed those references and the timings were still the 
>> same.
>>
>> As a side note, I did figure out what was wrong with the formula 
>> values.  It turned out to be a problem in MathX.java and its 
>> implementation of the MOD function.  It incorrectly reported 
>> mod(13,12) to be 0.9999999999999991 instead of simply 1.  This threw 
>> off some of the VLOOKUP calls since the MODs were being used as the 
>> index of the VLOOKUP table.  I have submitted a patch including test 
>> case through Bugzilla:
>> https://issues.apache.org/bugzilla/show_bug.cgi?id=50033
>>
>> If additional information would be helpful, please let me know and 
>> I'll track it down.
>>
>> Thanks!
>>
>> On 9/30/2010 6:33 AM, Yegor Kozlov wrote:
>>> How big is the problem workbook? Can you tell us its size in MB, the
>>> number of sheets and the size of data matrix on each sheet. Every 
>>> little
>>> bit of information helps.
>>>
>>> Can you obfuscate sensitive data? Without a sample file we can't do 
>>> much
>>> to help you.
>>>
>>> I suspect the slowness may be related to the support for array formulas
>>> which was introduced right after 3.6-FINAL. Does your workbook contain
>>> array formulas? They look something like SUM(C11:C12*D11:D12).
>>> Evaluation of array formulas is not yet fully implemented and it
>>> explains why you got incorrect formula results.
>>>
>>> Indirect evaluation of defined names is unlikely to affect performance.
>>> It was an isolated fix in the implementation of INDIRECT.
>>>
>>> Try to save the file as .xls and run the same code against HSSF. How
>>> much is it faster?
>>>
>>> Regards,
>>> Yegor
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
For additional commands, e-mail: user-help@poi.apache.org


Mime
View raw message