poi-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Law <david....@apconsult.de>
Subject Re: POI 3.7 beta 2 "new XSSFWorkbook" very slow (MOD function)
Date Tue, 05 Oct 2010 16:25:22 GMT
  Hi Yegor,

I think I probably didn't express myself quite clearly there...
...I should have put that "right" in quotes, meaning
that OpenOffice gets the /same/ result as Excel.

Its only "right" in the sense that its the same as Excel,
but the Excel result may well be "wrong".

But, right or wrong; I was really just trying to ask the question:
"shouldn't poi be trying to get the same result as Excel too?"

Aside:
/"Confused? You won't be, after this week's episode of...Soap"
/There, showing my age!!  Sorry, couldn't resist that. :-)

Best regards,
DaveLaw

On 05.10.2010 17:29, Yegor Kozlov wrote:
>  Different products address rounding errors differently. The question 
> is what to consider the "correct" result.
>
> Consider the following formula:
>     =(0.5-0.4-0.1)
>
> You may think it evaluates to 0.00 but it does not!
>
> MS Office and POI evaluate it as -2.78E-17 and this is a correct 
> result for 'strict' floating-point math.
>
> However, OpenOffice and Google Spreadsheet evaluate it to 0.00.
>
> The equivalent formula SUM(0.5,-0.4,-0.1) results in the same values: 
> -2.78E-17 in MS Office and 0.00 in OpenOffice.
>
> Some information about Excel's floating-point arithmetic can found at 
> http://support.microsoft.com/kb/78113 and 
> http://support.microsoft.com/kb/214118
>
> Yegor
>
>> Hi,
>>
>> sorry its taken so long to post this, I've been "mulling over"...
>> ...and I see the patch has been applied.
>>
>> As Yegor mentions in the Bug report...
>> mod(-3.4,2.0) being 0.60...01
>> ...is due to binary arithmetic.
>>
>> Now I just wondered how OpenOffice got it right and stumbled over
>> some code to round to 15 digits in what seems to be the mod function.
>>
>> Also, surely the aim here is to produce the same result as Excel,
>> which I tried playing with a bit & seems to round an awful lot of
>> things to just 14 digits, although it seems to be able to deal with
>> values MUCH smaller than that.
>>
>> Anyway, this was just intended to get a bit of discussion going on
>> the matter, if there's anything to discuss that is.  :-)
>>
>> Best regards,
>> DaveLaw
>>
>> Ryan Skow wrote:
>>>
>>> Converting to XLS results in the workbook being opened in less than 
>>> a second using POI 3.7b3 so that is probably what I'll do as a 
>>> workaround.  Thank you very much for that idea.
>>>
>>> To hopefully help get this problem solved for others and for future 
>>> reference, I have included answers to the rest of the questions that 
>>> were asked.
>>>
>>> General information related to the conversion and file size:
>>> POI Version  File Format  File Size(bytes)  Open Time(seconds)
>>> 3.6             xlsx        1,501,533               28
>>> 3.6             xls         4,951,040                1
>>> 3.7b3           xlsx        1,501,533              600
>>> 3.7b3           xls         4,951,040                1
>>>
>>> The workbook contains 8 sheets and the (columns,rows) are as follows:
>>> Sheet 0: (19,123) (mostly raw numeric/text values - inputs)
>>> Sheet 1: (13,124) (almost all formulas)
>>> Sheet 2: (33,1459) (almost all formulas)
>>> Sheet 3: (20,1459) (almost all formulas)
>>> Sheet 4: (19,131) (almost all formulas)
>>> Sheet 5: (28, 775) (almost all raw numeric values)
>>> Sheet 6: (11, 109) (almost all raw numeric values)
>>> Sheet 7: (33, 89) (mostly tables used in VLOOKUP)
>>>
>>> Attributes of the Workbook:
>>>   * There are LOTS of VLOOKUP function calls.
>>>   * There are LOTS regular multiply/divide/add/subtract which use 
>>> back referenced cells in their calculations.
>>>   * There are LOTS of nested cell references.
>>>   * There are quite a few 'named' cells which are referred to 
>>> throughout the various calculations.
>>>   * There are NO references to the 'SUM' function or other array 
>>> related calculations from what I can tell.
>>>
>>> You are correct that the performance issue has nothing to do with 
>>> INDIRECT - I removed those references and the timings were still the 
>>> same.
>>>
>>> As a side note, I did figure out what was wrong with the formula 
>>> values.  It turned out to be a problem in MathX.java and its 
>>> implementation of the MOD function.  It incorrectly reported 
>>> mod(13,12) to be 0.9999999999999991 instead of simply 1.  This threw 
>>> off some of the VLOOKUP calls since the MODs were being used as the 
>>> index of the VLOOKUP table.  I have submitted a patch including test 
>>> case through Bugzilla:
>>> https://issues.apache.org/bugzilla/show_bug.cgi?id=50033
>>>
>>> If additional information would be helpful, please let me know and 
>>> I'll track it down.
>>>
>>> Thanks!
>>>
>>> On 9/30/2010 6:33 AM, Yegor Kozlov wrote:
>>>> How big is the problem workbook? Can you tell us its size in MB, the
>>>> number of sheets and the size of data matrix on each sheet. Every 
>>>> little
>>>> bit of information helps.
>>>>
>>>> Can you obfuscate sensitive data? Without a sample file we can't do 
>>>> much
>>>> to help you.
>>>>
>>>> I suspect the slowness may be related to the support for array 
>>>> formulas
>>>> which was introduced right after 3.6-FINAL. Does your workbook contain
>>>> array formulas? They look something like SUM(C11:C12*D11:D12).
>>>> Evaluation of array formulas is not yet fully implemented and it
>>>> explains why you got incorrect formula results.
>>>>
>>>> Indirect evaluation of defined names is unlikely to affect 
>>>> performance.
>>>> It was an isolated fix in the implementation of INDIRECT.
>>>>
>>>> Try to save the file as .xls and run the same code against HSSF. How
>>>> much is it faster?
>>>>
>>>> Regards,
>>>> Yegor
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>>> For additional commands, e-mail: user-help@poi.apache.org
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
>> For additional commands, e-mail: user-help@poi.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@poi.apache.org
> For additional commands, e-mail: user-help@poi.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message