openoffice-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Regina Henschel <rb.hensc...@t-online.de>
Subject Re: Project idea: Calc for Statistics
Date Thu, 06 Dec 2012 19:09:31 GMT
Hi Pedro,

Pedro Giffuni schrieb:
> Hi guys;
>
> FWIW, while I was playing with the new random number generator I went
> around looking for some references and I found this paper from the Journal
> of Statistical Software (2010) titled "On the Numerical Accuracy of
> Spreadsheets":
>
> http://www.jstatsoft.org/v34/i04/paper
>
>
> It basically shows that Calc, among other Spreadsheet programs, is not
> really well suited for statistical analysis.

They use an old version of Calc. In the meantime Calc has got a lot of 
accuracy improvements. And the new implementations in Excel 2010 are far 
more accurate than the old ones. The special results of the paper are 
outdated. Of cause the general problem of using spreadsheets for data 
exploration remains.

>
> Something rather amazing is that the major statistic suites have been moving
> towards a more "spreadsheet-like" environment. I am personally a fan of
> Minitab as it brings many functions that I needed for Quality control in a
> previous job. The price of the software package sky-rocketed in few years
> though :(.

I'm not familiar with special statistical software. One problem with 
Calc is, that users do not how to use the functions in Calc for they 
purpose, for example making an ANOVA. So providing wizards would be 
helpful.

>
> One approach could be improving our local functions to match more
> demanding specifications: some of that will necessarily have to be done.
> Another approach could be facilitating interactions with software like R,

https://issues.apache.org/ooo/show_bug.cgi?id=66589

>
> and I am aware that approach has many followers. A third approach, which
> I would like to suggest as a future project, would be developing a scaddin
> focused on statistics and making full use of the functions from boost that
> we already have available as a module but we are not using to their full
> extent.

I know that Calc is really inaccurate in some corner cases and a 
comparison with the solutions from boost would be good. One problem is, 
that Calc is limited to double precision because of the MSCV compiler. 
As far as I know, boost uses own types to get better precision.

>
> I know we are all busy with other stuff to improve for 4.0 Release, just
> thought I'd leave the idea for the future.

I had done a lot for statistical functions under the mentor-ship of Eike 
in the past, but now I'm more interested in Draw.

Some problems, which need to be solved are:
- Adapt FDIST, FINV,  and TDIST to ODF
- New algorithm needed in ScInterpreter::GetBetaDist, see "FIXME" there
- Better detection of singular matrices
- Change the LINEST function to check for collinearity (Excel compatibility)

Kind regards
Regina






Mime
View raw message