commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phil Steitz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MATH-449) Storeless covariance
Date Mon, 22 Aug 2011 20:50:29 GMT

    [ https://issues.apache.org/jira/browse/MATH-449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088996#comment-13088996
] 

Phil Steitz commented on MATH-449:
----------------------------------

Good point on the stored data version.  This is really our first foray into meaningful management
of missing data and now is  a great time to start dealing with it.  In the correlation package,
at this point, we can fairly easily support either or both casewise or pairwise "deletion"
so it is probably best to make it configurable. Also, we need to agree on and advertise the
fact that NaNs should be used to signal missing data.  Lets start by implementing things this
way in the new storeless covariance classes and then open new tickets to add support for missing
data in first the rest of the correlation package and then regression.

One thing that is bugging me a little is convincing myself that if we allow pairwise deletion,
the covariance matrix will be legitimate (i.e. have all of the analytical properties associated
with a cov matrix).  Also, are there negative implications that I have not thought about to
using NaNs to signal missing data.   

> Storeless covariance
> --------------------
>
>                 Key: MATH-449
>                 URL: https://issues.apache.org/jira/browse/MATH-449
>             Project: Commons Math
>          Issue Type: Improvement
>            Reporter: Patrick Meyer
>            Assignee: Phil Steitz
>             Fix For: 3.1
>
>         Attachments: MATH-449.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> Currently there is no storeless version for computing the covariance. However, Pebay
(2008) describes algorithms for on-line covariance computations, [http://infoserve.sandia.gov/sand_doc/2008/086212.pdf].
I have provided a simple class for implementing this algorithm. It would be nice to have this
integrated into org.apache.commons.math.stat.correlation.Covariance.
> {code}
> //This code is granted for inclusion in the Apache Commons under the terms of the ASL.
> public class StorelessCovariance{
>     private double deltaX = 0.0;
>     private double deltaY = 0.0;
>     private double meanX = 0.0;
>     private double meanY = 0.0;
>     private double N=0;
>     private Double covarianceNumerator=0.0;
>     private boolean unbiased=true;
>     public Covariance(boolean unbiased){
> 	this.unbiased = unbiased;
>     }
>     public void increment(Double x, Double y){
>         if(x!=null & y!=null){
>             N++;
>             deltaX = x - meanX;
>             deltaY = y - meanY;
>             meanX += deltaX/N;
>             meanY += deltaY/N;
>             covarianceNumerator += ((N-1.0)/N)*deltaX*deltaY;
>         }
>         
>     }
>     public Double getResult(){
>         if(unbiased){
>             return covarianceNumerator/(N-1.0);
>         }else{
>             return covarianceNumerator/N;
>         }
>     }   
> }
> {code}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message