commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gilles Sadowski <gil...@harfang.homelinux.org>
Subject Re: [math] Logistic, Probit regerssion and Tolerance checks
Date Fri, 07 Sep 2012 15:48:12 GMT
Hi.

> 
> My name is Marios and I have very good
> academic background as well as I have worked as modeling analyst in big
> projects thus I have experience with prediction and optimization algorithms.
> 

Welcome to Commons Math's forum.
 
> 
> Recently (before 5 months) , I started
> learning JAVA and I have made my life much more simple by using Java and Common
> math rather than depending on the common packages (SAS SPSS etc). Obviously, I
> owe common math a lot.

That's good to read.
 
> 
> I have noticed that the site does not
> have logistic regression and probit regression, very commonly used in
> classification problems. Additionally, The math package does not provide a way
> to assess Tolerance (or VIF), very commonly used to avoid multi-colinearity
> issues and singular matrices in optimization algorithms, prior to running them.
> 
>  
> 
> I am willing to provide complete
> Logistic and Probit regression algorithms, optimizable by newton Raphson
> optimization maximum-likelihood method , in a very programmatically easy way
> (e.g  regression(double matrix [][],  double Target[], String
> Constant, double precision, double tolerance) , with academic references and
> very quick (3 secs for 60k set), with getter methods for all the common
> statistics such as null Deviance, Deviance, AIC, BIC, Chi-square f the model,
> betas, Wald statistics and p values, Cox_snell R square, Nagelkerke’s R-Square,
> Pseudo_r2, residuals, probabilities, classification matrix.

Such contributions would certainly be most welcome.

But care must be taken in how to fit those features into Commons Math. I mean
that the new implementations should be integrated in the API of similar
functionalities, if they currently exist.

IIUC, the proposal could be related to code currently in package
  org.apache.commons.math3.stat.clustering
and/or to the pending improvements suggested in this report:
  https://issues.apache.org/jira/browse/MATH-748

[By the way, I wonder whether "clustering" should really be under "stat",
rather than, say, "optimization" or a package of its own, one level up.]

In any case, it might be worth discussing here some design issues, before you
start adapting your code. At the same time, you should open tickets on the
bug tracking system:
  https://issues.apache.org/jira/browse/MATH
Preferably, there should be a general request for "New feature"; then
several "sub-issues" could be linked to that one, each referring to a
specific task (typically a class, with its unit tests).

> I have also included steps for checking
> tolerance so that we avoid cases that fail to converge. Generally the algorithm
> is not very expensive for the RAM (because I have approximated the Hessian
> Matrix) and the only external jar that I use is common math for multiplications
> of matrices.

Although the performance issue is certainly important, it is an
"implementation detail" that should not preempt a clear API (i.e. one that
reflects the mathematical concepts) and the reuse of existing classes (those
can be improved at the same time, if your proposal reveals that something is
lacking).


Thanks for your interest,
Gilles

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message