commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <phil.ste...@gmail.com>
Subject Re: [math] Logistic, Probit regerssion and Tolerance checks
Date Fri, 07 Sep 2012 18:18:40 GMT
On 9/7/12 9:22 AM, marios michaelidis wrote:
> Hi Giles,
> I will start exproling the links you gave me.
> I would suggest Logistic/probit regression to go under the regerssion package. 

Yes, these should go in regression.  Thanks in advance for your
contribution!

Phil
> Not that clustering is really any different, but it makes sense to find logistic "regerssion"
in the a package named as such.
> Regards
> Marios
>
>> Date: Fri, 7 Sep 2012 17:48:12 +0200
>> From: gilles@harfang.homelinux.org
>> To: dev@commons.apache.org
>> Subject: Re: [math] Logistic, Probit regerssion and Tolerance checks
>>
>> Hi.
>>
>>> My name is Marios and I have very good
>>> academic background as well as I have worked as modeling analyst in big
>>> projects thus I have experience with prediction and optimization algorithms.
>>>
>> Welcome to Commons Math's forum.
>>  
>>> Recently (before 5 months) , I started
>>> learning JAVA and I have made my life much more simple by using Java and Common
>>> math rather than depending on the common packages (SAS SPSS etc). Obviously,
I
>>> owe common math a lot.
>> That's good to read.
>>  
>>> I have noticed that the site does not
>>> have logistic regression and probit regression, very commonly used in
>>> classification problems. Additionally, The math package does not provide a way
>>> to assess Tolerance (or VIF), very commonly used to avoid multi-colinearity
>>> issues and singular matrices in optimization algorithms, prior to running them.
>>>
>>>  
>>>
>>> I am willing to provide complete
>>> Logistic and Probit regression algorithms, optimizable by newton Raphson
>>> optimization maximum-likelihood method , in a very programmatically easy way
>>> (e.g  regression(double matrix [][],  double Target[], String
>>> Constant, double precision, double tolerance) , with academic references and
>>> very quick (3 secs for 60k set), with getter methods for all the common
>>> statistics such as null Deviance, Deviance, AIC, BIC, Chi-square f the model,
>>> betas, Wald statistics and p values, Cox_snell R square, Nagelkerke’s R-Square,
>>> Pseudo_r2, residuals, probabilities, classification matrix.
>> Such contributions would certainly be most welcome.
>>
>> But care must be taken in how to fit those features into Commons Math. I mean
>> that the new implementations should be integrated in the API of similar
>> functionalities, if they currently exist.
>>
>> IIUC, the proposal could be related to code currently in package
>>   org.apache.commons.math3.stat.clustering
>> and/or to the pending improvements suggested in this report:
>>   https://issues.apache.org/jira/browse/MATH-748
>>
>> [By the way, I wonder whether "clustering" should really be under "stat",
>> rather than, say, "optimization" or a package of its own, one level up.]
>>
>> In any case, it might be worth discussing here some design issues, before you
>> start adapting your code. At the same time, you should open tickets on the
>> bug tracking system:
>>   https://issues.apache.org/jira/browse/MATH
>> Preferably, there should be a general request for "New feature"; then
>> several "sub-issues" could be linked to that one, each referring to a
>> specific task (typically a class, with its unit tests).
>>
>>> I have also included steps for checking
>>> tolerance so that we avoid cases that fail to converge. Generally the algorithm
>>> is not very expensive for the RAM (because I have approximated the Hessian
>>> Matrix) and the only external jar that I use is common math for multiplications
>>> of matrices.
>> Although the performance issue is certainly important, it is an
>> "implementation detail" that should not preempt a clear API (i.e. one that
>> reflects the mathematical concepts) and the reuse of existing classes (those
>> can be improved at the same time, if your proposal reveals that something is
>> lacking).
>>
>>
>> Thanks for your interest,
>> Gilles
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
>> For additional commands, e-mail: dev-help@commons.apache.org
>>
>  		 	   		  


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message