spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "mike bowles (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-1673) GLMNET implementation in Spark
Date Thu, 26 Feb 2015 22:34:05 GMT

    [ https://issues.apache.org/jira/browse/SPARK-1673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14339320#comment-14339320
] 

mike bowles commented on SPARK-1673:
------------------------------------

Good discussion.  I can see how it might be faster to propagate an approximate path as a way
to provide good starting conditions for an accurate iteration.  to some extent the accuracy
of the glmnet path can be modulated by loosening the convergence criteria for the inner iteration
(the iteration done to find the new minimum after the penalty parameter is decremented). 


The big time sink is making passes through the data.  with glmnet regression the inner iterations
don't require making passes through the data so they are much less expensive than the steps
in the penalty parameter, which may provoke a pass through the data to deal with a new element
being added to the active list.  

It would be interesting to see what happens if the active set of coefficients was constrained
to change less frequently than the penalty parameter.  I have a hunch that it might take more
(inexpensive) inner iterations to converge when the coefficient were allowed to change, but
it would save passes through the data.  

It would be relatively easy for us to implement this in our code.  We can try only letting
the active set change every other or every third step in the penalty parameter and see how
much change it makes in the coefficient curves.  

Thanks for the idea.  

> GLMNET implementation in Spark
> ------------------------------
>
>                 Key: SPARK-1673
>                 URL: https://issues.apache.org/jira/browse/SPARK-1673
>             Project: Spark
>          Issue Type: New Feature
>          Components: MLlib
>            Reporter: Sung Chung
>
> This is a Spark implementation of GLMNET by Jerome Friedman, Trevor Hastie, Rob Tibshirani.
> http://www.jstatsoft.org/v33/i01/paper
> It's a straightforward implementation of the Coordinate-Descent based L1/L2 regularized
linear models, including Linear/Logistic/Multinomial regressions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message