spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dbtsai <>
Subject [GitHub] spark pull request: [SPARK-5253] [ML] LinearRegression with L1/L2 ...
Date Thu, 26 Mar 2015 21:51:23 GMT
Github user dbtsai commented on the pull request:
    @jkbradley I think we should only support basic regularization in first which
is what python scikit-learn does. If users have the need of different type of regularization,
they can implement it based on the code we have. 
    It will be hard to implement GeneralizedLinearAlgorithm with regularization without using
a lot of if-else statement to handle the special case. I implemented logistic regression,
linear regression, and cox proportional-hazards regression with elasticnet regularization
at Alpine, and our customers are asking for precise accuracy compared with R's glmnet package.
As a result, I spent some time to research the original R's glmnet code, and I found that
there is no generic way to handle different linear models. There are special cases here and
    For example, in logistic regression, the intercept is computed by adding extra one dimension
in the data with constant one, but in linear regression, the intercept is computed by `val
intercept = yMean - dot(weights, scaler.mean)`.
    As a result, I would like to implement them separately and make sure we have the same
accuracy compared with R with proper tests first, and then we can abstract out the common
part. I have another PR trying to do this, #1518 and I will continuous on that after this
PR is merged.

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message