spark-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dbtsai <...@git.apache.org>
Subject [GitHub] spark pull request: [SPARK-8700][ML] Disable feature scaling in Lo...
Date Mon, 29 Jun 2015 08:08:14 GMT
GitHub user dbtsai opened a pull request:

    https://github.com/apache/spark/pull/7080

    [SPARK-8700][ML] Disable feature scaling in Logistic Regression

    All compressed sensing applications, and some of the regression use-cases will have better
result by turning the feature scaling off. However, if we implement this naively by training
the dataset without doing any standardization, the rate of convergency will not be good. This
can be implemented by still standardizing the training dataset but we penalize each component
differently to get effectively the same objective function but a better numerical problem.
As a result, for those columns with high variances, they will be penalized less, and vice
versa. Without this, since all the features are standardized, so they will be penalized the
same.
    
    In R, there is an option for this.
    `standardize`	
    Logical flag for x variable standardization, prior to fitting the model sequence. The
coefficients are always returned on the original scale. Default is standardize=TRUE. If variables
are in the same units already, you might not wish to standardize. See details below for y
standardization with family="gaussian".
    
    +cc @holdenk @mengxr @jkbradley 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/dbtsai/spark lors

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/7080.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #7080
    
----
commit 588c75f714372b6da4dd20fa7d006afe399fa8e2
Author: DB Tsai <dbt@netflix.com>
Date:   2015-06-24T01:06:03Z

    first commit

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


Mime
View raw message