madlib-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From njayaram2 <...@git.apache.org>
Subject [GitHub] incubator-madlib pull request #126: Bugfix: Elastic net gives inconsistent r...
Date Wed, 26 Apr 2017 17:18:47 GMT
GitHub user njayaram2 opened a pull request:

    https://github.com/apache/incubator-madlib/pull/126

    Bugfix: Elastic net gives inconsistent result

    JIRA: MADLIB-1092
    
    - Elastic net used to consider the number of rows as the total number
    of rows in the table even when grouping was used. This fix changes
    that to consider the number of rows in a group while computing IGD.
    - Elastic net used to consider mean and standard deviation for both
    independent and dependent variables based on the entire table even
    when grouping was used. This is now computed based on a group,
    which is used to computed the scaled data when standardize=TRUE
    for Gaussian IGD.
    - One approximation still remains. During gradient computation (C++),
    every value in the independent variable (for each dimension) is
    subtracted with the mean computed based on the entire table and
    not groups. This approximiation was adopted since it is messy to
    pass group specific mean values for every row in the table to the
    C++ layer.
    
    @iyerr3 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/njayaram2/incubator-madlib bugfix/elastic_net_grouping

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-madlib/pull/126.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #126
    
----
commit 92bbd3d08d457c5c7096aadf1403fc5e9df6ed7a
Author: Nandish Jayaram <njayaram@apache.org>
Date:   2017-04-24T16:46:03Z

    Bugfix: Elastic net gives inconsistent result
    
    JIRA: MADLIB-1092
    
    - Elastic net used to consider the number of rows as the total number
    of rows in the table even when grouping was used. This fix changes
    that to consider the number of rows in a group while computing IGD.
    - Elastic net used to consider mean and standard deviation for both
    independent and dependent variables based on the entire table even
    when grouping was used. This is now computed based on a group,
    which is used to computed the scaled data when standardize=TRUE
    for Gaussian IGD.
    - One approximation still remains. During gradient computation (C++),
    every value in the independent variable (for each dimension) is
    subtracted with the mean computed based on the entire table and
    not groups. This approximiation was adopted since it is messy to
    pass group specific mean values for every row in the table to the
    C++ layer.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message