GitHub user njayaram2 opened a pull request:
https://github.com/apache/incubator-madlib/pull/126
Bugfix: Elastic net gives inconsistent result
JIRA: MADLIB-1092
- Elastic net used to consider the number of rows as the total number
of rows in the table even when grouping was used. This fix changes
that to consider the number of rows in a group while computing IGD.
- Elastic net used to consider mean and standard deviation for both
independent and dependent variables based on the entire table even
when grouping was used. This is now computed based on a group,
which is used to computed the scaled data when standardize=TRUE
for Gaussian IGD.
- One approximation still remains. During gradient computation (C++),
every value in the independent variable (for each dimension) is
subtracted with the mean computed based on the entire table and
not groups. This approximiation was adopted since it is messy to
pass group specific mean values for every row in the table to the
C++ layer.
@iyerr3
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/njayaram2/incubator-madlib bugfix/elastic_net_grouping
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/incubator-madlib/pull/126.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #126
----
commit 92bbd3d08d457c5c7096aadf1403fc5e9df6ed7a
Author: Nandish Jayaram <njayaram@apache.org>
Date: 2017-04-24T16:46:03Z
Bugfix: Elastic net gives inconsistent result
JIRA: MADLIB-1092
- Elastic net used to consider the number of rows as the total number
of rows in the table even when grouping was used. This fix changes
that to consider the number of rows in a group while computing IGD.
- Elastic net used to consider mean and standard deviation for both
independent and dependent variables based on the entire table even
when grouping was used. This is now computed based on a group,
which is used to computed the scaled data when standardize=TRUE
for Gaussian IGD.
- One approximation still remains. During gradient computation (C++),
every value in the independent variable (for each dimension) is
subtracted with the mean computed based on the entire table and
not groups. This approximiation was adopted since it is messy to
pass group specific mean values for every row in the table to the
C++ layer.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---
|