[ https://issues.apache.org/jira/browse/SPARK20047?page=com.atlassian.jira.plugin.system.issuetabpanels:alltabpanel
]
DB Tsai updated SPARK20047:

Description:
For certain applications, such as stacked regressions, it is important to put nonnegative
constraints on the regression coefficients. Also, if the ranges of coefficients are known,
it makes sense to constrain the coefficient search space.
Fitting generalized constrained regression models object to Cβ ≤ b, where C ∈ R^{m×p}
and b ∈ R^{m} are predefined matrices and vectors which places a
set of m linear constraints on the coefficients is very challenging as discussed in many literatures.
However, for box constraints on the coefficients, the optimization is well solved. For gradient
descent, people can projected gradient descent in the primal by zeroing the negative weights
at each step. For LBFGS, an extended version of it, LBFGSB can handle large scale box optimization
efficiently. Unfortunately, for OWLQN, there is no good efficient way to do optimization with
box constrains.
As a result, in this work, we only implement constrained LR with box constrains without L1
regularization.
Note that since we standardize the data in training phase, so the coefficients seen in the
optimization subroutine are in the scaled space; as a result, we need to convert the box constrains
into scaled space.
Users will be able to set the lower / upper bounds of each coefficients and intercepts.
One solution could be to modify these implementations and do a Projected Gradient Descent
in the primal by zeroing the negative weights at each step. But this process is inconvenient
because the nice convergence properties are then lost.
> Constrained Logistic Regression
> 
>
> Key: SPARK20047
> URL: https://issues.apache.org/jira/browse/SPARK20047
> Project: Spark
> Issue Type: New Feature
> Components: MLlib
> Affects Versions: 2.1.0
> Reporter: DB Tsai
> Assignee: Yanbo Liang
>
> For certain applications, such as stacked regressions, it is important to put nonnegative
constraints on the regression coefficients. Also, if the ranges of coefficients are known,
it makes sense to constrain the coefficient search space.
> Fitting generalized constrained regression models object to Cβ ≤ b, where C ∈ R^{m×p}
and b ∈ R^{m} are predefined matrices and vectors which places a
> set of m linear constraints on the coefficients is very challenging as discussed in many
literatures.
> However, for box constraints on the coefficients, the optimization is well solved. For
gradient descent, people can projected gradient descent in the primal by zeroing the negative
weights at each step. For LBFGS, an extended version of it, LBFGSB can handle large scale
box optimization efficiently. Unfortunately, for OWLQN, there is no good efficient way to
do optimization with box constrains.
> As a result, in this work, we only implement constrained LR with box constrains without
L1 regularization.
> Note that since we standardize the data in training phase, so the coefficients seen in
the optimization subroutine are in the scaled space; as a result, we need to convert the box
constrains into scaled space.
> Users will be able to set the lower / upper bounds of each coefficients and intercepts.
>
> One solution could be to modify these implementations and do a Projected Gradient Descent
in the primal by zeroing the negative weights at each step. But this process is inconvenient
because the nice convergence properties are then lost.

This message was sent by Atlassian JIRA
(v6.3.15#6346)

To unsubscribe, email: issuesunsubscribe@spark.apache.org
For additional commands, email: issueshelp@spark.apache.org
