spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Seth Hendrickson (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (SPARK-17201) Investigate numerical instability for MLOR without regularization
Date Tue, 23 Aug 2016 22:25:20 GMT

    [ https://issues.apache.org/jira/browse/SPARK-17201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15433754#comment-15433754
] 

Seth Hendrickson edited comment on SPARK-17201 at 8/23/16 10:24 PM:
--------------------------------------------------------------------

Restating some of what was said on github:

_Concern is that for softmax regression without regularization, the Hessian becomes singular
and Newton methods can run into problems. Excerpt from this [link|http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression]:
"Thus, the minimizer of J(θ) is not unique. (Interestingly, J(θ) is still convex, and thus
gradient descent will not run into a local optima problems. But the Hessian is singular/non-invertible,
which causes a straightforward implementation of Newton's method to run into numerical problems.)"_

I looked into this. It is true that for softmax regression the Hessian is Symmetric positive
_semidefinite_, not symmetric positive definite. There is a good-enough proof of such [here|http://qwone.com/~jason/writing/convexLR.pdf].
Still consider the quote from the resources mentioned above "... which causes a *straightforward*
implementation of Newton's method to run into numerical problems." It's true the lack of positive
definiteness can be a problem for *naive* Newton methods, but LBFGS is not a straightforward
implementation - it does not use the Hessian directly, but it uses an approximation to the
Hessian. In fact, there are an abundance of resources showing that as long as the initial
Hessian approximation is symmetric positive definite, then the subsequent recursive updates
are also symmetric positive definite. From one resource: 

"H(-1)_(n + 1) is positive definite (psd) when H^(-1)_n is. Assuming our initial guess of
H0 is psd, it follows by induction each inverse Hessian estimate is as well. Since we can
choose any H^(-1)_0 we want, including the identity matrix, this is easy to ensure."

I appreciate other opinions on this to make sure I am understanding things correctly. Seems
like LBFGS will work fine even without regularization. Have we seen this problem in practice?
cc [~dbtsai] [~WeichenXu123]


was (Author: sethah):
Restating some of what was said on github:

_Concern is that for softmax regression without regularization, the Hessian becomes singular
and Newton methods can run into problems. Excerpt from this [link|http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression]:
"Thus, the minimizer of J(θ) is not unique. (Interestingly, J(θ) is still convex, and thus
gradient descent will not run into a local optima problems. But the Hessian is singular/non-invertible,
which causes a straightforward implementation of Newton's method to run into numerical problems.)"_

I looked into this. It is true that for softmax regression the Hessian is Symmetric positive
_semidefinite_, not symmetric positive definite. There is a good-enough proof of such [here|http://qwone.com/~jason/writing/convexLR.pdf].
Still consider the quote from the resources mentioned above "... which causes a *straightforward*
implementation of Newton's method to run into numerical problems." It's true the lack of positive
definiteness can be a problem for *naive* Newton methods, but LBFGS is not a straightforward
implementation - it does not use the Hessian directly, but it uses an approximation to the
Hessian. In fact, there are an abundance of resources showing that as long as the initial
Hessian approximation is symmetric positive definite, then the subsequent recursive updates
are also symmetric positive definite. From one resource: 

"H(-1)_(n + 1) is positive definite (psd) when H^(-1)_n is. Assuming our initial guess of
H0 is psd, it follows by induction each inverse Hessian estimate is as well. Since we can
choose any H^(-1)_0 we want, including the identity matrix, this is easy to ensure."

I appreciate other opinions on this to make sure I am understanding things correctly. Seems
like LBFGS will work fine even without regularization. cc [~dbtsai] [~WeichenXu123]

> Investigate numerical instability for MLOR without regularization
> -----------------------------------------------------------------
>
>                 Key: SPARK-17201
>                 URL: https://issues.apache.org/jira/browse/SPARK-17201
>             Project: Spark
>          Issue Type: Sub-task
>          Components: ML, MLlib
>            Reporter: Seth Hendrickson
>
> As mentioned [here|http://ufldl.stanford.edu/wiki/index.php/Softmax_Regression], when
no regularization is applied in Softmax regression, second order Newton solvers may run into
numerical instability problems. We should investigate this in practice and find a solution,
possibly by implementing pivoting when no regularization is applied.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message