spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Seth Hendrickson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-10078) Vector-free L-BFGS
Date Sun, 08 Jan 2017 20:52:58 GMT

    [ https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15810000#comment-15810000
] 

Seth Hendrickson commented on SPARK-10078:
------------------------------------------

As a part of [SPARK-17136|https://issues.apache.org/jira/browse/SPARK-17136] I am working
on a generic optimization interface for Spark, which would allow users to easily plug in their
own optimizers in place of built-in ones. Because of this, I have also been looking into how
we can create an interface that allows optimization with both local and distributed vector
types in a single interface. I have a branch that I have been doing some prototyping on [here|https://github.com/sethah/spark/tree/spark-vlbfgs].
Actually, I was able to get Yanbo's VLogisticRegression class working (on a very small dataset)
using the VLBFGS optimizer in my branch, which also works with local vector types. Maybe you
can let me know if this lines up at all with what you were thinking? 

Thinking about this interface without adding VL-BFGS, we can avoid any code duplication with
Breeze to start because we can simply plug in the Breeze code to our abstraction (in my branch,
that is what is done for LBFGS and OWLQN). Adding VL-BFGS is a bit trickier.

The problems I see are that we need an abstraction that will allow us to persist and unpersist
the parameter vectors during optimization as needed. Adding "persist" and "unpersist" methods
to a vector space, for example, seems a leaky abstraction. It might make sense to add this
to Breeze itself if we can avoid leaking RDD details into the interface. However, one benefit
of SPARK-17136 is that we could potentially eliminate our dependence on Breeze in the future.
I think it might make sense to implement our own VL-BFGS interface, even if there is some
duplication. Actually, I think this is part of an important discussion that will happen as
part of the optimization interface design. I hope to post a detailed design document for that
JIRA sometime in the next few days.

Finally, can you provide more detail on your proposed changes to DiffFunction? DiffFunction
in Breeze is already abstract in it's parameter type...

> Vector-free L-BFGS
> ------------------
>
>                 Key: SPARK-10078
>                 URL: https://issues.apache.org/jira/browse/SPARK-10078
>             Project: Spark
>          Issue Type: New Feature
>          Components: ML
>            Reporter: Xiangrui Meng
>            Assignee: Yanbo Liang
>
> This is to implement a scalable version of vector-free L-BFGS (http://papers.nips.cc/paper/5333-large-scale-l-bfgs-using-mapreduce.pdf).
> Design document:
> https://docs.google.com/document/d/1VGKxhg-D-6-vZGUAZ93l3ze2f3LBvTjfHRFVpX68kaw/edit?usp=sharing



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message