flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Trevor Grant <trevor.d.gr...@gmail.com>
Subject Re: A whole bag of ML issues
Date Tue, 29 Mar 2016 20:46:14 GMT
I was thinking that all IterativeSolvers would benefit from a setOptimizer
method. I didn't realize you had been working on GLM.  If that is the case
(which I think is wise) then feel free to put a setOptimizer in GLM, I'll
leave it in my NeuralNetworks, and lets just try to have some consistency
in the APIs... specifically- setOptimizer is a method that takes... an
optimizer.  We can default to whatever is most appropriate for each
learning algorithm.



Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org

*"Fortunate is he, who is able to know the causes of things."  -Virgil*


On Tue, Mar 29, 2016 at 3:26 PM, Theodore Vasiloudis <
theodoros.vasiloudis@gmail.com> wrote:

> > Adding a setOptimizer to IterativeSolver.
>
> Do you mean MLR here? IterativeSolver is implemented by different solvers,
> I don't think adding a method like this makes sense there.
>
> In the case of MLR a better alternative that includes a bit more work is to
> create a Generalized Linear Model framework that provides
> implementations for the most common linear models (ridge, lasso etc.) I had
> already started work on this here
> <https://github.com/thvasilo/flink/commits/glm>, but never got around
> to opening a PR. The relevant JIRA is here
> <https://issues.apache.org/jira/browse/FLINK-2013>. Having a setOptimizer
> method in GeneralizedLinearModel (with some restrictions/warnings
> regarding choice of optimizer and regularization) would be the preferred
> option for me at least.
>
> Other than that the list looks fine :)
>
> On Tue, Mar 29, 2016 at 9:32 PM, Trevor Grant <trevor.d.grant@gmail.com>
> wrote:
>
> > OK, I'm trying to respond to you and Till in one thread so someone call
> me
> > out if I missed a point but here goes:
> >
> > SGD Predicting Vectors :  There was discussion in the past regarding
> this-
> > at the time it was decided to go with only Doubles for simplicity. I feel
> > strongly that there is cause now for predicting vectors.  This should be
> a
> > separate PR.  I'll open an issue, we can refer to earlier mailing list
> and
> > reopen discussion on best way to proceed
> >
> > Warm Starts : Basically all that needs to be done here is for the
> iterative
> > solver to keep track of what iteration it is on, and start from that
> > iteration is WarmStart == True, then go another N iterations.  I don't
> > think savepoints solves this because of the way stepsizes are calculated
> in
> > SGD, though I don't know enough about savepoints to say for sure.  As
> Till
> > said, and I agree, very simple fix.  Use cases: Testing how new features
> > (e.g. stepsizes) increase / decrease convergence, e.g. fit a model in
> 1000
> > data point bursts and measure the error, see how it decreases as time
> goes
> > on. Also, model updates. E.g. I have a huge model that gets trained on a
> > year of data and takes a day or two to do so, but after that I just want
> to
> > update it nightly with the data from the last 24 hours, or at the
> extreme-
> > online learning, e.g. every new data point updates the model.
> >
> > Model Grading Metrics:  I'll chime in on the PR you mentioned.
> >
> > Weight Arrays vs. Weight Vectors:  Winding/unwinding arrays of matricies
> > into vectors it best done inside of methods that need such functionality
> > seems to be the concensus. I'm ok with that, as I have such things
> working
> > rather elegantly, but wanted to throw it out there anyway.
> >
> > BLAS ops for matrices:  I'll take care of this in my code.
> >
> > adding a 'setOptimizer' parameter to IterativeSolver: Theodore deferred
> to
> > Till, Till said open a PR.  I'll make the default SimpleSGD to maintain
> > backwards compatibility
> >
> > New issues to create:
> > [  ] Optimizer to predict vectors or Doubles and maintain backwards
> > compatibility.
> > [  ] Warm Start Functionality
> > [  ] setOptimizer to Iterative Solver, with default to SimpleSGD.
> > [  ] Add neuralnets package to FlinkML (Multilayer perceptron is first
> > iteration, other flavors to follow).
> >
> > Let me know if I missed anything.  I'm guessing you guys are done for the
> > day so I'll wait until tomorrow night my time (Chicago) before a I move
> > ahead on anything, to give you a chance to respond.
> >
> > Thanks!
> > tg
> >
> >
> > Trevor Grant
> > Data Scientist
> > https://github.com/rawkintrevo
> > http://stackexchange.com/users/3002022/rawkintrevo
> > http://trevorgrant.org
> >
> > *"Fortunate is he, who is able to know the causes of things."  -Virgil*
> >
> >
> > On Tue, Mar 29, 2016 at 4:11 AM, Theodore Vasiloudis <
> > theodoros.vasiloudis@gmail.com> wrote:
> >
> > > Hello Trevor,
> > >
> > > These are indeed a lot of issues, let's see if we can fit the
> discussion
> > > for all of them
> > > in one thread.
> > >
> > > I'll add some comments inline.
> > >
> > > - Expand SGD to allow for predicting vectors instead of just Doubles.
> > >
> > >
> > > We have discussed this in the past and at that point decided that it
> > didn't
> > > make
> > > sense to change the base SGD implementation to accommodate vectors.
> > > The alternatives that were presented at the time were to abstract away
> > > the type of the input/output in the Optimizer (allowing for both
> Vectors
> > > and Doubles),
> > > or to create specialized classes for each case. That also gives us
> > greater
> > > flexibility
> > > in terms of optimizing performance.
> > >
> > > In terms of the ANN, I think you can hide away the Vectors in the
> > > implementation of the ANN
> > > model, and use the Optimizer interface as is, like A. Ulanov did with
> the
> > > Spark
> > > ANN
> > > <https://github.com/apache/spark/pull/7621/files>
> > > implementation <https://github.com/apache/spark/pull/7621/files>.
> > >
> > > - Allow for 'warm starts'
> > >
> > >
> > > I like the idea of having a partiFit-like function, could you present a
> > > couple
> > > of use cases where we might use it? I'm wondering if savepoints already
> > > cover
> > > this functionality.
> > >
> > > - A library of model grading metrics.
> > > >
> > >
> > > We have a (perpetually) open PR <
> > https://github.com/apache/flink/pull/871>
> > > for an evaluation framework. Could you
> > > expand on "Having 'calculate RSquare' as a built in method for every
> > > regressor
> > > doesn't seem like an efficient way to do this long term."
> > >
> > > -BLAS for matrix ops (this was talked about earlier)
> > >
> > >
> > > This will be a good addition. If they are specific to the ANN
> > > implementation
> > > however I would hide them away from the rest of the code (and include
> in
> > > that PR
> > > only) until another usecase comes up.
> > >
> > > - A neural net has Arrays of matrices of weights (instead of just a
> > > vector).
> > > >
> > >
> > > Yes this is probably not the most efficient way to do this, but it's
> the
> > > "least
> > > API breaking" I'm afraid.
> > >
> > > - The linear regression implementation currently presumes it will be
> > using
> > > > SGD but I think that should be 'settable' as a parameter
> > > >
> > >
> > > The original Optimizer was written the way you described, but we
> changed
> > it
> > > later IIRC to make it more accessible (e.g. for users that don't know
> > that
> > > you can't match L1 regularization with L-BFGS). Maybe Till can say more
> > > about the other reasons this was changed.
> > >
> > >
> > > On Mon, Mar 28, 2016 at 8:01 PM, Trevor Grant <
> trevor.d.grant@gmail.com>
> > > wrote:
> > >
> > > > Hey,
> > > >
> > > > I have a working prototype of an multi layer perceptron
> implementation
> > > > working in Flink.
> > > >
> > > > I made every possible effort to utilize existing code when possible.
> > > >
> > > > In the process of doing this there were some hacks I want/need, and
> > think
> > > > this should be broken up into multiple PRs and possible abstract out
> > the
> > > > whole thing because the MLP implementation I came up with is itself
> > > > designed to be extendable to Long Short Term Memory Networks.
> > > >
> > > > Top level here are some of the sub PRs
> > > >
> > > > - Expand SGD to allow for predicting vectors instead of just Doubles.
> > > This
> > > > allows the same NN code (and other algos) to be used for
> > classification,
> > > > transformations, and regressions.
> > > >
> > > > - Allow for 'warm starts' -> this requires adding a parameter to
> > > > IterativeSolver that basically starts on iteration N.  This is
> somewhat
> > > > akin to the idea of partial fits in sklearn OR making the iterative
> > > solver
> > > > have some sort of internal counter and then when you call 'fit' it
> just
> > > > runs another N iterations (which is set by SetIterations) instead of
> > > > assuming it is back to zero.  This might seem trivial but has
> > significant
> > > > impact on step size calculations.
> > > >
> > > > - A library of model grading metrics. Having 'calculate RSquare' as a
> > > built
> > > > in method for every regressor doesn't seem like an efficient way to
> do
> > > this
> > > > long term.
> > > >
> > > > -BLAS for matrix ops (this was talked about earlier)
> > > >
> > > > - A neural net has Arrays of matrices of weights (instead of just a
> > > > vector).  Currently I flatten the array of matrices out into a weight
> > > > vector and reassemble it into an array of matrices, though this is
> > > probably
> > > > not super effecient.
> > > >
> > > > - The linear regression implementation currently presumes it will be
> > > using
> > > > SGD but I think that should be 'settable' as a parameter, because if
> > not-
> > > > why do we have all of those other nice SGD methods just hanging out?
> > > > Similarly the loss function / partial loss is hard coded.  I
> reccomend
> > > > making the current setup the 'defaults' of a 'setOptimizer' method.
> > I.e.
> > > > if you want to just run a MLR you can do it based on the examples,
> but
> > if
> > > > you want to use a fancy optimizer you can create it from existing
> > > methods,
> > > > or make your own, then call something like `mlr.setOptimizer(
> > myOptimizer
> > > > )`
> > > >
> > > > - and more
> > > >
> > > > At any rate- if some people could weigh in / direct me how to proceed
> > > that
> > > > would be swell.
> > > >
> > > > Thanks!
> > > > tg
> > > >
> > > >
> > > >
> > > >
> > > > Trevor Grant
> > > > Data Scientist
> > > > https://github.com/rawkintrevo
> > > > http://stackexchange.com/users/3002022/rawkintrevo
> > > > http://trevorgrant.org
> > > >
> > > > *"Fortunate is he, who is able to know the causes of things."
> -Virgil*
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message