OK, I'm trying to respond to you and Till in one thread so someone call me
out if I missed a point but here goes:
SGD Predicting Vectors : There was discussion in the past regarding this
at the time it was decided to go with only Doubles for simplicity. I feel
strongly that there is cause now for predicting vectors. This should be a
separate PR. I'll open an issue, we can refer to earlier mailing list and
reopen discussion on best way to proceed
Warm Starts : Basically all that needs to be done here is for the iterative
solver to keep track of what iteration it is on, and start from that
iteration is WarmStart == True, then go another N iterations. I don't
think savepoints solves this because of the way stepsizes are calculated in
SGD, though I don't know enough about savepoints to say for sure. As Till
said, and I agree, very simple fix. Use cases: Testing how new features
(e.g. stepsizes) increase / decrease convergence, e.g. fit a model in 1000
data point bursts and measure the error, see how it decreases as time goes
on. Also, model updates. E.g. I have a huge model that gets trained on a
year of data and takes a day or two to do so, but after that I just want to
update it nightly with the data from the last 24 hours, or at the extreme
online learning, e.g. every new data point updates the model.
Model Grading Metrics: I'll chime in on the PR you mentioned.
Weight Arrays vs. Weight Vectors: Winding/unwinding arrays of matricies
into vectors it best done inside of methods that need such functionality
seems to be the concensus. I'm ok with that, as I have such things working
rather elegantly, but wanted to throw it out there anyway.
BLAS ops for matrices: I'll take care of this in my code.
adding a 'setOptimizer' parameter to IterativeSolver: Theodore deferred to
Till, Till said open a PR. I'll make the default SimpleSGD to maintain
backwards compatibility
New issues to create:
[ ] Optimizer to predict vectors or Doubles and maintain backwards
compatibility.
[ ] Warm Start Functionality
[ ] setOptimizer to Iterative Solver, with default to SimpleSGD.
[ ] Add neuralnets package to FlinkML (Multilayer perceptron is first
iteration, other flavors to follow).
Let me know if I missed anything. I'm guessing you guys are done for the
day so I'll wait until tomorrow night my time (Chicago) before a I move
ahead on anything, to give you a chance to respond.
Thanks!
tg
Trevor Grant
Data Scientist
https://github.com/rawkintrevo
http://stackexchange.com/users/3002022/rawkintrevo
http://trevorgrant.org
*"Fortunate is he, who is able to know the causes of things." Virgil*
On Tue, Mar 29, 2016 at 4:11 AM, Theodore Vasiloudis <
theodoros.vasiloudis@gmail.com> wrote:
> Hello Trevor,
>
> These are indeed a lot of issues, let's see if we can fit the discussion
> for all of them
> in one thread.
>
> I'll add some comments inline.
>
>  Expand SGD to allow for predicting vectors instead of just Doubles.
>
>
> We have discussed this in the past and at that point decided that it didn't
> make
> sense to change the base SGD implementation to accommodate vectors.
> The alternatives that were presented at the time were to abstract away
> the type of the input/output in the Optimizer (allowing for both Vectors
> and Doubles),
> or to create specialized classes for each case. That also gives us greater
> flexibility
> in terms of optimizing performance.
>
> In terms of the ANN, I think you can hide away the Vectors in the
> implementation of the ANN
> model, and use the Optimizer interface as is, like A. Ulanov did with the
> Spark
> ANN
> <https://github.com/apache/spark/pull/7621/files>
> implementation <https://github.com/apache/spark/pull/7621/files>.
>
>  Allow for 'warm starts'
>
>
> I like the idea of having a partiFitlike function, could you present a
> couple
> of use cases where we might use it? I'm wondering if savepoints already
> cover
> this functionality.
>
>  A library of model grading metrics.
> >
>
> We have a (perpetually) open PR <https://github.com/apache/flink/pull/871>
> for an evaluation framework. Could you
> expand on "Having 'calculate RSquare' as a built in method for every
> regressor
> doesn't seem like an efficient way to do this long term."
>
> BLAS for matrix ops (this was talked about earlier)
>
>
> This will be a good addition. If they are specific to the ANN
> implementation
> however I would hide them away from the rest of the code (and include in
> that PR
> only) until another usecase comes up.
>
>  A neural net has Arrays of matrices of weights (instead of just a
> vector).
> >
>
> Yes this is probably not the most efficient way to do this, but it's the
> "least
> API breaking" I'm afraid.
>
>  The linear regression implementation currently presumes it will be using
> > SGD but I think that should be 'settable' as a parameter
> >
>
> The original Optimizer was written the way you described, but we changed it
> later IIRC to make it more accessible (e.g. for users that don't know that
> you can't match L1 regularization with LBFGS). Maybe Till can say more
> about the other reasons this was changed.
>
>
> On Mon, Mar 28, 2016 at 8:01 PM, Trevor Grant <trevor.d.grant@gmail.com>
> wrote:
>
> > Hey,
> >
> > I have a working prototype of an multi layer perceptron implementation
> > working in Flink.
> >
> > I made every possible effort to utilize existing code when possible.
> >
> > In the process of doing this there were some hacks I want/need, and think
> > this should be broken up into multiple PRs and possible abstract out the
> > whole thing because the MLP implementation I came up with is itself
> > designed to be extendable to Long Short Term Memory Networks.
> >
> > Top level here are some of the sub PRs
> >
> >  Expand SGD to allow for predicting vectors instead of just Doubles.
> This
> > allows the same NN code (and other algos) to be used for classification,
> > transformations, and regressions.
> >
> >  Allow for 'warm starts' > this requires adding a parameter to
> > IterativeSolver that basically starts on iteration N. This is somewhat
> > akin to the idea of partial fits in sklearn OR making the iterative
> solver
> > have some sort of internal counter and then when you call 'fit' it just
> > runs another N iterations (which is set by SetIterations) instead of
> > assuming it is back to zero. This might seem trivial but has significant
> > impact on step size calculations.
> >
> >  A library of model grading metrics. Having 'calculate RSquare' as a
> built
> > in method for every regressor doesn't seem like an efficient way to do
> this
> > long term.
> >
> > BLAS for matrix ops (this was talked about earlier)
> >
> >  A neural net has Arrays of matrices of weights (instead of just a
> > vector). Currently I flatten the array of matrices out into a weight
> > vector and reassemble it into an array of matrices, though this is
> probably
> > not super effecient.
> >
> >  The linear regression implementation currently presumes it will be
> using
> > SGD but I think that should be 'settable' as a parameter, because if not
> > why do we have all of those other nice SGD methods just hanging out?
> > Similarly the loss function / partial loss is hard coded. I reccomend
> > making the current setup the 'defaults' of a 'setOptimizer' method. I.e.
> > if you want to just run a MLR you can do it based on the examples, but if
> > you want to use a fancy optimizer you can create it from existing
> methods,
> > or make your own, then call something like `mlr.setOptimizer( myOptimizer
> > )`
> >
> >  and more
> >
> > At any rate if some people could weigh in / direct me how to proceed
> that
> > would be swell.
> >
> > Thanks!
> > tg
> >
> >
> >
> >
> > Trevor Grant
> > Data Scientist
> > https://github.com/rawkintrevo
> > http://stackexchange.com/users/3002022/rawkintrevo
> > http://trevorgrant.org
> >
> > *"Fortunate is he, who is able to know the causes of things." Virgil*
> >
>
