Hello Trevor,
These are indeed a lot of issues, let's see if we can fit the discussion
for all of them
in one thread.
I'll add some comments inline.
 Expand SGD to allow for predicting vectors instead of just Doubles.
We have discussed this in the past and at that point decided that it didn't
make
sense to change the base SGD implementation to accommodate vectors.
The alternatives that were presented at the time were to abstract away
the type of the input/output in the Optimizer (allowing for both Vectors
and Doubles),
or to create specialized classes for each case. That also gives us greater
flexibility
in terms of optimizing performance.
In terms of the ANN, I think you can hide away the Vectors in the
implementation of the ANN
model, and use the Optimizer interface as is, like A. Ulanov did with the Spark
ANN
<https://github.com/apache/spark/pull/7621/files>
implementation <https://github.com/apache/spark/pull/7621/files>.
 Allow for 'warm starts'
I like the idea of having a partiFitlike function, could you present a
couple
of use cases where we might use it? I'm wondering if savepoints already
cover
this functionality.
 A library of model grading metrics.
>
We have a (perpetually) open PR <https://github.com/apache/flink/pull/871>
for an evaluation framework. Could you
expand on "Having 'calculate RSquare' as a built in method for every
regressor
doesn't seem like an efficient way to do this long term."
BLAS for matrix ops (this was talked about earlier)
This will be a good addition. If they are specific to the ANN implementation
however I would hide them away from the rest of the code (and include in
that PR
only) until another usecase comes up.
 A neural net has Arrays of matrices of weights (instead of just a vector).
>
Yes this is probably not the most efficient way to do this, but it's the
"least
API breaking" I'm afraid.
 The linear regression implementation currently presumes it will be using
> SGD but I think that should be 'settable' as a parameter
>
The original Optimizer was written the way you described, but we changed it
later IIRC to make it more accessible (e.g. for users that don't know that
you can't match L1 regularization with LBFGS). Maybe Till can say more
about the other reasons this was changed.
On Mon, Mar 28, 2016 at 8:01 PM, Trevor Grant <trevor.d.grant@gmail.com>
wrote:
> Hey,
>
> I have a working prototype of an multi layer perceptron implementation
> working in Flink.
>
> I made every possible effort to utilize existing code when possible.
>
> In the process of doing this there were some hacks I want/need, and think
> this should be broken up into multiple PRs and possible abstract out the
> whole thing because the MLP implementation I came up with is itself
> designed to be extendable to Long Short Term Memory Networks.
>
> Top level here are some of the sub PRs
>
>  Expand SGD to allow for predicting vectors instead of just Doubles. This
> allows the same NN code (and other algos) to be used for classification,
> transformations, and regressions.
>
>  Allow for 'warm starts' > this requires adding a parameter to
> IterativeSolver that basically starts on iteration N. This is somewhat
> akin to the idea of partial fits in sklearn OR making the iterative solver
> have some sort of internal counter and then when you call 'fit' it just
> runs another N iterations (which is set by SetIterations) instead of
> assuming it is back to zero. This might seem trivial but has significant
> impact on step size calculations.
>
>  A library of model grading metrics. Having 'calculate RSquare' as a built
> in method for every regressor doesn't seem like an efficient way to do this
> long term.
>
> BLAS for matrix ops (this was talked about earlier)
>
>  A neural net has Arrays of matrices of weights (instead of just a
> vector). Currently I flatten the array of matrices out into a weight
> vector and reassemble it into an array of matrices, though this is probably
> not super effecient.
>
>  The linear regression implementation currently presumes it will be using
> SGD but I think that should be 'settable' as a parameter, because if not
> why do we have all of those other nice SGD methods just hanging out?
> Similarly the loss function / partial loss is hard coded. I reccomend
> making the current setup the 'defaults' of a 'setOptimizer' method. I.e.
> if you want to just run a MLR you can do it based on the examples, but if
> you want to use a fancy optimizer you can create it from existing methods,
> or make your own, then call something like `mlr.setOptimizer( myOptimizer
> )`
>
>  and more
>
> At any rate if some people could weigh in / direct me how to proceed that
> would be swell.
>
> Thanks!
> tg
>
>
>
>
> Trevor Grant
> Data Scientist
> https://github.com/rawkintrevo
> http://stackexchange.com/users/3002022/rawkintrevo
> http://trevorgrant.org
>
> *"Fortunate is he, who is able to know the causes of things." Virgil*
>
