hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@gmail.com>
Subject Re: [ML] - data storage and basic design approach
Date Tue, 10 Jul 2012 09:43:40 GMT
I don't know if we need sparse/named vectors for the first scratch.
You can just use the interface and the dense implementations and remove all
the uncompilable code in the writables.

2012/7/10 Tommaso Teofili <tommaso.teofili@gmail.com>

> Thomas, while inspecting the code I realize I may need to import most/all
> of the classes inside your math library for the writables to compile, is it
> ok for you or you don't want that?
> Regards,
> Tommaso
>
> 2012/7/10 Thomas Jungblut <thomas.jungblut@gmail.com>
>
> > great, thank you for taking care of it ;)
> >
> > 2012/7/10 Tommaso Teofili <tommaso.teofili@gmail.com>
> >
> > > Ok, sure, I'll just add the writables along with DoubleMatrix/Vector
> with
> > > the AL2 headers on top.
> > > Thanks Thomas for the contribution and feedback.
> > > Tommaso
> > >
> > > 2012/7/10 Thomas Jungblut <thomas.jungblut@gmail.com>
> > >
> > > > Feel free to commit this, but take care to add the apache license
> > > headers.
> > > > Also I wanted to add a few testcases over the next few weekends.
> > > >
> > > > 2012/7/10 Tommaso Teofili <tommaso.teofili@gmail.com>
> > > >
> > > > > nice idea, quickly thinking to it it looks to me that (C)GD is a
> good
> > > fit
> > > > > for BSP.
> > > > > Also I was trying to implement some easy meta learning algorithm
> like
> > > the
> > > > > weighed majority algorithm where each peer as a proper learning
> > > algorithm
> > > > > and gest penalized for each mistaken prediction.
> > > > > Regarding your math library do you plan to commit it yourself?
> > > Otherwise
> > > > I
> > > > > can do it.
> > > > > Regards,
> > > > > Tommaso
> > > > >
> > > > >
> > > > > 2012/7/10 Thomas Jungblut <thomas.jungblut@gmail.com>
> > > > >
> > > > > > Maybe a first good step towards algorithms would be to try to
> > > evaluate
> > > > > how
> > > > > > we can implement some non-linear optimizers in BSP. (BFGS or
> > > conjugate
> > > > > > gradient method)
> > > > > >
> > > > > > 2012/7/9 Tommaso Teofili <tommaso.teofili@gmail.com>
> > > > > >
> > > > > > > 2012/7/9 Thomas Jungblut <thomas.jungblut@gmail.com>
> > > > > > >
> > > > > > > > For the matrix/vector I would propose my library interface:
> > > (quite
> > > > > like
> > > > > > > > mahouts math, but without boundary checks)
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/thomasjungblut/tjungblut-math/blob/master/src/de/jungblut/math/DoubleVector.java
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/thomasjungblut/tjungblut-math/blob/master/src/de/jungblut/math/DoubleMatrix.java
> > > > > > > > Full Writable for Vector and basic Writable for Matrix:
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/thomasjungblut/thomasjungblut-common/tree/master/src/de/jungblut/writable
> > > > > > > >
> > > > > > > > It is an enough to make all machine learning algorithms
I've
> > seen
> > > > > until
> > > > > > > now
> > > > > > > > and the builder pattern allows really nice chaining
of
> commands
> > > to
> > > > > > easily
> > > > > > > > code equations or translate code from matlab/octave.
> > > > > > > > See for example logistic regression cost function
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/regression/LogisticRegressionCostFunction.java
> > > > > > >
> > > > > > >
> > > > > > > very nice, +1!
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > For the interfaces of the algorithms:
> > > > > > > > I guess we need to get some more experience, I can
not tell
> how
> > > the
> > > > > > > > interfaces for them should look like, mainly because
I don't
> > know
> > > > how
> > > > > > the
> > > > > > > > BSP version of them will call the algorithm logic.
> > > > > > > >
> > > > > > >
> > > > > > > you're right, it's more reasonable to just proceed bottom
- up
> > with
> > > > > this
> > > > > > as
> > > > > > > we're going to have a clearer idea while developing the
> different
> > > > > > > algorithms.
> > > > > > > So for now I'd introduce your library Writables and then
> proceed
> > 1
> > > > step
> > > > > > at
> > > > > > > a time with the more common API.
> > > > > > > Thanks,
> > > > > > > Tommaso
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > > But having stable math interfaces is the key point.
> > > > > > > >
> > > > > > > > 2012/7/9 Tommaso Teofili <tommaso.teofili@gmail.com>
> > > > > > > >
> > > > > > > > > Ok, so let's sketch up here what these interfaces
should
> look
> > > > like.
> > > > > > > > > Any proposal is more than welcome.
> > > > > > > > > Regards,
> > > > > > > > > Tommaso
> > > > > > > > >
> > > > > > > > > 2012/7/7 Thomas Jungblut <thomas.jungblut@gmail.com>
> > > > > > > > >
> > > > > > > > > > Looks fine to me.
> > > > > > > > > > The key are the interfaces for learning
and predicting so
> > we
> > > > > should
> > > > > > > > > define
> > > > > > > > > > some vectors and matrices.
> > > > > > > > > > It would be enough to define the algorithms
via the
> > > interfaces
> > > > > and
> > > > > > a
> > > > > > > > > > generic BSP should just run them based on
the given
> input.
> > > > > > > > > >
> > > > > > > > > > 2012/7/7 Tommaso Teofili <tommaso.teofili@gmail.com>
> > > > > > > > > >
> > > > > > > > > > > Hi all,
> > > > > > > > > > >
> > > > > > > > > > > in my spare time I started writing
some basic BSP based
> > > > machine
> > > > > > > > > learning
> > > > > > > > > > > algorithms for our ml module, now I'm
wondering, from a
> > > > design
> > > > > > > point
> > > > > > > > of
> > > > > > > > > > > view, where it'd make sense to put
the training data /
> > > model.
> > > > > I'd
> > > > > > > > > assume
> > > > > > > > > > > the obvious answer would be HDFS so
this makes me think
> > we
> > > > > should
> > > > > > > > come
> > > > > > > > > > with
> > > > > > > > > > > (at least) two BSP jobs for each algorithm:
one for
> > > learning
> > > > > and
> > > > > > > one
> > > > > > > > > for
> > > > > > > > > > > "predicting" each to be run separately.
> > > > > > > > > > > This would allow to read the training
data from HDFS,
> and
> > > > > > > > consequently
> > > > > > > > > > > create a model (also on HDFS) and then
the created
> model
> > > > could
> > > > > be
> > > > > > > > read
> > > > > > > > > > > (again from HDFS) in order to predict
an output for a
> new
> > > > > input.
> > > > > > > > > > > Does that make sense?
> > > > > > > > > > > I'm just wondering what a general purpose
design for
> Hama
> > > > based
> > > > > > ML
> > > > > > > > > stuff
> > > > > > > > > > > would look like so this is just to
start the
> discussion,
> > > any
> > > > > > > opinion
> > > > > > > > is
> > > > > > > > > > > welcome.
> > > > > > > > > > >
> > > > > > > > > > > Cheers,
> > > > > > > > > > > Tommaso
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message