hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@gmail.com>
Subject Re: [ML] - data storage and basic design approach
Date Tue, 10 Jul 2012 13:16:04 GMT
I have told him that he could use it, he uses a different approach.
You told that we can later merge when he is ready.
First come, first serve.

2012/7/10 Edward J. Yoon <edwardyoon@apache.org>

> My concern is that this looks like duplicated efforts with Miklai.
>
> I think it needs to be organized.
>
> On Tue, Jul 10, 2012 at 8:26 PM, Thomas Jungblut
> <thomas.jungblut@gmail.com> wrote:
> > Splitting out a math module would be smarter, but let's just keep that in
> > the ML package.
> >
> > Anyone volunteer to code a simple (mini-) batch gradient descent in BSP?
> > http://holehouse.org/mlclass/17_Large_Scale_Machine_Learning.html
> >
> >
> > 2012/7/10 Edward J. Yoon <edwardyoon@apache.org>
> >
> >> would like to move core module so that other can reuse it.
> >>
> >> On Tue, Jul 10, 2012 at 7:13 PM, Tommaso Teofili
> >> <tommaso.teofili@gmail.com> wrote:
> >> > I've done the first import, we can start from that now, thanks Thomas.
> >> > Tommaso
> >> >
> >> > 2012/7/10 Tommaso Teofili <tommaso.teofili@gmail.com>
> >> >
> >> >> ok, I'll try that, thanks :)
> >> >> Tommaso
> >> >>
> >> >> 2012/7/10 Thomas Jungblut <thomas.jungblut@gmail.com>
> >> >>
> >> >>> I don't know if we need sparse/named vectors for the first scratch.
> >> >>> You can just use the interface and the dense implementations and
> remove
> >> >>> all
> >> >>> the uncompilable code in the writables.
> >> >>>
> >> >>> 2012/7/10 Tommaso Teofili <tommaso.teofili@gmail.com>
> >> >>>
> >> >>> > Thomas, while inspecting the code I realize I may need to
import
> >> >>> most/all
> >> >>> > of the classes inside your math library for the writables
to
> compile,
> >> >>> is it
> >> >>> > ok for you or you don't want that?
> >> >>> > Regards,
> >> >>> > Tommaso
> >> >>> >
> >> >>> > 2012/7/10 Thomas Jungblut <thomas.jungblut@gmail.com>
> >> >>> >
> >> >>> > > great, thank you for taking care of it ;)
> >> >>> > >
> >> >>> > > 2012/7/10 Tommaso Teofili <tommaso.teofili@gmail.com>
> >> >>> > >
> >> >>> > > > Ok, sure, I'll just add the writables along with
> >> DoubleMatrix/Vector
> >> >>> > with
> >> >>> > > > the AL2 headers on top.
> >> >>> > > > Thanks Thomas for the contribution and feedback.
> >> >>> > > > Tommaso
> >> >>> > > >
> >> >>> > > > 2012/7/10 Thomas Jungblut <thomas.jungblut@gmail.com>
> >> >>> > > >
> >> >>> > > > > Feel free to commit this, but take care to
add the apache
> >> license
> >> >>> > > > headers.
> >> >>> > > > > Also I wanted to add a few testcases over the
next few
> >> weekends.
> >> >>> > > > >
> >> >>> > > > > 2012/7/10 Tommaso Teofili <tommaso.teofili@gmail.com>
> >> >>> > > > >
> >> >>> > > > > > nice idea, quickly thinking to it it looks
to me that
> (C)GD
> >> is a
> >> >>> > good
> >> >>> > > > fit
> >> >>> > > > > > for BSP.
> >> >>> > > > > > Also I was trying to implement some easy
meta learning
> >> algorithm
> >> >>> > like
> >> >>> > > > the
> >> >>> > > > > > weighed majority algorithm where each
peer as a proper
> >> learning
> >> >>> > > > algorithm
> >> >>> > > > > > and gest penalized for each mistaken prediction.
> >> >>> > > > > > Regarding your math library do you plan
to commit it
> >> yourself?
> >> >>> > > > Otherwise
> >> >>> > > > > I
> >> >>> > > > > > can do it.
> >> >>> > > > > > Regards,
> >> >>> > > > > > Tommaso
> >> >>> > > > > >
> >> >>> > > > > >
> >> >>> > > > > > 2012/7/10 Thomas Jungblut <thomas.jungblut@gmail.com>
> >> >>> > > > > >
> >> >>> > > > > > > Maybe a first good step towards algorithms
would be to
> try
> >> to
> >> >>> > > > evaluate
> >> >>> > > > > > how
> >> >>> > > > > > > we can implement some non-linear
optimizers in BSP.
> (BFGS
> >> or
> >> >>> > > > conjugate
> >> >>> > > > > > > gradient method)
> >> >>> > > > > > >
> >> >>> > > > > > > 2012/7/9 Tommaso Teofili <tommaso.teofili@gmail.com>
> >> >>> > > > > > >
> >> >>> > > > > > > > 2012/7/9 Thomas Jungblut <thomas.jungblut@gmail.com>
> >> >>> > > > > > > >
> >> >>> > > > > > > > > For the matrix/vector I
would propose my library
> >> >>> interface:
> >> >>> > > > (quite
> >> >>> > > > > > like
> >> >>> > > > > > > > > mahouts math, but without
boundary checks)
> >> >>> > > > > > > > >
> >> >>> > > > > > > > >
> >> >>> > > > > > > >
> >> >>> > > > > > >
> >> >>> > > > > >
> >> >>> > > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> https://github.com/thomasjungblut/tjungblut-math/blob/master/src/de/jungblut/math/DoubleVector.java
> >> >>> > > > > > > > >
> >> >>> > > > > > > > >
> >> >>> > > > > > > > >
> >> >>> > > > > > > >
> >> >>> > > > > > >
> >> >>> > > > > >
> >> >>> > > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> https://github.com/thomasjungblut/tjungblut-math/blob/master/src/de/jungblut/math/DoubleMatrix.java
> >> >>> > > > > > > > > Full Writable for Vector
and basic Writable for
> Matrix:
> >> >>> > > > > > > > >
> >> >>> > > > > > > > >
> >> >>> > > > > > > >
> >> >>> > > > > > >
> >> >>> > > > > >
> >> >>> > > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> https://github.com/thomasjungblut/thomasjungblut-common/tree/master/src/de/jungblut/writable
> >> >>> > > > > > > > >
> >> >>> > > > > > > > > It is an enough to make
all machine learning
> algorithms
> >> >>> I've
> >> >>> > > seen
> >> >>> > > > > > until
> >> >>> > > > > > > > now
> >> >>> > > > > > > > > and the builder pattern
allows really nice chaining
> of
> >> >>> > commands
> >> >>> > > > to
> >> >>> > > > > > > easily
> >> >>> > > > > > > > > code equations or translate
code from matlab/octave.
> >> >>> > > > > > > > > See for example logistic
regression cost function
> >> >>> > > > > > > > >
> >> >>> > > > > > > > >
> >> >>> > > > > > > >
> >> >>> > > > > > >
> >> >>> > > > > >
> >> >>> > > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >>
> https://github.com/thomasjungblut/thomasjungblut-common/blob/master/src/de/jungblut/regression/LogisticRegressionCostFunction.java
> >> >>> > > > > > > >
> >> >>> > > > > > > >
> >> >>> > > > > > > > very nice, +1!
> >> >>> > > > > > > >
> >> >>> > > > > > > >
> >> >>> > > > > > > > >
> >> >>> > > > > > > > >
> >> >>> > > > > > > > > For the interfaces of the
algorithms:
> >> >>> > > > > > > > > I guess we need to get
some more experience, I can
> not
> >> >>> tell
> >> >>> > how
> >> >>> > > > the
> >> >>> > > > > > > > > interfaces for them should
look like, mainly
> because I
> >> >>> don't
> >> >>> > > know
> >> >>> > > > > how
> >> >>> > > > > > > the
> >> >>> > > > > > > > > BSP version of them will
call the algorithm logic.
> >> >>> > > > > > > > >
> >> >>> > > > > > > >
> >> >>> > > > > > > > you're right, it's more reasonable
to just proceed
> >> bottom -
> >> >>> up
> >> >>> > > with
> >> >>> > > > > > this
> >> >>> > > > > > > as
> >> >>> > > > > > > > we're going to have a clearer
idea while developing
> the
> >> >>> > different
> >> >>> > > > > > > > algorithms.
> >> >>> > > > > > > > So for now I'd introduce your
library Writables and
> then
> >> >>> > proceed
> >> >>> > > 1
> >> >>> > > > > step
> >> >>> > > > > > > at
> >> >>> > > > > > > > a time with the more common
API.
> >> >>> > > > > > > > Thanks,
> >> >>> > > > > > > > Tommaso
> >> >>> > > > > > > >
> >> >>> > > > > > > >
> >> >>> > > > > > > >
> >> >>> > > > > > > >
> >> >>> > > > > > > > >
> >> >>> > > > > > > > > But having stable math
interfaces is the key point.
> >> >>> > > > > > > > >
> >> >>> > > > > > > > > 2012/7/9 Tommaso Teofili
<tommaso.teofili@gmail.com
> >
> >> >>> > > > > > > > >
> >> >>> > > > > > > > > > Ok, so let's sketch
up here what these interfaces
> >> should
> >> >>> > look
> >> >>> > > > > like.
> >> >>> > > > > > > > > > Any proposal is more
than welcome.
> >> >>> > > > > > > > > > Regards,
> >> >>> > > > > > > > > > Tommaso
> >> >>> > > > > > > > > >
> >> >>> > > > > > > > > > 2012/7/7 Thomas Jungblut
<
> thomas.jungblut@gmail.com>
> >> >>> > > > > > > > > >
> >> >>> > > > > > > > > > > Looks fine to
me.
> >> >>> > > > > > > > > > > The key are the
interfaces for learning and
> >> >>> predicting so
> >> >>> > > we
> >> >>> > > > > > should
> >> >>> > > > > > > > > > define
> >> >>> > > > > > > > > > > some vectors
and matrices.
> >> >>> > > > > > > > > > > It would be enough
to define the algorithms via
> the
> >> >>> > > > interfaces
> >> >>> > > > > > and
> >> >>> > > > > > > a
> >> >>> > > > > > > > > > > generic BSP should
just run them based on the
> given
> >> >>> > input.
> >> >>> > > > > > > > > > >
> >> >>> > > > > > > > > > > 2012/7/7 Tommaso
Teofili <
> >> tommaso.teofili@gmail.com>
> >> >>> > > > > > > > > > >
> >> >>> > > > > > > > > > > > Hi all,
> >> >>> > > > > > > > > > > >
> >> >>> > > > > > > > > > > > in my spare
time I started writing some basic
> BSP
> >> >>> based
> >> >>> > > > > machine
> >> >>> > > > > > > > > > learning
> >> >>> > > > > > > > > > > > algorithms
for our ml module, now I'm
> wondering,
> >> >>> from a
> >> >>> > > > > design
> >> >>> > > > > > > > point
> >> >>> > > > > > > > > of
> >> >>> > > > > > > > > > > > view, where
it'd make sense to put the
> training
> >> >>> data /
> >> >>> > > > model.
> >> >>> > > > > > I'd
> >> >>> > > > > > > > > > assume
> >> >>> > > > > > > > > > > > the obvious
answer would be HDFS so this
> makes me
> >> >>> think
> >> >>> > > we
> >> >>> > > > > > should
> >> >>> > > > > > > > > come
> >> >>> > > > > > > > > > > with
> >> >>> > > > > > > > > > > > (at least)
two BSP jobs for each algorithm:
> one
> >> for
> >> >>> > > > learning
> >> >>> > > > > > and
> >> >>> > > > > > > > one
> >> >>> > > > > > > > > > for
> >> >>> > > > > > > > > > > > "predicting"
each to be run separately.
> >> >>> > > > > > > > > > > > This would
allow to read the training data
> from
> >> >>> HDFS,
> >> >>> > and
> >> >>> > > > > > > > > consequently
> >> >>> > > > > > > > > > > > create a
model (also on HDFS) and then the
> >> created
> >> >>> > model
> >> >>> > > > > could
> >> >>> > > > > > be
> >> >>> > > > > > > > > read
> >> >>> > > > > > > > > > > > (again from
HDFS) in order to predict an
> output
> >> for
> >> >>> a
> >> >>> > new
> >> >>> > > > > > input.
> >> >>> > > > > > > > > > > > Does that
make sense?
> >> >>> > > > > > > > > > > > I'm just
wondering what a general purpose
> design
> >> for
> >> >>> > Hama
> >> >>> > > > > based
> >> >>> > > > > > > ML
> >> >>> > > > > > > > > > stuff
> >> >>> > > > > > > > > > > > would look
like so this is just to start the
> >> >>> > discussion,
> >> >>> > > > any
> >> >>> > > > > > > > opinion
> >> >>> > > > > > > > > is
> >> >>> > > > > > > > > > > > welcome.
> >> >>> > > > > > > > > > > >
> >> >>> > > > > > > > > > > > Cheers,
> >> >>> > > > > > > > > > > > Tommaso
> >> >>> > > > > > > > > > > >
> >> >>> > > > > > > > > > >
> >> >>> > > > > > > > > >
> >> >>> > > > > > > > >
> >> >>> > > > > > > >
> >> >>> > > > > > >
> >> >>> > > > > >
> >> >>> > > > >
> >> >>> > > >
> >> >>> > >
> >> >>> >
> >> >>>
> >> >>
> >> >>
> >>
> >>
> >>
> >> --
> >> Best Regards, Edward J. Yoon
> >> @eddieyoon
> >>
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message