mahout-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yexi Jiang <yexiji...@gmail.com>
Subject Re: [jira] [Comment Edited] (MAHOUT-1426) GSOC 2013 Neural network algorithms
Date Fri, 28 Feb 2014 01:06:27 GMT
Hi, Peng,

Do you mean the MultilayerPerceptron? There are three 'train' method, and
only one (the one without the parameters trackingKey and groupKey) is
implemented. In current implementation, they are not used.

Regards,
Yexi


2014-02-27 19:31 GMT-05:00 Ted Dunning <ted.dunning@gmail.com>:

> Generally for training models like this, there is an assumption that fault
> tolerance is not particularly necessary because the low risk of failure
> trades against algorithmic speed.  For reasonably small chance of failure,
> simply re-running the training is just fine.  If there is high risk of
> failure, simply checkpointing the parameter server is sufficient to allow
> restarts without redundancy.
>
> Sharding the parameter is quite possible and is reasonable when the
> parameter vector exceed 10's or 100's of millions of parameters, but isn't
> likely much necessary below that.
>
> The asymmetry is similarly not a big deal.  The traffic to and from the
> parameter server isn't enormous.
>
>
> Building something simple and working first is a good thing.
>
>
> On Thu, Feb 27, 2014 at 3:56 PM, peng <pc175@uowmail.edu.au> wrote:
>
> > With pleasure! the original downpour paper propose a parameter server
> from
> > which subnodes download shards of old model and upload gradients. So if
> the
> > parameter server is down, the process has to be delayed, it also requires
> > that all model parameters to be stored and atomically updated on (and
> > fetched from) a single machine, imposing asymmetric HDD and bandwidth
> > requirement. This design is necessary only because each -=delta operation
> > has to be atomic. Which cannot be ensured across network (e.g. on HDFS).
> >
> > But it doesn't mean that the operation cannot be decentralized:
> parameters
> > can be sharded across multiple nodes and multiple accumulator instances
> can
> > handle parts of the vector subtraction. This should be easy if you
> create a
> > buffer for the stream of gradient, and allocate proper numbers of
> producers
> > and consumers on each machine to make sure it doesn't overflow. Obviously
> > this is far from MR framework, but at least it can be made homogeneous
> and
> > slightly faster (because sparse data can be distributed in a way to
> > minimize their overlapping, so gradients doesn't have to go across the
> > network that frequent).
> >
> > If we instead using a centralized architect. Then there must be >=1
> backup
> > parameter server for mission critical training.
> >
> > Yours Peng
> >
> > e.g. we can simply use a producer/consumer pattern
> >
> > If we use a producer/consumer pattern for all gradients,
> >
> > On Thu 27 Feb 2014 05:09:52 PM EST, Yexi Jiang wrote:
> >
> >> Peng,
> >>
> >> Can you provide more details about your thought?
> >>
> >> Regards,
> >>
> >>
> >> 2014-02-27 16:00 GMT-05:00 peng <pc175@uowmail.edu.au>:
> >>
> >>  That should be easy. But that defeats the purpose of using mahout as
> >>> there
> >>> are already enough implementations of single node backpropagation (in
> >>> which
> >>> case GPU is much faster).
> >>>
> >>> Yexi:
> >>>
> >>> Regarding downpour SGD and sandblaster, may I suggest that the
> >>> implementation better has no parameter server? It's obviously a single
> >>> point of failure and in terms of bandwidth, a bottleneck. I heard that
> >>> MLlib on top of Spark has a functional implementation (never read or
> test
> >>> it), and its possible to build the workflow on top of YARN. Non of
> those
> >>> framework has an heterogeneous topology.
> >>>
> >>> Yours Peng
> >>>
> >>>
> >>> On Thu 27 Feb 2014 09:43:19 AM EST, Maciej Mazur (JIRA) wrote:
> >>>
> >>>
> >>>>       [ https://issues.apache.org/jira/browse/MAHOUT-1426?page=
> >>>> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> >>>> tabpanel&focusedCommentId=13913488#comment-13913488 ]
> >>>>
> >>>> Maciej Mazur edited comment on MAHOUT-1426 at 2/27/14 2:41 PM:
> >>>> ---------------------------------------------------------------
> >>>>
> >>>> I've read the papers. I didn't think about distributed network. I had
> in
> >>>> mind network that will fit into memory, but will require significant
> >>>> amount
> >>>> of computations.
> >>>>
> >>>> I understand that there are better options for neural networks than
> map
> >>>> reduce.
> >>>> How about non-map-reduce version?
> >>>> I see that you think it is something that would make a sense. (Doing
a
> >>>> non-map-reduce neural network in Mahout would be of substantial
> >>>> interest.)
> >>>> Do you think it will be a valueable contribution?
> >>>> Is there a need for this type of algorithm?
> >>>> I think about multi-threded batch gradient descent with pretraining
> (RBM
> >>>> or/and Autoencoders).
> >>>>
> >>>> I have looked into these old JIRAs. RBM patch was withdrawn.
> >>>> "I would rather like to withdraw that patch, because by the time i
> >>>> implemented it i didn't know that the learning algorithm is not suited
> >>>> for
> >>>> MR, so I think there is no point including the patch."
> >>>>
> >>>>
> >>>> was (Author: maciejmazur):
> >>>> I've read the papers. I didn't think about distributed network. I had
> in
> >>>> mind network that will fit into memory, but will require significant
> >>>> amount
> >>>> of computations.
> >>>>
> >>>> I understand that there are better options for neural networks than
> map
> >>>> reduce.
> >>>> How about non-map-reduce version?
> >>>> I see that you think it is something that would make a sense.
> >>>> Do you think it will be a valueable contribution?
> >>>> Is there a need for this type of algorithm?
> >>>> I think about multi-threded batch gradient descent with pretraining
> (RBM
> >>>> or/and Autoencoders).
> >>>>
> >>>> I have looked into these old JIRAs. RBM patch was withdrawn.
> >>>> "I would rather like to withdraw that patch, because by the time i
> >>>> implemented it i didn't know that the learning algorithm is not suited
> >>>> for
> >>>> MR, so I think there is no point including the patch."
> >>>>
> >>>>   GSOC 2013 Neural network algorithms
> >>>>
> >>>>> -----------------------------------
> >>>>>
> >>>>>                   Key: MAHOUT-1426
> >>>>>                   URL: https://issues.apache.org/
> >>>>> jira/browse/MAHOUT-1426
> >>>>>               Project: Mahout
> >>>>>            Issue Type: Improvement
> >>>>>            Components: Classification
> >>>>>              Reporter: Maciej Mazur
> >>>>>
> >>>>> I would like to ask about possibilites of implementing neural network
> >>>>> algorithms in mahout during GSOC.
> >>>>> There is a classifier.mlp package with neural network.
> >>>>> I can't see neighter RBM  nor Autoencoder in these classes.
> >>>>> There is only one word about Autoencoders in NeuralNetwork class.
> >>>>> As far as I know Mahout doesn't support convolutional networks.
> >>>>> Is it a good idea to implement one of these algorithms?
> >>>>> Is it a reasonable amount of work?
> >>>>> How hard is it to get GSOC in Mahout?
> >>>>> Did anyone succeed last year?
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>> This message was sent by Atlassian JIRA
> >>>> (v6.1.5#6160)
> >>>>
> >>>>
> >>>
> >>
> >>
>



-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message