mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rahul Huilgol <rahulhuil...@gmail.com>
Subject Re: [Launch Announcement] Dynamic training with Apache MXNet
Date Fri, 30 Nov 2018 00:45:38 GMT
This is great stuff. Well done!  Few questions:

   - Do you plan to maintain this as a separate fork, or merge it back to
   the main repository?
   - Is the number of parameter servers fixed at the start? Or can we add
   more parameter servers?
   - I see that you can not remove any nodes that you initialized the
   cluster with. Why are these initial nodes treated differently? Are they
   treated differently because they hold the parameter servers who update the
   weights (and hold the optimizer states)?


On Thu, Nov 29, 2018 at 4:04 PM Marco de Abreu
<marco.g.abreu@googlemail.com.invalid> wrote:

> Awesome project! Great job everyone.
>
> Am Do., 29. Nov. 2018, 19:55 hat Kumar, Vikas <vikumar@amazon.com.invalid>
> geschrieben:
>
> > A big thanks to Qi Qiao < https://github.com/mirocody > for making it
> > easy for users to set up a cluster for dynamic training using
> > cloudformation.
> >
> > From: "Kumar, Vikas" <vikumar@amazon.com>
> > Date: Thursday, November 29, 2018 at 10:26 AM
> > To: "dev@mxnet.incubator.apache.org" <dev@mxnet.incubator.apache.org>
> > Subject: [Launch Announcement] Dynamic training with Apache MXNet
> >
> > Hello MXNet community,
> >
> > MXNet users can now use Dynamic Training(DT) for Deep learning models
> with
> > Apache MXNet. DT helps to reducing training cost and training time by
> > adding elasticity to the distributed training cluster. DT also helps in
> > increasing instance pool utilization. With DT unused instances can be
> used
> > to speed up training and then instances can be removed from training
> > cluster at a later time to be used by some other application.
> > For details, refer to DT blog<
> >
> https://aws.amazon.com/blogs/machine-learning/introducing-dynamic-training-for-deep-learning-with-amazon-ec2/
> > >.
> > Developers should be able to integrate Dynamic training in their existing
> > distributed training code, with introduction of few extra lines of code<
> >
> https://github.com/awslabs/dynamic-training-with-apache-mxnet-on-aws#writing-a-distributed-training-script
> > >.
> >
> > Thank you for all the contributors – Vikas Kumar <
> > https://github.com/Vikas89 >, Haibin Lin <
> > https://github.com/eric-haibin-lin>, Andrea Olgiati <
> > https://github.com/andreaolgiati/><https://github.com/andreaolgiati/> ,
> > Mu Li < https://github.com/mli >, Hagay Lupesko <
> > https://github.com/lupesko>, Markham Aaron <
> > https://github.com/aaronmarkham > , Sergey Sokolov <
> > https://github.com/Ishitori> , Qi Qiao < https://github.com/mirocody >
> >
> > This is an effort towards making training neural networks cheap and fast.
> > We welcome your contributions to the repo -
> > https://github.com/awslabs/dynamic-training-with-apache-mxnet-on-aws .
> We
> > would love to hear feedback and ideas in this direction.
> >
> > Thanks
> > Vikas
> >
>


-- 
Rahul Huilgol

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message