mxnet-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nan Zhu <zhunanmcg...@gmail.com>
Subject Re: request for reviewing PR in ps-lite
Date Tue, 05 Dec 2017 23:04:53 GMT
Thanks Yizhi and Mu!

This is not for fault-tolerance, this is about how to allow start multiple
customers in the same jvm process, dispatch msgs to the customers correctly
and make the sync mechanism like Barrier msg work correctly with the new
scenario

Best,

Nan

On Tue, Dec 5, 2017 at 1:59 PM, Mu Li <muli.cmu@gmail.com> wrote:

> @yizhi, please go ahead. I'm on NIPS this week, probably don't have enough
> time to dive deep into the codes
>
> On Tue, Dec 5, 2017 at 1:51 PM, YiZhi Liu <javelinjs@gmail.com> wrote:
>
> > to my understanding, this is not about fault-tolerance, i.e., restart
> when
> > worker/server fail, right?
> >
> > I can help to review. ping @Mu for advice.
> >
> > 2017-12-05 13:13 GMT-08:00 CodingCat <codingcat@apache.org>:
> >
> > > ping
> > >
> > > On Sat, Dec 2, 2017 at 10:04 AM, CodingCat <codingcat@apache.org>
> wrote:
> > >
> > > > ping
> > > >
> > > > On Fri, Dec 1, 2017 at 12:18 AM, Nan Zhu <zhunanmcgill@gmail.com>
> > wrote:
> > > >
> > > >> Hi, all
> > > >>
> > > >> I have been working on integrating MXNet with Spark in a more
> > > >> full-fledged manner.
> > > >>
> > > >> One of the most critical pre-conditions is to make parameter server
> in
> > > >> mxnet support multiple workers per process. I created the PR in
> > > >> https://github.com/dmlc/ps-lite/pull/121 (OK, sorry for being
> > late....I
> > > >> should have finished it earlier)
> > > >>
> > > >> This PR includes some refactoring of those too long methods, to
> > > highlight
> > > >> the changes
> > > >>
> > > >> 1. https://github.com/dmlc/ps-lite/pull/112 includes the changes
> > > related
> > > >> to refactoring
> > > >>
> > > >> 2. https://github.com/CodingCat/ps-lite/pull/3/files includes the
> > > >> changes related to the key functionality
> > > >>
> > > >> 3. https://github.com/dmlc/ps-lite/pull/121 contains everything
> > (Please
> > > >> review this one)
> > > >>
> > > >>
> > > >> I am not sure who is the current owner of ps-lite, please help to
> > share
> > > >> your thoughts on the implementation. Only after this PR is merged
> and
> > > >> ps-lite version is synced in mxnet repo, I can file the successive
> PRs
> > > in
> > > >> mxnet
> > > >>
> > > >> Thank you very much!
> > > >>
> > > >> Nan
> > > >>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > Yizhi Liu
> > DMLC member
> > Amazon Web Services
> > Vancouver, Canada
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message