incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@googlemail.com>
Subject Re: Hang problem
Date Fri, 23 Sep 2011 15:50:47 GMT
If it is still about SSSP:
Well, I took that into account. That is the reason why there is a master
task.
There is a while(updated) loop.
Updated is just getting false when globally no updates were made.
Same logic in Pagerank.
This is totally failsafe :p

2011/9/23 Edward J. Yoon <edwardyoon@apache.org>

> In other words, all tasks should be entered into next step until whole
> job is completed successfully.
>
> On Sat, Sep 24, 2011 at 12:37 AM, Edward J. Yoon <edwardyoon@apache.org>
> wrote:
> > According to BSPMaster log messages, a few tasks of all are finished
> > with SUCCEEDED status during the iterations. If I remember correctly,
> > child processes calls bspPeer.close() finally.
> >
> > Then yes, others will be hanged at the step of comparing the size of
> > znode and initial task size.
> >
> > I wonder what happens if some task no longer need to communicate with
> others?
> >
> > On Fri, Sep 23, 2011 at 11:59 PM, Thomas Jungblut
> > <thomas.jungblut@googlemail.com> wrote:
> >> Well, for SSSP example it might be correct.
> >> But you faced the hanging problems in randbench, too.
> >>
> >> Moreover, we have to implement our own mechanisms for high availability
> if
> >>> we have own sync master server.
> >>>
> >>
> >> +1
> >>
> >> 2011/9/23 Edward J. Yoon <edwardyoon@apache.org>
> >>
> >>> As I mentioned before, it's not a ZK problem.
> >>>
> >>> Moreover, we have to implement our own mechanisms for high availability
> if
> >>> we have own sync master server.
> >>>
> >>> Sent from my iPad
> >>>
> >>> On Sep 23, 2011, at 11:01 PM, Thomas Jungblut <
> >>> thomas.jungblut@googlemail.com> wrote:
> >>>
> >>> > I have made a github for that:
> >>> > https://github.com/thomasjungblut/barriersync
> >>> >
> >>> > Check it out into your eclipse (the root directory failed for
> whatever
> >>> > reason).
> >>> > Start the server and then the clientemulator.
> >>> > Works like a real charm.
> >>> >
> >>> > Please consider this as an alternative. We should not roll out a 4.0
> >>> release
> >>> > with a not working barrier sync.
> >>> >
> >>> > 2011/9/23 Thomas Jungblut <thomas.jungblut@googlemail.com>
> >>> >
> >>> >> Won't much different.
> >>> >>>
> >>> >>
> >>> >> Let's see.
> >>> >>
> >>> >> 2011/9/23 Edward J. Yoon <edwardyoon@apache.org>
> >>> >>
> >>> >>> What happens if some task no longer need to communicate with
> others?
> >>> >>>
> >>> >>> I didn't look at the code recently but I guess that the problem
is
> >>> >>> related with comparison of znode size and task size.
> >>> >>>
> >>> >>>> I am going to write a RPC barrier sync. Zookeeper sucks
in this
> case.
> >>> >>>
> >>> >>> Won't much different. Let's focusing on NG integration and
> In/Output
> >>> >>> system.
> >>> >>>
> >>> >>> On Fri, Sep 23, 2011 at 8:21 PM, Thomas Jungblut
> >>> >>> <thomas.jungblut@googlemail.com> wrote:
> >>> >>>> I am going to write a RPC barrier sync. Zookeeper sucks
in this
> case.
> >>> >>>>
> >>> >>>> 2011/9/23 Edward J. Yoon <edwardyoon@apache.org>
> >>> >>>>
> >>> >>>>> P.S., Tested on 16 nodes using 10 tasks per node.
> >>> >>>>>
> >>> >>>>> On Fri, Sep 23, 2011 at 7:19 PM, Edward J. Yoon <
> >>> edwardyoon@apache.org
> >>> >>>>
> >>> >>>>> wrote:
> >>> >>>>>> Hi,
> >>> >>>>>>
> >>> >>>>>> Today I ran the sssp example with 4GB sample file.
> >>> >>>>>>
> >>> >>>>>> At 32th step, some tasks are finished and others
hang forever.
> >>> >>>>>>
> >>> >>>>>> Could anyone figure out this problem?
> >>> >>>>>>
> >>> >>>>>> Plus, there're too many INFO-level logs. Let's
reduce them.
> >>> >>>>>>
> >>> >>>>>> Thanks.
> >>> >>>>>>
> >>> >>>>>> --
> >>> >>>>>> Best Regards, Edward J. Yoon
> >>> >>>>>> @eddieyoon
> >>> >>>>>>
> >>> >>>>>
> >>> >>>>>
> >>> >>>>>
> >>> >>>>> --
> >>> >>>>> Best Regards, Edward J. Yoon
> >>> >>>>> @eddieyoon
> >>> >>>>>
> >>> >>>>
> >>> >>>>
> >>> >>>>
> >>> >>>> --
> >>> >>>> Thomas Jungblut
> >>> >>>> Berlin
> >>> >>>>
> >>> >>>> mobile: 0170-3081070
> >>> >>>>
> >>> >>>> business: thomas.jungblut@testberichte.de
> >>> >>>> private: thomas.jungblut@gmail.com
> >>> >>>>
> >>> >>>
> >>> >>>
> >>> >>>
> >>> >>> --
> >>> >>> Best Regards, Edward J. Yoon
> >>> >>> @eddieyoon
> >>> >>>
> >>> >>
> >>> >>
> >>> >>
> >>> >> --
> >>> >> Thomas Jungblut
> >>> >> Berlin
> >>> >>
> >>> >> mobile: 0170-3081070
> >>> >>
> >>> >> business: thomas.jungblut@testberichte.de
> >>> >> private: thomas.jungblut@gmail.com
> >>> >>
> >>> >
> >>> >
> >>> >
> >>> > --
> >>> > Thomas Jungblut
> >>> > Berlin
> >>> >
> >>> > mobile: 0170-3081070
> >>> >
> >>> > business: thomas.jungblut@testberichte.de
> >>> > private: thomas.jungblut@gmail.com
> >>>
> >>
> >>
> >>
> >> --
> >> Thomas Jungblut
> >> Berlin
> >>
> >> mobile: 0170-3081070
> >>
> >> business: thomas.jungblut@testberichte.de
> >> private: thomas.jungblut@gmail.com
> >>
> >
> >
> >
> > --
> > Best Regards, Edward J. Yoon
> > @eddieyoon
> >
>
>
>
> --
> Best Regards, Edward J. Yoon
> @eddieyoon
>



-- 
Thomas Jungblut
Berlin

mobile: 0170-3081070

business: thomas.jungblut@testberichte.de
private: thomas.jungblut@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message