incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: Hang problem
Date Sat, 24 Sep 2011 00:59:53 GMT
> There is a while(updated) loop.
> Updated is just getting false when globally no updates were made.
> Same logic in Pagerank.

Is this mean that some process can be finished earlier than others?

On Sat, Sep 24, 2011 at 12:50 AM, Thomas Jungblut
<thomas.jungblut@googlemail.com> wrote:
> If it is still about SSSP:
> Well, I took that into account. That is the reason why there is a master
> task.
> There is a while(updated) loop.
> Updated is just getting false when globally no updates were made.
> Same logic in Pagerank.
> This is totally failsafe :p
>
> 2011/9/23 Edward J. Yoon <edwardyoon@apache.org>
>
>> In other words, all tasks should be entered into next step until whole
>> job is completed successfully.
>>
>> On Sat, Sep 24, 2011 at 12:37 AM, Edward J. Yoon <edwardyoon@apache.org>
>> wrote:
>> > According to BSPMaster log messages, a few tasks of all are finished
>> > with SUCCEEDED status during the iterations. If I remember correctly,
>> > child processes calls bspPeer.close() finally.
>> >
>> > Then yes, others will be hanged at the step of comparing the size of
>> > znode and initial task size.
>> >
>> > I wonder what happens if some task no longer need to communicate with
>> others?
>> >
>> > On Fri, Sep 23, 2011 at 11:59 PM, Thomas Jungblut
>> > <thomas.jungblut@googlemail.com> wrote:
>> >> Well, for SSSP example it might be correct.
>> >> But you faced the hanging problems in randbench, too.
>> >>
>> >> Moreover, we have to implement our own mechanisms for high availability
>> if
>> >>> we have own sync master server.
>> >>>
>> >>
>> >> +1
>> >>
>> >> 2011/9/23 Edward J. Yoon <edwardyoon@apache.org>
>> >>
>> >>> As I mentioned before, it's not a ZK problem.
>> >>>
>> >>> Moreover, we have to implement our own mechanisms for high availability
>> if
>> >>> we have own sync master server.
>> >>>
>> >>> Sent from my iPad
>> >>>
>> >>> On Sep 23, 2011, at 11:01 PM, Thomas Jungblut <
>> >>> thomas.jungblut@googlemail.com> wrote:
>> >>>
>> >>> > I have made a github for that:
>> >>> > https://github.com/thomasjungblut/barriersync
>> >>> >
>> >>> > Check it out into your eclipse (the root directory failed for
>> whatever
>> >>> > reason).
>> >>> > Start the server and then the clientemulator.
>> >>> > Works like a real charm.
>> >>> >
>> >>> > Please consider this as an alternative. We should not roll out
a 4.0
>> >>> release
>> >>> > with a not working barrier sync.
>> >>> >
>> >>> > 2011/9/23 Thomas Jungblut <thomas.jungblut@googlemail.com>
>> >>> >
>> >>> >> Won't much different.
>> >>> >>>
>> >>> >>
>> >>> >> Let's see.
>> >>> >>
>> >>> >> 2011/9/23 Edward J. Yoon <edwardyoon@apache.org>
>> >>> >>
>> >>> >>> What happens if some task no longer need to communicate
with
>> others?
>> >>> >>>
>> >>> >>> I didn't look at the code recently but I guess that the
problem is
>> >>> >>> related with comparison of znode size and task size.
>> >>> >>>
>> >>> >>>> I am going to write a RPC barrier sync. Zookeeper sucks
in this
>> case.
>> >>> >>>
>> >>> >>> Won't much different. Let's focusing on NG integration
and
>> In/Output
>> >>> >>> system.
>> >>> >>>
>> >>> >>> On Fri, Sep 23, 2011 at 8:21 PM, Thomas Jungblut
>> >>> >>> <thomas.jungblut@googlemail.com> wrote:
>> >>> >>>> I am going to write a RPC barrier sync. Zookeeper sucks
in this
>> case.
>> >>> >>>>
>> >>> >>>> 2011/9/23 Edward J. Yoon <edwardyoon@apache.org>
>> >>> >>>>
>> >>> >>>>> P.S., Tested on 16 nodes using 10 tasks per node.
>> >>> >>>>>
>> >>> >>>>> On Fri, Sep 23, 2011 at 7:19 PM, Edward J. Yoon
<
>> >>> edwardyoon@apache.org
>> >>> >>>>
>> >>> >>>>> wrote:
>> >>> >>>>>> Hi,
>> >>> >>>>>>
>> >>> >>>>>> Today I ran the sssp example with 4GB sample
file.
>> >>> >>>>>>
>> >>> >>>>>> At 32th step, some tasks are finished and others
hang forever.
>> >>> >>>>>>
>> >>> >>>>>> Could anyone figure out this problem?
>> >>> >>>>>>
>> >>> >>>>>> Plus, there're too many INFO-level logs. Let's
reduce them.
>> >>> >>>>>>
>> >>> >>>>>> Thanks.
>> >>> >>>>>>
>> >>> >>>>>> --
>> >>> >>>>>> Best Regards, Edward J. Yoon
>> >>> >>>>>> @eddieyoon
>> >>> >>>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>>
>> >>> >>>>> --
>> >>> >>>>> Best Regards, Edward J. Yoon
>> >>> >>>>> @eddieyoon
>> >>> >>>>>
>> >>> >>>>
>> >>> >>>>
>> >>> >>>>
>> >>> >>>> --
>> >>> >>>> Thomas Jungblut
>> >>> >>>> Berlin
>> >>> >>>>
>> >>> >>>> mobile: 0170-3081070
>> >>> >>>>
>> >>> >>>> business: thomas.jungblut@testberichte.de
>> >>> >>>> private: thomas.jungblut@gmail.com
>> >>> >>>>
>> >>> >>>
>> >>> >>>
>> >>> >>>
>> >>> >>> --
>> >>> >>> Best Regards, Edward J. Yoon
>> >>> >>> @eddieyoon
>> >>> >>>
>> >>> >>
>> >>> >>
>> >>> >>
>> >>> >> --
>> >>> >> Thomas Jungblut
>> >>> >> Berlin
>> >>> >>
>> >>> >> mobile: 0170-3081070
>> >>> >>
>> >>> >> business: thomas.jungblut@testberichte.de
>> >>> >> private: thomas.jungblut@gmail.com
>> >>> >>
>> >>> >
>> >>> >
>> >>> >
>> >>> > --
>> >>> > Thomas Jungblut
>> >>> > Berlin
>> >>> >
>> >>> > mobile: 0170-3081070
>> >>> >
>> >>> > business: thomas.jungblut@testberichte.de
>> >>> > private: thomas.jungblut@gmail.com
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> Thomas Jungblut
>> >> Berlin
>> >>
>> >> mobile: 0170-3081070
>> >>
>> >> business: thomas.jungblut@testberichte.de
>> >> private: thomas.jungblut@gmail.com
>> >>
>> >
>> >
>> >
>> > --
>> > Best Regards, Edward J. Yoon
>> > @eddieyoon
>> >
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>
>
>
>
> --
> Thomas Jungblut
> Berlin
>
> mobile: 0170-3081070
>
> business: thomas.jungblut@testberichte.de
> private: thomas.jungblut@gmail.com
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Mime
View raw message