hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: [DISCUSS]Helix for global synchronization in Hama
Date Fri, 10 May 2013 01:26:24 GMT
Hi,

In classic Hama mode, client works directly with already existing
Zookeeper. "ZooKeeperSyncServerImpl.java" is used for launching a
syncServer (= zookeeper server) on YARN cluster.

If a node fails, Job will be restarted from last checkpoint position.
P.S., FT does not perfectly works yet.


On Fri, May 10, 2013 at 1:51 AM, kishore g <g.kishore@gmail.com> wrote:
> Thanks Edward,
>
> I looked at the code and it looks like its nicely abstracted. I see some
> comments in the code that say this happens only in YARN. Can you give me
> some additional info on what is the difference when running with YARN.
>
> Another thing I wanted to check is what happens when a node fails, is the
> entire job restarted or just super step or just the sub task of the super
> step. I am interested in the current behavior and what would be nice to
> have.
>
> Is there a document that describes the internal architecture.
>
> Thanks,
> Kishore G
>
>
>
>
> On Wed, May 8, 2013 at 6:21 PM, Edward J. Yoon <edwardyoon@apache.org>wrote:
>
>> Hi,
>>
>> This would be great collaboration. Since we pursue the pluggable
>> interfaces for managing the synchronization[1], messenger, and job
>> scheduling systems (we want to preserve the classic (standalone)
>> cluster mode, while integrating with resource manager systems), the
>> integration with Helix won't be difficult.
>>
>> 1. http://wiki.apache.org/hama/SyncService
>>
>> On Thu, May 9, 2013 at 7:01 AM, kishore g <g.kishore@gmail.com> wrote:
>> > Hello,
>> >
>> > I am starting a discussion thread on potential pros/cons of using Helix
>> in
>> > Hama. I dont know the internal details of Hama, so please correct me if
>> > something does not make sense.
>> >
>> > My source of information is http://wiki.apache.org/hama/Architectureand a
>> > brief chat with Suraj at ApacheCon where he described the need for
>> barriers
>> > between super steps.
>> >
>> > Please read about Apache Helix here http://helix.incubator.apache.org/.
>> >
>> > Architecture wise Helix maps pretty well with the components in Hama.
>> > HelixController can be wrapped inside BSPMaster and GroomServer is the
>> > PARTICIPANT in Helix terminology that wraps Helix Agent.
>> >
>> > The partitioning and assigning tasks to GroomServers can be done via
>> Helix
>> > Apis, it basically boils down to setting the idealstate for a particular
>> > stage. Starting of the next step which basically depends on all tasks in
>> > previous step being completed can be done by watching the ExternalView.
>> >
>> > In the architecture wiki, I see that there is plan to integrate with
>> > Zookeeper for fault tolerance. Helix internally uses Zookeeper to store
>> the
>> > cluster state. So it might make it easier to make the tasks fault
>> tolerant
>> > and probably restartable as well at a task level instead of job/stage
>> level.
>> >
>> > We recently added a recipe in Helix to demonstrate the concept of
>> > dependency between resources.
>> >
>> > http://helix.incubator.apache.org/recipes/task_dag_execution.html
>> > Code:
>> >
>> https://github.com/apache/incubator-helix/tree/master/recipes/task-execution/src/main/java/org/apache/helix/taskexecution
>> >
>> > Let me know your thoughts.
>> >
>> > thanks,
>> > Kishore G
>>
>>
>>
>> --
>> Best Regards, Edward J. Yoon
>> @eddieyoon
>>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Mime
View raw message