incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: [Hama Wiki] Update of "GroomServerFaultTolerance" by ChiaHungLin
Date Tue, 05 Apr 2011 13:24:56 GMT
+1

Once architecture is done, you can break down the work into small
tasks, and you can also lead a whole part of FT system.

On Tue, Apr 5, 2011 at 8:08 PM, Apache Wiki <wikidiffs@apache.org> wrote:
> Dear Wiki user,
>
> You have subscribed to a wiki page or wiki category on "Hama Wiki" for change notification.
>
> The "GroomServerFaultTolerance" page has been changed by ChiaHungLin.
> http://wiki.apache.org/hama/GroomServerFaultTolerance?action=diff&rev1=2&rev2=3
>
> --------------------------------------------------
>
>
>  === Architecture ===
>
> + * NodeMaanger embedded in the GroomServer periodically sends heartbeat to NodeMonitor
in BSPMaster. // Can't attach diagram
> +
> + * One of GroomServers fails, indicating BSPMaster loses heartbeat from a particular
GroomServer. // Can't attach diagram
> +
> + * NodeMonitor collects metrics information, including CPU, memory, tasks, etc., from
healthy NodeManagers. // Can't attach diagram
> +
> + * Dispatch task(s) to GroomServer(s). // Can't attach diagram
> +
> + 1. NodeMonitor notifies TaskScheduler the failure of GroomServers; and move failure
GroomServer to black list (will move back when the failed GroomServer restarts).
> +
> + 2. TaskScheduler searches node list looking for GroomServer(s) whose workload is not
heavy (which GroomServer to go is corresponded to policy).
> +
> + 3. Update task(s) JobInProgress by assigning failed tasks to the GroomServer found
in previous step.
> +
> + 4. Dispatch task(s) to designed GroomServer(s).
> +
> +
> +
> +
> +
> +
>
>
>  === Glossary ===
>
> - NodeManager
> + NodeMonitor: a component monitors the healthy of GroomServers.
>
> - Failure Detector
> + NodeManager: a component that collects metrics information whilst NodeMonitor requests
to report status of the GroomServer it runs on.
>
> - Supervisor behaviour
>
>  === References ===
>  [1]. Hadoop. http://hadoop.apache.org/
>



-- 
Best Regards, Edward J. Yoon
http://blog.udanax.org
http://twitter.com/eddieyoon

Mime
View raw message