giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vincentius Martin <>
Subject Failover Mechanism in Giraph?
Date Thu, 16 Oct 2014 17:37:10 GMT

Recently, I tried to learn Giraph by running RandomMessageBenchmark.

In normal condition, it works just fine. However, when I tried running it
with a slow node in the system, the work just didn't finish. The progress
just went down after it reached 100% map task. After that, it showed me
some errors log like this:

*INFO mapred.JobClient: Task Id : attempt_201410101016_0003_m_*
*000004_0, Status : FAILEDTask attempt_201410101016_0003_m_**000004_0
failed to report status for 600 seconds. Killing!*

So, I'm curious about how failover mechanism works in Giraph? I believe
that it uses checkpoint but I don't know the detail.

Also, I read the source It states that Giraph doesn't use
speculative execution, so what happened when a node in a cluster is
problematic? Does hadoop also redistribute the task to some other workers?


Vincentius Martin

View raw message