giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yuanyuan Tian <>
Subject Re: about fault tolerance in Giraph
Date Mon, 18 Mar 2013 17:22:28 GMT
Can anyone help me answer the question?


From:   Yuanyuan Tian/Almaden/IBM@IBMUS
Date:   03/15/2013 02:05 PM
Subject:        about fault tolerance in Giraph


I was testing the fault tolerance of Giraph on a long running job. I 
noticed that when one of the worker throw an exception, the whole job 
failed without retrying the task, even though I turned on the 
checkpointing and there were available map slots in my cluster. Why wasn't 
the fault tolerance mechanism working? 

I was running a version of Giraph downloaded sometime in June 2012 and I 
used Netty for the communication layer. 



View raw message