giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: about fault tolerance in Giraph
Date Mon, 18 Mar 2013 21:09:24 GMT
Hi Yuanyuan,

We haven't tested this feature in a while.  But it should work. What did 
the job report about why it failed?

Avery

On 3/18/13 10:22 AM, Yuanyuan Tian wrote:
> Can anyone help me answer the question?
>
> Yuanyuan
>
>
>
> From: Yuanyuan Tian/Almaden/IBM@IBMUS
> To: user@giraph.apache.org
> Date: 03/15/2013 02:05 PM
> Subject: about fault tolerance in Giraph
> ------------------------------------------------------------------------
>
>
>
> Hi
>
> I was testing the fault tolerance of Giraph on a long running job. I 
> noticed that when one of the worker throw an exception, the whole job 
> failed without retrying the task, even though I turned on the 
> checkpointing and there were available map slots in my cluster. Why 
> wasn't the fault tolerance mechanism working?
>
> I was running a version of Giraph downloaded sometime in June 2012 and 
> I used Netty for the communication layer.
>
> Thanks,
>
> Yuanyuan


Mime
View raw message