giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vishal Patel <write2vis...@gmail.com>
Subject Re: Giraph Job "Task attempt_* failed to report status" Problem
Date Wed, 22 Aug 2012 16:33:15 GMT
After several supersteps, sometimes a worker thread dies (say it ran out of
memory). Zookeeper waits for ~5 mins (600 seconds) and then decides that
the worker is not responsive and fails the entire job. At this point if you
have a checkpoint saved it will resume from there otherwise you have to
start from scratch.

If you run the job again it should successfully finish (or it might error
at some other superstep / worker combination).

Vishal



On Tue, Aug 21, 2012 at 10:12 PM, Amani Alonazi
<amani.alonazi@kaust.edu.sa>wrote:

> Hi all,
>
> I'm running a minimum spanning tree compute function on Hadoop cluster (20
> machines). After certain supersteps (e.g. superstep 47 for a graph of
> 4,194,304 vertices and 181,566,970 edges), the execution time increased
> dramatically. This is not the only problem, the job has been killed "Task
> attempt_* failed to report status for 601 seconds. Killing! "
>
> I disabled the checkpoint feature by setting the
> "CHECKPOINT_FREQUENCY_DEFAULT = 0" in GiraphJob.java. I don't need to write
> any data to disk neither snapshots nor output. I tested the algorithm on
> sample graph of 7 vertices and it works well.
>
> Is there any way to profile or debug Giraph job?
> In the Giraph Stats the "Aggregate finished vertices" counter is it for
> the vertices which voted to halt? Also the "sent messages" counter, is it
> per each superstep or the total msgs?
> If a vertex vote to halt, will it be activated upon receiving messages?
>
> Thanks a lot!
>
> Best,
> Amani AlOnazi
> MSc Computer Science
> King Abdullah University of Science and Technology
> Kingdom of Saudi Arabia
>
>
> ------------------------------
> This message and its contents, including attachments are intended solely
> for the original recipient. If you are not the intended recipient or have
> received this message in error, please notify me immediately and delete
> this message from your computer system. Any unauthorized use or
> distribution is prohibited. Please consider the environment before printing
> this email.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message