giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vishal Patel <write2vis...@gmail.com>
Subject Re: Giraph Job "Task attempt_* failed to report status" Problem
Date Thu, 23 Aug 2012 15:25:26 GMT
As I said, failures on specific supersteps *might* happen, but its not
necessary.

Did you run the minimum spanning tree job again? Did it finish
successfully?

On a different note, what do you mean by "submitted a job of 90
supersteps"? I don't think you can specify the number of supersteps-- that
number is determined by the total number of iterations required before all
vertices vote to halt. That's not something you can specify..



On Thu, Aug 23, 2012 at 7:58 AM, Amani Alonazi
<amani.alonazi@kaust.edu.sa>wrote:

> Thank you Vishal.
>
> But I submitted a PageRank job of 90 supersteps, 20 workers, 4,000,000
> vertices and 30 edges per vertex. The job completed successfully. I'm
> really confused.
>
> On Wed, Aug 22, 2012 at 7:33 PM, Vishal Patel <write2vishal@gmail.com>wrote:
>
>> After several supersteps, sometimes a worker thread dies (say it ran out
>> of memory). Zookeeper waits for ~5 mins (600 seconds) and then decides that
>> the worker is not responsive and fails the entire job. At this point if you
>> have a checkpoint saved it will resume from there otherwise you have to
>> start from scratch.
>>
>> If you run the job again it should successfully finish (or it might error
>> at some other superstep / worker combination).
>>
>> Vishal
>>
>>
>>
>> On Tue, Aug 21, 2012 at 10:12 PM, Amani Alonazi <
>> amani.alonazi@kaust.edu.sa> wrote:
>>
>>> Hi all,
>>>
>>> I'm running a minimum spanning tree compute function on Hadoop cluster
>>> (20 machines). After certain supersteps (e.g. superstep 47 for a graph of
>>> 4,194,304 vertices and 181,566,970 edges), the execution time increased
>>> dramatically. This is not the only problem, the job has been killed "Task
>>> attempt_* failed to report status for 601 seconds. Killing! "
>>>
>>> I disabled the checkpoint feature by setting the
>>> "CHECKPOINT_FREQUENCY_DEFAULT = 0" in GiraphJob.java. I don't need to write
>>> any data to disk neither snapshots nor output. I tested the algorithm on
>>> sample graph of 7 vertices and it works well.
>>>
>>> Is there any way to profile or debug Giraph job?
>>> In the Giraph Stats the "Aggregate finished vertices" counter is it for
>>> the vertices which voted to halt? Also the "sent messages" counter, is it
>>> per each superstep or the total msgs?
>>> If a vertex vote to halt, will it be activated upon receiving messages?
>>>
>>> Thanks a lot!
>>>
>>> Best,
>>> Amani AlOnazi
>>> MSc Computer Science
>>> King Abdullah University of Science and Technology
>>> Kingdom of Saudi Arabia
>>>
>>>
>>> ------------------------------
>>> This message and its contents, including attachments are intended solely
>>> for the original recipient. If you are not the intended recipient or have
>>> received this message in error, please notify me immediately and delete
>>> this message from your computer system. Any unauthorized use or
>>> distribution is prohibited. Please consider the environment before printing
>>> this email.
>>
>>
>>
>
>
> --
> Amani AlOnazi
> MSc Computer Science
>  King Abdullah University of Science and Technology
> Kingdom of Saudi Arabia
> amani.alonazi@kaust.edu.sa | +966 (0) 555 191 795
>
>
> ------------------------------
> This message and its contents, including attachments are intended solely
> for the original recipient. If you are not the intended recipient or have
> received this message in error, please notify me immediately and delete
> this message from your computer system. Any unauthorized use or
> distribution is prohibited. Please consider the environment before printing
> this email.
>

Mime
View raw message