giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Reisman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-274) Jobs still failing due to tasks timeout during INPUT_SUPERSTEP
Date Wed, 01 Aug 2012 17:59:03 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13426782#comment-13426782
] 

Eli Reisman commented on GIRAPH-274:
------------------------------------

Thanks for noticing! The patch I had up called progress in more places than just around the
locks. I have been running large amounts of data all summer at it takes forever to load. I
know it polluted the landscape with progress() calls, but the alternative was another thread
as Avery said here and that seemed like a worse idea AND allowed for zombies to continue when
they had failed for all intents and purposes. When users played with this idea, our cluster
were occasionally littered with zombies that had been forgotten about by users when the job
seemed to fail. So...

The patch I arrived at in 246 worked fine and only hit a 600 second timeout when the job was
actually catastrophically failed at a particular worker. If you look through it and add the
progress calls your lock patch did not, it will work. I was able to spend up to 60+ min loading
huge social graph data with no trouble, and finishing jobs. Obviously the next step is to
lower that time, but progress() calls are a must. If you grab those calls, I guarantee it
will work for now as long as you need it to. Its been a while, but I'm fairly sure I didn't
give anyone access to context who didn't already have it also.

Good luck, thanks for addressing this, 246 would no longer patch in and I was not able to
run any large data for a week now, this fix will be welcome!

                
> Jobs still failing due to tasks timeout during INPUT_SUPERSTEP
> --------------------------------------------------------------
>
>                 Key: GIRAPH-274
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-274
>             Project: Giraph
>          Issue Type: Bug
>    Affects Versions: 0.2.0
>            Reporter: Jaeho Shin
>            Assignee: Jaeho Shin
>             Fix For: 0.2.0
>
>         Attachments: GIRAPH-274.patch
>
>
> Even after GIRAPH-267, jobs were failing during INPUT_SUPERSTEP when some workers don't
get to reserve an input split, while others were loading vertices for a long time.  (related
to GIRAPH-246 and GIRAPH-267)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message