giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-972) Race condition in checkpointing
Date Fri, 19 Dec 2014 00:17:14 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252615#comment-14252615
] 

Hudson commented on GIRAPH-972:
-------------------------------

ABORTED: Integrated in Giraph-trunk-Commit #1507 (See [https://builds.apache.org/job/Giraph-trunk-Commit/1507/])
GIRAPH-972 Race condition in checkpointing (edunov: http://git-wip-us.apache.org/repos/asf?p=giraph.git&a=commit&h=7f2d58445e2353a1a42fbb4282ed5cad724186b5)
* giraph-core/src/main/java/org/apache/giraph/master/BspServiceMaster.java
* CHANGELOG
* giraph-core/src/main/java/org/apache/giraph/worker/BspServiceWorker.java
* giraph-core/src/main/java/org/apache/giraph/bsp/BspService.java


> Race condition in checkpointing
> -------------------------------
>
>                 Key: GIRAPH-972
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-972
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: Sergey Edunov
>
> Couple of issues noticed with checkpointing of large jobs:
> 1) Task ID of master appears to be important. In most cases it is 0, however sometimes
it is not and as we can not control it checkpointing should not depend on it.
> 2) Race condition happens on master when worker dies:
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /_hadoopBsp/job_201411061513.38895_0001/_applicationAttemptsDir/0/_superstepDir/9/_workerHealthyDir/hadoop4921.prn2.facebook.com_3
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
> 	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1180)
> 	at org.apache.giraph.zk.ZooKeeperExt.getData(ZooKeeperExt.java:470)
> 	at org.apache.giraph.utils.WritableUtils.readFieldsFromZnode(WritableUtils.java:126)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message