incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ChiaHung Lin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HAMA-387) Advanced Barrier Synchronization
Date Sat, 24 Sep 2011 15:55:27 GMT

     [ https://issues.apache.org/jira/browse/HAMA-387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

ChiaHung Lin updated HAMA-387:
------------------------------

    Attachment: conditional_wait.patch

The new patch may solve the following issues, and one that perhaps the root cause in which
a task attaching the watcher for monitoring /ready may not be notified because of unconditional
wait. Can anyone help test if sync() still hang with this patch? 

{code}
2011-09-24 15:44:33,644 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241540_0001_000005_0
11/09/24 15:44:33 ERROR bsp.BSPTask: Exception during BSP execution!
2011-09-24 15:44:33,644 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241540_0001_000005_0
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /bsp/job_201109241540_0001/4/attempt_201109241540_0001_000005_0
2011-09-24 15:44:33,644 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241540_0001_000005_0
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
2011-09-24 15:44:33,644 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241540_0001_000005_0
        at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
2011-09-24 15:44:33,644 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241540_0001_000005_0
        at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728)
2011-09-24 15:44:33,645 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241540_0001_000005_0
        at org.apache.hama.bsp.BSPPeer.leaveBarrier(BSPPeer.java:437)
2011-09-24 15:44:33,645 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241540_0001_000005_0
        at org.apache.hama.bsp.BSPPeer.sync(BSPPeer.java:335)
2011-09-24 15:44:33,645 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241540_0001_000005_0
        at org.apache.hama.examples.PiEstimator$MyEstimator.bsp(PiEstimator.java:80)
2011-09-24 15:44:33,645 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241540_0001_000005_0
        at org.apache.hama.bsp.BSPTask.run(BSPTask.java:60)
2011-09-24 15:44:33,645 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241540_0001_000005_0
        at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:940)
2011-09-24 15:44:33,657 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241540_0001_000005_0
11/09/24 15:44:33 INFO zookeeper.ZooKeeper: Session: 0x3329a6008840001 closed
2011-09-24 15:44:33,657 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241540_0001_000005_0
11/09/24 15:44:33 INFO ipc.Server: Stopping server on 61002
2011-09-24 15:44:33,657 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241540_0001_000005_0
log4j:WARN No appenders could be found for logger (org.apache.hadoop.ipc.Server).
2011-09-24 15:44:33,657 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241540_0001_000005_0
log4j:WARN Please initialize the log4j system properly.
2011-09-24 15:44:33,657 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241540_0001_000005_0
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
{code}

{code}
2011-09-24 16:29:09,521 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241626_0001_000011_0
11/09/24 16:29:09 WARN bsp.BSPPeer: Ignore because znode may be already created at /bsp/job_201109241626_0001/0
2011-09-24 16:29:09,522 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241626_0001_000011_0
org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for
/bsp/job_201109241626_0001/0
2011-09-24 16:29:09,522 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241626_0001_000011_0
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:110)
2011-09-24 16:29:09,522 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241626_0001_000011_0
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
2011-09-24 16:29:09,522 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241626_0001_000011_0
	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
2011-09-24 16:29:09,522 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241626_0001_000011_0
	at org.apache.hama.bsp.BSPPeer.createZnode(BSPPeer.java:367)
2011-09-24 16:29:09,524 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241626_0001_000011_0
	at org.apache.hama.bsp.BSPPeer.createZnode(BSPPeer.java:354)
2011-09-24 16:29:09,524 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241626_0001_000011_0
	at org.apache.hama.bsp.BSPPeer.enterBarrier(BSPPeer.java:388)
2011-09-24 16:29:09,524 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241626_0001_000011_0
	at org.apache.hama.bsp.BSPPeer.sync(BSPPeer.java:308)
2011-09-24 16:29:09,524 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241626_0001_000011_0
	at org.apache.hama.examples.PiEstimator$MyEstimator.bsp(PiEstimator.java:66)
2011-09-24 16:29:09,524 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241626_0001_000011_0
	at org.apache.hama.bsp.BSPTask.run(BSPTask.java:60)
2011-09-24 16:29:09,524 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241626_0001_000011_0
	at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:940)
2011-09-24 16:29:09,673 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241626_0001_000011_0
11/09/24 16:29:09 ERROR bsp.BSPTask: Exception during BSP execution!
2011-09-24 16:29:09,673 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241626_0001_000011_0
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /bsp/job_201109241626_0001/0
2011-09-24 16:29:09,673 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241626_0001_000011_0
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:102)
2011-09-24 16:29:09,674 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241626_0001_000011_0
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
2011-09-24 16:29:09,674 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241626_0001_000011_0
	at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1243)
2011-09-24 16:29:09,674 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241626_0001_000011_0
	at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1271)
2011-09-24 16:29:09,674 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241626_0001_000011_0
	at org.apache.hama.bsp.BSPPeer.enterBarrier(BSPPeer.java:411)
2011-09-24 16:29:09,674 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241626_0001_000011_0
	at org.apache.hama.bsp.BSPPeer.sync(BSPPeer.java:308)
2011-09-24 16:29:09,674 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241626_0001_000011_0
	at org.apache.hama.examples.PiEstimator$MyEstimator.bsp(PiEstimator.java:66)
2011-09-24 16:29:09,674 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241626_0001_000011_0
	at org.apache.hama.bsp.BSPTask.run(BSPTask.java:60)
2011-09-24 16:29:09,674 INFO org.apache.hama.bsp.TaskRunner: attempt_201109241626_0001_000011_0
	at org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:940)
{code}

Test result output: 
{code}
$ hama jar hama-examples-0.4.0-incubating-SNAPSHOT.jar pi
11/09/24 22:49:24 INFO bsp.BSPJobClient: Running job: job_201109242248_0001
11/09/24 22:49:27 INFO bsp.BSPJobClient: Current supersteps number: 0
...
11/09/24 22:57:02 INFO bsp.BSPJobClient: Current supersteps number: 101
11/09/24 22:57:10 INFO bsp.BSPJobClient: The total number of supersteps: 101
Estimated value of PI is 3.1428666666666665
Job Finished in 472.434 seconds
$ hama jar hama-examples-0.4.0-incubating-SNAPSHOT.jar pi
11/09/24 22:57:20 INFO bsp.BSPJobClient: Running job: job_201109242248_0002
11/09/24 22:57:23 INFO bsp.BSPJobClient: Current supersteps number: 0
...
11/09/24 23:03:12 INFO bsp.BSPJobClient: Current supersteps number: 101
11/09/24 23:03:25 INFO bsp.BSPJobClient: The total number of supersteps: 101
Estimated value of PI is 3.1447999999999996
Job Finished in 368.786 seconds
$ hama jar hama-examples-0.4.0-incubating-SNAPSHOT.jar pi
11/09/24 23:04:27 INFO bsp.BSPJobClient: Running job: job_201109242248_0003
11/09/24 23:04:30 INFO bsp.BSPJobClient: Current supersteps number: 0
...
1/09/24 23:10:50 INFO bsp.BSPJobClient: Current supersteps number: 101
11/09/24 23:11:02 INFO bsp.BSPJobClient: The total number of supersteps: 101
Estimated value of PI is 3.144633333333333
Job Finished in 398.859 seconds
$ hama jar hama-examples-0.4.0-incubating-SNAPSHOT.jar pi
11/09/24 23:15:26 INFO bsp.BSPJobClient: Running job: job_201109242248_0004
11/09/24 23:15:29 INFO bsp.BSPJobClient: Current supersteps number: 0
...
11/09/24 23:20:40 INFO bsp.BSPJobClient: Current supersteps number: 101
11/09/24 23:20:50 INFO bsp.BSPJobClient: The total number of supersteps: 101
Estimated value of PI is 3.1455999999999995
Job Finished in 331.478 seconds
$ hama jar hama-examples-0.4.0-incubating-SNAPSHOT.jar pi
11/09/24 23:21:03 INFO bsp.BSPJobClient: Running job: job_201109242248_0005
11/09/24 23:21:06 INFO bsp.BSPJobClient: Current supersteps number: 0
...
11/09/24 23:26:41 INFO bsp.BSPJobClient: Current supersteps number: 101
11/09/24 23:26:48 INFO bsp.BSPJobClient: The total number of supersteps: 101
Estimated value of PI is 3.1420333333333335
Job Finished in 350.252 seconds
$ hama jar hama-examples-0.4.0-incubating-SNAPSHOT.jar pi
11/09/24 23:27:02 INFO bsp.BSPJobClient: Running job: job_201109242248_0006
11/09/24 23:27:05 INFO bsp.BSPJobClient: Current supersteps number: 0
...
11/09/24 23:32:36 INFO bsp.BSPJobClient: Current supersteps number: 101
11/09/24 23:32:48 INFO bsp.BSPJobClient: The total number of supersteps: 101
Estimated value of PI is 3.1422000000000003
Job Finished in 352.089 seconds
$ hama jar hama-examples-0.4.0-incubating-SNAPSHOT.jar pi
11/09/24 23:35:35 INFO bsp.BSPJobClient: Running job: job_201109242248_0007
11/09/24 23:35:38 INFO bsp.BSPJobClient: Current supersteps number: 0
...
11/09/24 23:41:00 INFO bsp.BSPJobClient: Current supersteps number: 101
11/09/24 23:41:07 INFO bsp.BSPJobClient: The total number of supersteps: 101
Estimated value of PI is 3.139433333333333
Job Finished in 336.643 seconds
{code}


> Advanced Barrier Synchronization
> --------------------------------
>
>                 Key: HAMA-387
>                 URL: https://issues.apache.org/jira/browse/HAMA-387
>             Project: Hama
>          Issue Type: Improvement
>          Components: bsp
>    Affects Versions: 0.3.0
>            Reporter: Edward J. Yoon
>            Assignee: ChiaHung Lin
>             Fix For: 0.4.0
>
>         Attachments: HAMA-387.patch, HAMA-387_v02.patch, HAMA-387_v03.patch, HAMA-387_v04.patch,
conditional_wait.patch, doublebarrier.patch, new.patch, ownSyncService.patch, ownSyncService_v2.patch,
ownSyncService_v3.patch, sleepless.patch, x.PNG, x.patch
>
>
> I think, the lock file must include:
>  * the job ID
>  * the task ID of the lock file owner
>  * the current superstep count
> to check ownership and validation.
> Currently they are named by hostname, but multi-tasks can be run per one groomserver
in the future. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message