hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@gmail.com>
Subject Re: Build failed in Jenkins: Hama-Nightly #645
Date Sat, 18 Aug 2012 15:05:58 GMT
Okay still no idea.

I know that this happens when the master tasks sends a message with the
updated aggregator values to every slave. (line 239). Then this only
message should be consumed arround line 246 cc.

But it still remains in the buffer and will be consumed after all
computation in line 209 in the parseMessages.
Even clearing the buffer is not fixing it.

I should rewrite this whole crapyard to the new superstep API. -.-

2012/8/18 Thomas Jungblut <thomas.jungblut@gmail.com>

> Okay two possible problems:
> - It changes the host because of fault tolerance (contra arguments: its
> turned off and the port is smaller than the other one)
> - Messaging is broken (would also explain why pagerank does not converge
> anymore)
>
> Can anyone else reproduce this scenario?
>
> 2012/8/18 Apache Jenkins Server <jenkins@builds.apache.org>
>
>> See <https://builds.apache.org/job/Hama-Nightly/645/changes>
>>
>> Changes:
>>
>> [tjungblut] add missing class
>>
>> ------------------------------------------
>> [...truncated 4232 lines...]
>>
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________ 0 / hama.2;0 / 4
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________+  true
>> janus.apache.org:61002
>>
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________++ VAL=0.0
>>
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________ 1 / hama.2;1 / 4
>>
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________+  true
>> janus.apache.org:61002
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________++ VAL=0.0
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________ 0 / hama.2;0 / 7
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________+  true
>> janus.apache.org:61002
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________++
>> VAL=0.4572019638123739
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________ 1 / hama.2;1 / 7
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________+  true
>> janus.apache.org:61002
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________++
>> VAL=0.44247448197562855
>>
>> 12/08/17 23:05:52 INFO server.PrepRequestProcessor: Got user-level
>> KeeperException when processing sessionid:0x13936d5d8cc0002 type:create
>> cxid:0x3de zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error
>> Path:/bsp/job_201208172305_0001/sync/51 Error:KeeperErrorCode = NodeExists
>> for /bsp/job_201208172305_0001/sync/51
>>
>> 12/08/17 23:05:52 INFO server.PrepRequestProcessor: Got user-level
>> KeeperException when processing sessionid:0x13936d5d8cc0002 type:create
>> cxid:0x3e8 zxid:0xfffffffffffffffe txntype:unknown reqpath:n/a Error
>> Path:/bsp/job_201208172305_0001/sync/51/ready Error:KeeperErrorCode =
>> NodeExists for /bsp/job_201208172305_0001/sync/51/ready
>>
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________ 0 / hama.2;0 / 4
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________+  true
>> janus.apache.org:61002
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________++ VAL=0.0
>>
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________ 1 / hama.2;1 / 4
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________+  true
>> janus.apache.org:61002
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________++ VAL=0.0
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________ 0 / hama.2;0 / 7
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________+  true
>> janus.apache.org:61002
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________++
>> VAL=0.457610574551534
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________ 1 / hama.2;1 / 7
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________+  true
>> janus.apache.org:61002
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________++
>> VAL=0.2675231554874198
>>
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________ 0 / hama.2;0 / 11
>> 12/08/17 23:05:52 INFO graph.GraphJobRunner: _________+ NULL! false
>> janus.apache.org:61001
>>
>> 12/08/17 23:05:52 ERROR bsp.BSPTask: Error running bsp setup and bsp
>> function.
>> java.lang.NullPointerException
>>         at
>> org.apache.hama.graph.GraphJobRunner.parseMessages(GraphJobRunner.java:373)
>>         at
>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:209)
>>         at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:166)
>>         at org.apache.hama.bsp.BSPTask.run(BSPTask.java:143)
>>         at
>> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1271)
>>
>> 12/08/17 23:05:52 INFO server.PrepRequestProcessor: Processed session
>> termination for sessionid: 0x13936d5d8cc0002
>>
>> 12/08/17 23:05:52 INFO zookeeper.ZooKeeper: Session: 0x13936d5d8cc0002
>> closed
>>
>> 12/08/17 23:05:52 INFO zookeeper.ClientCnxn: EventThread shut down
>>
>> 12/08/17 23:05:52 INFO server.NIOServerCnxn: Closed socket connection for
>> client /127.0.0.1:54114 which had sessionid 0x13936d5d8cc0002
>>
>> 12/08/17 23:05:52 INFO ipc.NettyServer: [id: 0x001cb7a1, /
>> 67.195.138.60:58913 :> /67.195.138.60:61001] DISCONNECTED
>>
>> 12/08/17 23:05:52 INFO ipc.NettyTransceiver: [id: 0x0136d9d8, /
>> 67.195.138.60:58913 :> janus.apache.org/67.195.138.60:61001] DISCONNECTED
>>
>> 12/08/17 23:05:52 INFO ipc.NettyTransceiver: [id: 0x0136d9d8, /
>> 67.195.138.60:58913 :> janus.apache.org/67.195.138.60:61001] UNBOUND
>>
>> 12/08/17 23:05:52 INFO ipc.NettyServer: [id: 0x001cb7a1, /
>> 67.195.138.60:58913 :> /67.195.138.60:61001] UNBOUND
>>
>> 12/08/17 23:05:52 INFO ipc.NettyTransceiver: [id: 0x0136d9d8, /
>> 67.195.138.60:58913 :> janus.apache.org/67.195.138.60:61001] CLOSED
>>
>> 12/08/17 23:05:52 INFO ipc.NettyServer: [id: 0x001cb7a1, /
>> 67.195.138.60:58913 :> /67.195.138.60:61001] CLOSED
>>
>> 12/08/17 23:05:52 INFO ipc.NettyTransceiver: Remote peer
>> janus.apache.org/67.195.138.60:61001 closed connection.
>>
>> 12/08/17 23:05:52 INFO ipc.NettyServer: [id: 0x01bf7b23, /
>> 67.195.138.60:58915 :> /67.195.138.60:61001] DISCONNECTED
>>
>> 12/08/17 23:05:52 INFO ipc.NettyTransceiver: Disconnecting from
>> janus.apache.org/67.195.138.60:61001
>>
>> 12/08/17 23:05:52 INFO ipc.NettyServer: [id: 0x01bf7b23, /
>> 67.195.138.60:58915 :> /67.195.138.60:61001] UNBOUND
>>
>> 12/08/17 23:05:52 INFO ipc.NettyTransceiver: [id: 0x002c17f7, /
>> 67.195.138.60:58915 :> janus.apache.org/67.195.138.60:61001] DISCONNECTED
>>
>> 12/08/17 23:05:52 INFO ipc.NettyServer: [id: 0x01bf7b23, /
>> 67.195.138.60:58915 :> /67.195.138.60:61001] CLOSED
>>
>> 12/08/17 23:05:52 INFO ipc.NettyTransceiver: [id: 0x002c17f7, /
>> 67.195.138.60:58915 :> janus.apache.org/67.195.138.60:61001] UNBOUND
>> 12/08/17 23:05:52 INFO ipc.NettyTransceiver: [id: 0x002c17f7, /
>> 67.195.138.60:58915 :> janus.apache.org/67.195.138.60:61001] CLOSED
>> 12/08/17 23:05:52 INFO ipc.NettyTransceiver: Remote peer
>> janus.apache.org/67.195.138.60:61001 closed connection.
>> 12/08/17 23:05:52 INFO ipc.NettyTransceiver: Disconnecting from
>> janus.apache.org/67.195.138.60:61001
>>
>> 12/08/17 23:05:52 ERROR bsp.BSPTask: Shutting down ping service.
>>
>> 12/08/17 23:05:52 FATAL bsp.GroomServer: Error running child
>> java.lang.NullPointerException
>>         at
>> org.apache.hama.graph.GraphJobRunner.parseMessages(GraphJobRunner.java:373)
>>         at
>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:209)
>>         at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:166)
>>         at org.apache.hama.bsp.BSPTask.run(BSPTask.java:143)
>>         at
>> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1271)
>>
>> java.lang.NullPointerException
>>         at
>> org.apache.hama.graph.GraphJobRunner.parseMessages(GraphJobRunner.java:373)
>>         at
>> org.apache.hama.graph.GraphJobRunner.bsp(GraphJobRunner.java:209)
>>         at org.apache.hama.bsp.BSPTask.runBSP(BSPTask.java:166)
>>         at org.apache.hama.bsp.BSPTask.run(BSPTask.java:143)
>>         at
>> org.apache.hama.bsp.GroomServer$BSPPeerChild.main(GroomServer.java:1271)
>> 12/08/17 23:05:53 INFO bsp.BSPJobClient: Current supersteps number: 0
>>
>> 12/08/17 23:05:56 INFO bsp.BSPJobClient: Current supersteps number: 52
>>
>> 12/08/17 23:06:00 INFO bsp.GroomServer: adding purge task:
>> attempt_201208172305_0001_000001_0
>>
>> 12/08/17 23:06:00 INFO bsp.GroomServer: About to purge task:
>> attempt_201208172305_0001_000001_0
>>
>> 12/08/17 23:06:00 INFO ipc.NettyServer: [id: 0x00b6d6ab, /
>> 67.195.138.60:46475 :> /67.195.138.60:61002] DISCONNECTED
>>
>> 12/08/17 23:06:00 INFO ipc.NettyServer: [id: 0x00b6d6ab, /
>> 67.195.138.60:46475 :> /67.195.138.60:61002] UNBOUND
>>
>> 12/08/17 23:06:00 INFO ipc.NettyServer: [id: 0x00b6d6ab, /
>> 67.195.138.60:46475 :> /67.195.138.60:61002] CLOSED
>>
>> 12/08/17 23:06:01 INFO bsp.JobInProgress: Taskid
>> 'attempt_201208172305_0001_000001_0' has failed.
>>
>> 12/08/17 23:06:01 INFO bsp.TaskInProgress: Task
>> 'task_201208172305_0001_000001' has failed.
>>
>> 12/08/17 23:06:01 INFO bsp.JobInProgress: Job failed.
>>
>> 12/08/17 23:06:01 INFO bsp.GroomServer: Kill 1 tasks.
>>
>> 12/08/17 23:06:01 INFO bsp.GroomServer: Kill 1 tasks.
>>
>> 12/08/17 23:06:01 WARN server.NIOServerCnxn: EndOfStreamException: Unable
>> to read additional data from client sessionid 0x13936d5d8cc0003, likely
>> client has closed socket
>>
>> 12/08/17 23:06:01 INFO server.NIOServerCnxn: Closed socket connection for
>> client /0:0:0:0:0:0:0:1:37318 which had sessionid 0x13936d5d8cc0003
>>
>> 12/08/17 23:06:02 WARN mortbay.log: /tasklog: java.io.IOException: Closed
>>
>> 12/08/17 23:06:02 WARN bsp.BSPJobClient: Error reading task outputhttp://
>> janus.apache.org:40015/tasklog?plaintext=true&taskid=attempt_201208172305_0001_000001_0&filter=stdout
>>
>> 12/08/17 23:06:02 INFO bsp.BSPJobClient: Job failed.
>>
>> 12/08/17 23:06:02 INFO server.PrepRequestProcessor: Processed session
>> termination for sessionid: 0x13936d5d8cc0000
>>
>> 12/08/17 23:06:02 INFO zookeeper.ZooKeeper: Session: 0x13936d5d8cc0000
>> closed
>>
>> 12/08/17 23:06:02 INFO zookeeper.ClientCnxn: EventThread shut down
>>
>> 12/08/17 23:06:02 INFO ipc.Server: Stopping server on 40000
>>
>> 12/08/17 23:06:02 INFO server.NIOServerCnxn: Closed socket connection for
>> client /127.0.0.1:54084 which had sessionid 0x13936d5d8cc0000
>>
>> 12/08/17 23:06:02 INFO ipc.Server: IPC Server handler 0 on 40000: exiting
>>
>> 12/08/17 23:06:02 INFO ipc.Server: Stopping IPC Server listener on 40000
>>
>> 12/08/17 23:06:02 INFO metrics.RpcInstrumentation: shut down
>>
>> 12/08/17 23:06:02 INFO bsp.BSPMaster: Stopped RPC Master server.
>>
>> 12/08/17 23:06:02 INFO ipc.Server: Stopping IPC Server Responder
>>
>> 12/08/17 23:06:02 INFO server.PrepRequestProcessor: Processed session
>> termination for sessionid: 0x13936d5d8cc0001
>>
>> 12/08/17 23:06:02 INFO server.NIOServerCnxn: Closed socket connection for
>> client /127.0.0.1:54088 which had sessionid 0x13936d5d8cc0001
>>
>> 12/08/17 23:06:02 INFO zookeeper.ClientCnxn: EventThread shut down
>>
>> 12/08/17 23:06:02 INFO zookeeper.ZooKeeper: Session: 0x13936d5d8cc0001
>> closed
>>
>> 12/08/17 23:06:02 INFO ipc.Server: Stopping server on 40440
>>
>> 12/08/17 23:06:02 INFO ipc.Server: Stopping IPC Server listener on 40440
>>
>> 12/08/17 23:06:02 INFO metrics.RpcInstrumentation: shut down
>>
>> 12/08/17 23:06:02 INFO ipc.Server: Stopping IPC Server Responder
>>
>> 12/08/17 23:06:02 INFO ipc.Server: Stopping server on 48980
>>
>> 12/08/17 23:06:02 INFO ipc.Server: IPC Server handler 0 on 40440: exiting
>>
>> 12/08/17 23:06:02 INFO ipc.Server: IPC Server handler 2 on 48980: exiting
>>
>> 12/08/17 23:06:02 INFO ipc.Server: IPC Server handler 1 on 48980: exiting
>>
>> 12/08/17 23:06:02 INFO ipc.Server: IPC Server handler 0 on 48980: exiting
>>
>> 12/08/17 23:06:02 INFO ipc.Server: Stopping IPC Server Responder
>>
>> 12/08/17 23:06:02 INFO ipc.Server: Stopping IPC Server listener on 48980
>>
>> 12/08/17 23:06:02 INFO ipc.Server: IPC Server handler 5 on 48980: exiting
>>
>> 12/08/17 23:06:02 INFO metrics.RpcInstrumentation: shut down
>>
>> 12/08/17 23:06:02 INFO ipc.Server: IPC Server handler 7 on 48980: exiting
>>
>> 12/08/17 23:06:02 INFO ipc.Server: IPC Server handler 3 on 48980: exiting
>>
>> 12/08/17 23:06:02 INFO ipc.Server: IPC Server handler 6 on 48980: exiting
>>
>> 12/08/17 23:06:02 INFO ipc.Server: IPC Server handler 9 on 48980: exiting
>>
>> 12/08/17 23:06:02 INFO ipc.Server: IPC Server handler 4 on 48980: exiting
>>
>> 12/08/17 23:06:02 INFO server.NIOServerCnxn: NIOServerCnxn factory exited
>> run method
>>
>> 12/08/17 23:06:02 INFO ipc.Server: IPC Server handler 8 on 48980: exiting
>>
>> 12/08/17 23:06:02 INFO server.PrepRequestProcessor: PrepRequestProcessor
>> exited loop!
>>
>> 12/08/17 23:06:02 INFO server.SyncRequestProcessor: SyncRequestProcessor
>> exited!
>>
>> 12/08/17 23:06:02 INFO server.FinalRequestProcessor: shutdown of request
>> processor complete
>>
>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.028 sec
>> <<< FAILURE!
>>
>> Results :
>>
>> Failed tests:
>>   testSubmitJob(org.apache.hama.graph.TestSubmitGraphJob)
>>
>> Tests run: 1, Failures: 1, Errors: 0, Skipped: 0
>>
>> [INFO]
>> ------------------------------------------------------------------------
>> [INFO] Reactor Summary:
>> [INFO]
>> [INFO] Apache Hama parent POM ............................ SUCCESS
>> [5.294s]
>> [INFO] core .............................................. SUCCESS
>> [1:49.431s]
>> [INFO] graph ............................................. FAILURE
>> [17.399s]
>> [INFO] examples .......................................... SKIPPED
>> [INFO] yarn .............................................. SKIPPED
>> [INFO] machine learning .................................. SKIPPED
>> [INFO] hama-dist ......................................... SKIPPED
>> [INFO]
>> ------------------------------------------------------------------------
>> [INFO] BUILD FAILURE
>> [INFO]
>> ------------------------------------------------------------------------
>> [INFO] Total time: 2:13.208s
>> [INFO] Finished at: Fri Aug 17 23:06:02 UTC 2012
>> [INFO] Final Memory: 40M/191M
>> [INFO]
>> ------------------------------------------------------------------------
>> [ERROR] Failed to execute goal
>> org.apache.maven.plugins:maven-surefire-plugin:2.6:test (default-test) on
>> project hama-graph: There are test failures.
>> [ERROR]
>> [ERROR] Please refer to <
>> https://builds.apache.org/job/Hama-Nightly/ws/trunk/graph/target/surefire-reports>
>> for the individual test results.
>> [ERROR] -> [Help 1]
>> [ERROR]
>> [ERROR] To see the full stack trace of the errors, re-run Maven with the
>> -e switch.
>> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>> [ERROR]
>> [ERROR] For more information about the errors and possible solutions,
>> please read the following articles:
>> [ERROR] [Help 1]
>> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
>> [ERROR]
>> [ERROR] After correcting the problems, you can resume the build with the
>> command
>> [ERROR]   mvn <goals> -rf :hama-graph
>> Build step 'Invoke top-level Maven targets' marked build as failure
>> Archiving artifacts
>> Recording test results
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message