giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avery Ching (JIRA)" <j...@apache.org>
Subject [jira] [Created] (GIRAPH-381) Ensure we get the original exception from GraphMapper#run()
Date Thu, 18 Oct 2012 22:12:04 GMT
Avery Ching created GIRAPH-381:
----------------------------------

             Summary: Ensure we get the original exception from GraphMapper#run()
                 Key: GIRAPH-381
                 URL: https://issues.apache.org/jira/browse/GIRAPH-381
             Project: Giraph
          Issue Type: Improvement
            Reporter: Avery Ching
            Assignee: Avery Ching


We can lose the original exception if failureCleanup() fails.

I.e.

INFO    2012-10-18 14:23:25,417 [main] org.apache.giraph.graph.WorkerAggregatorHandler  -
marshalAggregatorValues: Finished assembling aggregator values
INFO    2012-10-18 14:23:25,451 [main-SendThread(xxx.machine.xxx:22181)] org.apache.zookeeper.ClientCnxn
 - Unable to read additional data from server sessionid 0x13a75baca440014, likely server has
closed socket, closing socket c\
onnection and attempting reconnect
ERROR   2012-10-18 14:23:25,552 [main] org.apache.giraph.graph.BspServiceWorker  - unregisterHealth:
Got failure, unregistering health on /_hadoopBsp/job_201209271814.8652_0001/_applicationAttemptsDir/0/_superstepDir/1/_workerHea\
lthyDir/xxx.machine.xxx_9 on superstep 1
WARN    2012-10-18 14:23:25,554 [main-EventThread] org.apache.giraph.graph.BspService  - process:
Disconnected from ZooKeeper (will automatically try to recover) WatchedEvent state:Disconnected
type:None path:null
INFO    2012-10-18 14:23:26,916 [main-SendThread(xxx.machine.xxx:22181)] org.apache.zookeeper.ClientCnxn
 - Opening socket connection to server xxx.machine.xxx/10.174.108.77:22181
INFO    2012-10-18 14:23:26,917 [main-SendThread(xxx.machine.xxx:22181)] org.apache.zookeeper.ClientCnxn
 - Socket connection established to xxx.machine.xxx/10.174.108.77:22181, initiating session
WARN    2012-10-18 14:23:26,977 [main-SendThread(xxx.machine.xxx:22181)] org.apache.zookeeper.ClientCnxn
 - Session 0x13a75baca440014 for server xxx.machine.xxx/10.174.108.77:22181, unexpected error,
closing socket connection and\
 attempting reconnect
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:218)
at sun.nio.ch.IOUtil.read(IOUtil.java:186)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:359)
at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:858)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1130)
WARN    2012-10-18 14:23:27,082 [main] org.apache.hadoop.mapred.Child  - Error running child
java.lang.IllegalStateException: unregisterHealth: KeeperException - Couldn't delete /_hadoopBsp/job_201209271814.8652_0001/_applicationAttemptsDir/0/_superstepDir/1/_workerHealthyDir/xxx.machine.xxx_9
at org.apache.giraph.graph.BspServiceWorker.unregisterHealth(BspServiceWorker.java:582)
at org.apache.giraph.graph.BspServiceWorker.failureCleanup(BspServiceWorker.java:590)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:608)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:632)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:171)
Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode =
ConnectionLoss for /_hadoopBsp/job_201209271814.8652_0001/_applicationAttemptsDir/0/_superstepDir/1/_workerHealthyDir/xxx.machine.xxx_9
at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:728)
at org.apache.giraph.graph.BspServiceWorker.unregisterHealth(BspServiceWorker.java:576)
... 5 more


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message