incubator-giraph-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Avery Ching (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (GIRAPH-53) Unable to read additional data from server session, likely server has closed socket
Date Fri, 14 Oct 2011 15:30:12 GMT

    [ https://issues.apache.org/jira/browse/GIRAPH-53?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127626#comment-13127626
] 

Avery Ching commented on GIRAPH-53:
-----------------------------------

Thanks for reporting the issue.  A few questions:

1)  Is it always the 103rd superstep?

2)  It looks like the task lost its connection to the ZooKeeper service.  Probably good to
see what happen to that task as well.  Most likely it crashed for some reason.
                
> Unable to read additional data from server session, likely server has closed socket
> -----------------------------------------------------------------------------------
>
>                 Key: GIRAPH-53
>                 URL: https://issues.apache.org/jira/browse/GIRAPH-53
>             Project: Giraph
>          Issue Type: Bug
>            Reporter: locker
>
> I've got an error recently. Every thing goes well till it comes to the 103rd superstep.

> 2011-10-14 16:23:38,904 INFO org.apache.giraph.comm.BasicRPCCommunications: prepareSuperstep
> 2011-10-14 16:23:39,018 WARN org.apache.giraph.graph.BspService: process: Unknown and
unprocessed event (path=/_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/101/_vertexRangeAssignments,
type=NodeDeleted, state=SyncConnected)
> 2011-10-14 16:23:39,057 INFO org.apache.giraph.graph.BspServiceWorker: registerHealth:
Created my health node for attempt=0, superstep=103 with /_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/103/_workerHealthyDir/locker-desktop_1
and hostnamePort = ["locker-desktop",30001]
> 2011-10-14 16:23:39,057 WARN org.apache.giraph.graph.BspService: process: Unknown and
unprocessed event (path=/_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/101/_superstepFinished,
type=NodeDeleted, state=SyncConnected)
> 2011-10-14 16:23:39,529 INFO org.apache.zookeeper.ClientCnxn: Unable to read additional
data from server sessionid 0x1330186cff30001, likely server has closed socket, closing socket
connection and attempting reconnect
> 2011-10-14 16:23:39,630 ERROR org.apache.zookeeper.ClientCnxn: Error while calling watcher

> java.lang.RuntimeException: process: Disconnected from ZooKeeper, cannot recover.
> 	at org.apache.giraph.graph.BspService.process(BspService.java:995)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:488)
> 2011-10-14 16:23:41,098 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection
to server locker-desktop/10.13.30.90:22181
> 2011-10-14 16:23:41,099 WARN org.apache.zookeeper.ClientCnxn: Session 0x1330186cff30001
for server null, unexpected error, closing socket connection and attempting reconnect
> java.net.ConnectException: Connection refused
> 	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> 	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1078)
> 2011-10-14 16:23:41,212 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing
logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
> 2011-10-14 16:23:41,306 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache
for UID to User mapping with a cache timeout of 14400 seconds.
> 2011-10-14 16:23:41,307 INFO org.apache.hadoop.io.nativeio.NativeIO: Got UserName dic
for UID 1001 from the native implementation
> 2011-10-14 16:23:41,318 WARN org.apache.hadoop.mapred.Child: Error running child
> java.lang.RuntimeException: org.apache.zookeeper.KeeperException$ConnectionLossException:
KeeperErrorCode = ConnectionLoss for /_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/103/_vertexRangeAssignments
> 	at org.apache.giraph.graph.BspServiceWorker.startSuperstep(BspServiceWorker.java:836)
> 	at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:551)
> 	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
> Caused by: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode
= ConnectionLoss for /_hadoopBsp/job_201110141621_0001/_applicationAttemptsDir/0/_superstepDir/103/_vertexRangeAssignments
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:809)
> 	at org.apache.zookeeper.ZooKeeper.exists(ZooKeeper.java:837)
> 	at org.apache.giraph.graph.BspServiceWorker.startSuperstep(BspServiceWorker.java:830)
> 	... 9 more
> I dont know whether it should be called a bug or not. Wait for some help, thx...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message