incubator-giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: java.io.EOFException
Date Fri, 06 Jan 2012 01:02:35 GMT
Yeah, this is kind of annoying that it's hard to figure out where this 
happens.  Unfortunately, this error happens in hadoop RPC, we don't have 
control of this code (it's from Apache Hadoop).  I suppose we could add 
some generic vertex checking utilities in the unittests that could be 
easily extended.  Maybe add this in the FAQ since it seems like a common 
error?

Avery

On 1/5/12 2:14 PM, "Christoph Böhm" wrote:
> Correct. I had an issue in readFields(DataInput in) for my vertex value type.
> Unfortunately, I never got to see the real exception until I wrote local tests (which
one of course should do before).
> The error returned was java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375)
which was irritating since I don't call readInt() in any place ...
>
> Where is a good spot to fix this? -- i.e., add vertex value errors to user logs.
>
> Chr
>
> -------- Original-Nachricht --------
>> Datum: Tue, 03 Jan 2012 11:07:56 -0800
>> Von: Avery Ching<aching@apache.org>
>> An: giraph-user@incubator.apache.org
>> Betreff: Re: java.io.EOFException
>> It appears that you had a problem with the serialization/deserialization
>> of your vertex and/or its types (I, E, V, M).  You might want to try to
>> test that out separately.
>>
>> Avery
>>
>> On 1/3/12 3:54 AM, "Christoph Böhm" wrote:
>>> Thanks!
>>> The next exception I cannot explain myself is the following.
>>> I have one input file of the form:
>>>
>> [2095029,[[1100046950,-1],[952771928,-1]],[[1276522248,0.9829082],[322609086,0.013525307]]]
>> [5146036,[[947366954,-1],[34019593,-1]],[[1199061143,0.573876],[1024309140,0.98412496]]]
>> [5270429,[[800028028,-1],[1362541830,-1]],[[164325925,0.92203426],[148512084,0.65505975]]]
>>> ... and want to use say 5 workers.
>>> Then worker tenem05 reports what is below.
>>>
>>> Cheers.
>>> Christoph
>>>
>>> --------------
>>> java.lang.RuntimeException: java.io.IOException: Call to
>> tenem02//172.16.23.151:30003 failed on local exception: java.io.EOFException
>>> 	at
>> org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:780)
>>> 	at
>> org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:304)
>>> 	at
>> org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:569)
>>> 	at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>> 	at javax.security.auth.Subject.doAs(Subject.java:396)
>>> 	at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
>>> Caused by: java.io.IOException: Call to tenem02/172.16.23.151:30003
>> failed on local exception: java.io.EOFException
>>> 	at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065)
>>> 	at org.apache.hadoop.ipc.Client.call(Client.java:1033)
>>> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
>>> 	at $Proxy3.putVertexList(Unknown Source)
>>> 	at
>> org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:777)
>>> 	... 11 more
>>> Caused by: java.io.EOFException
>>> 	at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>> 	at
>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767)
>>> 	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712)
>>> 2012-01-03 12:35:46,259 ERROR org.apache.giraph.graph.GraphMapper:
>> setup: Caught exception just before end of setup
>>> java.lang.IllegalStateException: setup: loadVertices failed
>>> 	at
>> org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:576)
>>> 	at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>> 	at javax.security.auth.Subject.doAs(Subject.java:396)
>>> 	at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
>>> Caused by: java.lang.RuntimeException: java.io.IOException: Call to
>> tenem02/172.16.23.151:30003 failed on local exception: java.io.EOFException
>>> 	at
>> org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:780)
>>> 	at
>> org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:304)
>>> 	at
>> org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:569)
>>> 	... 9 more
>>> Caused by: java.io.IOException: Call to tenem02/172.16.23.151:30003
>> failed on local exception: java.io.EOFException
>>> 	at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065)
>>> 	at org.apache.hadoop.ipc.Client.call(Client.java:1033)
>>> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
>>> 	at $Proxy3.putVertexList(Unknown Source)
>>> 	at
>> org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:777)
>>> 	... 11 more
>>> Caused by: java.io.EOFException
>>> 	at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>> 	at
>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767)
>>> 	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712)
>>> 2012-01-03 12:35:46,260 ERROR org.apache.giraph.graph.BspServiceWorker:
>> unregisterHealth: Got failure, unregistering health on
>> /_hadoopBsp/job_201112231316_4347/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/tenem05_1
>> on superstep -1
>>> 2012-01-03 12:35:46,270 INFO org.apache.hadoop.mapred.TaskLogsTruncater:
>> Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
>>> 2012-01-03 12:35:46,320 INFO org.apache.hadoop.io.nativeio.NativeIO:
>> Initialized cache for UID to User mapping with a cache timeout of 14400
>> seconds.
>>> 2012-01-03 12:35:46,320 INFO org.apache.hadoop.io.nativeio.NativeIO: Got
>> UserName hadoop00 for UID 503 from the native implementation
>>> 2012-01-03 12:35:46,322 WARN org.apache.hadoop.mapred.Child: Error
>> running child
>>> java.lang.IllegalStateException: run: Caught an unrecoverable exception
>> setup: Offlining servers due to exception...
>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:641)
>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>> 	at javax.security.auth.Subject.doAs(Subject.java:396)
>>> 	at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
>>> Caused by: java.lang.RuntimeException: setup: Offlining servers due to
>> exception...
>>> 	at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:466)
>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
>>> 	... 7 more
>>> Caused by: java.lang.IllegalStateException: setup: loadVertices failed
>>> 	at
>> org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:576)
>>> 	at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458)
>>> 	... 8 more
>>> Caused by: java.lang.RuntimeException: java.io.IOException: Call to
>> tenem02/172.16.23.151:30003 failed on local exception: java.io.EOFException
>>> 	at
>> org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:780)
>>> 	at
>> org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:304)
>>> 	at
>> org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:569)
>>> 	... 9 more
>>> Caused by: java.io.IOException: Call to tenem02/172.16.23.151:30003
>> failed on local exception: java.io.EOFException
>>> 	at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065)
>>> 	at org.apache.hadoop.ipc.Client.call(Client.java:1033)
>>> 	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
>>> 	at $Proxy3.putVertexList(Unknown Source)
>>> 	at
>> org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:777)
>>> 	... 11 more
>>> Caused by: java.io.EOFException
>>> 	at java.io.DataInputStream.readInt(DataInputStream.java:375)
>>> 	at
>> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767)
>>> 	at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712)
>>> 2012-01-03 12:35:46,337 INFO org.apache.hadoop.mapred.Task: Runnning
>> cleanup for the task
>>>
>>>
>>>
>>>
>>> -------- Original-Nachricht --------
>>>> Datum: Fri, 23 Dec 2011 09:25:24 -0800
>>>> Von: Avery Ching<aching@apache.org>
>>>> An: giraph-user@incubator.apache.org
>>>> Betreff: Re: zookeeper connection issue
>>>> Yeah, of those errors can seem a little scary.  But I think they are
>>>> mostly harmless.  Let's go over each one inline.
>>>>
>>>> On 12/23/11 7:10 AM, "Christoph Böhm" wrote:
>>>>> Hi List,
>>>>>
>>>>> I'm about to get started with Giraph and have a few of questions:
>>>>> when running the Pagrank example with
>>>>>       hadoop jar giraph-0.70-jar-with-dependencies.jar
>>>> org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 500000 -w
>> 10
>>>>> this finishes but I find the following in one worker's logs:
>>>>>
>>>>> *** Worker:
>>>>> 2011-12-23 15:36:09,468 ERROR org.apache.zookeeper.ClientCnxn: Error
>>>> while calling watcher
>>>>> java.lang.RuntimeException:
>>>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>> KeeperErrorCode = ConnectionLoss for
>>>> /_hadoopBsp/job_201112231316_0010/_masterJobState
>>>>> 	at
>> org.apache.giraph.graph.BspService.getJobState(BspService.java:564)
>>>>> 	at
>> org.apache.giraph.graph.BspServiceWorker.processEvent(BspServiceWorker.java:1414)
>>>>> 	at org.apache.giraph.graph.BspService.process(BspService.java:1017)
>>>>> 	at
>> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530)
>>>>> 	at
>> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
>>>>> Caused by:
>> org.apache.zookeeper.KeeperException$ConnectionLossException:
>>>> KeeperErrorCode = ConnectionLoss for
>>>> /_hadoopBsp/job_201112231316_0010/_masterJobState
>>>>> 	at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:90)
>>>>> 	at
>> org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
>>>>> 	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)
>>>>> 	at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:99)
>>>>> 	at
>> org.apache.giraph.graph.BspService.getJobState(BspService.java:555)
>>>>> 	... 4 more
>>>> Depends when this happens.  If it's after the worker has let the master
>>>> know that it was finished with everything, this is fine.
>>>>
>>>>> *** The Master says:
>>>>> 2011-12-23 15:45:40,564 WARN org.apache.giraph.zk.ZooKeeperManager:
>>>> onlineZooKeeperServers: Got ConnectException
>>>>> java.net.ConnectException: Connection refused
>>>>> 	at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>> 	at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>>>>> 	at
>> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>>>>> 	at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>>>>> 	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>>>>> 	at java.net.Socket.connect(Socket.java:525)
>>>>> 	at
>> org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:624)
>>>>> 	at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:408)
>>>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
>>>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>>>> 	at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>> 	at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
>>>>>
>>>>>
>>>>>
>>>>> Also, when I'm trying to run my own Job I see the following. All
>>>> firewalls etc. should be shutdown.
>>>>> *** Master (node09.de):
>>>>> 2011-12-23 15:57:47,140 INFO org.apache.giraph.zk.ZooKeeperManager:
>>>> onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect
>> to
>>>> node09.de:22181 with poll msecs = 3000
>>>>> 2011-12-23 15:57:47,143 WARN org.apache.giraph.zk.ZooKeeperManager:
>>>> onlineZooKeeperServers: Got ConnectException
>>>>> java.net.ConnectException: Connection refused
>>>>> 	at java.net.PlainSocketImpl.socketConnect(Native Method)
>>>>> 	at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333)
>>>>> 	at
>> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195)
>>>>> 	at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182)
>>>>> 	at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
>>>>> 	at java.net.Socket.connect(Socket.java:525)
>>>>> 	at
>> org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:624)
>>>>> 	at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:409)
>>>>> 	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630)
>>>>> 	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
>>>>> 	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
>>>>> 	at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
>>>>> 	at java.security.AccessController.doPrivileged(Native Method)
>>>>> 	at javax.security.auth.Subject.doAs(Subject.java:396)
>>>>> 	at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
>>>>> 	at org.apache.hadoop.mapred.Child.main(Child.java:253)
>>>>>
>>>>>
>>>>>
>>>>> Thanks again.
>>>>> Christoph
>>>> These two exceptions on the master are also fine.  It takes some time
>>>> for the master to start the zk service (hence the multiple connection
>>>> attempts).
>


Mime
View raw message