Return-Path: X-Original-To: apmail-incubator-giraph-user-archive@minotaur.apache.org Delivered-To: apmail-incubator-giraph-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BF5C99EDD for ; Fri, 6 Jan 2012 01:02:34 +0000 (UTC) Received: (qmail 93001 invoked by uid 500); 6 Jan 2012 01:02:34 -0000 Delivered-To: apmail-incubator-giraph-user-archive@incubator.apache.org Received: (qmail 92905 invoked by uid 500); 6 Jan 2012 01:02:34 -0000 Mailing-List: contact giraph-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: giraph-user@incubator.apache.org Delivered-To: mailing list giraph-user@incubator.apache.org Received: (qmail 92897 invoked by uid 99); 6 Jan 2012 01:02:34 -0000 Received: from minotaur.apache.org (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Jan 2012 01:02:34 +0000 Received: from localhost (HELO achingmbp15.local) (127.0.0.1) (smtp-auth username aching, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Jan 2012 01:02:33 +0000 Message-ID: <4F06482B.5090604@apache.org> Date: Thu, 05 Jan 2012 17:02:35 -0800 From: Avery Ching User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:8.0) Gecko/20111105 Thunderbird/8.0 MIME-Version: 1.0 To: giraph-user@incubator.apache.org Subject: Re: java.io.EOFException References: <20120105221446.232380@gmx.net> In-Reply-To: <20120105221446.232380@gmx.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Yeah, this is kind of annoying that it's hard to figure out where this happens. Unfortunately, this error happens in hadoop RPC, we don't have control of this code (it's from Apache Hadoop). I suppose we could add some generic vertex checking utilities in the unittests that could be easily extended. Maybe add this in the FAQ since it seems like a common error? Avery On 1/5/12 2:14 PM, "Christoph Böhm" wrote: > Correct. I had an issue in readFields(DataInput in) for my vertex value type. > Unfortunately, I never got to see the real exception until I wrote local tests (which one of course should do before). > The error returned was java.io.EOFException at java.io.DataInputStream.readInt(DataInputStream.java:375) which was irritating since I don't call readInt() in any place ... > > Where is a good spot to fix this? -- i.e., add vertex value errors to user logs. > > Chr > > -------- Original-Nachricht -------- >> Datum: Tue, 03 Jan 2012 11:07:56 -0800 >> Von: Avery Ching >> An: giraph-user@incubator.apache.org >> Betreff: Re: java.io.EOFException >> It appears that you had a problem with the serialization/deserialization >> of your vertex and/or its types (I, E, V, M). You might want to try to >> test that out separately. >> >> Avery >> >> On 1/3/12 3:54 AM, "Christoph Böhm" wrote: >>> Thanks! >>> The next exception I cannot explain myself is the following. >>> I have one input file of the form: >>> >> [2095029,[[1100046950,-1],[952771928,-1]],[[1276522248,0.9829082],[322609086,0.013525307]]] >> [5146036,[[947366954,-1],[34019593,-1]],[[1199061143,0.573876],[1024309140,0.98412496]]] >> [5270429,[[800028028,-1],[1362541830,-1]],[[164325925,0.92203426],[148512084,0.65505975]]] >>> ... and want to use say 5 workers. >>> Then worker tenem05 reports what is below. >>> >>> Cheers. >>> Christoph >>> >>> -------------- >>> java.lang.RuntimeException: java.io.IOException: Call to >> tenem02//172.16.23.151:30003 failed on local exception: java.io.EOFException >>> at >> org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:780) >>> at >> org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:304) >>> at >> org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:569) >>> at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458) >>> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630) >>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) >>> at org.apache.hadoop.mapred.Child$4.run(Child.java:259) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:396) >>> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) >>> at org.apache.hadoop.mapred.Child.main(Child.java:253) >>> Caused by: java.io.IOException: Call to tenem02/172.16.23.151:30003 >> failed on local exception: java.io.EOFException >>> at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065) >>> at org.apache.hadoop.ipc.Client.call(Client.java:1033) >>> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) >>> at $Proxy3.putVertexList(Unknown Source) >>> at >> org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:777) >>> ... 11 more >>> Caused by: java.io.EOFException >>> at java.io.DataInputStream.readInt(DataInputStream.java:375) >>> at >> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767) >>> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712) >>> 2012-01-03 12:35:46,259 ERROR org.apache.giraph.graph.GraphMapper: >> setup: Caught exception just before end of setup >>> java.lang.IllegalStateException: setup: loadVertices failed >>> at >> org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:576) >>> at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458) >>> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630) >>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) >>> at org.apache.hadoop.mapred.Child$4.run(Child.java:259) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:396) >>> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) >>> at org.apache.hadoop.mapred.Child.main(Child.java:253) >>> Caused by: java.lang.RuntimeException: java.io.IOException: Call to >> tenem02/172.16.23.151:30003 failed on local exception: java.io.EOFException >>> at >> org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:780) >>> at >> org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:304) >>> at >> org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:569) >>> ... 9 more >>> Caused by: java.io.IOException: Call to tenem02/172.16.23.151:30003 >> failed on local exception: java.io.EOFException >>> at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065) >>> at org.apache.hadoop.ipc.Client.call(Client.java:1033) >>> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) >>> at $Proxy3.putVertexList(Unknown Source) >>> at >> org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:777) >>> ... 11 more >>> Caused by: java.io.EOFException >>> at java.io.DataInputStream.readInt(DataInputStream.java:375) >>> at >> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767) >>> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712) >>> 2012-01-03 12:35:46,260 ERROR org.apache.giraph.graph.BspServiceWorker: >> unregisterHealth: Got failure, unregistering health on >> /_hadoopBsp/job_201112231316_4347/_applicationAttemptsDir/0/_superstepDir/-1/_workerHealthyDir/tenem05_1 >> on superstep -1 >>> 2012-01-03 12:35:46,270 INFO org.apache.hadoop.mapred.TaskLogsTruncater: >> Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1 >>> 2012-01-03 12:35:46,320 INFO org.apache.hadoop.io.nativeio.NativeIO: >> Initialized cache for UID to User mapping with a cache timeout of 14400 >> seconds. >>> 2012-01-03 12:35:46,320 INFO org.apache.hadoop.io.nativeio.NativeIO: Got >> UserName hadoop00 for UID 503 from the native implementation >>> 2012-01-03 12:35:46,322 WARN org.apache.hadoop.mapred.Child: Error >> running child >>> java.lang.IllegalStateException: run: Caught an unrecoverable exception >> setup: Offlining servers due to exception... >>> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:641) >>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) >>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) >>> at org.apache.hadoop.mapred.Child$4.run(Child.java:259) >>> at java.security.AccessController.doPrivileged(Native Method) >>> at javax.security.auth.Subject.doAs(Subject.java:396) >>> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) >>> at org.apache.hadoop.mapred.Child.main(Child.java:253) >>> Caused by: java.lang.RuntimeException: setup: Offlining servers due to >> exception... >>> at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:466) >>> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630) >>> ... 7 more >>> Caused by: java.lang.IllegalStateException: setup: loadVertices failed >>> at >> org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:576) >>> at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:458) >>> ... 8 more >>> Caused by: java.lang.RuntimeException: java.io.IOException: Call to >> tenem02/172.16.23.151:30003 failed on local exception: java.io.EOFException >>> at >> org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:780) >>> at >> org.apache.giraph.graph.BspServiceWorker.loadVertices(BspServiceWorker.java:304) >>> at >> org.apache.giraph.graph.BspServiceWorker.setup(BspServiceWorker.java:569) >>> ... 9 more >>> Caused by: java.io.IOException: Call to tenem02/172.16.23.151:30003 >> failed on local exception: java.io.EOFException >>> at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065) >>> at org.apache.hadoop.ipc.Client.call(Client.java:1033) >>> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224) >>> at $Proxy3.putVertexList(Unknown Source) >>> at >> org.apache.giraph.comm.BasicRPCCommunications.sendPartitionReq(BasicRPCCommunications.java:777) >>> ... 11 more >>> Caused by: java.io.EOFException >>> at java.io.DataInputStream.readInt(DataInputStream.java:375) >>> at >> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767) >>> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712) >>> 2012-01-03 12:35:46,337 INFO org.apache.hadoop.mapred.Task: Runnning >> cleanup for the task >>> >>> >>> >>> >>> -------- Original-Nachricht -------- >>>> Datum: Fri, 23 Dec 2011 09:25:24 -0800 >>>> Von: Avery Ching >>>> An: giraph-user@incubator.apache.org >>>> Betreff: Re: zookeeper connection issue >>>> Yeah, of those errors can seem a little scary. But I think they are >>>> mostly harmless. Let's go over each one inline. >>>> >>>> On 12/23/11 7:10 AM, "Christoph Böhm" wrote: >>>>> Hi List, >>>>> >>>>> I'm about to get started with Giraph and have a few of questions: >>>>> when running the Pagrank example with >>>>> hadoop jar giraph-0.70-jar-with-dependencies.jar >>>> org.apache.giraph.benchmark.PageRankBenchmark -e 1 -s 3 -v -V 500000 -w >> 10 >>>>> this finishes but I find the following in one worker's logs: >>>>> >>>>> *** Worker: >>>>> 2011-12-23 15:36:09,468 ERROR org.apache.zookeeper.ClientCnxn: Error >>>> while calling watcher >>>>> java.lang.RuntimeException: >>>> org.apache.zookeeper.KeeperException$ConnectionLossException: >> KeeperErrorCode = ConnectionLoss for >>>> /_hadoopBsp/job_201112231316_0010/_masterJobState >>>>> at >> org.apache.giraph.graph.BspService.getJobState(BspService.java:564) >>>>> at >> org.apache.giraph.graph.BspServiceWorker.processEvent(BspServiceWorker.java:1414) >>>>> at org.apache.giraph.graph.BspService.process(BspService.java:1017) >>>>> at >> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:530) >>>>> at >> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506) >>>>> Caused by: >> org.apache.zookeeper.KeeperException$ConnectionLossException: >>>> KeeperErrorCode = ConnectionLoss for >>>> /_hadoopBsp/job_201112231316_0010/_masterJobState >>>>> at >> org.apache.zookeeper.KeeperException.create(KeeperException.java:90) >>>>> at >> org.apache.zookeeper.KeeperException.create(KeeperException.java:42) >>>>> at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637) >>>>> at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:99) >>>>> at >> org.apache.giraph.graph.BspService.getJobState(BspService.java:555) >>>>> ... 4 more >>>> Depends when this happens. If it's after the worker has let the master >>>> know that it was finished with everything, this is fine. >>>> >>>>> *** The Master says: >>>>> 2011-12-23 15:45:40,564 WARN org.apache.giraph.zk.ZooKeeperManager: >>>> onlineZooKeeperServers: Got ConnectException >>>>> java.net.ConnectException: Connection refused >>>>> at java.net.PlainSocketImpl.socketConnect(Native Method) >>>>> at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) >>>>> at >> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) >>>>> at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) >>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) >>>>> at java.net.Socket.connect(Socket.java:525) >>>>> at >> org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:624) >>>>> at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:408) >>>>> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630) >>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) >>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) >>>>> at org.apache.hadoop.mapred.Child$4.run(Child.java:259) >>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>> at javax.security.auth.Subject.doAs(Subject.java:396) >>>>> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) >>>>> at org.apache.hadoop.mapred.Child.main(Child.java:253) >>>>> >>>>> >>>>> >>>>> Also, when I'm trying to run my own Job I see the following. All >>>> firewalls etc. should be shutdown. >>>>> *** Master (node09.de): >>>>> 2011-12-23 15:57:47,140 INFO org.apache.giraph.zk.ZooKeeperManager: >>>> onlineZooKeeperServers: Connect attempt 0 of 10 max trying to connect >> to >>>> node09.de:22181 with poll msecs = 3000 >>>>> 2011-12-23 15:57:47,143 WARN org.apache.giraph.zk.ZooKeeperManager: >>>> onlineZooKeeperServers: Got ConnectException >>>>> java.net.ConnectException: Connection refused >>>>> at java.net.PlainSocketImpl.socketConnect(Native Method) >>>>> at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:333) >>>>> at >> java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:195) >>>>> at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:182) >>>>> at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366) >>>>> at java.net.Socket.connect(Socket.java:525) >>>>> at >> org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServers(ZooKeeperManager.java:624) >>>>> at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:409) >>>>> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:630) >>>>> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763) >>>>> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369) >>>>> at org.apache.hadoop.mapred.Child$4.run(Child.java:259) >>>>> at java.security.AccessController.doPrivileged(Native Method) >>>>> at javax.security.auth.Subject.doAs(Subject.java:396) >>>>> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059) >>>>> at org.apache.hadoop.mapred.Child.main(Child.java:253) >>>>> >>>>> >>>>> >>>>> Thanks again. >>>>> Christoph >>>> These two exceptions on the master are also fine. It takes some time >>>> for the master to start the zk service (hence the multiple connection >>>> attempts). >