giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@apache.org>
Subject Re: wierd communication errors
Date Thu, 28 Jun 2012 02:56:21 GMT
Same issue using netty as well?



On 6/27/12 6:14 PM, Yuanyuan Tian wrote:
> Hi,
>
> I was running a giraph job where I constantly got the following 
> communication related errors. The symptom is that in super step 0, 
> most of the workers succeeded but a few of the workers produced the 
> errors below, the machines that caused the connection reset are 
> different in each failed worker. To rule out the probability of the 
> cluster setup error, I also ran a different job and it worked fine. 
> So, the error must be caused by this particular giraph job. My giraph 
> job is just normal message propagation type of job, except that the 
> message is not a of a unique type. Therefore, I defined a special 
> message type (also copied in this email) that incorporates two 
> different types of messages: integer message and double array message. 
>  I have tried all day but still couldn't ping point the source of the 
> bug. Can anyone give me some hints on what may have caused this error?
>
> Thanks a lot,
>
> java.lang.IllegalStateException: run: Caught an unrecoverable 
> exception flush: Got ExecutionException
> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:859)
> at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:763)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:369)
> at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:396)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
> at org.apache.hadoop.mapred.Child.main(Child.java:253)
> Caused by: java.lang.IllegalStateException: flush: Got ExecutionException
> at 
> org.apache.giraph.comm.BasicRPCCommunications.flush(BasicRPCCommunications.java:1082)
> at 
> org.apache.giraph.graph.BspServiceWorker.finishSuperstep(BspServiceWorker.java:1080)
> at org.apache.giraph.graph.GraphMapper.map(GraphMapper.java:806)
> at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:850)
> ... 7 more
> Caused by: java.util.concurrent.ExecutionException: 
> java.lang.reflect.UndeclaredThrowableException
> at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
> at java.util.concurrent.FutureTask.get(FutureTask.java:83)
> at 
> org.apache.giraph.comm.BasicRPCCommunications.flush(BasicRPCCommunications.java:1077)
> ... 10 more
> Caused by: java.lang.reflect.UndeclaredThrowableException
> at $Proxy3.getName(Unknown Source)
> at 
> org.apache.giraph.comm.BasicRPCCommunications$PeerFlushExecutor.run(BasicRPCCommunications.java:335)
> at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
> at java.lang.Thread.run(Thread.java:662)
> Caused by: java.io.IOException: Call to 
> idp35.almaden.ibm.com/172.16.0.35:30083 failed on local exception: 
> java.io.IOException: Connection reset by peer
> at org.apache.hadoop.ipc.Client.wrapException(Client.java:1065)
> at org.apache.hadoop.ipc.Client.call(Client.java:1033)
> at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:224)
> ... 8 more
> Caused by: java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcher.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
> at sun.nio.ch.IOUtil.read(IOUtil.java:175)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> at 
> org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:55)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> at java.io.FilterInputStream.read(FilterInputStream.java:116)
> at 
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:343)
> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
> at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
> at java.io.DataInputStream.readInt(DataInputStream.java:370)
> at 
> org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:767)
> at org.apache.hadoop.ipc.Client$Connection.run(Client.java:712)
>
> My special messge type:
>
> *public**class*MyMessageWritable *implements*Writable{
>
> *public**byte*msgType=0;
> *public**long*vertexID=-1;
> *public**double*[] arrayMsg=*null*;
> *public**int*intMsg=-1;
>
> *public*MyMessageWritable ()
>         {
>         }
>
> *public*MyMessageWritable (*long*id, *byte*tp, *int*msg)
>         {
> vertexID=id;
> msgType=tp;
> intMsg=msg;
>         }
>
> *public*MyMessageWritable (*long*id, *byte*tp, *double*[] arr)
>         {
> vertexID=id;
> msgType=tp;
> arrayMsg=arr;
>         }
>
> @Override
> *public**void*readFields(DataInput in) *throws*IOException {
> vertexID=in.readLong();
> msgType=in.readByte();
> *switch*(msgType)
>       {
> *case*1:
> *case*4:
> intMsg=in.readInt();
> *break*;
> *case*2:
> *case*3:
> *if*(arrayMsg==*null*)
> arrayMsg=*new**double*[MyVertex./K/];
> *for*(*int*i=0; i<MyVertex./K/; i++)
> arrayMsg[i]=in.readDouble();
> *break*;
> *default*:
> *throw**new*IOException("message type invalid: "+msgType);
>       }
>         }
>
> @Override
> *public**void*write(DataOutput out) *throws*IOException {
>       out.writeLong(vertexID);
>       out.writeByte(msgType);
> *switch*(msgType)
>       {
> *case*1:
> *case*4:
>               out.writeInt(intMsg);
> *break*;
> *case*2:
> *case*3:
> *if*(arrayMsg==*null*)
> *throw**new*IOException("array message is null");
> *for*(*int*i=0; i<MyVertex./K/; i++)
>       out.writeDouble(arrayMsg[i]);
> *break*;
> *default*:
> *throw**new*IOException("message type invalid: "+msgType);
>       }
>
>         } 



Mime
View raw message