giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zachary Hanif <zh4...@gmail.com>
Subject Giraph/Netty issues on a cluster
Date Wed, 13 Feb 2013 19:29:00 GMT
(How embarrassing! I forgot a subject header in a previous attempt to post
this. Please reply to this thread, not the other.)

Hi everyone,

I am having some odd issues when trying to run a Giraph 0.2 job across my
CDH 3u3 cluster. After building the jar, and deploying it across the
cluster, I start to notice a handful of my nodes reporting the following
error:

2013-02-13 17:47:43,341 WARN
> org.apache.giraph.comm.netty.handler.ResponseClientHandler:
> exceptionCaught: Channel failed with remote address <EDITED_INTERNAL_DNS>/
> 10.2.0.16:30001
> java.lang.NullPointerException
>     at
> org.apache.giraph.vertex.EdgeListVertexBase.write(EdgeListVertexBase.java:106)
>     at
> org.apache.giraph.partition.SimplePartition.write(SimplePartition.java:169)
>     at
> org.apache.giraph.comm.requests.SendVertexRequest.writeRequest(SendVertexRequest.java:71)
>     at
> org.apache.giraph.comm.requests.WritableRequest.write(WritableRequest.java:127)
>     at
> org.apache.giraph.comm.netty.handler.RequestEncoder.encode(RequestEncoder.java:96)
>     at
> org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:61)
>     at
> org.jboss.netty.handler.execution.ExecutionHandler.handleDownstream(ExecutionHandler.java:185)
>     at org.jboss.netty.channel.Channels.write(Channels.java:712)
>     at org.jboss.netty.channel.Channels.write(Channels.java:679)
>     at
> org.jboss.netty.channel.AbstractChannel.write(AbstractChannel.java:246)
>     at
> org.apache.giraph.comm.netty.NettyClient.sendWritableRequest(NettyClient.java:655)
>     at
> org.apache.giraph.comm.netty.NettyWorkerClient.sendWritableRequest(NettyWorkerClient.java:144)
>     at
> org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:425)
>     at
> org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.sendPartitionRequest(NettyWorkerClientRequestProcessor.java:195)
>     at
> org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.flush(NettyWorkerClientRequestProcessor.java:365)
>     at
> org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:190)
>     at
> org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)
>     at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>     at java.lang.Thread.run(Thread.java:722)
>

What would be causing this? All other Hadoop jobs run well on the cluster,
and when the Giraph job is run with only one worker, it completes without
any issues. When run with any number of workers >1, the above error occurs.
I have referenced this
post<http://mail-archives.apache.org/mod_mbox/giraph-user/201209.mbox/%3CCAEQ6y7ShC4in-L73nR7aBizsPMRRfw9sfa8TMi3MyqML8VK0LQ@mail.gmail.com%3E>where
superficially similar issues were discussed, but the root cause
appears to be different, and suggested methods of resolution are not
panning out.

As extra background, the 'remote address' changes, as the error cycles
through my available cluster nodes, and the failing workers do not seem to
favor one physical machine over another. Not all nodes present this issue,
only a handful per job. Is there soemthing simple that I am missing?

Mime
View raw message