giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suijian Zhou <suijian.z...@gmail.com>
Subject Re: zookeeper problem in giraph..
Date Mon, 07 Apr 2014 14:34:37 GMT
Hi, Lukas,
  Thank you, but when I tried to apply the patch, I got:
2014.04.07|09:25:47~/giraph/giraph-core/src> git apply --check
NettyClient_Timeout.patch
error: patch failed:
giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java:153
error:
giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java:
patch does not apply

  Could you send me directly the new patched NettyClient.java file? Thanks!

  Best Regards,
  Suijian



2014-04-04 17:12 GMT-05:00 Lukas Nalezenec <lukas.nalezenec@firma.seznam.cz>
:

>  Hi,
>
> I had similar issue, it was caused by long GC pauses. I patched
> NettyClient so when reconnect fails it sleeps for some time before next
> try. Patch is enclosed. Let me know if it works for you.
> I would try tuning GC. You can also try to use
> giraph.waitForRequestsConfirmation and giraph.maxNumberOfOpenRequests .
> I hope I am right.
>
> Regards
> Lukas
>
>
> On 4.4.2014 22:49, Suijian Zhou wrote:
>
>   Hi,
>   I have a zookeeper problem when running a giraph program, the program
> will be aborted in superstep 2 as:
> 14/04/04 15:44:48 INFO zookeeper.ClientCnxn: Opening socket connection to
> server compute-0-18.local/10.1.255.236:22181. Will not attempt to
> authenticate using SASL (unknown error)
> 14/04/04 15:44:48 INFO zookeeper.ClientCnxn: Socket connection established
> to compute-0-18.local/10.1.255.236:22181, initiating session
> 14/04/04 15:44:48 INFO zookeeper.ClientCnxn: Session establishment
> complete on server compute-0-18.local/10.1.255.236:22181, sessionid =
> 0x1452e7c79910009, negotiated timeout = 600000
> ......
> 14/04/04 15:46:08 INFO job.JobProgressTracker: Data from 8 workers -
> Compute superstep 2: 0 out of 4847571 vertices computed; 0 out of 64
> partitions computed; min free memory on worker 3 - 270.37MB, average
> 451.21MB
> 14/04/04 15:46:13 INFO job.JobProgressTracker: Data from 8 workers -
> Compute superstep 2: 0 out of 4847571 vertices computed; 0 out of 64
> partitions computed; min free memory on worker 6 - 249.25MB, average
> 404.02MB
> 14/04/04 15:46:16 INFO zookeeper.ClientCnxn: Unable to read additional
> data from server sessionid 0x1452e7c79910009, likely server has closed
> socket, closing socket connection and attempting reconnect
> 14/04/04 15:46:17 INFO zookeeper.ClientCnxn: Opening socket connection to
> server compute-0-18.local/10.1.255.236:22181. Will not attempt to
> authenticate using SASL (unknown error)
> 14/04/04 15:46:17 WARN zookeeper.ClientCnxn: Session 0x1452e7c79910009 for
> server null, unexpected error, closing socket connection and attempting
> reconnect
> java.net.ConnectException: Connection refused
>
>
>  Each rerun of the program will lead to another computing node reporting
> the same error("Unable to read additional data from server sessionid...").
>
>  What in superstep 2 are:
>   if (getSuperstep() == 2) {
>     for (IntWritable message: messages) {
>         for (Edge<IntWritable, IntWritable> edge: vertex.getEdges()) {
>            sendMessage(edge.getTargetVertexId(), message);
>            //int abc=0;
>         }
>     }
>   }
>
>  Checked that if I replace the line "sendMessage(edge.getTargetVertexId(),
> message);" to another meaningless line like "int abc=0;", the program could
> be finished successfully. Seems a ZooKeeper problem but this seems comes
> with giraph as I did not install ZooKeeper seperately.  I tried to modify
> parameters in GiraphConstants.java and re-compile giraph, but it seems do
> not take any effects as I see in the screen output the parameters were not
> changed at all.  Any hints?
>
>    Best Regards,
>    Suijian
>
>
>

Mime
View raw message