giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Suijian Zhou <suijian.z...@gmail.com>
Subject Re: zookeeper problem in giraph..
Date Mon, 07 Apr 2014 21:59:54 GMT
Hi, Lukas,
  Got the patch applied to
giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java and
recompiled giraph by "mvn compile", but still the same error:

14/04/07 16:51:26 INFO job.JobProgressTracker: Data from 8 workers -
Compute superstep 2: 0 out of 4847571 vertices computed; 0 out of 64
partitions computed; min free memory on worker 5 - 270.76MB, average
394.74MB
14/04/07 16:51:27 INFO zookeeper.ClientCnxn: Unable to read additional data
from server sessionid 0x1453e2b3cca0009, likely server has closed socket,
closing socket connection and attempting reconnect
14/04/07 16:51:29 INFO zookeeper.ClientCnxn: Opening socket connection to
server compute-0-19.local/10.1.255.235:22181. Will not attempt to
authenticate using SASL (unknown error)
14/04/07 16:51:29 WARN zookeeper.ClientCnxn: Session 0x1453e2b3cca0009 for
server null, unexpected error, closing socket connection and attempting
reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/04/07 16:51:31 INFO zookeeper.ClientCnxn: Opening socket connection to
server compute-0-19.local/10.1.255.235:22181. Will not attempt to
authenticate using SASL (unknown error)

  I tried to modify some parameters in:
./giraph-core/src/main/java/org/apache/giraph/conf/GiraphConstants.java
like DEFAULT_ZOOKEEPER_MAX_CLIENT_CNXNS
but seems have no effect. Any hints?

  Best Regards,
  Suijian



2014-04-07 9:34 GMT-05:00 Suijian Zhou <suijian.zhou@gmail.com>:

> Hi, Lukas,
>   Thank you, but when I tried to apply the patch, I got:
> 2014.04.07|09:25:47~/giraph/giraph-core/src> git apply --check
> NettyClient_Timeout.patch
> error: patch failed:
> giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java:153
> error:
> giraph-core/src/main/java/org/apache/giraph/comm/netty/NettyClient.java:
> patch does not apply
>
>   Could you send me directly the new patched NettyClient.java file? Thanks!
>
>   Best Regards,
>   Suijian
>
>
>
> 2014-04-04 17:12 GMT-05:00 Lukas Nalezenec <
> lukas.nalezenec@firma.seznam.cz>:
>
>  Hi,
>>
>> I had similar issue, it was caused by long GC pauses. I patched
>> NettyClient so when reconnect fails it sleeps for some time before next
>> try. Patch is enclosed. Let me know if it works for you.
>> I would try tuning GC. You can also try to use
>> giraph.waitForRequestsConfirmation and giraph.maxNumberOfOpenRequests .
>> I hope I am right.
>>
>> Regards
>> Lukas
>>
>>
>> On 4.4.2014 22:49, Suijian Zhou wrote:
>>
>>   Hi,
>>   I have a zookeeper problem when running a giraph program, the program
>> will be aborted in superstep 2 as:
>> 14/04/04 15:44:48 INFO zookeeper.ClientCnxn: Opening socket connection to
>> server compute-0-18.local/10.1.255.236:22181. Will not attempt to
>> authenticate using SASL (unknown error)
>> 14/04/04 15:44:48 INFO zookeeper.ClientCnxn: Socket connection
>> established to compute-0-18.local/10.1.255.236:22181, initiating session
>> 14/04/04 15:44:48 INFO zookeeper.ClientCnxn: Session establishment
>> complete on server compute-0-18.local/10.1.255.236:22181, sessionid =
>> 0x1452e7c79910009, negotiated timeout = 600000
>> ......
>> 14/04/04 15:46:08 INFO job.JobProgressTracker: Data from 8 workers -
>> Compute superstep 2: 0 out of 4847571 vertices computed; 0 out of 64
>> partitions computed; min free memory on worker 3 - 270.37MB, average
>> 451.21MB
>> 14/04/04 15:46:13 INFO job.JobProgressTracker: Data from 8 workers -
>> Compute superstep 2: 0 out of 4847571 vertices computed; 0 out of 64
>> partitions computed; min free memory on worker 6 - 249.25MB, average
>> 404.02MB
>> 14/04/04 15:46:16 INFO zookeeper.ClientCnxn: Unable to read additional
>> data from server sessionid 0x1452e7c79910009, likely server has closed
>> socket, closing socket connection and attempting reconnect
>> 14/04/04 15:46:17 INFO zookeeper.ClientCnxn: Opening socket connection to
>> server compute-0-18.local/10.1.255.236:22181. Will not attempt to
>> authenticate using SASL (unknown error)
>> 14/04/04 15:46:17 WARN zookeeper.ClientCnxn: Session 0x1452e7c79910009
>> for server null, unexpected error, closing socket connection and attempting
>> reconnect
>> java.net.ConnectException: Connection refused
>>
>>
>>  Each rerun of the program will lead to another computing node reporting
>> the same error("Unable to read additional data from server sessionid...").
>>
>>  What in superstep 2 are:
>>   if (getSuperstep() == 2) {
>>     for (IntWritable message: messages) {
>>         for (Edge<IntWritable, IntWritable> edge: vertex.getEdges()) {
>>            sendMessage(edge.getTargetVertexId(), message);
>>            //int abc=0;
>>         }
>>     }
>>   }
>>
>>  Checked that if I replace the line
>> "sendMessage(edge.getTargetVertexId(), message);" to another meaningless
>> line like "int abc=0;", the program could be finished successfully. Seems a
>> ZooKeeper problem but this seems comes with giraph as I did not install
>> ZooKeeper seperately.  I tried to modify parameters in GiraphConstants.java
>> and re-compile giraph, but it seems do not take any effects as I see in the
>> screen output the parameters were not changed at all.  Any hints?
>>
>>    Best Regards,
>>    Suijian
>>
>>
>>
>

Mime
View raw message