giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lukas Nalezenec <lukas.naleze...@firma.seznam.cz>
Subject Re: zookeeper problem in giraph..
Date Fri, 04 Apr 2014 23:25:28 GMT
BTW: This patch solves connection problems between workers, not with 
zookeeper but as you problem disappears when you dont sent messages the 
zookeeper problems may be secondary.

On 5.4.2014 00:12, Lukas Nalezenec wrote:
> Hi,
>
> I had similar issue, it was caused by long GC pauses. I patched 
> NettyClient so when reconnect fails it sleeps for some time before 
> next try. Patch is enclosed. Let me know if it works for you.
> I would try tuning GC. You can also try to use 
> giraph.waitForRequestsConfirmation and giraph.maxNumberOfOpenRequests .
> I hope I am right.
>
> Regards
> Lukas
>
>
> On 4.4.2014 22:49, Suijian Zhou wrote:
>> Hi,
>>   I have a zookeeper problem when running a giraph program, the 
>> program will be aborted in superstep 2 as:
>> 14/04/04 15:44:48 INFO zookeeper.ClientCnxn: Opening socket 
>> connection to server compute-0-18.local/10.1.255.236:22181 
>> <http://10.1.255.236:22181>. Will not attempt to authenticate using 
>> SASL (unknown error)
>> 14/04/04 15:44:48 INFO zookeeper.ClientCnxn: Socket connection 
>> established to compute-0-18.local/10.1.255.236:22181 
>> <http://10.1.255.236:22181>, initiating session
>> 14/04/04 15:44:48 INFO zookeeper.ClientCnxn: Session establishment 
>> complete on server compute-0-18.local/10.1.255.236:22181 
>> <http://10.1.255.236:22181>, sessionid = 0x1452e7c79910009, 
>> negotiated timeout = 600000
>> ......
>> 14/04/04 15:46:08 INFO job.JobProgressTracker: Data from 8 workers - 
>> Compute superstep 2: 0 out of 4847571 vertices computed; 0 out of 64 
>> partitions computed; min free memory on worker 3 - 270.37MB, average 
>> 451.21MB
>> 14/04/04 15:46:13 INFO job.JobProgressTracker: Data from 8 workers - 
>> Compute superstep 2: 0 out of 4847571 vertices computed; 0 out of 64 
>> partitions computed; min free memory on worker 6 - 249.25MB, average 
>> 404.02MB
>> 14/04/04 15:46:16 INFO zookeeper.ClientCnxn: Unable to read 
>> additional data from server sessionid 0x1452e7c79910009, likely 
>> server has closed socket, closing socket connection and attempting 
>> reconnect
>> 14/04/04 15:46:17 INFO zookeeper.ClientCnxn: Opening socket 
>> connection to server compute-0-18.local/10.1.255.236:22181 
>> <http://10.1.255.236:22181>. Will not attempt to authenticate using 
>> SASL (unknown error)
>> 14/04/04 15:46:17 WARN zookeeper.ClientCnxn: Session 
>> 0x1452e7c79910009 for server null, unexpected error, closing socket 
>> connection and attempting reconnect
>> java.net.ConnectException: Connection refused
>>
>>
>> Each rerun of the program will lead to another computing node 
>> reporting the same error("Unable to read additional data from server 
>> sessionid...").
>>
>> What in superstep 2 are:
>>   if (getSuperstep() == 2) {
>>     for (IntWritable message: messages) {
>>         for (Edge<IntWritable, IntWritable> edge: vertex.getEdges()) {
>>            sendMessage(edge.getTargetVertexId(), message);
>>            //int abc=0;
>>         }
>>     }
>>   }
>>
>> Checked that if I replace the line 
>> "sendMessage(edge.getTargetVertexId(), message);" to another 
>> meaningless line like "int abc=0;", the program could be finished 
>> successfully. Seems a ZooKeeper problem but this seems comes with 
>> giraph as I did not install ZooKeeper seperately.  I tried to modify 
>> parameters in GiraphConstants.java and re-compile giraph, but it 
>> seems do not take any effects as I see in the screen output the 
>> parameters were not changed at all.  Any hints?
>>
>>   Best Regards,
>>   Suijian
>>
>


Mime
View raw message