giraph-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jyotirmoy Sundi <sundi...@gmail.com>
Subject Re: zookeeper connection issue while running for second time
Date Wed, 02 Oct 2013 05:39:57 GMT
Thanks a lot Avery for your response, I increased the timeout to 10 minutes
*changed:*
-Dgiraph.zkSessionMsecTimeout=600000 and
-Dgiraph.useInputSplitLocality=false ,
 It is working for consecutive runs now without any errors.

Thanks
Sundi


On Tue, Oct 1, 2013 at 10:18 PM, Avery Ching <aching@apache.org> wrote:

>  We did have this error a few times.  This can happen due to GC pauses,
> so I would check the worker for long GC issues.  Also, you can increase the
> ZooKeeper timeouts, see
>
>   /** ZooKeeper session millisecond timeout */
>   IntConfOption ZOOKEEPER_SESSION_TIMEOUT =
>       new IntConfOption("giraph.zkSessionMsecTimeout", MINUTES.toMillis(1),
>           "ZooKeeper session millisecond timeout");
>
> Currently, the default is one minute, but in production we set that number
> much, much higher (even greater than a day sometimes) to avoid the
> disconnection.
>
> Hope that helps,
> Avery
>
>
> On 10/1/13 6:27 PM, Jyotirmoy Sundi wrote:
>
> Hi ,
> I am able to run apache giraph successfully with around 500M pairs to
> find Connected components. It works great but not always, the issue seems
> to be with the time out zookeeper time out. Some of the client(around 5-10
> ) out of 100, produces this error and the master fails due to this.Do you
> have any suggestions for this error. Any suggestions will be appreaciated.
>
> 2013-10-02 01:20:43,651 WARN org.apache.giraph.bsp.BspService: process: Disconnected
from ZooKeeper (will automatically try to recover) WatchedEvent state:Disconnected type:None
path:null
> 2013-10-02 01:20:44,035 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection
to server had22.rsk.admobius.com/10.240.51.32:2181. Will not attempt to authenticate using
SASL (Unable to locate a login configuration)
> 2013-10-02 01:20:44,035 INFO org.apache.zookeeper.ClientCnxn: Socket connection established
to had22.rsk.admobius.com/10.240.51.32:2181, initiating session
> 2013-10-02 01:20:44,037 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect to
ZooKeeper service, session 0x441604c97412331 has expired, closing socket connection
> 2013-10-02 01:20:44,037 WARN org.apache.giraph.bsp.BspService: process: Got unknown null
path event WatchedEvent state:Expired type:None path:null
> 2013-10-02 01:20:44,038 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
> 2013-10-02 01:21:20,046 INFO org.apache.giraph.worker.VertexInputSplitsCallable: readVertexInputSplit:
Loaded 250000 vertices at 1827.2925619484213 vertices/sec 1728790 edges at 12636.730317550928
edges/sec Memory (free/total/max) = 1745.60M / 2262.19M / 2730.69M
> 2013-10-02 01:21:24,788 INFO org.apache.giraph.worker.InputSplitsCallable: loadFromInputSplit:
Finished loading /_hadoopBsp/job_201309260044_1132/_vertexInputSplitDir/601 (v=261131, e=1808572)
> 2013-10-02 01:21:24,789 ERROR org.apache.giraph.utils.LogStacktraceCallable: Execution
of callable failed
> java.lang.IllegalStateException: markInputSplitPathFinished: KeeperException on /_hadoopBsp/job_201309260044_1132/_vertexInputSplitDir/601/_vertexInputSplitFinished
> 	at org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:168)
> 	at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:226)
> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161)
> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)
> 	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
> 	at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode
= Session expired for /_hadoopBsp/job_201309260044_1132/_vertexInputSplitDir/601/_vertexInputSplitFinished
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
> 	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
> 	at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
> 	at org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:159)
> 	... 9 more
>
>
>  --
>  Best Regards,
> Jyotirmoy Sundi
> Admobius
>
> San Francisco, CA 94158
>
>
> On Thu, Sep 26, 2013 at 6:08 PM, Jyotirmoy Sundi <sundi133@gmail.com>wrote:
>
>>  Hi ,
>>
>>    I got the connected component working for 1B nodes, but when I run the job again,
it fails with the below error. Aprt form this in zookeeper the data is not cleared in the
data directory. For successful jobs the data in zookeper from giraph is cleared.
>>
>> The following errors seems to be coming because the node tries to connect to the
zookeeper with a session id which is cleared as seens in
>>
>> "Client session timed out, have not heard from server in 68845ms for sessionid 0x3415cc6ce930059,
closing socket connection and attempting reconnect" , Any idea if increasing the session time
out will be good ?
>>
>> 2013-09-27 00:57:11,748 WARN org.apache.giraph.bsp.BspService: process: Got unknown
null path event WatchedEvent state:Expired type:None path:null
>> 2013-09-27 00:57:11,748 INFO org.apache.zookeeper.ClientCnxn: Unable to reconnect
to ZooKeeper service, session 0x3415cc6ce930059 has expired, closing socket connection
>> 2013-09-27 00:57:11,748 WARN org.apache.giraph.worker.InputSplitsHandler: process:
Problem with zookeeper, got event with path null, state Expired, event type None
>> 2013-09-27 00:57:11,748 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
>> 2013-09-27 00:57:11,925 INFO org.apache.giraph.worker.InputSplitsCallable: loadFromInputSplit:
Finished loading /_hadoopBsp/job_201309260044_0116/_vertexInputSplitDir/89 (v=258127, e=1792906)
>> 2013-09-27 00:57:11,926 ERROR org.apache.giraph.utils.LogStacktraceCallable: Execution
of callable failed
>> java.lang.IllegalStateException: markInputSplitPathFinished: KeeperException on /_hadoopBsp/job_201309260044_0116/_vertexInputSplitDir/89/_vertexInputSplitFinished
>> 	at org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:168)
>> 	at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplitsCallable.java:226)
>> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:161)
>> 	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.java:58)
>> 	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:51)
>> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>> 	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
>> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
>> 	at java.lang.Thread.run(Thread.java:662)
>> Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode
= Session expired for /_hadoopBsp/job_201309260044_0116/_vertexInputSplitDir/89/_vertexInputSplitFinished
>> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
>> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>> 	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
>> 	at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
>> 	at org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(InputSplitsHandler.java:159)
>> 	... 9 more
>>
>>
>>  --
>>  Best Regards,
>> Jyotirmoy Sundi
>> Data Engineer,
>> Admobius
>>
>> San Francisco, CA 94158
>>
>
>
>
>  --
>  Best Regards,
> Jyotirmoy Sundi
> Data Engineer,
> Admobius
>
> San Francisco, CA 94158
>
>
>


-- 
Best Regards,
Jyotirmoy Sundi
Data Engineer,
Admobius

San Francisco, CA 94158

Mime
View raw message