Return-Path: X-Original-To: apmail-giraph-user-archive@www.apache.org Delivered-To: apmail-giraph-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 77117102DD for ; Wed, 2 Oct 2013 05:42:10 +0000 (UTC) Received: (qmail 64044 invoked by uid 500); 2 Oct 2013 05:40:42 -0000 Delivered-To: apmail-giraph-user-archive@giraph.apache.org Received: (qmail 64005 invoked by uid 500); 2 Oct 2013 05:40:35 -0000 Mailing-List: contact user-help@giraph.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@giraph.apache.org Delivered-To: mailing list user@giraph.apache.org Received: (qmail 63947 invoked by uid 99); 2 Oct 2013 05:40:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Oct 2013 05:40:23 +0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sundi133@gmail.com designates 209.85.223.176 as permitted sender) Received: from [209.85.223.176] (HELO mail-ie0-f176.google.com) (209.85.223.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Oct 2013 05:40:18 +0000 Received: by mail-ie0-f176.google.com with SMTP id as1so767503iec.21 for ; Tue, 01 Oct 2013 22:39:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=KN6UfrgF3VdGJmoYYVA57iNXKV5PF7UWe3zN9sPY/FU=; b=jTrAxAgAMv2NyesVDioRHZ+rNOCenqApIIhzRQUw+q7qvQwihA3pFPoqkirThj7ZXA nKlMXqMN2thc75aArsnZc+SzUR0F5Kq299TwFuNhYkDVhtCpTxgqMrrNZDTv0wvtix9d v7jD34a/9t/0Rk+Aocn7XVJ9ZnaltpOw74Lai7Xkkr7PoBvoTMxD6hCD6ddcYmTPyIpt Pr00n/s8DWzuHak2VdS1bUjFQCM+xBPIkQc6qXbz8oF7USoaXyixubyy1Q/9YvXSNZY8 y49CCyG4EeLLzsUeGnEJdtPbYwsk8RhOd1mCsQjUJfzDCAfBWPza/s4/G8uuZNrax9ig FwOQ== MIME-Version: 1.0 X-Received: by 10.50.97.35 with SMTP id dx3mr688748igb.55.1380692398205; Tue, 01 Oct 2013 22:39:58 -0700 (PDT) Received: by 10.64.249.66 with HTTP; Tue, 1 Oct 2013 22:39:57 -0700 (PDT) In-Reply-To: <524BACA7.5000803@apache.org> References: <524BACA7.5000803@apache.org> Date: Tue, 1 Oct 2013 22:39:57 -0700 Message-ID: Subject: Re: zookeeper connection issue while running for second time From: Jyotirmoy Sundi To: user@giraph.apache.org Content-Type: multipart/alternative; boundary=e89a8ffbae75f0ce8904e7bb7f3b X-Virus-Checked: Checked by ClamAV on apache.org --e89a8ffbae75f0ce8904e7bb7f3b Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Thanks a lot Avery for your response, I increased the timeout to 10 minutes *changed:* -Dgiraph.zkSessionMsecTimeout=3D600000 and -Dgiraph.useInputSplitLocality=3Dfalse , It is working for consecutive runs now without any errors. Thanks Sundi On Tue, Oct 1, 2013 at 10:18 PM, Avery Ching wrote: > We did have this error a few times. This can happen due to GC pauses, > so I would check the worker for long GC issues. Also, you can increase t= he > ZooKeeper timeouts, see > > /** ZooKeeper session millisecond timeout */ > IntConfOption ZOOKEEPER_SESSION_TIMEOUT =3D > new IntConfOption("giraph.zkSessionMsecTimeout", MINUTES.toMillis(1= ), > "ZooKeeper session millisecond timeout"); > > Currently, the default is one minute, but in production we set that numbe= r > much, much higher (even greater than a day sometimes) to avoid the > disconnection. > > Hope that helps, > Avery > > > On 10/1/13 6:27 PM, Jyotirmoy Sundi wrote: > > Hi , > I am able to run apache giraph successfully with around 500M pairs to > find Connected components. It works great but not always, the issue seems > to be with the time out zookeeper time out. Some of the client(around 5-1= 0 > ) out of 100, produces this error and the master fails due to this.Do you > have any suggestions for this error. Any suggestions will be appreaciated= . > > 2013-10-02 01:20:43,651 WARN org.apache.giraph.bsp.BspService: process: D= isconnected from ZooKeeper (will automatically try to recover) WatchedEvent= state:Disconnected type:None path:null > 2013-10-02 01:20:44,035 INFO org.apache.zookeeper.ClientCnxn: Opening soc= ket connection to server had22.rsk.admobius.com/10.240.51.32:2181. Will not= attempt to authenticate using SASL (Unable to locate a login configuration= ) > 2013-10-02 01:20:44,035 INFO org.apache.zookeeper.ClientCnxn: Socket conn= ection established to had22.rsk.admobius.com/10.240.51.32:2181, initiating = session > 2013-10-02 01:20:44,037 INFO org.apache.zookeeper.ClientCnxn: Unable to r= econnect to ZooKeeper service, session 0x441604c97412331 has expired, closi= ng socket connection > 2013-10-02 01:20:44,037 WARN org.apache.giraph.bsp.BspService: process: G= ot unknown null path event WatchedEvent state:Expired type:None path:null > 2013-10-02 01:20:44,038 INFO org.apache.zookeeper.ClientCnxn: EventThread= shut down > 2013-10-02 01:21:20,046 INFO org.apache.giraph.worker.VertexInputSplitsCa= llable: readVertexInputSplit: Loaded 250000 vertices at 1827.2925619484213 = vertices/sec 1728790 edges at 12636.730317550928 edges/sec Memory (free/tot= al/max) =3D 1745.60M / 2262.19M / 2730.69M > 2013-10-02 01:21:24,788 INFO org.apache.giraph.worker.InputSplitsCallable= : loadFromInputSplit: Finished loading /_hadoopBsp/job_201309260044_1132/_v= ertexInputSplitDir/601 (v=3D261131, e=3D1808572) > 2013-10-02 01:21:24,789 ERROR org.apache.giraph.utils.LogStacktraceCallab= le: Execution of callable failed > java.lang.IllegalStateException: markInputSplitPathFinished: KeeperExcept= ion on /_hadoopBsp/job_201309260044_1132/_vertexInputSplitDir/601/_vertexIn= putSplitFinished > at org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinishe= d(InputSplitsHandler.java:168) > at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSpli= tsCallable.java:226) > at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable= .java:161) > at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable= .java:58) > at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCalla= ble.java:51) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec= utor.java:895) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor= .java:918) > at java.lang.Thread.run(Thread.java:662) > Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: = KeeperErrorCode =3D Session expired for /_hadoopBsp/job_201309260044_1132/_= vertexInputSplitDir/601/_vertexInputSplitFinished > at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) > at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) > at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152) > at org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinishe= d(InputSplitsHandler.java:159) > ... 9 more > > > -- > Best Regards, > Jyotirmoy Sundi > Admobius > > San Francisco, CA 94158 > > > On Thu, Sep 26, 2013 at 6:08 PM, Jyotirmoy Sundi wrot= e: > >> Hi , >> >> I got the connected component working for 1B nodes, but when I run th= e job again, it fails with the below error. Aprt form this in zookeeper the= data is not cleared in the data directory. For successful jobs the data in= zookeper from giraph is cleared. >> >> The following errors seems to be coming because the node tries to connec= t to the zookeeper with a session id which is cleared as seens in >> >> "Client session timed out, have not heard from server in 68845ms for ses= sionid 0x3415cc6ce930059, closing socket connection and attempting reconnec= t" , Any idea if increasing the session time out will be good ? >> >> 2013-09-27 00:57:11,748 WARN org.apache.giraph.bsp.BspService: process: = Got unknown null path event WatchedEvent state:Expired type:None path:null >> 2013-09-27 00:57:11,748 INFO org.apache.zookeeper.ClientCnxn: Unable to = reconnect to ZooKeeper service, session 0x3415cc6ce930059 has expired, clos= ing socket connection >> 2013-09-27 00:57:11,748 WARN org.apache.giraph.worker.InputSplitsHandler= : process: Problem with zookeeper, got event with path null, state Expired,= event type None >> 2013-09-27 00:57:11,748 INFO org.apache.zookeeper.ClientCnxn: EventThrea= d shut down >> 2013-09-27 00:57:11,925 INFO org.apache.giraph.worker.InputSplitsCallabl= e: loadFromInputSplit: Finished loading /_hadoopBsp/job_201309260044_0116/_= vertexInputSplitDir/89 (v=3D258127, e=3D1792906) >> 2013-09-27 00:57:11,926 ERROR org.apache.giraph.utils.LogStacktraceCalla= ble: Execution of callable failed >> java.lang.IllegalStateException: markInputSplitPathFinished: KeeperExcep= tion on /_hadoopBsp/job_201309260044_0116/_vertexInputSplitDir/89/_vertexIn= putSplitFinished >> at org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinish= ed(InputSplitsHandler.java:168) >> at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSpl= itsCallable.java:226) >> at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallabl= e.java:161) >> at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallabl= e.java:58) >> at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCall= able.java:51) >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExe= cutor.java:895) >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecuto= r.java:918) >> at java.lang.Thread.run(Thread.java:662) >> Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException:= KeeperErrorCode =3D Session expired for /_hadoopBsp/job_201309260044_0116/= _vertexInputSplitDir/89/_vertexInputSplitFinished >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:127= ) >> at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) >> at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) >> at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152) >> at org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinish= ed(InputSplitsHandler.java:159) >> ... 9 more >> >> >> -- >> Best Regards, >> Jyotirmoy Sundi >> Data Engineer, >> Admobius >> >> San Francisco, CA 94158 >> > > > > -- > Best Regards, > Jyotirmoy Sundi > Data Engineer, > Admobius > > San Francisco, CA 94158 > > > --=20 Best Regards, Jyotirmoy Sundi Data Engineer, Admobius San Francisco, CA 94158 --e89a8ffbae75f0ce8904e7bb7f3b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Thanks a lot Avery for your response, I increased the time= out to 10 minutes=A0
changed:
-Dgiraph.zkSessionMsecTimeout=3D600000 an= d -Dgiraph.useInputSplitLocality=3Dfalse=A0,
=A0It is working f= or consecutive runs now without any errors.

= Thanks
Sundi


On Tue,= Oct 1, 2013 at 10:18 PM, Avery Ching <aching@apache.org> wr= ote:
=20 =20 =20
We did have this error a few times.=A0 This can happen due to GC pauses, so I would check the worker for long GC issues.=A0 Also, you can increase the ZooKeeper timeouts, see

=A0 /** ZooKeeper session millisecond timeout */
=A0 IntConfOption ZOOKEEPER_SESSION_TIMEOUT =3D
=A0=A0=A0=A0=A0 new IntConfOption("giraph.zkSessionMsecTimeout&q= uot;, MINUTES.toMillis(1),
=A0=A0=A0=A0=A0=A0=A0=A0=A0 "ZooKeeper session millisecond timeo= ut");

Currently, the default is one minute, but in production we set that number much, much higher (even greater than a day sometimes) to avoid the disconnection.

Hope that helps,
Avery


On 10/1/13 6:27 PM, Jyotirmoy Sundi wrote:
Hi=A0,
= I am able to run apache giraph successfully=A0with around 500M pairs to find=A0Connected=A0components. It works great but not always, the issue seems to be with the time out zookeeper time out. Some of the client(around 5-10 ) out of 100, produces this error and the master fails due to this.Do you have any suggestions for this error. Any suggestions will be appreaciated.
2013-10-02 01:20:43,651 WARN org.apache.giraph.bsp.BspServic=
e: process: Disconnected from ZooKeeper (will automatically try to recover)=
 WatchedEvent state:Disconnected type:None path:null
2013-10-02 01:20:44,035 INFO org.apache.zookeeper.ClientCnxn: Opening socke=
t connection to server had22.rsk.admobius.com/10.240.51.32:2181. Wil=
l not attempt to authenticate using SASL (Unable to locate a login configur=
ation)
2013-10-02 01:20:44,035 INFO org.apache.zookeeper.ClientCnxn: Socket connec=
tion established to had22.rsk.admobius.com/10.240.51.32:2181, initia=
ting session
2013-10-02 01:20:44,037 INFO org.apache.zookeeper.ClientCnxn: Unable to rec=
onnect to ZooKeeper service, session 0x441604c97412331 has expired, closing=
 socket connection
2013-10-02 01:20:44,037 WARN org.apache.giraph.bsp.BspService: process: Got=
 unknown null path event WatchedEvent state:Expired type:None path:null
2013-10-02 01:20:44,038 INFO org.apache.zookeeper.ClientCnxn: EventThread s=
hut down
2013-10-02 01:21:20,046 INFO org.apache.giraph.worker.VertexInputSplitsCall=
able: readVertexInputSplit: Loaded 250000 vertices at 1827.2925619484213 ve=
rtices/sec 1728790 edges at 12636.730317550928 edges/sec Memory (free/total=
/max) =3D 1745.60M / 2262.19M / 2730.69M
2013-10-02 01:21:24,788 INFO org.apache.giraph.worker.InputSplitsCallable: =
loadFromInputSplit: Finished loading /_hadoopBsp/job_201309260044_1132/_ver=
texInputSplitDir/601 (v=3D261131, e=3D1808572)
2013-10-02 01:21:24,789 ERROR org.apache.giraph.utils.LogStacktraceCallable=
: Execution of callable failed
java.lang.IllegalStateException: markInputSplitPathFinished: KeeperExceptio=
n on /_hadoopBsp/job_201309260044_1132/_vertexInputSplitDir/601/_vertexInpu=
tSplitFinished
	at org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(=
InputSplitsHandler.java:168)
	at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplits=
Callable.java:226)
	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.j=
ava:161)
	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.j=
ava:58)
	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallabl=
e.java:51)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecut=
or.java:895)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j=
ava:918)
	at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: Ke=
eperErrorCode =3D Session expired for /_hadoopBsp/job_201309260044_1132/_ve=
rtexInputSplitDir/601/_vertexInputSplitFinished
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
	at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
	at org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(=
InputSplitsHandler.java:159)
	... 9 more

--=A0
Best Regards,
Jyotirmoy Sundi
Admobius

San Francisco, CA 94158



On Thu, Sep 26, 2013 at 6:08 PM, Jyotirmoy Sundi <sundi133@gmail.com> wrote:
Hi ,
   I got the connected component working for 1B nodes, b=
ut when I run the job again, it fails with the below error. Aprt form this =
in zookeeper the data is not cleared in the data directory. For successful =
jobs the data in zookeper from giraph is cleared.
The following errors seems to be coming because the node=
 tries to connect to the zookeeper with a session id which is cleared as se=
ens in 
"Client session t=
imed out, have not heard from server in 68845ms for sessionid 0x3415cc6ce93=
0059, closing socket connection and attempting reconnect" , Any idea if increasing the session time out=
 will be good ?
2013-09-27 00:57:11,748 WARN org.apache.giraph.bsp.BspSe=
rvice: process: Got unknown null path event WatchedEvent state:Expired type=
:None path:null
2013-09-27 00:57:11,748 INFO org.apache.zookeeper.ClientCnxn: Unable to rec=
onnect to ZooKeeper service, session 0x3415cc6ce930059 has expired, closing=
 socket connection
2013-09-27 00:57:11,748 WARN org.apache.giraph.worker.InputSplitsHandler: p=
rocess: Problem with zookeeper, got event with path null, state Expired, ev=
ent type None
2013-09-27 00:57:11,748 INFO org.apache.zookeeper.ClientCnxn: EventThread s=
hut down
2013-09-27 00:57:11,925 INFO org.apache.giraph.worker.InputSplitsCallable: =
loadFromInputSplit: Finished loading /_hadoopBsp/job_201309260044_0116/_ver=
texInputSplitDir/89 (v=3D258127, e=3D1792906)
2013-09-27 00:57:11,926 ERROR org.apache.giraph.utils.LogStacktraceCallable=
: Execution of callable failed
java.lang.IllegalStateException: markInputSplitPathFinished: KeeperExceptio=
n on /_hadoopBsp/job_201309260044_0116/_vertexInputSplitDir/89/_vertexInput=
SplitFinished
	at org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(=
InputSplitsHandler.java:168)
	at org.apache.giraph.worker.InputSplitsCallable.loadInputSplit(InputSplits=
Callable.java:226)
	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.j=
ava:161)
	at org.apache.giraph.worker.InputSplitsCallable.call(InputSplitsCallable.j=
ava:58)
	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallabl=
e.java:51)
	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
	at java.util.concurrent.FutureTask.run(FutureTask.java:138)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecut=
or.java:895)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.j=
ava:918)
	at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: Ke=
eperErrorCode =3D Session expired for /_hadoopBsp/job_201309260044_0116/_ve=
rtexInputSplitDir/89/_vertexInputSplitFinished
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
	at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
	at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783)
	at org.apache.giraph.zk.ZooKeeperExt.createExt(ZooKeeperExt.java:152)
	at org.apache.giraph.worker.InputSplitsHandler.markInputSplitPathFinished(=
InputSplitsHandler.java:159)
	... 9 more

--
Best Regards,
Jyotirmoy Sundi
Data Engineer,
Admobius<= /font>

San Francisco, CA 94158




--
Best Regards,
Jyotirmoy Sundi
Data En= gineer,
Admobius

San Francisco, CA 94158





--
Best Regards,=
Jyotirmoy Sundi
Data Engineer,
Admobius
=

San Francisco, CA 94158

--e89a8ffbae75f0ce8904e7bb7f3b--