Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 16EDC114B4 for ; Tue, 16 Sep 2014 08:36:08 +0000 (UTC) Received: (qmail 27044 invoked by uid 500); 16 Sep 2014 08:36:07 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 26987 invoked by uid 500); 16 Sep 2014 08:36:07 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 26973 invoked by uid 99); 16 Sep 2014 08:36:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Sep 2014 08:36:06 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of fpjunqueira@yahoo.com designates 98.139.212.179 as permitted sender) Received: from [98.139.212.179] (HELO nm20.bullet.mail.bf1.yahoo.com) (98.139.212.179) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Sep 2014 08:36:00 +0000 Received: from [66.196.81.173] by nm20.bullet.mail.bf1.yahoo.com with NNFMP; 16 Sep 2014 08:35:39 -0000 Received: from [98.139.212.202] by tm19.bullet.mail.bf1.yahoo.com with NNFMP; 16 Sep 2014 08:35:39 -0000 Received: from [127.0.0.1] by omp1011.mail.bf1.yahoo.com with NNFMP; 16 Sep 2014 08:35:39 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 38674.85816.bm@omp1011.mail.bf1.yahoo.com Received: (qmail 36602 invoked by uid 60001); 16 Sep 2014 08:35:38 -0000 X-YMail-OSG: l0rVJ4kVM1kauV9yM_YcCdNnDIrDdM.Ggiby_lNZbUkU7D6 3kyRlYYr_oCWETYTnoTuMjv5NbhlhbhFPE7FVe1wc_7pZ4jKHbe5AK3.90XE zNxS6qxcC1rANt0Kb9AF6MQ.2t.quT1HD_iroGDHo8kwFTgGfEU6Ns5kZblq cbatDfqgQ6_zRhba18mDhQQPTl0Tp6UutHfjJXlmrKWXXuDlbrRINHCfPCiF 6b2clkUafxH8aeeXxA8FDY2RlaYlCrm1ZYe6iK2qPhc3vv9RrI6I_e7OiVis ZI1hiCmMilEMGmIvqFYpomFV1bdxhBo882F8YlI4tohGg6QPedeIAr_Z9wJl jBYPT3e5bqNCPA3KyVFfhvuvNTuu3vqdg.6Hxe.YDyvL_FAdqUd.XHZtt5M9 QYPCy4U77KuhkpIFHcQkpkSvBhlmuiWZ4wy0rPgvfANak9_zPlBtzgBP8gly .aJiVJ9UbnHEm1_.mJtwBed8uUR_NPkUAJaMI60YOjwTdZQ1JcycX0xaryVB 5MkvavCGb27W9giNSh4B2u8cQCavB55TcZrW6rFdhMb1NMo_M.nds0PImqzH Tk.AE.5owaWK2MfXtm9pA2j_MkkDQar.tF635ZKvAGTLDGY10PI1dd9OG9Kw Em36ioywrZ6mwqlQORRfLLj2XEPhqX_rdKl63lvCq_rL2.FSch7i_hUMCBMz b4XO.zaEDhinKX1x14IG9KAtJNI_gW6Wu47UwkOc345AJJKLQBdNYoYVO_GI .Ng5dRaRsIxwnyl27qq5YSg-- Received: from [167.220.196.145] by web142301.mail.bf1.yahoo.com via HTTP; Tue, 16 Sep 2014 01:35:38 PDT X-Rocket-MIMEInfo: 002.001,V2hhdCBpZiB5b3UgdXNlICd6a1NlcnZlci5zaCBzdGFydC1mb3JlZ3JvdW5kJyB0byBkZWJ1Zz8KCi1GbGF2aW8KCgpPbiBUdWVzZGF5LCBTZXB0ZW1iZXIgMTYsIDIwMTQgNToyMCBBTSwgbGFsaXQgamFuZ3JhIDxsYWxpdC5qLmphbmdyYUBnbWFpbC5jb20.IHdyb3RlOgogCgo.Cj4KPkhlbGxvIEZsYXZpbywKPgo.SSBhbSB1c2luZyAnemtTZXJ2ZXIuc2ggc3RhcnQnIGNvbW1hbmQgdG8gc3RhcnQgem9va2VlcGVyIG5vZGVzLiBJIGFsc28KPmNvdWxkIHNlZSBsb2dzIGluIGxvZyBmb2xkZXJzIGluIGhhdmUBMAEBAQE- X-Mailer: YahooMailWebService/0.8.203.696 References: <1410789552.20480.YahooMailNeo@web142305.mail.bf1.yahoo.com> <1410821777.72786.YahooMailNeo@web142305.mail.bf1.yahoo.com> Message-ID: <1410856538.71906.YahooMailNeo@web142301.mail.bf1.yahoo.com> Date: Tue, 16 Sep 2014 01:35:38 -0700 From: Flavio Junqueira Reply-To: Flavio Junqueira Subject: Re: Getting errors in zookeeper logs To: "user@zookeeper.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="1747921928-695415479-1410856538=:71906" X-Virus-Checked: Checked by ClamAV on apache.org --1747921928-695415479-1410856538=:71906 Content-Type: text/plain; charset=us-ascii What if you use 'zkServer.sh start-foreground' to debug? -Flavio On Tuesday, September 16, 2014 5:20 AM, lalit jangra wrote: > > >Hello Flavio, > >I am using 'zkServer.sh start' command to start zookeeper nodes. I also >could see logs in log folders in have specified but these logs are in a >form which is difficult to understand. > >Also regarding to using 6 zookeeper nodes (3+3), is it fine to handle >failures as per 50% rule as if 3 are down my cluster should work or should >i move to having odd numbers such as 5 or 7 here? > >Regards. > >On Tue, Sep 16, 2014 at 4:26 AM, Flavio Junqueira < >fpjunqueira@yahoo.com.invalid> wrote: > >> Instead of guessing, I think it is best if we understand what's going >> wrong with the servers, you need to look at the server logs. If you don't >> know how to get it, could you please share the command you're using to >> start servers? >> >> -Flavio >> >> >> >> On Monday, September 15, 2014 3:30 PM, lalit jangra < >> lalit.j.jangra@gmail.com> wrote: >> >> >> > >> > >> >Hello Flavio, >> > >> >Can this issue arise from system not having enough RAM for Java Heap as i >> >could see my system is running on top of its RAM? >> > >> >Also is there any way to assign memory to zookeeper nodes? >> > >> >Regards. >> > >> >On Mon, Sep 15, 2014 at 7:37 PM, lalit jangra >> >wrote: >> > >> >> Thanks Flavio, >> >> >> >> I am having 3+3 zookeeper nodes on two servers MCF1 & MCF2. Also i could >> >> see same error on both nodes. For logs into servers, i am not able to >> read >> >> anything from these, how can i read and interpret from zookeeper servers >> >> what is wrong? >> >> >> >> I have put different log & data directories for each of zookeeper, may >> be >> >> i should elaborate a bit more. I am deciding on names of logs & data >> >> directory as per myid (ranging from 1 to 6). >> >> >> >> ZK1 -> Data.1 -> Logs.1 >> >> ZK2 -> Data.2 -> Logs.2 >> >> ZK3 -> Data.3 -> Logs.3 >> >> ZK4 -> Data.4 -> Logs.4 >> >> ZK5 -> Data.5 -> Logs.5 >> >> ZK6 -> Data.6 -> Logs.6 >> >> >> >> As i have two servers only and i need to make it running on these two >> only >> >> so i chose this architecture. Also i am trying to make even for scenario >> >> where one node is down, i have only 3 zookeepers down so still second is >> >> working. If i have odd numbers say 5 or 7, if server with more numbers >> of >> >> zookeeper is down, its gone. >> >> >> >> Regards. >> >> >> >> >> >> On Mon, Sep 15, 2014 at 7:29 PM, Flavio Junqueira < >> >> fpjunqueira@yahoo.com.invalid> wrote: >> >> >> >>> I believe you have shared just the client-side errors, and I was >> >>> wondering what's going on with the servers. One problem I could spot >> with >> >>> the configuration is with the values of dataDir and dataLogDir. It >> looks >> >>> like the processes on the same node are writing to the same directory, >> >>> which should be confusing the servers. >> >>> >> >>> A couple of things about your setting. I'm not sure what your >> motivation >> >>> is to put multiple servers on the same node. It will induce correlated >> >>> crashes for the servers on the same node. Also, we in general >> recommend to >> >>> use an odd number of servers (5 or 7 for your case). >> >>> >> >>> -Flavio >> >>> >> >>> On Wednesday, September 10, 2014 6:29 AM, lalit jangra < >> >>> lalit.j.jangra@gmail.com> wrote: >> >>> >> >>> >> >>> > >> >>> > >> >>> >Hi, >> >>> > >> >>> >I am running cluster of two Apache ManifoldCF nodes on two separate >> >>> >machines each of which having 3 zookeeper instances (total 6 >> instances in >> >>> >cluster). When i am running up manifoldCF agents, i see below warning >> >>> >during startup. >> >>> > >> >>> >[http-bio-80-exec-2-SendThread(iwdc1preecma03.iwater.ie:2181)] INFO >> >>> >org.apache.zookeeper.ClientCnxn - Unable to read additional data from >> >>> >server sessionid 0x0, likely server has closed socket, closing socket >> >>> >connection and attempting reconnect >> >>> > >> >>> >[http-bio-80-exec-2-SendThread(iwdc2preecma04.iwater.ie:2182)] INFO >> >>> >org.apache.zookeeper.ClientCnxn - Opening socket connection to server >> >>> >iwdc2preecma04.iwater.ie/10.231.72.25:2182. Will not attempt to >> >>> >authenticate using SASL (unknown error) >> >>> > >> >>> > >> >>> >Also i could see below error in logs in while agents are running. >> >>> > >> >>> >[localhost-startStop-1-SendThread(iwdc1preecma03.iwater.ie:2183)] >> WARN >> >>> >org.apache.zookeeper.ClientCnxn - Session 0x6485a8006060079 for server >> >>> >iwdc1preecma03.iwater.ie/10.231.72.24:2183, unexpected error, closing >> >>> >socket connection and attempting reconnect >> >>> > >> >>> >java.io.IOException: Connection reset by peer >> >>> > >> >>> > at sun.nio.ch.FileDispatcherImpl.read0(Native Method) >> >>> > >> >>> > at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) >> >>> > >> >>> > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:225) >> >>> > >> >>> > at sun.nio.ch.IOUtil.read(IOUtil.java:193) >> >>> > >> >>> > at >> sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:375) >> >>> > >> >>> > at >> >>> >> >>> >> >org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68) >> >>> > >> >>> > at >> >>> >> >>> >> >org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355) >> >>> > >> >>> > at >> >>> >org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068) >> >>> > >> >>> > >> >>> >Below are configurations for 1. zookeeper nodes & 2. MCF nodes for >> >>> >zookeeper. >> >>> > >> >>> > >> >>> >*zoo.cfg : Same for all six zookeeper nodes.* >> >>> > >> >>> > >> >>> ># The number of milliseconds of each tick >> >>> > >> >>> >tickTime=2000 >> >>> > >> >>> >dataDir=/app/IW/zookeeper/data/data.1 >> >>> > >> >>> >dataLogDir=/app/IW/zookeeper/logs/log.1 >> >>> > >> >>> >clientPort=2181 >> >>> > >> >>> >server.1=iwdc1preecma03:2888:3888 >> >>> > >> >>> >server.2=iwdc1preecma03:2889:3889 >> >>> > >> >>> >server.3=iwdc1preecma03:2890:3890 >> >>> > >> >>> >server.4=iwdc2preecma04:2891:3891 >> >>> > >> >>> >server.5=iwdc2preecma04:2892:3892 >> >>> > >> >>> >server.6=iwdc2preecma04:2893:3893 >> >>> > >> >>> ># The number of ticks that the initial >> >>> > >> >>> ># synchronization phase can take >> >>> > >> >>> >initLimit=10 >> >>> > >> >>> ># The number of ticks that can pass between >> >>> > >> >>> ># sending a request and getting an acknowledgement >> >>> > >> >>> >syncLimit=5 >> >>> > >> >>> ># the directory where the snapshot is stored. >> >>> > >> >>> ># do not use /tmp for storage, /tmp here is just >> >>> > >> >>> ># example sakes. >> >>> > >> >>> >#dataDir=/tmp/zookeeper >> >>> > >> >>> ># the port at which the clients will connect >> >>> > >> >>> >#clientPort=2181 >> >>> > >> >>> ># the maximum number of client connections. >> >>> > >> >>> ># increase this if you need to handle more clients >> >>> > >> >>> >#maxClientCnxns=60 >> >>> > >> >>> ># >> >>> > >> >>> ># Be sure to read the maintenance section of the >> >>> > >> >>> ># administrator guide before turning on autopurge. >> >>> > >> >>> ># >> >>> > >> >>> ># >> >>> >> http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance >> >>> > >> >>> ># >> >>> > >> >>> ># The number of snapshots to retain in dataDir >> >>> > >> >>> >autopurge.snapRetainCount=3 >> >>> > >> >>> ># Purge task interval in hours >> >>> > >> >>> ># Set to "0" to disable auto purge feature >> >>> > >> >>> >autopurge.purgeInterval=1 >> >>> > >> >>> > >> >>> > >> >>> >*ManifoldCF configurations : same for both ManifoldCF nodes.* >> >>> > >> >>> > >> >>> >> >>> >value="org.apache.manifoldcf.core.lockmanager.ZooKeeperLockManager"/> >> >>> > >> >>> > > >>> >> >>> >> >value="iwdc1preecma03:2181,iwdc1preecma03:2182,iwdc1preecma03:2183,iwdc2preecma04:2181,iwdc2preecma04:2182,iwdc2preecma04:2183"/> >> >>> > >> >>> >> >>> >value="4000"/> >> >>> > >> >>> > >> >>> > >> >>> >*I want to know if due to above warnings/errors, will zookeeper stop >> >>> >working or will zookeeper will work and these are non-failing >> messages, >> >>> >because ManifoldCF jobs are stuck while i can see these errors.* >> >>> > >> >>> >Please suggest. >> >>> > >> >>> >Regards, >> >>> >Lalit. > >> > >> >>> > >> >>> > >> >>> > >> >> >> >> >> >> >> >> >> >> -- >> >> Regards, >> >> Lalit. >> >> >> > >> > >> > >> >-- >> >Regards, >> >Lalit. >> > >> > >> > >> > > > >-- >Regards, >Lalit. > > > --1747921928-695415479-1410856538=:71906--