Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 61F849264 for ; Mon, 2 Jul 2012 22:55:57 +0000 (UTC) Received: (qmail 93513 invoked by uid 500); 2 Jul 2012 22:55:55 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 93472 invoked by uid 500); 2 Jul 2012 22:55:55 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 93464 invoked by uid 99); 2 Jul 2012 22:55:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jul 2012 22:55:55 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=FSL_RCVD_USER,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [68.230.241.147] (HELO fed1rmfepo202.cox.net) (68.230.241.147) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jul 2012 22:55:47 +0000 Received: from fed1rmimpo110.cox.net ([68.230.241.159]) by fed1rmfepo202.cox.net (InterMail vM.8.01.04.00 201-2260-137-20101110) with ESMTP id <20120702225523.IRCN7477.fed1rmfepo202.cox.net@fed1rmimpo110.cox.net> for ; Mon, 2 Jul 2012 18:55:23 -0400 Received: from [127.0.0.1] ([24.251.157.174]) by fed1rmimpo110.cox.net with bizsmtp id VavM1j00M3m2ujc03avNbP; Mon, 02 Jul 2012 18:55:23 -0400 X-CT-Class: Clean X-CT-Score: 0.00 X-CT-RefID: str=0001.0A020203.4FF226DB.002F,ss=1,re=0.000,fgs=0 X-CT-Spam: 0 X-Authority-Analysis: v=1.1 cv=3oNJKdbxD3BBECFLH2UCt3mK2DnxolAqcovcJSjLvU8= c=1 sm=1 a=OHDxHPYXlGIA:10 a=nVRbr5GfEcwA:10 a=vHyJRmlyJCMA:10 a=8nJEP1OIZ-IA:10 a=KKywo8gj/PENq1i+sfHdUw==:17 a=COfzQ7OkAAAA:8 a=mV9VRH-2AAAA:8 a=r-5NjFBPAAAA:8 a=UKFkCDfHDOOGp9c1GNAA:9 a=wPNLvfGTeEIA:10 a=mamu5dpZ24wA:10 a=A1F8MrXfYoUA:10 a=KKywo8gj/PENq1i+sfHdUw==:117 X-CM-Score: 0.00 Authentication-Results: cox.net; none Message-ID: <4FF226D9.90007@circle-cross-jn.com> Date: Mon, 02 Jul 2012 15:55:21 -0700 From: Jay Wilson User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: user@hbase.apache.org Subject: Re: HBASE -- Regionserver and QuorumPeer ? References: <4FF0BB90.9010905@circle-cross-jn.com> <5897ED5E42C04F88A39BFECD4330A859@gmail.com> <4FF200C2.3020903@circle-cross-jn.com> In-Reply-To: X-Enigmail-Version: 1.4.2 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit First, thank you. I moved my HRegionservers not my HQuorumPeers. I have checked the network and everyone can talk to everyone. I can even talk to my HQuorumPeers via "nc" from the nodes that should be running my HMaster on it and my HRegionservers. [hadoop@devrackA-00 ~]$ zookeeper-check devrackA-03 imok This ZooKeeper instance is not currently serving requests This ZooKeeper instance is not currently serving requests devrackA-04 imok Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT Clients: /172.18.0.1:41582[0](queued=0,recved=1,sent=0) Latency min/avg/max: 0/0/0 Received: 5 Sent: 4 Outstanding: 0 Zxid: 0x0 Mode: follower Node count: 4 /172.18.0.1:41583[0](queued=0,recved=1,sent=0) devrackA-05 imok Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT Clients: /172.18.0.1:35517[0](queued=0,recved=1,sent=0) Latency min/avg/max: 0/0/0 Received: 5 Sent: 4 Outstanding: 0 Zxid: 0x0 Mode: follower Node count: 4 /172.18.0.1:35518[0](queued=0,recved=1,sent=0) ~~~~~~~~~~~~~~~~~~~~ [hadoop@devrackA-06 ~]$ jps 21276 Jps 20641 DataNode [hadoop@devrackA-06 ~]$ echo ruok | nc devrackA-04 2181 imok[hadoop@devrackA-06 ~]$ echo stat | nc devrackA-04 2181 Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT Clients: /172.18.0.7:37950[0](queued=0,recved=1,sent=0) Latency min/avg/max: 0/0/0 Received: 8 Sent: 7 Outstanding: 0 Zxid: 0x0 Mode: follower Node count: 4 ~~~~~~~~~~~~~~~~~~~ [hadoop@devrackB-07 ~]$ echo ruok | nc devrackA-04 2181 imok[hadoop@devrackB-07 ~]$ echo stat | nc devrackA-03 2181 This ZooKeeper instance is not currently serving requests [hadoop@devrackB-07 ~]$ echo stat | nc devrackA-05 2181 Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT Clients: /172.18.0.72:40784[0](queued=0,recved=1,sent=0) Latency min/avg/max: 0/0/0 Received: 7 Sent: 6 Outstanding: 0 Zxid: 0x0 Mode: follower Node count: 4 [hadoop@devrackB-07 ~]$ echo stat | nc devrackA-04 2181 Zookeeper version: 3.3.5-cdh3u4--1, built on 05/07/2012 20:10 GMT Clients: /172.18.0.72:60795[0](queued=0,recved=1,sent=0) Latency min/avg/max: 0/0/0 Received: 10 Sent: 9 Outstanding: 0 Zxid: 0x0 Mode: follower Node count: 4 [hadoop@devrackB-07 ~]$ ~~~~~~~~~~~ I know it says connection refused in the error, but are there files associated with a HRegionServer that I need to clean up? I did NOT move the HMaster or HQuorumPeers. I only moved the HRegionServers Thanks you for the help. --- Jay Wilson On 7/2/2012 2:43 PM, Suraj Varma wrote: > The error you are getting is: > >> 2012-07-02 12:39:02,205 INFO org.apache.zookeeper.ClientCnxn: Opening >> socket connection to server devrackA-05/172.18.0.6:2181 >> 2012-07-02 12:39:02,211 WARN org.apache.zookeeper.ClientCnxn: Session >> 0x0 for server null, unexpected error, closing socket connection and >> attempting reconnect >> java.net.ConnectException: Connection refused > > > This means this server is not able to reach the zookeeper. Did you > change your hbase-site.xml as well with the new zookeeper quorum? > Do basic connectivity testing to ensure that your hosts / DNS is all > in place after your relocations - checkout > http://hbase.apache.org/book.html#d1952e311 and see if the dns checker > tool might help. > --S > > > > On Mon, Jul 2, 2012 at 1:12 PM, Jay Wilson > wrote: >> First, Yep I am a newbie to Hadoop/Hbase. I have read both of the >> O'Reilly books (Hadoop and Hbase), so my knowledge level at this point >> is pure book learning and understanding the log messages is very vexing. >> >> Second, based on the recommendations of this mail-list I decided to move >> my HRegionservers to nodes other than where where my HQuorumpeers are. >> I updated my regionservers file on every node in the cluster. I ran >> stop-hbase.sh, stop-all.sh, and cleaned up my zookeeper files. Then I >> ran start-all.sh, waited, and then ran start-hbase.sh. Now my HMaster >> and HRegionservers terminate within seconds. Before I had them at least >> running for 30 minutes. The message is: >> >> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client >> environment:java.io.tmpdir=/tmp >> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client >> environment:java.compiler= >> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client >> environment:os.name=Linux >> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client >> environment:os.arch=amd64 >> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client >> environment:os.version=2.6.18-194.el5 >> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client >> environment:user.name=hadoop >> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client >> environment:user.home=/home/hadoop >> 2012-07-02 12:39:02,193 INFO org.apache.zookeeper.ZooKeeper: Client >> environment:user.dir=/home/hadoop/jscripts >> 2012-07-02 12:39:02,194 INFO org.apache.zookeeper.ZooKeeper: Initiating >> client connection, >> connectString=devrackA-03:2181,devrackA-05:2181,devrackA-04:2181 >> sessionTimeout=180000 watcher=master:60000 >> 2012-07-02 12:39:02,205 INFO org.apache.zookeeper.ClientCnxn: Opening >> socket connection to server devrackA-05/172.18.0.6:2181 >> 2012-07-02 12:39:02,211 WARN org.apache.zookeeper.ClientCnxn: Session >> 0x0 for server null, unexpected error, closing socket connection and >> attempting reconnect >> java.net.ConnectException: Connection refused >> >> I tried the same sequence again (stop-hbase.sh, stop-all.sh, and cleaned >> up zookeeper), but I get the same result (Connection refused). Is there >> something else I need to do when I move a regionserver? >> >> My zookeeper working directory is /home/hbase/zookeeper. Would there be >> other places that I need to clean up? >> >> >> >> Thank You >> -- >> Jay >> >> >> >> On 7/2/2012 11:25 AM, Amandeep Khurana wrote: >>> As someone who has been developing/running/using the software for a longer period of time than the person who is asking the question, you can best serve the poser by making them aware of the trade offs and why it's a good/bad idea to do things a certain way. At the end of the day, it's their choice to make based on their requirements and constraints. >>> >>> Having said that, it'll be really nice to stop this thread from becoming more about how to answer questions rather than answering the question itself. >>> >>> Bringing the thread back to track: >>> >>> Jay, you can certainly run zookeepers with the Datanodes and Region Server processes. The issue there (as highlighted by Andy earlier) is that you will likely load up the machine (primarily due to I/O) which will cause ZK some grief. It is generally recommended to collocate in the following groups: >>> >>> Datanode + Region Servers on the same physical nodes >>> Zookeeper and HBase Master on the same physical nodes (make sure to give ZK a dedicated spindle) >>> Namenode on an independent node >>> Secondary Namenode on an independent node >>> >>> These are the general recommendations and different environments might warrant different decisions. For instance, if it's just a PoC or Dev cluster where you don't really want to fret about SLAs and want to keep costs low, it might even be okay to collocate the Namenode, Zookeeper and HBase master on the same physical host. >>> >>> Hope that helps >>> >>> -Amandeep >>> >>> >>> On Monday, July 2, 2012 at 4:40 AM, Michael Segel wrote: >>> >>>> I am not finding fault with what Andy was saying. The problem is that we tend not to use stronger language when discussing these topics. And my point wasn't just on this topic but others posts where we say 'not a good idea' yet someone still pursues the idea until there's a chorus of saying not to do something. I'm not faulting the poster because he wasn't and isn't the only one who does this... We see it all the time where someone goes down the wrong path, and is looking for a quick solution, rather than following the recommendation. >>> >>> >> > >