Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 95168 invoked from network); 13 Dec 2010 19:29:00 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 13 Dec 2010 19:29:00 -0000 Received: (qmail 62827 invoked by uid 500); 13 Dec 2010 19:28:58 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 62787 invoked by uid 500); 13 Dec 2010 19:28:58 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 62779 invoked by uid 99); 13 Dec 2010 19:28:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Dec 2010 19:28:58 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=FREEMAIL_FROM,FS_REPLICA,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jdcryans@gmail.com designates 209.85.161.50 as permitted sender) Received: from [209.85.161.50] (HELO mail-fx0-f50.google.com) (209.85.161.50) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Dec 2010 19:28:53 +0000 Received: by fxm14 with SMTP id 14so6063790fxm.23 for ; Mon, 13 Dec 2010 11:28:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type:content-transfer-encoding; bh=4P4kRY+EucDhZfkQedILW5rYIrOrV/vr/rPBNmupIV0=; b=HO8u8nICgslg9g7OLXUEubK0O++0KlBEZNoJd2Tc8bsvih6/0QxN4utXMp3nNY94xh CnQS82ATkh/LqaTUL7PwoNC4RiG/Rtf68/MSJ0j8W5zZnXQ71IEYU/xT6W9dom5lBHj3 Yyw5RY/8QWLf/g0yW8+Eo1qmeKCVlSwz6kv4A= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=sVNXcZvZmcdc354G2JhzDUK+HTsCNEK7orQhqgvjyjtBeg4fuXkyF/bQkTpZ6OP6Zc r8wROiDCfoPZ/VknQS6WOAvh5OZA6CeQHT6hzu3K5XVjTomT/zQmnRcDZC6t2e9ojAhf oFK2dZt2PCxNO6vNUmkHES3G+I14TMNNiv7GQ= MIME-Version: 1.0 Received: by 10.223.81.79 with SMTP id w15mr3835947fak.72.1292268512365; Mon, 13 Dec 2010 11:28:32 -0800 (PST) Sender: jdcryans@gmail.com Received: by 10.223.86.133 with HTTP; Mon, 13 Dec 2010 11:28:32 -0800 (PST) In-Reply-To: References: Date: Mon, 13 Dec 2010 11:28:32 -0800 X-Google-Sender-Auth: XoGE82oXixvkWu41917bMWEwiVE Message-ID: Subject: Re: HBase Replication problems From: Jean-Daniel Cryans To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hi Nathaniel, Thanks for trying out replication, let's make it work for you. So on the master-side there's 2 lines that are important to make sure that replication works, first it has to say: Replicating x Where x is the number of edits it's going to ship, and then Replicated in total: y Where y is the total number it replicated. Seeing the second line means that replication was successful, at least from the master point of view. On the slave, one node should have: Total replicated: z And that z is the number of edits that that region server applied on it's cluster. It could be on any region server, since the sink for replication is chose at random. Do you see those? Any exceptions around those logs apart from EOFs? Thx, J-D On Mon, Dec 13, 2010 at 10:52 AM, Nathaniel Cook wrote: > Hi, > > I am trying to setup replication for my HBase clusters. I have two > small clusters for testing each with 4 machines. The setup for the two > clusters is identical. Each machine runs a DataNode, and > HRegionServer. Three of the machines run a ZK peer and one machine > runs the HMaster and NameNode. The cluster master machines have > hostnames (ds1,ds2 ...) and the slave cluster is (bk1, bk2 ...). I set > the replication =A0scope to 1 for my test table column families and set > the hbase.replication property to true for both clusters. Next I ran > the add_peer.rb script with the following command on the ds1 machine: > > hbase org.jruby.Main /usr/lib/hbase/bin/replication/add_peer.rb > ds1:2181:/hbase bk1:2181:/hbase > > After the script finishes ZK for the master cluster has the > replication znode and children of peers, master, and state. The slave > ZK didn't have a replication znode. I fixed that problem by rerunning > the script on the bk1 machine and commenting out the code to write to > the master ZK. Now the slave ZK has the /hbase/replication/master > znode with data (ds1:2181:/hbase). Everthing looked to be configured > correctly. I restarted the clusters. The logs of the master > regionservers stated: > > This cluster (ds1:2181:/hbase) is a master for replication, compared > with (ds1:2181:/hbase) > > The logs on the slave cluster stated: > > This cluster (bk1:2181:/hbase) is a slave for replication, compared > with (ds1:2181:/hbase) > > Using the hbase shell I put a row into the test table. > > The regionserver for that table had a log statement like: > > Going to report log #192.168.1.166%3A60020.1291757445179 for position > 15828 in hdfs://ds1:9000/hbase/.logs/ds1.internal,60020,1291757445059/192= .168.1.166 > %3A60020.1291757445179 > > (192.168.1.166 is ds1) > > I wait and even after several minutes the row still does not appear in > the slave cluster table. > > Any help with what the problem might be is greatly appreciated. > > Both clusters are using a CDH3b3. The HBase version is exactly > 0.89.20100924+28. > > -Nathaniel Cook >