Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 57BC2C896 for ; Mon, 3 Jun 2013 05:19:14 +0000 (UTC) Received: (qmail 77098 invoked by uid 500); 3 Jun 2013 05:19:11 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 77055 invoked by uid 500); 3 Jun 2013 05:19:11 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 77036 invoked by uid 99); 3 Jun 2013 05:19:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Jun 2013 05:19:10 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of asaf.mesika@gmail.com designates 209.85.219.52 as permitted sender) Received: from [209.85.219.52] (HELO mail-oa0-f52.google.com) (209.85.219.52) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Jun 2013 05:19:04 +0000 Received: by mail-oa0-f52.google.com with SMTP id h1so815269oag.11 for ; Sun, 02 Jun 2013 22:18:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=pXgi8fyc3SxuXXacSlNkR8PklwZOnN4csCXriVwl5as=; b=I7UdWQyUObl69EPLkDsFNmRNLIxpJr566pDQoPd5q7x1FFhVw8/Mfo7n0KOXR2R5dR KCwBO2TPFBHt9rsQTk8oWkuPscyEQNWgw4ur/NZuNeVAkJqYn0elI3GtQ92wpCidXH1M bEo1Kcuy/d57IotveEywG6KVsQ2oU3G+XzD35hMJRRzUb9gsd/9BynTbkqGQnTWWI1ls K8RrTI9SF1/htEVVpOZ5Pq8W/F4+vHlcFdVbQl/lpN3CDWAMcO1xmgcUhvKrx4sxTIwl vkI8ohrfO5Z3sV4fQtD3iEuiPhCM1Nyz3E2iG+SlacGMGKRe/Brx3z2ZlknBpNoZbtux atmA== MIME-Version: 1.0 X-Received: by 10.60.54.34 with SMTP id g2mr2969119oep.16.1370236724070; Sun, 02 Jun 2013 22:18:44 -0700 (PDT) Received: by 10.60.164.42 with HTTP; Sun, 2 Jun 2013 22:18:43 -0700 (PDT) In-Reply-To: References: Date: Mon, 3 Jun 2013 08:18:43 +0300 Message-ID: Subject: Re: Weird Replication exception From: Asaf Mesika To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=089e0115f3963261c804de391945 X-Virus-Checked: Checked by ClamAV on apache.org --089e0115f3963261c804de391945 Content-Type: text/plain; charset=UTF-8 No, this was brand new with 0 length thus the peculiar message of too old was strange to me. On Monday, June 3, 2013, Himanshu Vashishtha wrote: > Hey Asaf, > > It looks like you only need 7122. Either upgrade, or you could also patch > it up. > > Syncing up the master and slave cluster is also advised, but that stands > good in case you are using master-master replication. > > > bq. 172.25.98.74,60020, > 1369903540894/172.25.98.74 > %2C60020%2C1369903540894.1369925171871 > > bq. Meaning I lost data > > Did the log whose znode you deleted had any data? > You can do a cat to see if there is any data on it. You could copy-table > for that time range, (or a hacky way is to re-create a znode for that log > under a regionserver noticing the format of its current log znodes, and let > the replicationSource pick it up in its normal run). > > Thanks, > Himanshu > > > > On Sun, Jun 2, 2013 at 12:38 PM, Ted Yu wrote: > > > bq. Is 0.94.8 production ready? > > > > I think so. Lars released 0.94.8 Friday evening. > > > > On Sun, Jun 2, 2013 at 12:26 PM, Asaf Mesika > > wrote: > > > > > I use 0.94.7. > > > Is 0.94.8 production ready? > > > > > > So in summary I have two issues: > > > 1. Clocks are out of sync > > > 2. I need to upgrade to 0.94.8 to avoid seeing this WARN messages? > > > > > > On Jun 2, 2013, at 5:46 PM, Ted Yu wrote: > > > > > > > What is the HBase version you're using ? > > > > > > > > In another thread, I mentioned this: > > > > > > > > There was a recently integrated JIRA (0.94.8): > > > > HBASE-7122 Proper warning message when opening a log file with no > > entries > > > > (idle cluster) > > > > > > > > Does the HBase you're using contain HBASE-7122 ? > > > > > > > > Cheers > > > > > > > > On Sat, Jun 1, 2013 at 10:20 PM, Asaf Mesika > > > wrote: > > > > > > > >> Hi, > > > >> > > > >> I have a weird error in a cluster I'm checking Replication with. > > > >> > > > >> I have two clusters set up, each on its own DC (different > continents). > > > Each > > > >> has 1 master, and 3 RS. > > > >> > > > >> I've done all required setup, started replication and pushed in some > > > data > > > >> into the master. I had an issue where the slave (peer) cluster went > > dead > > > >> (all RS failed contacting the master), thus replication couldn't > work. > > > This > > > >> happened right before the weekend, so it was out for 3 days. > > > >> > > > >> Now I'm back in the office - got slave cluster back up (just the > RS), > > > and I > > > >> got some nasty exception in one of the RS of the master cluster: > > > >> > > > >> 2013-06-02 04:40:45,903 INFO > > > >> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: > > > Getting > > > >> 0 rs from peer cluster # c > > > >> 2013-06-02 04:40:45,903 INFO > > > >> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: > > > Slave > > > >> cluster looks down: c has 0 region servers > > > >> 2013-06-02 04:40:46,903 DEBUG > > > >> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: > > > Since > > > >> we are unable to replicate, sleeping 1000 times 10 > > > >> 2013-06-02 04:40:57,019 INFO > > > >> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: > > > Getting > > > >> 0 rs from peer cluster # c > > > >> 2013-06-02 04:40:57,019 INFO > > > >> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: > > > Slave > > > >> cluster looks down: c has 0 region servers > > > >> 2013-06-02 04:40:58,019 DEBUG > > > >> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: > > > Since > > > >> we are unable to replicate, sleeping 1000 times 10 > > > >> 2013-06-02 04:41:08,134 INFO > > > >> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: > > > Getting > > > >> 1 rs from peer cluster # c > > > >> 2013-06-02 04:41:08,134 INFO > > > >> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: > > > >> Choosing peer a72-246-95-86,60020,1370147274693 > > > >> 2013-06-02 04:41:08,672 DEBUG > > > >> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: > > > >> Replicating 1 > > > >> 2013-06-02 04:41:08,971 INFO > > > >> > > > > > org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceM --089e0115f3963261c804de391945--