Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2C2B66AB9 for ; Sat, 14 May 2011 04:28:47 +0000 (UTC) Received: (qmail 88351 invoked by uid 500); 14 May 2011 04:28:45 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 88206 invoked by uid 500); 14 May 2011 04:28:45 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 88196 invoked by uid 99); 14 May 2011 04:28:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 May 2011 04:28:43 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of saint.ack@gmail.com designates 209.85.216.176 as permitted sender) Received: from [209.85.216.176] (HELO mail-qy0-f176.google.com) (209.85.216.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 14 May 2011 04:28:38 +0000 Received: by qyk30 with SMTP id 30so2238488qyk.14 for ; Fri, 13 May 2011 21:28:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; bh=JCAQ2SJ2hrmKsoj86se+4mBRxozt18eJguf8VXwAl+I=; b=r9/wwmB+udR8t3I6F6MAJ4UFR41TszUZB3LDGipk9Zx4iMbOKI8l864yPPOvE/Xc0G zpZrBU5ERejamceyyG3RZs1SfMfJhhdMF1x0enC2PGZU+d/0p/k82O25dcUCBGvZL+eu QiSxTTy2KsPcW8nEupyJjeWVa+6CzI5SDej3Y= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=iXOIGD44fyy3B2TwUVL0AO2DQNqT2QjosRv5wSdnMoSxi3vllp4Fz4XJjPfINuRzTj RqmVL/YdKAGmabhTkpl4ULogvvB3Vil4Ef0jt5TcHZfGuB+YSYeZ4MX3ZUBr1SePRS67 tGq06j8viZAzhfb0ocFIwbXCPZIsUzBkmfgz0= MIME-Version: 1.0 Received: by 10.224.189.78 with SMTP id dd14mr1798363qab.136.1305347297063; Fri, 13 May 2011 21:28:17 -0700 (PDT) Sender: saint.ack@gmail.com Received: by 10.224.19.208 with HTTP; Fri, 13 May 2011 21:28:17 -0700 (PDT) In-Reply-To: References: Date: Fri, 13 May 2011 21:28:17 -0700 X-Google-Sender-Auth: HO-QxZDnrV9cz4VP_ChnIRscTI8 Message-ID: Subject: Re: region goes missing on rs (may be during reassignment) From: Stack To: user@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Fri, May 13, 2011 at 3:01 PM, Raghu Angadi wrote: > Thanks Stack. greatly appreciate the help. > No problem. > hbase.regionserver.handler.count is set to 30. > we have not set hbase.master.assignment.timeoutmonitor.timeout. will sure= ly > increase to 180 seconds as HBASE-3846 does. > You might want to go to 0.90.3 altogether. Is that a pain for you? > The load on the cluster is low to moderate and HBase holds up pretty well= . > Most of the load consists of hourly random writes to the table and > sequential scans from MR jobs. > Thanks boss. > I will send another email with locations to full master logs. > There are many "Regions in transition timed out" messages for this region > and many others spread over time. > Grand. I can come over any time or you should drop by our place. Its just a few blocks away and we can munch on lunch while we dig in your logs. St.Ack > Raghu. > > On Fri, May 13, 2011 at 11:33 AM, Stack wrote: > >> I see that we are timing out region assignment then assigning >> elsewhere, but the region opened anyway on first server (What do you >> have hbase.regionserver.handler.count set to? =A0The default is 10 which >> could mean a bunch of requests hanging out in the rpc queue before >> getting into the server to be processed). =A0One thing you could do is >> up your region in transition timeout. =A0Default is 30 seconds which if >> there is a bunch of churn may not be enough time for region assignment >> to complete -- was there churn at this time? (We up the default >> timeout in 0.90.3, see =A0'HBASE-3846 =A0Set RIT timeout higher'). >> >> See below for more. >> >> On Fri, May 13, 2011 at 8:19 AM, Raghu Angadi wrote= : >> ... >> >> > 2011-05-12 12:05:20,987 DEBUG >> >> > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Ope= ned >> >> > users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f387f. >> >> The region opened successfully. >> >> But looking at the master log, 12 seconds earlier it says: >> >> >>>> 2011-05-12 12:05:08,122 INFO >> org.apache.hadoop.hbase.master.AssignmentManager: Regions in transition >> timed out: =A0users,61364002,1297594642368.a0bf035ac417cdd0697464f1c48f3= 87f. >> state=3D3DOPENING, ts=3D3D1305201871850 2011-05-12 12:05:08,122 INFO >> org.apache.hadoop.hbase.master.AssignmentManager: Region has been OPENIN= G >> for too long, reassigning >> >> >> .... and then forces it reasssigned elsewhere (Your log from master >> stops at this point. =A0I'd be interested in seeing more. =A0Send it to = me >> offline?). >> >> Thanks Raghu, >> St.Ack >> >