Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 43590 invoked from network); 27 Jan 2010 22:52:17 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 27 Jan 2010 22:52:17 -0000 Received: (qmail 12753 invoked by uid 500); 27 Jan 2010 22:52:16 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 12699 invoked by uid 500); 27 Jan 2010 22:52:16 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 12689 invoked by uid 99); 27 Jan 2010 22:52:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Jan 2010 22:52:16 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of saint.ack@gmail.com designates 209.85.221.183 as permitted sender) Received: from [209.85.221.183] (HELO mail-qy0-f183.google.com) (209.85.221.183) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Jan 2010 22:52:09 +0000 Received: by qyk13 with SMTP id 13so38367qyk.31 for ; Wed, 27 Jan 2010 14:51:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to :content-type:content-transfer-encoding; bh=O+b+13jrz8BGkWB+RsEWKUP3ozG+27uds2QEPjdM/g4=; b=pmAW1ja0FR5WpC0psGZYZLv2/3AdanVCjGz77qt7Fc6jKX3KYyK8V2YtwfJHb5SQ0S K1aiyHzqb3aai1/Z232sWekq408b41Bx9ug5GNcONgwwEqQvKfDE4itp6lBzc1GWS0zQ oMWwpw5uXtI+sJkiP6H/EOOC16UPkicP+xaFk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type :content-transfer-encoding; b=OfhQtemO3GQoRwrMICv7s4X4u7siips3Hypkm4OEDQgHfYfJ+2dLc5Ou3pfsS8/54B foyB9srP/sY1yfdQuOnNQtMtmq5DAgMV8Zh1bjzt/GadNYE+73kiLmnbFVgJuK+XwEp0 HAuU9DXH+9rgbkHDf4H71p5prDuxdC8a8/91A= MIME-Version: 1.0 Sender: saint.ack@gmail.com Received: by 10.229.107.32 with SMTP id z32mr5392126qco.81.1264632709021; Wed, 27 Jan 2010 14:51:49 -0800 (PST) In-Reply-To: <1264632100.1157.1701.camel@puma> References: <1264568589.1157.516.camel@puma> <7c962aed1001271154o7a2d2042xb9f36ac8506a4a1d@mail.gmail.com> <1264632100.1157.1701.camel@puma> Date: Wed, 27 Jan 2010 14:51:48 -0800 X-Google-Sender-Auth: 236f5dab66759747 Message-ID: <7c962aed1001271451s358cc872j96b805eba59a75cc@mail.gmail.com> Subject: Re: Region gets stuck in transition state From: Stack To: hbase-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On Wed, Jan 27, 2010 at 2:41 PM, James Baldassari wrote: > > First we shut down the master and all region servers and then manually > removed the /hbase root through hadoop/HDFS. =A0One of my colleagues > increased some timeout values (I think they were ZooKeeper timeouts). ticktime? > Another change was that I recreated the table without LZO compression > and without setting the IN_MEMORY flag. =A0I learned that we did not have > the LZO libraries installed, and the table had been created originally > with compression set to LZO, so I imagine that would cause problems. =A0I > didn't see any errors about it in the logs, however. =A0Maybe this > explains why we lost data during our initial testing after shutting down > HBase. =A0Perhaps it was unable to write the data to HDFS because the LZO > libraries were not available? > If lzo enabled and libs are not in place, no data is written IIRC. Its a problem. > Anyway, everything seems to be ok for now. =A0We can restart HBase withou= t > data loss or errors, and we can truncate the table without any problems. > If any other issues crop up we plan on upgrading to 0.20.3, but our > preference is to stay with the Cloudera distro if we can. =A0We're doing > additional testing tonight with a larger dataset, so I'll keep an eye on > it and post back if we learn anything new. Avoid truncating tables if you are not on 0.20.3. Its flakey and may put you back in the spot you complained of orignally. St.Ack > > Thanks again for your help. > > -James > > > On Wed, 2010-01-27 at 13:54 -0600, Stack wrote: >> On Tue, Jan 26, 2010 at 9:03 PM, James Baldassari wro= te: >> > >> > After running a map/reduce job which inserted around 180,000 rows into >> > HBase, HBase appeared to be fine. =A0We could do a count on our table,= and >> > no errors were reported. =A0We then tried to truncate the table in >> > preparation for another test but were unable to do so because the regi= on >> > became stuck in a transition state. >> >> Yes. =A0In older hbase, truncate of > small tables was flakey. =A0Its >> better in 0.20.3 (I wrote our brothers over at Cloudera about updating >> version they bundle especially since 0.20.3 just went out). >> >> =A0I restarted each region server >> > individually, but it did not fix the problem. =A0I tried the >> > disable_region and close_region commands from the hbase shell, but tha= t >> > didn't work either. =A0After doing all of that, a status 'detailed' sh= owed >> > this: >> > >> > 1 regionsInTransition >> > =A0 =A0name=3Dretargeting,,1264546222144, unassigned=3Dfalse, pendingO= pen=3Dfalse, open=3Dfalse, closing=3Dtrue, pendingClose=3Dfalse, closed=3Df= alse, offlined=3Dfalse >> > >> > Then I restarted the master and all region servers, and it looked like= this: >> > >> > 1 regionsInTransition >> > =A0 =A0name=3Dretargeting,,1264546222144, unassigned=3Dfalse, pendingO= pen=3Dtrue, open=3Dfalse, closing=3Dfalse, pendingClose=3Dfalse, closed=3Df= alse, offlined=3Dfalse >> >> >> Even after a master restart? =A0Above is dump of a master internal >> datastructure that is kept in-memory. =A0Strange that it would pick up >> same exact state on restart (As Ryan says, a restart of the master >> alone is usually a radical but sufficient fix). >> >> I was going to say that you try onlining the individual region in the >> shell but I don't think that'll work either, not unless you update to >> 0.20.3 era hbase. >> >> > >> > I noticed messages in some of the region server logs indicating that >> > their zookeeper sessions had expired. =A0I'm not sure if this has anyt= hing >> > to do with the problem. >> >> It could. =A0The regionservers will restart if their session w/ zk >> expires. =A0Whats your hbase schema like? =A0How are you doing your >> upload? >> >> I should mention that this scenario is quite >> > repeatable, and the last few times it has happened we had to shut down >> > HBase and manually remove the /hbase root from HDFS, then start HBase >> > and recreate the table. >> > >> For sure you've upped file descriptors and xceiver params as per the >> Getting Started? >> >> > >> > I was also wondering whether it was normal for there to be only one >> > region with 180,000+ rows. =A0Shouldn't this region be split into seve= ral >> > regions and distributed among the region servers? =A0I'm new to HBase,= so >> > maybe my understanding of how it's supposed to work is wrong. >> >> Get the regions size on the filesystem: ./bin/hadoop fs -dus >> /hbase/table/regionname. =A0Region splits when its above a size >> threshold, 256M usually. >> >> St.Ack >> >> > >> > Thanks, >> > James >> > >> > >> > > >