Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 1544 invoked from network); 18 Aug 2009 01:00:27 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 18 Aug 2009 01:00:27 -0000 Received: (qmail 89582 invoked by uid 500); 18 Aug 2009 01:00:45 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 89535 invoked by uid 500); 18 Aug 2009 01:00:45 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 89525 invoked by uid 99); 18 Aug 2009 01:00:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Aug 2009 01:00:45 +0000 X-ASF-Spam-Status: No, hits=-1.0 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jlist@streamy.com designates 72.34.249.3 as permitted sender) Received: from [72.34.249.3] (HELO mail.streamy.com) (72.34.249.3) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Aug 2009 01:00:36 +0000 Received: from [192.168.249.50] (static-98-112-71-211.lsanca.dsl-w.verizon.net [98.112.71.211]) by ns1.streamy.com (8.13.1/8.13.1) with ESMTP id n7I10EsI029815 for ; Mon, 17 Aug 2009 18:00:14 -0700 Message-ID: <4A89FCF6.2030707@streamy.com> Date: Mon, 17 Aug 2009 17:59:34 -0700 From: Jonathan Gray User-Agent: Thunderbird 2.0.0.22 (Windows/20090605) MIME-Version: 1.0 To: hbase-user@hadoop.apache.org Subject: Re: NoServerForRegionException, TableNotFoundException and WrongRegionException References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Checker-Version: SpamAssassin 3.2.5 (2008-06-10) on ns1.streamy.com X-Virus-Checked: Checked by ClamAV on apache.org X-Old-Spam-Status: No, score=0.0 required=5.0 tests=none autolearn=failed version=3.2.5 To reiterate what stack said, you need to upgrade. There are serious, known bugs in 0.19.1. Upgrade to 0.19 branch or 0.20 branch, instructions can be found from this page: http://hadoop.apache.org/hbase/version_control.html For example, svn co http://svn.apache.org/repos/asf/hadoop/hbase/branches/0.19/ hbase-0.19-branch cd hbase-0.19-branch ant jar JG Marc Limotte wrote: > Regions seem to be reasonably dispersed... as of now... not sure if that was > true before I reset hbase.hregion.max.filesize. > > > Region Servers > Address Start Code Load > host1:60020 1250533702083 requests=0, regions=10, usedHeap=32, > maxHeap=888 > host2:60020 1250533702094 requests=0, regions=12, usedHeap=32, > maxHeap=888 > host3:60020 1250533702052 requests=0, regions=7, usedHeap=31, > maxHeap=888 > host4:60020 1250533702078 requests=0, regions=11, usedHeap=32, > maxHeap=888 > Total: servers: 4 requests=0, regions=40 > > Marc > > > >> ---------- Forwarded message ---------- >> From: stack >> To: hbase-user@hadoop.apache.org >> Date: Mon, 17 Aug 2009 14:21:47 -0700 >> Subject: Re: NoServerForRegionException, TableNotFoundException and >> WrongRegionException >> Please update to the head of 0.19 trunk, or better update to 0.20 trunk -- >> espeically if you are testing. Issues described below have been addressed. >> >> How many regions do you have in your table? Are all going to one >> regionserver because you only have one region? >> >> Yours, >> St.Ack >> >> >> On Mon, Aug 17, 2009 at 12:19 PM, Marc Limotte >> wrote: >> >>> I'm seeing a nice variety of Exceptions from HBase and could use some >>> pointers about what to do next. >>> >>> This is a new map/reduce program, updating about 550k rows with around a >>> dozen columns on a very small cluster (only 4 nodes... as we're still >>> testing and it doesn't have to support production yet). Hbase Version >>> 0.19.1. >>> >>> I ran the job and it seems to make some progress, and then dies after >>> several hours, reporting "NoServerForRegionException: No server address >>> listed in .META. for region TABLEX,,1250526695078". I retried it a few >>> times with the same result. I also noticed that the load is not well >>> balanced, all requests seemed to be going to one node. I adjust >>> hadoop-site.xml with the addition of these two entries: >>> >>> hbase.hregion.max.filesize >>> 33554432 >>> >>> hbase.client.retries.number >>> 5 >>> >>> And restarted hbase (and hadoop to be safe). Re-ran and got the same >> error >>> in the M/R job. >>> >>> *I thought I'd try dropping the table, since it's a new table and I can >>> recreate it. But that gives another exception: >>> * >>> hbase(main):002:0> disable 'TABLEX' >>> NativeException: org.apache.hadoop.hbase.TableNotFoundException: >>> org.apache.hadoop.hbase.TableNotFoundException: TABLEX >>> at >>> >>> >> org.apache.hadoop.hbase.master.TableOperation$ProcessTableOperation.call(TableOperation.java:129) >>> at >>> >>> >> org.apache.hadoop.hbase.master.TableOperation$ProcessTableOperation.call(TableOperation.java:70) >>> at >>> >>> >> org.apache.hadoop.hbase.master.RetryableMetaOperation.doWithRetries(RetryableMetaOperation.java:64) >>> at >>> >>> >> org.apache.hadoop.hbase.master.TableOperation.process(TableOperation.java:143) >>> at >> org.apache.hadoop.hbase.master.HMaster.disableTable(HMaster.java:691) >>> ... >>> >>> >>> *And now I see this exception in the HBase logs: >>> * >>> org.apache.hadoop.hbase.regionserver.WrongRegionException: >>> org.apache.hadoop.hbase.regionserver.WrongRegionException: Requested row >>> out >>> of range for HRegion .META.,,1250280235390, startKey='', >>> getEndKey()='TABLEX,,1250219949252', >>> row='TABLEX,840.56098.0544,1250526661861' >>> at >>> org.apache.hadoop.hbase.regionserver.HRegion.checkRow(HRegion.java:1788) >>> at >>> >>> >> org.apache.hadoop.hbase.regionserver.HRegion.obtainRowLock(HRegion.java:1844) >>> at >>> org.apache.hadoop.hbase.regionserver.HRegion.getLock(HRegion.java:1912) >>> at >>> >> org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1244) >>> at >>> >> org.apache.hadoop.hbase.regionserver.HRegion.batchUpdate(HRegion.java:1216) >>> ... >>> >>> >>> *As a test, tried a "count"... >>> * >>> hbase(main):007:0* count 'TABLEX' >>> NativeException: >> org.apache.hadoop.hbase.client.NoServerForRegionException: >>> No server address listed in .META. for region TABLEX,,1250526695078 >>> from org/apache/hadoop/hbase/client/HConnectionManager.java:548:in >>> `locateRegionInMeta' >>> from org/apache/hadoop/hbase/client/HConnectionManager.java:478:in >>> `locateRegion' >>> from org/apache/hadoop/hbase/client/HConnectionManager.java:440:in >>> `locateRegion' >>> from org/apache/hadoop/hbase/client/HTable.java:114:in `' >>> from org/apache/hadoop/hbase/client/HTable.java:97:in `' >>> from sun/reflect/NativeConstructorAccessorImpl.java:-2:in >> `newInstance0' >>> ... >>> >>> >>> *Also saw a thread somewhere that suggested doing a major compaction. >> Did >>> that. It returns almost immediately. Not sure if that's normal or >> not... >>> no perceivable impact from doing this, though.* >>> >>> hbase(main):013:0> major_compact '.META.' >>> 0 row(s) in 0.0220 seconds >>> hbase(main):014:0> >>> >>> Not sure what else to try? Is there a way to force removal of the table >> in >>> question? Is there something else I should be looking at? >>> >>> Marc >>> >> >> >