Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 541CD9972 for ; Tue, 17 Apr 2012 20:39:04 +0000 (UTC) Received: (qmail 6310 invoked by uid 500); 17 Apr 2012 20:39:02 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 6266 invoked by uid 500); 17 Apr 2012 20:39:02 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 6256 invoked by uid 99); 17 Apr 2012 20:39:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Apr 2012 20:39:02 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of alex.baranov.v@gmail.com designates 209.85.210.169 as permitted sender) Received: from [209.85.210.169] (HELO mail-iy0-f169.google.com) (209.85.210.169) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Apr 2012 20:38:55 +0000 Received: by iajr24 with SMTP id r24so12977550iaj.14 for ; Tue, 17 Apr 2012 13:38:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=P/LAqmMLSMn8kL0QR+g9lKk6gLF+QiLmWo6RguOvXZo=; b=Ke14r76B3tdWvxWqJXO2Yc6jEyiRrbO6xQA9GqkXMwpiM9pJqOrNmQ+OTIipbFM+Pt V3H8gDGUDLMO+5Ne+K64kdqbw38/0cxWR8RoG47Ayc2gsnuIdl99J54vpuTbooa0GtY3 psI6yB1w973S6gi/OGDlbGTzzYacF0aaNc/Ng903aORvuBvOfDsZ9lsjCrhJk6xNL3CS zbmd1GF6HKCxKRVU01J9q9k1MhhtMUPIHzjWXLUySj0no7cM6KqDrQR3G+jhN8At4Bee XjsXhTo7W3uQ0JozKFw6AQtAfvRNR+mqb9lMUw8F2c4seX1ERMRp9DKDkhrzgUuzofaw spNQ== MIME-Version: 1.0 Received: by 10.42.136.202 with SMTP id v10mr6827508ict.32.1334695113757; Tue, 17 Apr 2012 13:38:33 -0700 (PDT) Received: by 10.50.213.98 with HTTP; Tue, 17 Apr 2012 13:38:33 -0700 (PDT) In-Reply-To: References: Date: Tue, 17 Apr 2012 16:38:33 -0400 Message-ID: Subject: Re: regions stuck in transition From: Alex Baranau To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=90e6ba1efd4a23947804bde5ec16 --90e6ba1efd4a23947804bde5ec16 Content-Type: text/plain; charset=ISO-8859-1 I've seen similar behavior at our cluster too. >From the top of my head, you can try to restart particular RegionServer, where those regions belong too (in cases I saw usually single regionserver was an issue). Have you tried to access data from that region (e.g. in shell)? I think it should still be served. Alex Baranau ------ Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase On Mon, Apr 16, 2012 at 11:21 AM, Bryan Beaudreault < bbeaudreault@hubspot.com> wrote: > Hello, > > We've recently had a problem where regions will get stuck in transition for > a long period of time. In fact, they don't ever appear to get > out-of-transition unless we take manual action. Last time this happened I > restarted the master and they were cleared out. This time I wanted to > consult the list first. > > I checked the admin ui for all 24 of our servers, and the region does not > appear to be hosted anywhere. If I look in hdfs, I do see the region there > and it has 2 files. The first instance of this region in my HMaster logs > is: > > 2/04/15 17:48:06 INFO master.HMaster: balance > > > hri=visitor-activities-a2,\x00\x02EG120909,1333750824238.703fed4411f2d6ff4b3ea80506fb635e., > > src=XXXXXXXXX.ec2.internal,60020,1334064456919, > > dest=XXXXXXXX.ec2.internal,60020,1334064197946 > > 12/04/15 17:48:06 INFO master.AssignmentManager: Server > > serverName=XXXXXXXX.ec2.internal,60020,1334064456919, load=(requests=0, > > regions=0, usedHeap=0, maxHeap=0) returned > > org.apache.hadoop.hbase.NotServingRegionException: > > org.apache.hadoop.hbase.NotServingRegionException: Received close for > > > visitor-activities-a2,\x00\x02EG120909,1333750824238.703fed4411f2d6ff4b3ea80506fb635e. > > but we are not serving it for 703fed4411f2d6ff4b3ea80506fb635e > > > It then keeps saying the same few logs every ~30 mins: > > 12/04/15 18:18:18 INFO master.AssignmentManager: Regions in transition > > timed out: > > > visitor-activities-a2,\x00\x02EG120909,1333750824238.703fed4411f2d6ff4b3ea80506fb635e. > > state=PENDING_CLOSE, ts=1334526491544, server=null > > 12/04/15 18:18:18 INFO master.AssignmentManager: Region has been > > PENDING_CLOSE for too long, running forced unassign again on > > > region=visitor-activities-a2,\x00\x02EG120909,1333750824238.703fed4411f2d6ff4b3ea80506fb635e. > > 12/04/15 18:18:18 INFO master.AssignmentManager: Server > > serverName=XXXXXXXXX.ec2.internal,60020,1334064456919, load=(requests=0, > > regions=0, usedHeap=0, maxHeap=0) returned > > org.apache.hadoop.hbase.NotServingRegionException: > > org.apache.hadoop.hbase.NotServingRegionException: Received close for > > > visitor-activities-a2,\x00\x02EG120909,1333750824238.703fed4411f2d6ff4b3ea80506fb635e. > > but we are not serving it for 703fed4411f2d6ff4b3ea80506fb635e > > > Any ideas how I can avoid this, or a better solution than restarting the > HMaster? > > Thanks, > > Bryan > --90e6ba1efd4a23947804bde5ec16--