Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 14070D8E7 for ; Wed, 18 Jul 2012 15:53:42 +0000 (UTC) Received: (qmail 45012 invoked by uid 500); 18 Jul 2012 15:53:41 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 44939 invoked by uid 500); 18 Jul 2012 15:53:41 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 44928 invoked by uid 99); 18 Jul 2012 15:53:41 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Jul 2012 15:53:41 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jdcryans@gmail.com designates 209.85.217.169 as permitted sender) Received: from [209.85.217.169] (HELO mail-lb0-f169.google.com) (209.85.217.169) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Jul 2012 15:53:34 +0000 Received: by lbjn8 with SMTP id n8so2996295lbj.14 for ; Wed, 18 Jul 2012 08:53:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type; bh=sw5M+J17pabCZ1OUzVilJSHFgTV9lyZs6Eg3lCN/Wgo=; b=PSWnHpFqLAjXdw/6aKUmjq34g28b+WtIdPjuO+JniB+EAtSzTKF/JZrqjl0eswF6Pq YR05xBJwPxNAsn74PrcjHUDaJBO72Xf8xKWDnSLhqPYMaQ/jDpXfrmg4zB/1eeHpYvsG wCkbYUVPkuOvmOEzhP5X8Sfox8Wdnwc06uuOr68e5EQ4STQZ/Bmk0vLyq6hqJU8/e0ZW Aa55a0xdP5PRaCzvfHJrp3RtIl4ocjFS7+VP70H4AiWhIFfgw4XUx18InZS/kTc0eka2 CxyxTvpDW9MCdN1kBjQc50vyh5HPI6FPM6RNQ7hdkbbC6/WHDaDqDnwje99avkZUSiD9 u2Zg== MIME-Version: 1.0 Received: by 10.112.37.71 with SMTP id w7mr2147402lbj.2.1342626793310; Wed, 18 Jul 2012 08:53:13 -0700 (PDT) Sender: jdcryans@gmail.com Received: by 10.114.23.201 with HTTP; Wed, 18 Jul 2012 08:53:13 -0700 (PDT) Date: Wed, 18 Jul 2012 08:53:13 -0700 X-Google-Sender-Auth: zTageFdBZNmS49QiLhYXaa_t_Ko Message-ID: Subject: Wondering what hbck should do in this situation From: Jean-Daniel Cryans To: dev@hbase.apache.org Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Hey devs, I encountered an "interesting" situation with hbck in 0.94, we had this region which was on HDFS that wasn't in .META. and hbck decided to include it back: ERROR: Region { meta => null, hdfs => hdfs://sfor3s24:10101/hbase/url_stumble_summary/159952764, deployed => } on HDFS, but not listed in META or deployed on any region server 12/07/17 23:46:03 INFO util.HBaseFsck: Patching .META. with .regioninfo: {NAME => 'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY => '25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED => 159952764,} Then when it tried to assign the region it got bounced between region servers: Trying to reassign region... 12/07/17 23:46:04 INFO util.HBaseFsckRepair: Region still in transition, waiting for it to become assigned: {NAME => 'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY => '25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED => 159952764,} 12/07/17 23:46:05 INFO util.HBaseFsckRepair: Region still in transition, waiting for it to become assigned: {NAME => 'url_stumble_summary,25467315:2009-12-28,1271922074820', STARTKEY => '25467315:2009-12-28', ENDKEY => '25821137:2010-03-08', ENCODED => 159952764,} etc Turns out that this region only contained references (as in post-split references) to a region that didn't exist anymore so when the region was being opened it was failing on opening those referenced files: 2012-07-18 00:00:27,454 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed open of region=url_stumble_summary,25467315:2009-12-28,1271922074820.159952764, starting to roll back the global memstore size. java.io.IOException: java.io.IOException: java.io.FileNotFoundException: File does not exist: /hbase/url_stumble_summary/208247386/default/2354161894779228084 at org.apache.hadoop.hbase.regionserver.HRegion.initializeRegionInternals(HRegion.java:550) at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:463) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3729) at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3677) ... Caused by: java.io.IOException: java.io.FileNotFoundException: File does not exist: /hbase/url_stumble_summary/208247386/default/2354161894779228084 at org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:405) at org.apache.hadoop.hbase.regionserver.Store.(Store.java:258) at org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2918) ... Caused by: java.io.FileNotFoundException: File does not exist: /hbase/url_stumble_summary/208247386/default/2354161894779228084 at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.openInfo(DFSClient.java:1822) at org.apache.hadoop.hdfs.DFSClient$DFSInputStream.(DFSClient.java:1813) at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:544) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:187) at org.apache.hadoop.fs.FilterFileSystem.open(FilterFileSystem.java:102) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:456) at org.apache.hadoop.hbase.io.hfile.HFile.createReaderWithEncoding(HFile.java:547) at org.apache.hadoop.hbase.regionserver.StoreFile$Reader.(StoreFile.java:1252) at org.apache.hadoop.hbase.io.HalfStoreFileReader.(HalfStoreFileReader.java:66) ... At first it was confusing me why it was looking for another region until I saw the HalfStoreFileReader :) So this is a case where hbck made the cluster worse because the only way to get rid of this region is to force unassign it, delete it from .META. and then possibly also delete it from HDFS. I'm wondering how this could be done better, should we do more checks when including that sort of region? Like, at least make sure we can open it? And then what? Just report it? Thx for reading this far, J-D