Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D7DA6DFC7 for ; Mon, 10 Dec 2012 18:28:06 +0000 (UTC) Received: (qmail 36092 invoked by uid 500); 10 Dec 2012 18:28:04 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 36012 invoked by uid 500); 10 Dec 2012 18:28:04 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Delivered-To: moderator for user@hbase.apache.org Received: (qmail 53097 invoked by uid 99); 10 Dec 2012 03:09:30 -0000 X-ASF-Spam-Status: No, hits=1.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of spion06@gmail.com designates 209.85.220.169 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:date:x-google-sender-auth:message-id:subject :from:to:content-type; bh=iedgiQaOZrL1wF1oYG3mvCTbR1jT8gFTH2N/srRngCk=; b=0t9hVqAD6h8tJKsEDJlIeo9Dx4Sh0f+6NldszeIko24O9xUZ1Wk2YNG2rUBWXXoCRT 5hkthRokgGxIhT+XtdwdSUhL3A4CsUsHqW01qchZkTsuQb75ruaZmQ7hgFK8CeNggoCy ueJav3e9WyTAWa0mVcWtPAbO5DtTtXJx9g+RJ1WSmIT06qJj7BRDNsmqNsCGHcmr/WqC A8W3Vb3q58drfFH5iYprRCCvs2/EyVca9R9xThxueCcJo00559guYA9OHSNFKlMtbJCh GnAqoBPUUV3ImLWCTqrfGeMpFo57tzktYEq/nfm0Gr7DoT4Gb/5YeugNjjuEgpj2piru IsWA== MIME-Version: 1.0 Sender: spion06@gmail.com Date: Sun, 9 Dec 2012 21:09:02 -0600 X-Google-Sender-Auth: gRmaUr9JRO7eXCCjs4Pqle9Wry8 Message-ID: Subject: Re: hbase corruption - missing region files in HDFS From: Kyle McGovern To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=f46d043c821a2a696704d076e310 X-Virus-Checked: Checked by ClamAV on apache.org --f46d043c821a2a696704d076e310 Content-Type: text/plain; charset=ISO-8859-1 We recently had a very similar issue on a couple of our clusters. What ended up happening was a split failed and there was a leftover file in the region telling it where the new split region was located. The destination region folder/file did not exist so our region server would try endlessly to read a file that didn't exist. The end result was exhaustion of open file descriptors for the region server due to the number of connections it was making. Our fix was to remove the bad "split file" and assign the region again. 15:38:21 # hdfs dfs -ls -R /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a drwxr-xr-x - root hadoop 0 2012-12-07 13:21 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/.oldlogs -rw-r--r-- 3 root hadoop 124 2012-12-07 13:21 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/.oldlogs/hlog.1354760917669 -rw-r--r-- 3 root hadoop 352 2012-12-07 13:27 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/.regioninfo drwxr-xr-x - root hadoop 0 2012-12-07 13:27 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW -rw-r--r-- 3 root hadoop 554522 2012-12-07 13:27 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/195cc6d2cc384b39bd5ad30e95385bd8 -rw-r--r-- 3 root hadoop 4558378 2012-12-07 13:27 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/1c42fa9bc26a4550a439f4bd31bb08b0 -rw-r--r-- 3 root hadoop 3498028 2012-12-07 13:27 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/28a356081046422b8c057bc20c0ae658 -rw-r--r-- 3 root hadoop 1948108 2012-12-07 13:27 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/3353dc2d99184fe4b9d73f39503dfbc7 -rw-r--r-- 3 root hadoop 4390731 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/4ce59f31c1b74db5804953fa7967f791 -rw-r--r-- 3 root hadoop 3116921421 2012-12-07 12:22 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/5313858989b24752ae31322333de02e0 -rw-r--r-- 3 root hadoop 5395692 2012-12-07 12:22 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/54c11a7e4f9d4ebfafaf2b93d3c9e954 -rw-r--r-- 3 root hadoop 5981971640 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/5d965eba35df44d2851a8186fe6e8cc8 -rw-r--r-- 3 root hadoop 23 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/5d965eba35df44d2851a8186fe6e8cc8.7d4f7401d2fe7a813778248970b03515 -rw-r--r-- 3 root hadoop 2251800 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/673b36462014480cb7d91088412b85a7 -rw-r--r-- 3 root hadoop 408794 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/73261dd86f634f2086ec745642425d7c -rw-r--r-- 3 root hadoop 2676245 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/769728d25b5b4e78be6b36f9716a82c4 -rw-r--r-- 3 root hadoop 1262744 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/81f414cb3fe449f6a80310dd38ea467f -rw-r--r-- 3 root hadoop 940502 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/8f818b3c45344ad68c0b4afc7fe20bbb -rw-r--r-- 3 root hadoop 3492843 2012-12-07 13:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/ae7cb412e5da4a908b0f2ea4d5cd5c76 -rw-r--r-- 3 root hadoop 2894474 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/b6ee14a0a75341d0aa58187fb6159a41 -rw-r--r-- 3 root hadoop 14257782 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/bd4fff3291d647eb9cc533d66f9685a3 -rw-r--r-- 3 root hadoop 4880699 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/c4d3f1c8511743579588162616beeea1 -rw-r--r-- 3 root hadoop 35238595 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/c69a406d54b1492ba52cd296de8320a1 -rw-r--r-- 3 root hadoop 23 2012-12-07 12:01 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/c69a406d54b1492ba52cd296de8320a1.7d4f7401d2fe7a813778248970b03515 -rw-r--r-- 3 root hadoop 3181138002 2012-12-07 12:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/cad9f4cc0ef54a7896a3a47253250e71 -rw-r--r-- 3 root hadoop 1747856 2012-12-07 12:20 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/cca2ad1698984a73abd9c58c78945be0 -rw-r--r-- 3 root hadoop 6264897732 2012-12-07 13:21 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/d876f1f4734e4778b2efa527ef1ef3ee -rw-r--r-- 3 root hadoop 463704 2012-12-07 13:21 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/f2efc4a6ec054a62a44f664cc0b01c0a -rw-r--r-- 3 root hadoop 686868 2012-12-07 13:21 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/f34384ae8c1d4e16afb79cb41bf6cf74 -rw-r--r-- 3 root hadoop 838234 2012-12-07 13:21 /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/fc1dc425cf324beaa283ef82fdc073e3 For example, if I remove /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/c69a406d54b1492ba52cd296de8320a1.7d4f7401d2fe7a813778248970b03515 and /hbase/mytable/3ff87b4b16037f2000f4f4fb1bae820a/RAW/5d965eba35df44d2851a8186fe6e8cc8.7d4f7401d2fe7a813778248970b03515 the region will successfully assign and hbck does not show errors for this region anymore. The contents of the file appear to just be a split key. --f46d043c821a2a696704d076e310--