Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A8693102F3 for ; Mon, 10 Mar 2014 03:31:50 +0000 (UTC) Received: (qmail 50503 invoked by uid 500); 10 Mar 2014 03:31:48 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 49022 invoked by uid 500); 10 Mar 2014 03:31:46 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 49014 invoked by uid 99); 10 Mar 2014 03:31:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Mar 2014 03:31:45 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_REMOTE_IMAGE X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bharathv@cloudera.com designates 74.125.82.169 as permitted sender) Received: from [74.125.82.169] (HELO mail-we0-f169.google.com) (74.125.82.169) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Mar 2014 03:31:41 +0000 Received: by mail-we0-f169.google.com with SMTP id w62so7954188wes.0 for ; Sun, 09 Mar 2014 20:31:20 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=TapqUF5ACxDu7UZNBuWr1SntwlLG/TuIcDngYJ7aVPQ=; b=PFYqnxeYnxbqPzsge+TGbXSD8CISHnrqNo5evmdHiP4Mfa/SEwgWt5MnDoGtp07QNK p4aBAiSG67Wd9W/SfOvFcQk16vqoeIyoMZ1B5j0ptzN1vGfhPMUiJbXid54uRdXX8/Jq hAwtNY68c8Q3Ch8kqxLjGFfUVwgVrK/olI0cgMxJqRdCkIPNISyB3TiKHVMtiSqbkvug O7QV8qmvceWKUCq99T77ZPlRuKbbKp4sncUZPxf5f42FkNiN+YlGZ4d7uQau9YzkBFyy R+TV9mmcosWTE7TsfdPArVXMFkxfqlPjSsfKOtk9evdKgApN6f8gKtoW53sNpdheBxIC w+UQ== X-Gm-Message-State: ALoCoQlDkF+CSyIRp37i99cx6YCGeNH9XHode5aPXf/0swb0BRK7mLxbcqXHcMQlRb7xNzce7mbF X-Received: by 10.180.77.129 with SMTP id s1mr6165023wiw.56.1394422280630; Sun, 09 Mar 2014 20:31:20 -0700 (PDT) MIME-Version: 1.0 Received: by 10.227.100.69 with HTTP; Sun, 9 Mar 2014 20:30:59 -0700 (PDT) In-Reply-To: References: From: Bharath Vissapragada Date: Mon, 10 Mar 2014 09:00:59 +0530 Message-ID: Subject: Re: Distributed log splitting failing after cluster outage. To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=f46d043c801eb4669704f4383c2c X-Virus-Checked: Checked by ClamAV on apache.org --f46d043c801eb4669704f4383c2c Content-Type: text/plain; charset=ISO-8859-1 Glad to know everything is up. We faced this issue too, I'm not really sure whats the exact cause of this. On Mon, Mar 10, 2014 at 4:12 AM, David Koch wrote: > Actually, all the files were 0-sized so that's in the end we deleted those > files and HBase started up. > > > On Sun, Mar 9, 2014 at 7:33 PM, Bharath Vissapragada > wrote: > > > Check if there are an 0 sized wals in /hbase/.logs and sideline them and > > restart. That could help. As Ted mentioned the actual problematic log > names > > are in the RS logs that got the task assigned. > > > > > > On Fri, Mar 7, 2014 at 12:43 AM, David Koch > wrote: > > > > > Hello, > > > > > > Our HBase cluster had an unexpected shut-down and while trying to bring > > it > > > back up we the Master gets stuck with the following message: > > > > > > Failed splitting of [ list of ,, ] > > > java.io.IOException: error or interrupted while splitting logs in [ > list > > of > > > ,, ] > > > Task = installed = 10 done = 0 error = 10 > > > at > > > > > > > > > org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:282) > > > at > > > > > > > > > org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:300) > > > at > > > > > > > > > org.apache.hadoop.hbase.master.MasterFileSystem.splitLogAfterStartup(MasterFileSystem.java:242) > > > at > > > > > > > > > org.apache.hadoop.hbase.master.HMaster.splitLogAfterStartup(HMaster.java:661) > > > at > > > > > > > > > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:580) > > > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:396) > > > at java.lang.Thread.run(Thread.java:724) > > > > > > What can I do to get the cluster operational again. There was no data > > > ingestion going on since quite some hours before the crash so maybe > > > clearing out /hbase/.logs/ could be an option. > > > > > > Thanks, > > > > > > /David > > > > > > > > > > > -- > > Bharath Vissapragada > > > > > -- Bharath Vissapragada --f46d043c801eb4669704f4383c2c--