Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2885091C1 for ; Wed, 19 Oct 2011 18:00:57 +0000 (UTC) Received: (qmail 75084 invoked by uid 500); 19 Oct 2011 18:00:54 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 75058 invoked by uid 500); 19 Oct 2011 18:00:54 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 75050 invoked by uid 99); 19 Oct 2011 18:00:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Oct 2011 18:00:54 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jdcryans@gmail.com designates 209.85.161.169 as permitted sender) Received: from [209.85.161.169] (HELO mail-gx0-f169.google.com) (209.85.161.169) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Oct 2011 18:00:48 +0000 Received: by ggnh4 with SMTP id h4so2458197ggn.14 for ; Wed, 19 Oct 2011 11:00:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; bh=yyCUDyDk1Dmw6FtFQj08IQ8gDcmar2mAYupcOsVnQaU=; b=BGL2WlFzEbrXqI0JqCBrxGDQWGtghgeKNB9MYT6ZueuB8qYt8z/QhCGOL+eBUH9GRk YfygvCBHwFC+F5Tz04Gymo7c46AkJ1i3XSyb19SGpD/uUAkSS2GjcsYbHWdn/haTnor9 YlcN4bo7Wb2W7pCqZrMsVVi6fcpaZ39/wRXmw= MIME-Version: 1.0 Received: by 10.101.106.31 with SMTP id i31mr1777152anm.13.1319047227403; Wed, 19 Oct 2011 11:00:27 -0700 (PDT) Sender: jdcryans@gmail.com Received: by 10.100.38.16 with HTTP; Wed, 19 Oct 2011 11:00:27 -0700 (PDT) In-Reply-To: References: Date: Wed, 19 Oct 2011 11:00:27 -0700 X-Google-Sender-Auth: WTo0xbgb93j-xxwVk7C2MpTCti4 Message-ID: Subject: Re: data loss when splitLog() From: Jean-Daniel Cryans To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=001636c5bdae6e6ea204afaa9dba --001636c5bdae6e6ea204afaa9dba Content-Type: text/plain; charset=ISO-8859-1 Mmm ok, how did you kill the master exactly? kill -9 or a normal shutdown? I think I could see how it would happen in the case of a normal shutdown, but even then it would *really really* help to see the logs of what's going on. J-D On Tue, Oct 18, 2011 at 6:37 PM, Mingjian Deng wrote: > @J-D: I used cloudrea CDH3. This loss can't replay every time but it could > happen with the following logs: > "2011-10-19 04:44:09,065 DEBUG > org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Used 134218288 bytes > of buffered edits, waiting for IO threads..." > This log printed many times and even 134218288 didn't change. I kill master > and restarted, the data loss. So I think the 134218288 bytes of entry was > the last entry in memory. In the following codes: > " synchronized (dataAvailable) { > totalBuffered += incrHeap; > while (totalBuffered > maxHeapUsage && (thrown == null || > thrown.get()== null)){ > LOG.debug("Used " + totalBuffered + " bytes of buffered edits, > waiting for IO threads..."); > dataAvailable.wait(3000); > } > dataAvailable.notifyAll(); > }" > If (totalBuffered <= maxHeapUsage) and there was no more entry in .logs > dir, archiveLogs would excute even before writeThread end. > > 2011/10/19 Jean-Daniel Cryans > > > Even if the files aren't closed properly, the fact that you are appending > > should persist them. > > > > Are you using a version of Hadoop that supports sync? > > > > Do you have logs that show the issue where the logs were moved but not > > written? > > > > Thx, > > > > J-D > > > > On Tue, Oct 18, 2011 at 7:40 AM, Mingjian Deng > > wrote: > > > > > Hi: > > > There is a case cause data loss in our cluster. We block in splitLog > > > because some error in our hdfs and we kill master. Some hlog files were > > > moved from .logs to .oldlogs before them were wrote to > .recovered.edits. > > So > > > rs couldn't replay these files. > > > In HLogSplitter.java, we found: > > > ... > > > archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs, > > conf); > > > } finally { > > > LOG.info("Finishing writing output logs and closing down."); > > > splits = outputSink.finishWritingAndClose(); > > > } > > > Why archiveLogs before outputSink.finishWritingAndClose()? Did these > > > hlog files mv to .oldlogs and couldn't be split next startup if write > > > threads failed but archiveLog success? > > > > > > --001636c5bdae6e6ea204afaa9dba--