Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of jdcryans@gmail.com designates
 209.85.161.169 as permitted sender)
MIME-Version: 1.0
Sender: jdcryans@gmail.com
In-Reply-To: 
 <CAHoGh0kZ-dFVdCeiCip69azEj=i3vxMSabLZko=j4ohRS7fcdw@mail.gmail.com>
References: 
 <CAHoGh0kU=53BgqcaLr5Gd0_Tk5ntUQm4tbADbzQgmFYcNwg-Uw@mail.gmail.com>
	<CAGpTDNcafYLx0QvgKRuT9BdeU5uO_sD2kE57MTKETR0TQff1Yg@mail.gmail.com>
	<CAHoGh0kZ-dFVdCeiCip69azEj=i3vxMSabLZko=j4ohRS7fcdw@mail.gmail.com>
Date: Wed, 19 Oct 2011 11:00:27 -0700
Message-ID: 
 <CAGpTDNeJHmaXLfb-W-Yb4DMvWn6rWQPaPwJLY17uFotjw-s3Vg@mail.gmail.com>
Subject: Re: data loss when splitLog()
From: Jean-Daniel Cryans <jdcryans@apache.org>
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=001636c5bdae6e6ea204afaa9dba

--001636c5bdae6e6ea204afaa9dba
Content-Type: text/plain; charset=ISO-8859-1

Mmm ok, how did you kill the master exactly? kill -9 or a normal shutdown? I
think I could see how it would happen in the case of a normal shutdown, but
even then it would *really really* help to see the logs of what's going on.

J-D

On Tue, Oct 18, 2011 at 6:37 PM, Mingjian Deng <koven2049@gmail.com> wrote:

> @J-D: I used cloudrea CDH3. This loss can't replay every time but it could
> happen with the following logs:
> "2011-10-19 04:44:09,065 DEBUG
> org.apache.hadoop.hbase.regionserver.wal.HLogSplitter: Used 134218288 bytes
> of buffered edits, waiting for IO threads..."
> This log printed many times and even 134218288 didn't change. I kill master
> and restarted, the data loss. So I think the 134218288 bytes of entry was
> the last entry in memory. In the following codes:
> " synchronized (dataAvailable) {
>        totalBuffered += incrHeap;
>        while (totalBuffered > maxHeapUsage && (thrown == null ||
> thrown.get()== null)){
>          LOG.debug("Used " + totalBuffered + " bytes of buffered edits,
> waiting for IO threads...");
>          dataAvailable.wait(3000);
>        }
>        dataAvailable.notifyAll();
>      }"
> If (totalBuffered <= maxHeapUsage) and there was no more entry in .logs
> dir, archiveLogs would excute even before writeThread end.
>
> 2011/10/19 Jean-Daniel Cryans <jdcryans@apache.org>
>
> > Even if the files aren't closed properly, the fact that you are appending
> > should persist them.
> >
> > Are you using a version of Hadoop that supports sync?
> >
> > Do you have logs that show the issue where the logs were moved but not
> > written?
> >
> > Thx,
> >
> > J-D
> >
> > On Tue, Oct 18, 2011 at 7:40 AM, Mingjian Deng <koven2049@gmail.com>
> > wrote:
> >
> > > Hi:
> > >    There is a case cause data loss in our cluster. We block in splitLog
> > > because some error in our hdfs and we kill master. Some hlog files were
> > > moved from .logs to .oldlogs before them were wrote to
> .recovered.edits.
> > So
> > > rs couldn't replay these files.
> > >    In HLogSplitter.java, we found:
> > >    ...
> > >    archiveLogs(srcDir, corruptedLogs, processedLogs, oldLogDir, fs,
> > conf);
> > >    } finally {
> > >      LOG.info("Finishing writing output logs and closing down.");
> > >      splits = outputSink.finishWritingAndClose();
> > >    }
> > >    Why archiveLogs before outputSink.finishWritingAndClose()? Did these
> > > hlog files mv to .oldlogs and couldn't be split next startup if write
> > > threads failed but archiveLog success?
> > >
> >
>

--001636c5bdae6e6ea204afaa9dba--