hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: HLog sync() questions
Date Fri, 29 Jan 2010 20:30:13 GMT
On Fri, Jan 29, 2010 at 5:17 AM, Lars George <lars.george@gmail.com> wrote:
>  @Override
>  public void sync() throws IOException {
>    this.writer.sync();
>    if (this.writer_out != null) {
>      this.writer_out.sync();
>    }
>  }
> The first sync calls SequenceFile.Writer.sync() which is not at all
> what we want, i.e. the hflush or flush/sync in general but setting a
> sync marker in the file. What purpose does that have and why is that
> needed here?

Yes. SF.sync does not what you'd expect a sync to do, as you point out
later in your mail.

We continue to call it in TRUNK because its what we've always done and
my thinking was, maybe the marker written to SF file will be of use.
IIUC the marker is used to pick up the parse after we've crossed the
corrupted section.

> The second call is the deprecated FSDataOutputStream.sync() which does
> the actual hflush() internally. Just wondering what the sync is for
> really.

Yeah, the call to FSDOS.sync is the important one.  Its deprecated but
its doing the right thing (At the time of writing this was all that
was available).

> Looking at the old 0.20 call
>  public void sync() throws IOException {
>    lastLogFlushTime = System.currentTimeMillis();
>    if (this.append && syncfs != null) {
>      try {
>        this.syncfs.invoke(this.writer, NO_ARGS);
>      } catch (Exception e) {
>        throw new IOException("Reflection", e);
>      }
>    } else {
>      this.writer.sync();
>    }
>    this.unflushedEntries.set(0);
>    syncTime += System.currentTimeMillis() - lastLogFlushTime;
>    syncOps++;
>  }
> That is calling thew actual syncfs aka hflush

No.  syncfs is hdfs-200 thing.   hflush is hdfs-265.  In 0.20, we had
it rigged so that if hdfs-200 was present, we'd notice it and call
syncFs on any sync invocation.

> Why are we calling it? It is called internally anyways after every few
> hundred bytes it says here
>    synchronized void checkAndWriteSync() throws IOException {
>      if (sync != null &&
>          out.getPos() >= lastSyncPos+SYNC_INTERVAL) { // time to emit sync
>        sync();
>      }
>    }
> Just when a KV is say 10KB then it is basically in between nearly
> every KV (unless the HLogKey + KeyValue is minute).

I suppose it should come as no surprise that if actual SF sync is
managed internally, it won't match KV boundaries.   I think it may
prove of use still though Lars letting the SF parse pick up again
after its crossed a corrupt patch.  I haven't checked the code though
and maybe we want to fail if SF has any corruption in it?

Thanks for digging in on this Lars.

View raw message