hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan Rawson <ryano...@gmail.com>
Subject Re: commit semantics
Date Mon, 11 Jan 2010 23:58:55 GMT
Performance.... It's all about performance.

In my own tests, calling sync() in HDFS-0.21 on every single commit
can limit the number of small rows you do to about a max of 1200 a
second.  One way to speed things up is to sync less often.  Another
way is to sync on a timer instead.  Both of these are going to be way
more important in HDFS-0.21/Hbase-0.21.

If we are talking about hdfs/hadoop 0.20, it hardly matters either
way, there is that whole 'no append/sync' thing you know all about.

-ryan

On Mon, Jan 11, 2010 at 3:46 PM, Joydeep Sarma <jssarma@apache.org> wrote:
> Hey HBase-devs,
>
> we have been going through hbase code to come up to speed.
>
> One of the questions was regarding the commit semantics. Thumbing through
> the RegionServer code that's appending to the wal:
>
> syncWal -> HLog.sync -> addToSyncQueue ->syncDone.await()
>
> and the log writer thread calls:
>
> hflush(), syncDone.signalAll()
>
> however hflush doesn't necessarily call a sync on the underlying log file:
>
>      if (this.forceSync ||
>          this.unflushedEntries.get() >= this.flushlogentries) { ... sync()
> ... }
>
> so it seems that if forceSync is not true, the syncWal can unblock before a
> sync is called (and forcesync seems to be only true for metaregion()).
>
> are we missing something - or is there a bug here (the signalAll should be
> conditional on hflush having actually flushed something).
>
> thanks,
>
> Joydeep
>

Mime
View raw message