hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
Date Tue, 18 Jun 2013 20:52:21 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13687171#comment-13687171

Sergey Shelukhin commented on HBASE-8755:

+      synchronized (this.writeLock) {
+        if (txid <= this.pendingTxid)
+          return;
pendingTxid can only go up, right? If so it may make sense to speculatively check it outside
the lock; same with other places.
How often does this condition happen?

bq. upateLock 

LOG.warn("writer.getLength() failed,this failure won't block here");
This message is not very clear, also exception should be logged.

+      synchronized (this.notifyLock) {
+        this.flushedTxid = txid;
+        this.notifyLock.notify();
Here check is done outside lock but not inside, could this race?

if (txid <= this.failedTxid.get()) {
I don't quite understand the logic here. If 2 batches go thru the writer-syncer pipeline,
1st one succeeds and 
the 2nd one fails, before notifier thread wakes up, wouldn't it report the first batch also
as failed?

The same interaction I wonder about in writer and syncer.
I am not sure how HDFS write and sync interact, is the following possible or not?
Writer writes the first batch and wakes up syncer. Before syncer wakes up writer starts the
2nd batch.
Syncer wakes up and syncs, invisibly to HBase code, to the middle of the 2nd batch that is
being written (sync has no upper bound) and succeeds.
Then finishing to write the 2nd batch, or sync after, fails, so now we wrote to WAL but reported

Also can you please put comment somewhere with regard to thread safety of log rolling... I
am assuming it will be thread safe
because if we write to one file, roll in the middle and sync a different file it will just
be extra sync call, so harmless.

+        addPendingWrite(new HLog.Entry(logKey, logEdit));
addPendingWrite is called without bufferLock in some places, with in others.

Can you please add comment to bufferLock to elaborate what it locks. And that updateLock cannot
be taken inside bufferLock.
It seems that right now this holds.

Also, I understand the need for writer and sync thread, but is separate notifier thread necessary?
It doesn't do any blocking operations other than interacting with flusher thread, or taking
syncedTillHere lock, which looks like it should be uncontested most of the time.
Couldn't flusher thread have the 4~ lines that set syncedTillHere?

> A new write thread model for HLog to improve the overall HBase write throughput
> -------------------------------------------------------------------------------
>                 Key: HBASE-8755
>                 URL: https://issues.apache.org/jira/browse/HBASE-8755
>             Project: HBase
>          Issue Type: Improvement
>          Components: wal
>            Reporter: Feng Honghua
>         Attachments: HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch, HBASE-8755-trunk-V0.patch
> In current write model, each write handler thread (executing put()) will individually
go through a full 'append (hlog local buffer) => HLog writer append (write to hdfs) =>
HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock
and flushLock.
> The only optimization where checking if current syncTillHere > txid in expectation
for other thread help write/sync its own txid to hdfs and omitting the write/sync actually
help much less than expectation.
> Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new
write thread model for writing hdfs sequence file and the prototype implementation shows a
4X improvement for throughput (from 17000 to 70000+). 
> I apply this new write thread model in HLog and the performance test in our test cluster
shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 70000 for
5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator
published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed
performance test results if anyone is interested.
> The change for new write thread model is as below:
>  1> All put handler threads append the edits to HLog's local pending buffer; (it notifies
AsyncWriter thread that there is new edits in local buffer)
>  2> All put handler threads wait in HLog.syncer() function for underlying threads
to finish the sync that contains its txid;
>  3> An single AsyncWriter thread is responsible for retrieve all the buffered edits
in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher
thread that there is new writes to hdfs that needs a sync)
>  4> An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist
the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases)
>  5> An single AsyncNotifier thread is responsible for notifying all pending put handler
threads which are waiting in the HLog.syncer() function
>  6> No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads
do the same job it does)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message