hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Feng Honghua (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
Date Mon, 17 Jun 2013 11:15:20 GMT
Feng Honghua created HBASE-8755:

             Summary: A new write thread model for HLog to improve the overall HBase write
                 Key: HBASE-8755
                 URL: https://issues.apache.org/jira/browse/HBASE-8755
             Project: HBase
          Issue Type: Improvement
          Components: wal
            Reporter: Feng Honghua

In current write model, each write handler thread (executing put()) will individually go through
a full 'append (hlog local buffer) => HLog writer append (write to hdfs) => HLog writer
sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock and

The only optimization where checking if current syncTillHere > txid in expectation for
other thread help write/sync its own txid to hdfs and omitting the write/sync actually help
much less than expectation.

Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new write
thread model for writing hdfs sequence file and the prototype implementation shows a 4X improvement
for throughput (from 17000 to 70000+). 

I apply this new write thread model in HLog and the performance test in our test cluster shows
about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 70000 for 5 RS),
the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator published
in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed performance
test results if anyone is interested.

The change for new write thread model is as below:
 1> All put handler threads append the edits to HLog's local pending buffer; (it notifies
AsyncWriter thread that there is new edits in local buffer)
 2> All put handler threads wait in HLog.syncer() function for underlying threads to finish
the sync that contains its txid;
 3> An single AsyncWriter thread is responsible for retrieve all the buffered edits in
HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher
thread that there is new writes to hdfs that needs a sync)
 4> An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist
the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases)
 5> An single AsyncNotifier thread is responsible for notifying all pending put handler
threads which are waiting in the HLog.syncer() function
 6> No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads
do the same job it does)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message