hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8755) A new write thread model for HLog to improve the overall HBase write throughput
Date Wed, 04 Dec 2013 07:20:54 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838679#comment-13838679
] 

stack commented on HBASE-8755:
------------------------------

Sorry for the delay getting back to this [~fenghh] and thanks for the explanation.

I am having trouble reviewing the patch because I am trying to understand what is going on
here in FSHLog.  It is hard to follow (not your patch necessarily but what is there currently)
in spite of multiple reviews.  I keep trying to grok what is going on because this is critical
code.

The numbers are hard to argue with and it does some nice cleanup of FSHLog which makes it
easier to understand.  We could commit this patch and then work on undoing the complexity
that is rife here; your patch adds yet more because it adds interacting threads w/ new synchronizations,
notifications, AtomicBoolean states, etc., which cost performance-wise but at least it is
clearer what is going on and we have tools for comparing approaches now.  We could work on
simplication and removal of sync points in a follow-on (See below for a note on one approach).

I now get why the need for multiple syncers.  It is a little counter-intuitiive given we want
to batch up edits more to get more performance on the one hand, but then on the other, we
have to sync more often because sync'ing is outstanding for too much time, so much time it
holds up handlers too long.

+ I am trying to understand why we keep aside the edits in a linked-list.  This was there
before your time.  You just continue the practice.  The original comment says "We keep them
cached here instead of writing them to HDFS piecemeal, because the HDFS write-method is pretty
heavyweight as far as locking is concerned."    Yet, when we eventually flush the edits, we
don't do anything special; we just call write on the dfsoutputstream.  We are not avoiding
locking in hdfs.  It must be the hbase flush/update locking that is being referred to here.
+ AsyncSyncer is a confounding name for a class -- but it makes sense in this context.  The
flush object in this thread is a syncer synchronization object not for memstore flushes...
as I thought it was (there is use of flush in here when it probably should be sync to be consistent).

Off-list, a few other lads are interested in reviewing this patch (it is a popular patch!)...
our [~jon@cloudera.com] and possible [~himanshu@cloudera.com] because they are getting stuck
in this area.  If they don't get to it soon, I'll commit unless objection.




> A new write thread model for HLog to improve the overall HBase write throughput
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-8755
>                 URL: https://issues.apache.org/jira/browse/HBASE-8755
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance, wal
>            Reporter: Feng Honghua
>            Assignee: stack
>            Priority: Critical
>         Attachments: 8755trunkV2.txt, HBASE-8755-0.94-V0.patch, HBASE-8755-0.94-V1.patch,
HBASE-8755-0.96-v0.patch, HBASE-8755-trunk-V0.patch, HBASE-8755-trunk-V1.patch, HBASE-8755-trunk-v4.patch
>
>
> In current write model, each write handler thread (executing put()) will individually
go through a full 'append (hlog local buffer) => HLog writer append (write to hdfs) =>
HLog writer sync (sync hdfs)' cycle for each write, which incurs heavy race condition on updateLock
and flushLock.
> The only optimization where checking if current syncTillHere > txid in expectation
for other thread help write/sync its own txid to hdfs and omitting the write/sync actually
help much less than expectation.
> Three of my colleagues(Ye Hangjun / Wu Zesheng / Zhang Peng) at Xiaomi proposed a new
write thread model for writing hdfs sequence file and the prototype implementation shows a
4X improvement for throughput (from 17000 to 70000+). 
> I apply this new write thread model in HLog and the performance test in our test cluster
shows about 3X throughput improvement (from 12150 to 31520 for 1 RS, from 22000 to 70000 for
5 RS), the 1 RS write throughput (1K row-size) even beats the one of BigTable (Precolator
published in 2011 says Bigtable's write throughput then is 31002). I can provide the detailed
performance test results if anyone is interested.
> The change for new write thread model is as below:
>  1> All put handler threads append the edits to HLog's local pending buffer; (it notifies
AsyncWriter thread that there is new edits in local buffer)
>  2> All put handler threads wait in HLog.syncer() function for underlying threads
to finish the sync that contains its txid;
>  3> An single AsyncWriter thread is responsible for retrieve all the buffered edits
in HLog's local pending buffer and write to the hdfs (hlog.writer.append); (it notifies AsyncFlusher
thread that there is new writes to hdfs that needs a sync)
>  4> An single AsyncFlusher thread is responsible for issuing a sync to hdfs to persist
the writes by AsyncWriter; (it notifies the AsyncNotifier thread that sync watermark increases)
>  5> An single AsyncNotifier thread is responsible for notifying all pending put handler
threads which are waiting in the HLog.syncer() function
>  6> No LogSyncer thread any more (since there is always AsyncWriter/AsyncFlusher threads
do the same job it does)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message