hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 杨苏立 Yang Su Li <yangs...@gmail.com>
Subject Re: How threads interact with each other in HBase
Date Wed, 29 Mar 2017 03:53:36 GMT
Hi Josh,

Thanks a lot for your response and we really appreciate the effort you put
in to help us.

Following are responses to your comments:

1. We understand there is a block cache, and processing an RPC may only
involve looking up the cache instead of going to HDFS. That's why we said
"If the processing of RPC requires reading data from HDFS...".

2. Thanks a lot for pointing out that we over-simplified the write path in
our description (though they are reflected in the graph). I have updated
the document ( http://pages.cs.wisc.edu/~suli/hbase.pdf ) so that the
description is more accurate. We wonder if you could take a look to see if
the description now agrees with the reality.
We understand that by default put only returns after the corresponding WAL
entry is persisted. When we say "asynchronous write", we are referring to
the writes issued by the HBase to HDFS. Though the write(append) to the WAL
is alway synchronous (there are dedicated LOG sync threads continuously
issue flush/sync to HDFS), other writes HBase does (e.g., the memstore
flush writes) can be asynchronous as there no file.flush() or file.sync()
following the writes.

Again, thanks a lot!

Suli

On Tue, Mar 28, 2017 at 12:11 PM, Josh Elser <elserj@apache.org> wrote:

> 1.1 -> 2: don't forget about the block cache which can invalidate the need
> for any HDFS read.
>
> I think you're over-simplifying the write-path quite a bit. I'm not sure
> what you mean by an 'asynchronous write', but that doesn't exist at the
> HBase RPC layer as that would invalidate the consistency guarantees (if an
> RPC returns to the client that data was "put", then it is durable).
>
> Going off of memory (sorry in advance if I misstate something): the
> general way that data is written to the WAL is a "group commit". You have
> many threads all trying to append data to the WAL -- performance would be
> terrible if you serially applied all of these writes. Instead, many writes
> can be accepted and a the caller receives a Future. The caller must wait
> for the Future to complete. What's happening behind the scene is that the
> writes are being bundled together to reduce the number of syncs to the WAL
> ("grouping" the writes together). When one caller's future would complete,
> what really happened is that the write/sync which included the caller's
> update was committed (along with others). All of this is happening inside
> the RS's implementation of accepting an update.
>
> https://github.com/apache/hbase/blob/55d6dcaf877cc5223e67973
> 6eb613173229c18be/hbase-server/src/main/java/org/apache/hadoop/hbase/
> regionserver/wal/FSHLog.java#L74-L106
>
>
> 杨苏立 Yang Su Li wrote:
>
>> The attachment can be found in the following URL:
>> http://pages.cs.wisc.edu/~suli/hbase.pdf
>>
>> Sorry for the inconvenience...
>>
>>
>> On Mon, Mar 27, 2017 at 8:25 PM, Ted Yu<yuzhihong@gmail.com>  wrote:
>>
>> Again, attachment didn't come thru.
>>>
>>> Is it possible to formulate as google doc ?
>>>
>>> Thanks
>>>
>>> On Mon, Mar 27, 2017 at 6:19 PM, 杨苏立 Yang Su Li<yangsuli@gmail.com>
>>> wrote:
>>>
>>> Hi,
>>>>
>>>> I am a graduate student working on scheduling on storage systems, and we
>>>> are interested in how different threads in HBase interact with each
>>>> other
>>>> and how it might affect scheduling.
>>>>
>>>> I have written down my understanding on how HBase/HDFS works based on
>>>> its
>>>> current thread architecture (attached). I am wondering if the developers
>>>>
>>> of
>>>
>>>> HBase could take a look at it and let me know if anything is incorrect
>>>> or
>>>> inaccurate, or if I have missed anything.
>>>>
>>>> Thanks a lot for your help!
>>>>
>>>> On Wed, Mar 22, 2017 at 3:39 PM, 杨苏立 Yang Su Li<yangsuli@gmail.com>
>>>> wrote:
>>>>
>>>> Hi,
>>>>>
>>>>> I am a graduate student working on scheduling on storage systems, and
>>>>> we
>>>>> are interested in how different threads in HBase interact with each
>>>>>
>>>> other
>>>
>>>> and how it might affect scheduling.
>>>>>
>>>>> I have written down my understanding on how HBase/HDFS works based on
>>>>>
>>>> its
>>>
>>>> current thread architecture (attached). I am wondering if the
>>>>>
>>>> developers of
>>>
>>>> HBase could take a look at it and let me know if anything is incorrect
>>>>>
>>>> or
>>>
>>>> inaccurate, or if I have missed anything.
>>>>>
>>>>> Thanks a lot for your help!
>>>>>
>>>>> --
>>>>> Suli Yang
>>>>>
>>>>> Department of Physics
>>>>> University of Wisconsin Madison
>>>>>
>>>>> 4257 Chamberlin Hall
>>>>> Madison WI 53703
>>>>>
>>>>>
>>>>>
>>>> --
>>>> Suli Yang
>>>>
>>>> Department of Physics
>>>> University of Wisconsin Madison
>>>>
>>>> 4257 Chamberlin Hall
>>>> Madison WI 53703
>>>>
>>>>
>>>>
>>
>>
>>


-- 
Suli Yang

Department of Physics
University of Wisconsin Madison

4257 Chamberlin Hall
Madison WI 53703

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message