hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: How threads interact with each other in HBase
Date Sun, 02 Apr 2017 21:32:55 GMT
Need some time to digest the BOB and see if it can simplify the reasoning
of how fsync is implemented in hbase.

hdfs was evaluated by the paper where I noticed the following:

bq. both HDFS and ZooKeeper respondents lament that such an fsync() is not
easily achievable with Java

Cheers

On Sun, Apr 2, 2017 at 1:53 PM, 杨苏立 Yang Su Li <yangsuli@gmail.com> wrote:

> Regarding HBASE-5954 specifically, have you thought about using BOB (block
> order breaker,
> https://www.usenix.org/system/files/conference/osdi14/
> osdi14-paper-pillai.pdf)
> to verify if a change is correct.
>
> It allows you to explore many different crash scenarios.
>
>
>
> On Sun, Apr 2, 2017 at 1:35 PM, 杨苏立 Yang Su Li <yangsuli@gmail.com> wrote:
>
> > I understand why HBase by default does not use hsync -- it does come with
> > big performance cost (though for FSYNC_WAL which is not the default
> option,
> > you should probably do it because the documentation explicitly promised
> > it).
> >
> >
> > I just want to make sure my description about HBase is accurate,
> including
> > the durability aspect.
> >
> > On Sun, Apr 2, 2017 at 12:19 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> >> Suli:
> >> Have you looked at HBASE-5954 ?
> >>
> >> It gives some background on why hbase code is formulated the way it
> >> currently is.
> >>
> >> Cheers
> >>
> >> On Sun, Apr 2, 2017 at 9:36 AM, 杨苏立 Yang Su Li <yangsuli@gmail.com>
> >> wrote:
> >>
> >> > Don't your second paragraph just prove my point? -- If data is not
> >> > persisted to disk, then it is not durable. That is the definition of
> >> > durability.
> >> >
> >> > If you want the data to be durable, then you need to call hsync()
> >> instead
> >> > of hflush(), and that would be the correct behavior if you use
> FSYNC_WAL
> >> > flag (per HBase documentation).
> >> >
> >> > However, HBase does not do that.
> >> >
> >> > Suli
> >> >
> >> > On Sun, Apr 2, 2017 at 11:26 AM, Josh Elser <josh.elser@gmail.com>
> >> wrote:
> >> >
> >> > > No, that's not correct. HBase would, by definition, not be a
> >> > > consistent database if a write was not durable when a client sees
a
> >> > > successful write.
> >> > >
> >> > > The point that I will concede to you is that the hflush call may,
in
> >> > > extenuating circumstances, may not be completely durable. For
> example,
> >> > > HFlush does not actually force the data to disk. If an abrupt power
> >> > > failure happens before this data is pushed to disk, HBase may think
> >> > > that data was durable when it actually wasn't (at the HDFS level).
> >> > >
> >> > > On Thu, Mar 30, 2017 at 4:26 PM, 杨苏立 Yang Su Li <yangsuli@gmail.com
> >
> >> > > wrote:
> >> > > > Also, please correct me if I am wrong, but I don't think a put
is
> >> > durable
> >> > > > when an RPC returns to the client. Just its corresponding WAL
> entry
> >> is
> >> > > > pushed to the memory of all three data nodes, so it has a low
> >> > probability
> >> > > > of being lost. But nothing is persisted at this point.
> >> > > >
> >> > > > And this is true no mater you use SYNC_WAL or FSYNC_WAL flag.
> >> > > >
> >> > > > On Tue, Mar 28, 2017 at 12:11 PM, Josh Elser <elserj@apache.org>
> >> > wrote:
> >> > > >
> >> > > >> 1.1 -> 2: don't forget about the block cache which can
invalidate
> >> the
> >> > > need
> >> > > >> for any HDFS read.
> >> > > >>
> >> > > >> I think you're over-simplifying the write-path quite a bit.
I'm
> not
> >> > sure
> >> > > >> what you mean by an 'asynchronous write', but that doesn't
exist
> at
> >> > the
> >> > > >> HBase RPC layer as that would invalidate the consistency
> guarantees
> >> > (if
> >> > > an
> >> > > >> RPC returns to the client that data was "put", then it is
> durable).
> >> > > >>
> >> > > >> Going off of memory (sorry in advance if I misstate something):
> the
> >> > > >> general way that data is written to the WAL is a "group commit".
> >> You
> >> > > have
> >> > > >> many threads all trying to append data to the WAL -- performance
> >> would
> >> > > be
> >> > > >> terrible if you serially applied all of these writes. Instead,
> many
> >> > > writes
> >> > > >> can be accepted and a the caller receives a Future. The caller
> must
> >> > wait
> >> > > >> for the Future to complete. What's happening behind the scene
is
> >> that
> >> > > the
> >> > > >> writes are being bundled together to reduce the number of
syncs
> to
> >> the
> >> > > WAL
> >> > > >> ("grouping" the writes together). When one caller's future
would
> >> > > complete,
> >> > > >> what really happened is that the write/sync which included
the
> >> > caller's
> >> > > >> update was committed (along with others). All of this is
> happening
> >> > > inside
> >> > > >> the RS's implementation of accepting an update.
> >> > > >>
> >> > > >> https://github.com/apache/hbase/blob/55d6dcaf877cc5223e67973
> >> > > >> 6eb613173229c18be/hbase-server/src/main/java/org/apache/
> >> hadoop/hbase/
> >> > > >> regionserver/wal/FSHLog.java#L74-L106
> >> > > >>
> >> > > >>
> >> > > >> 杨苏立 Yang Su Li wrote:
> >> > > >>
> >> > > >>> The attachment can be found in the following URL:
> >> > > >>> http://pages.cs.wisc.edu/~suli/hbase.pdf
> >> > > >>>
> >> > > >>> Sorry for the inconvenience...
> >> > > >>>
> >> > > >>>
> >> > > >>> On Mon, Mar 27, 2017 at 8:25 PM, Ted Yu<yuzhihong@gmail.com>
> >> wrote:
> >> > > >>>
> >> > > >>> Again, attachment didn't come thru.
> >> > > >>>>
> >> > > >>>> Is it possible to formulate as google doc ?
> >> > > >>>>
> >> > > >>>> Thanks
> >> > > >>>>
> >> > > >>>> On Mon, Mar 27, 2017 at 6:19 PM, 杨苏立 Yang Su
Li<
> >> yangsuli@gmail.com>
> >> > > >>>> wrote:
> >> > > >>>>
> >> > > >>>> Hi,
> >> > > >>>>>
> >> > > >>>>> I am a graduate student working on scheduling
on storage
> >> systems,
> >> > > and we
> >> > > >>>>> are interested in how different threads in HBase
interact with
> >> each
> >> > > >>>>> other
> >> > > >>>>> and how it might affect scheduling.
> >> > > >>>>>
> >> > > >>>>> I have written down my understanding on how HBase/HDFS
works
> >> based
> >> > on
> >> > > >>>>> its
> >> > > >>>>> current thread architecture (attached). I am
wondering if the
> >> > > developers
> >> > > >>>>>
> >> > > >>>> of
> >> > > >>>>
> >> > > >>>>> HBase could take a look at it and let me know
if anything is
> >> > > incorrect
> >> > > >>>>> or
> >> > > >>>>> inaccurate, or if I have missed anything.
> >> > > >>>>>
> >> > > >>>>> Thanks a lot for your help!
> >> > > >>>>>
> >> > > >>>>> On Wed, Mar 22, 2017 at 3:39 PM, 杨苏立 Yang
Su Li<
> >> yangsuli@gmail.com
> >> > >
> >> > > >>>>> wrote:
> >> > > >>>>>
> >> > > >>>>> Hi,
> >> > > >>>>>>
> >> > > >>>>>> I am a graduate student working on scheduling
on storage
> >> systems,
> >> > > and
> >> > > >>>>>> we
> >> > > >>>>>> are interested in how different threads in
HBase interact
> with
> >> > each
> >> > > >>>>>>
> >> > > >>>>> other
> >> > > >>>>
> >> > > >>>>> and how it might affect scheduling.
> >> > > >>>>>>
> >> > > >>>>>> I have written down my understanding on how
HBase/HDFS works
> >> based
> >> > > on
> >> > > >>>>>>
> >> > > >>>>> its
> >> > > >>>>
> >> > > >>>>> current thread architecture (attached). I am
wondering if the
> >> > > >>>>>>
> >> > > >>>>> developers of
> >> > > >>>>
> >> > > >>>>> HBase could take a look at it and let me know
if anything is
> >> > > incorrect
> >> > > >>>>>>
> >> > > >>>>> or
> >> > > >>>>
> >> > > >>>>> inaccurate, or if I have missed anything.
> >> > > >>>>>>
> >> > > >>>>>> Thanks a lot for your help!
> >> > > >>>>>>
> >> > > >>>>>> --
> >> > > >>>>>> Suli Yang
> >> > > >>>>>>
> >> > > >>>>>> Department of Physics
> >> > > >>>>>> University of Wisconsin Madison
> >> > > >>>>>>
> >> > > >>>>>> 4257 Chamberlin Hall
> >> > > >>>>>> Madison WI 53703
> >> > > >>>>>>
> >> > > >>>>>>
> >> > > >>>>>>
> >> > > >>>>> --
> >> > > >>>>> Suli Yang
> >> > > >>>>>
> >> > > >>>>> Department of Physics
> >> > > >>>>> University of Wisconsin Madison
> >> > > >>>>>
> >> > > >>>>> 4257 Chamberlin Hall
> >> > > >>>>> Madison WI 53703
> >> > > >>>>>
> >> > > >>>>>
> >> > > >>>>>
> >> > > >>>
> >> > > >>>
> >> > > >>>
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Suli Yang
> >> > > >
> >> > > > Department of Physics
> >> > > > University of Wisconsin Madison
> >> > > >
> >> > > > 4257 Chamberlin Hall
> >> > > > Madison WI 53703
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Suli Yang
> >> >
> >> > Department of Physics
> >> > University of Wisconsin Madison
> >> >
> >> > 4257 Chamberlin Hall
> >> > Madison WI 53703
> >> >
> >>
> >
> >
> >
> > --
> > Suli Yang
> >
> > Department of Physics
> > University of Wisconsin Madison
> >
> > 4257 Chamberlin Hall
> > Madison WI 53703
> >
> >
>
>
> --
> Suli Yang
>
> Department of Physics
> University of Wisconsin Madison
>
> 4257 Chamberlin Hall
> Madison WI 53703
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message