hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0
Date Sat, 30 Apr 2016 18:34:20 GMT
On Sat, Apr 30, 2016 at 6:33 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> What about support for Transparent Data Encryption feature which was
> introduced in Apache Hadoop 2.6.0 ?
>
>
Transparent: "...(of a process or interface) functioning without the user
being aware of its presence."
St.Ack



> On Fri, Apr 29, 2016 at 6:24 PM, 张铎 <palomino219@gmail.com> wrote:
>
> > Yes, it does. There is testcase that enumerates all the possible
> protection
> > level(authentication, integrity and privacy) and encryption
> algorithm(none,
> > 3des, rc4).
> >
> >
> >
> https://github.com/apache/hbase/blob/master/hbase-server/src/test/java/org/apache/hadoop/hbase/io/asyncfs/TestSaslFanOutOneBlockAsyncDFSOutput.java
> >
> > I have also tested it in a secure cluster(hbase-2.0.0-SNAPSHOT and
> > hadoop-2.4.0).
> >
> > Thanks.
> >
> > 2016-04-30 2:32 GMT+08:00 Gary Helmling <ghelmling@gmail.com>:
> >
> > > How well has this been tested on secure clusters?  I know SASL support
> > was
> > > lacking initially, but I believe it had been added?  Does AsyncFSWAL
> > > support all the HDFS transport encryption options?
> > >
> > >
> > > On Fri, Apr 29, 2016 at 12:05 AM Stack <stack@duboce.net> wrote:
> > >
> > > > I'm +1 on enabling asyncfswal as default in 2.0:
> > > >
> > > > + We'll have plenty of time to figure issues if any if we get it in
> > now,
> > > > early.
> > > > + The improvement in throughput is substantial
> > > > + There are now less moving parts
> > > > + A critical piece of our write path is much less opaque in its
> > workings
> > > > and no longer (effectively) immutable
> > > >
> > > > St.Ack
> > > >
> > > >
> > > > On Thu, Apr 28, 2016 at 11:53 PM, 张铎 <palomino219@gmail.com>
wrote:
> > > >
> > > > > I‘ve done dig in HDFS and HADOOP proejcts and found that there
is
> an
> > > > active
> > > > > issue HADOOP-12910 that related to asynchronous FileSystem
> > > > implementation.
> > > > >
> > > > > I have left some comments on it, maybe we could start from there.
> > > > >
> > > > > Thanks.
> > > > >
> > > > > 2016-04-29 14:42 GMT+08:00 Stack <stack@duboce.net>:
> > > > >
> > > > > > On Thu, Apr 28, 2016 at 8:47 PM, Ted Yu <yuzhihong@gmail.com>
> > wrote:
> > > > > >
> > > > > > > Last comment on HDFS-916 was from 2010.
> > > > > > >
> > > > > > > Suggest making a new issue or reviving discussion on HDFS-916
> > > > > (currently
> > > > > > > assigned to Todd).
> > > > > > >
> > > > > > >
> > > > > > Duo is on it. Some mileage and confidence in the new code would
> be
> > > good
> > > > > to
> > > > > > have before going to HDFS (Getting stuff into HDFS is a PITA
at
> the
> > > > best
> > > > > of
> > > > > > times... lets have a good case when we go there).
> > > > > >
> > > > > >
> > > > > > > bq. The fallback implementation is not aim to get a good
> > > performance
> > > > > > >
> > > > > > > For more than two weeks, I have been working with Azure
Data
> Lake
> > > > > > > developers so that all hbase system tests pass on ADLS
- there
> > were
> > > > > > subtle
> > > > > > > differences between ADLS and hdfs.
> > > > > > >
> > > > > > > If switching to AsyncWAL gives either WASB or ADLS subpar
> > > > performance,
> > > > > it
> > > > > > > would make upgrading to hbase 2.x unacceptable for their
users.
> > > > > > >
> > > > > > >
> > > > > > Just use FSHLog instead of asyncfswal when up on WASB. Its just
a
> > > > config
> > > > > > change.
> > > > > >
> > > > > > St.Ack
> > > > > >
> > > > > >
> > > > > >
> > > > > > > On Thu, Apr 28, 2016 at 8:39 PM, 张铎 <palomino219@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > > 2016-04-29 11:35 GMT+08:00 Ted Yu <yuzhihong@gmail.com>:
> > > > > > > >
> > > > > > > > > bq. AsyncFSOutput will be in HDFS-3.0
> > > > > > > > >
> > > > > > > > > Is there HDFS JIRA for the above ? Can you share
the
> number ?
> > > > > > > > >
> > > > > > > > I have not filed a new one but there are bunch of
related
> > issues
> > > > > > already,
> > > > > > > > such as this one
> > https://issues.apache.org/jira/browse/HDFS-916
> > > > > > > >
> > > > > > > > >
> > > > > > > > > bq. Just wrap FSDataOutputStream to make it act
like an
> > > > > asynchronous
> > > > > > > > output
> > > > > > > > >
> > > > > > > > > Can you be a bit more specific ?
> > > > > > > > > HBase currently works with WASB and Azure Data
Lake. Does
> the
> > > > above
> > > > > > > mean
> > > > > > > > > their performance would suffer ?
> > > > > > > > >
> > > > > > > > Yes, the performance will suffer...
> > > > > > > > The fallback implementation is not aim to get a good
> > performance,
> > > > > just
> > > > > > > for
> > > > > > > > compatibility with any FileSystem implementation.
> > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, Apr 28, 2016 at 8:30 PM, 张铎 <palomino219@gmail.com
> >
> > > > wrote:
> > > > > > > > >
> > > > > > > > > > Inline comments.
> > > > > > > > > > Thanks,
> > > > > > > > > >
> > > > > > > > > > 2016-04-29 10:57 GMT+08:00 Sean Busbey <
> > busbey@cloudera.com
> > > >:
> > > > > > > > > >
> > > > > > > > > > > I am nervous about having default out-of-the-box
new
> > HBase
> > > > > users
> > > > > > > > > reliant
> > > > > > > > > > on
> > > > > > > > > > > a bespoke HDFS client, especially given
Hadoop's
> > > > compatibility
> > > > > > > > > > > promises and history. Answers for these
questions would
> > > make
> > > > me
> > > > > > > more
> > > > > > > > > > > confident:
> > > > > > > > > > >
> > > > > > > > > > > 1) Where are we on getting the client-side
changes to
> > HDFS
> > > > > pushed
> > > > > > > > back
> > > > > > > > > > > upstream?
> > > > > > > > > > >
> > > > > > > > > > No progress yet... Here I want to tell a
good story that
> > > HBase
> > > > is
> > > > > > > > already
> > > > > > > > > > use it as default :)
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 2) How well do we detect when our FS
is not HDFS and
> what
> > > > does
> > > > > > > > > > > fallback look like?
> > > > > > > > > > >
> > > > > > > > > > Just wrap FSDataOutputStream to make it
act like an
> > > > asynchronous
> > > > > > > > > > output(call hflush in a separated thread).
The
> performance
> > is
> > > > not
> > > > > > > good
> > > > > > > > I
> > > > > > > > > > think.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 3) Will this mean altering the versions
of Hadoop we
> > label
> > > as
> > > > > > > > > > > supported for HBase 2.y+?
> > > > > > > > > > >
> > > > > > > > > > I have tested with hadoop versions from
2.4.x to 2.7.x,
> so
> > I
> > > > > don't
> > > > > > > > think
> > > > > > > > > we
> > > > > > > > > > need to change the supported versions?
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > 4) How are we going to ensure our client
remains
> > compatible
> > > > > with
> > > > > > > > newer
> > > > > > > > > > > Hadoop releases?
> > > > > > > > > > >
> > > > > > > > > > We can not ensure, HDFS always breaks HBase
at a new
> > > release...
> > > > > > > > > > I need to test AsyncFSWAL on every new 2.x
release and
> make
> > > it
> > > > > > > > compatible
> > > > > > > > > > with that version. And back to #1, I think
we should make
> > > sure
> > > > > that
> > > > > > > the
> > > > > > > > > > AsyncFSOutput will be in HDFS-3.0. And in
HBase-3.0, we
> can
> > > > > > > introduce a
> > > > > > > > > new
> > > > > > > > > > 'AsyncFSWAL' that use the AsyncFSOutput
in HDFS.
> > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > On Thu, Apr 28, 2016 at 9:42 PM, Duo
Zhang <
> > > > > zhangduo@apache.org>
> > > > > > > > > wrote:
> > > > > > > > > > > > Six month after I filed HBASE-14790...
> > > > > > > > > > > >
> > > > > > > > > > > > Now the AsyncFSWAL is ready. The
WALPE result shows
> > that
> > > it
> > > > > is
> > > > > > > > > > > *1.4x~3.7x*
> > > > > > > > > > > > faster than FSHLog. The ITBLL
result turns out that
> it
> > is
> > > > > *not
> > > > > > > bad*
> > > > > > > > > > than
> > > > > > > > > > > > FSHLog(the master branch is not
that stable
> itself...).
> > > > > > > > > > > >
> > > > > > > > > > > > More details can be found on HBASE-15536.
> > > > > > > > > > > >
> > > > > > > > > > > > So here we propose to change the
default WAL from
> > FSHLog
> > > to
> > > > > > > > > AsyncFSWAL.
> > > > > > > > > > > > Suggestions are welcomed.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks.
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > --
> > > > > > > > > > > busbey
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message