hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: [DISCUSS] Make AsyncFSWAL the default WAL in 2.0
Date Fri, 29 Apr 2016 03:47:35 GMT
Last comment on HDFS-916 was from 2010.

Suggest making a new issue or reviving discussion on HDFS-916 (currently
assigned to Todd).

bq. The fallback implementation is not aim to get a good performance

For more than two weeks, I have been working with Azure Data Lake
developers so that all hbase system tests pass on ADLS - there were subtle
differences between ADLS and hdfs.

If switching to AsyncWAL gives either WASB or ADLS subpar performance, it
would make upgrading to hbase 2.x unacceptable for their users.

On Thu, Apr 28, 2016 at 8:39 PM, 张铎 <palomino219@gmail.com> wrote:

> 2016-04-29 11:35 GMT+08:00 Ted Yu <yuzhihong@gmail.com>:
>
> > bq. AsyncFSOutput will be in HDFS-3.0
> >
> > Is there HDFS JIRA for the above ? Can you share the number ?
> >
> I have not filed a new one but there are bunch of related issues already,
> such as this one https://issues.apache.org/jira/browse/HDFS-916
>
> >
> > bq. Just wrap FSDataOutputStream to make it act like an asynchronous
> output
> >
> > Can you be a bit more specific ?
> > HBase currently works with WASB and Azure Data Lake. Does the above mean
> > their performance would suffer ?
> >
> Yes, the performance will suffer...
> The fallback implementation is not aim to get a good performance, just for
> compatibility with any FileSystem implementation.
>
> >
> > On Thu, Apr 28, 2016 at 8:30 PM, 张铎 <palomino219@gmail.com> wrote:
> >
> > > Inline comments.
> > > Thanks,
> > >
> > > 2016-04-29 10:57 GMT+08:00 Sean Busbey <busbey@cloudera.com>:
> > >
> > > > I am nervous about having default out-of-the-box new HBase users
> > reliant
> > > on
> > > > a bespoke HDFS client, especially given Hadoop's compatibility
> > > > promises and history. Answers for these questions would make me more
> > > > confident:
> > > >
> > > > 1) Where are we on getting the client-side changes to HDFS pushed
> back
> > > > upstream?
> > > >
> > > No progress yet... Here I want to tell a good story that HBase is
> already
> > > use it as default :)
> > >
> > > >
> > > > 2) How well do we detect when our FS is not HDFS and what does
> > > > fallback look like?
> > > >
> > > Just wrap FSDataOutputStream to make it act like an asynchronous
> > > output(call hflush in a separated thread). The performance is not good
> I
> > > think.
> > >
> > > >
> > > > 3) Will this mean altering the versions of Hadoop we label as
> > > > supported for HBase 2.y+?
> > > >
> > > I have tested with hadoop versions from 2.4.x to 2.7.x, so I don't
> think
> > we
> > > need to change the supported versions?
> > >
> > > >
> > > > 4) How are we going to ensure our client remains compatible with
> newer
> > > > Hadoop releases?
> > > >
> > > We can not ensure, HDFS always breaks HBase at a new release...
> > > I need to test AsyncFSWAL on every new 2.x release and make it
> compatible
> > > with that version. And back to #1, I think we should make sure that the
> > > AsyncFSOutput will be in HDFS-3.0. And in HBase-3.0, we can introduce a
> > new
> > > 'AsyncFSWAL' that use the AsyncFSOutput in HDFS.
> > >
> > > >
> > > > On Thu, Apr 28, 2016 at 9:42 PM, Duo Zhang <zhangduo@apache.org>
> > wrote:
> > > > > Six month after I filed HBASE-14790...
> > > > >
> > > > > Now the AsyncFSWAL is ready. The WALPE result shows that it is
> > > > *1.4x~3.7x*
> > > > > faster than FSHLog. The ITBLL result turns out that it is *not bad*
> > > than
> > > > > FSHLog(the master branch is not that stable itself...).
> > > > >
> > > > > More details can be found on HBASE-15536.
> > > > >
> > > > > So here we propose to change the default WAL from FSHLog to
> > AsyncFSWAL.
> > > > > Suggestions are welcomed.
> > > > >
> > > > > Thanks.
> > > >
> > > >
> > > >
> > > > --
> > > > busbey
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message