hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hsieh <...@cloudera.com>
Subject Re: [DISCUSSION] Merge of the hbase-11339 mob branch into master.
Date Fri, 22 May 2015 20:45:42 GMT
In another thread andrew purtell brought up some concerns about the mob

On Fri, May 22, 2015 at 12:40 PM, Andrew Purtell <apurtell@apache.org>

> Another point of clarification, sorry, I hit the send button too early it
> seems: I don't believe MOB is fully integrated yet, for example the
> feature
> is an extension to store that lacks support for encryption (this would
> technically be a feature regression); and HBCK. I have not been following
> MOB too closely so could be mistaken. These issues do not preclude a merge
> of MOB into trunk, but do preclude a merge back of MOB from trunk to
> branch-1. I would veto the latter until such shortcomings in the
> implementation that could be described as regressions are addressed. I
> would also like to see a performance analysis of a range of workloads
> before and after in as much detail as can be mustered, and would be happy
> to volunteer to help out with that.

Here's info on the points brought up:

Encryption support shortcoming is being addrsessed here:
https://issues.apache.org/jira/browse/HBASE-13693 (closed)
https://issues.apache.org/jira/browse/HBASE-13720 (in review)

Hbck has been actually run against the integration test rigs while the
feature has been enabled but currently has no explicit unit test or simple
to run integration test.  It currently doesn't report anything special
about the mob storage area. We can add unit tests that cover hbck when the
mob path is exercised.

Another suggestion was a tool to check that mob references had
corresponding mob data.  We currently include a mr-based sweeper job that
could be used to perform this verification.  We can add this tool and
testing for the tool.

I've done some performance testing and Jingcheng and his colleagues have
done significant amounts of performance testing. We currently have a blog
post in progress that will share the results of this performance testing.


On Wed, May 20, 2015 at 7:38 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> This is a useful feature, Jon.
> I went over the mega-patch and left some comments on review board.
> I noticed that hbck was not included in the patch. Neither did I find a
> sub-task of HBASE-11339 that covers hbck.
> Do you or Jingcheng plan to add MOB-aware capability for hbck ?
> Cheers
> On Wed, May 20, 2015 at 9:21 AM, Jonathan Hsieh <jon@cloudera.com> wrote:
> > Hi folks,
> >
> > The Medium Object (MOB) Storage feature (HBASE-11339[1]) is modified I/O
> > and compaction path that allows individual moderately sized values
> > (10k-10MB) to be stored so that write amplification is reduced when
> > compared to the normal I/O path.   At a high level, it provides alternate
> > flush and compaction mechanisms that segregates large cells into a
> separate
> > area where they are not subject to potentially frequent compaction and
> > splits that can be encountered in the normal I/O path. A more detailed
> > design doc can be found on the hbase-11339 jira.
> >
> > Jingcheng Du has been working on the mob feature for a while and Anoop,
> Ram
> > and I have been shepherding him through the design revisions and
> > implementation of the feature in the hbase-11339 branch.[2]
> >
> > The branch we are proposing to merge into master is compatible with
> HBase's
> > core functionality including snapshots, replication, shell support,
> behaves
> > well with table alters, bulk loads and does not require external MR
> > processes. It has been documented, and subject to many integration test
> > runs  (ITBLL, ITAcidGuarantees, ITIngest) including fault injection.
> > Performance testing of the feature shows what can be a 2x-3x throughput
> > improvement for workloads that contain mobs. These results can be seen on
> > the hbase 2.0 panel discussion slides from hbasecon (once published).
> >
> > Recently there have been some hfile encryption related shortcomings that
> we
> > could address in branch or in master.
> >
> > Earlier iterations of the feature has been tested in production by users
> > that Jingcheng has been responsible for.  A version has also been
> deployed
> > at users I have been responsible for.  Some of the folks from Huawei
> > (ashutosh) have also been submitting the recent encryption bug reports
> > against the hbase-11339 branch so there is some evidence of usage by
> them.
> >
> > The four of us  (Jingcheng, Ram, Anoop and I) are satisfied with the
> > feature and feel it is a good time to call a merge vote.  Ive posted a
> > megapatch version for folks who want to peruse the code. [3]
> >
> > What do you all think?
> >
> > Thanks,
> > Jingcheng, Jon, Ram, and Anoop.
> >
> > [1] https://issues.apache.org/jira/browse/HBASE-11339
> > [2] https://github.com/apache/hbase/tree/hbase-11339
> > [3] https://reviews.apache.org/r/34475/
> > --
> > // Jonathan Hsieh (shay)
> > // HBase Tech Lead, Software Engineer, Cloudera
> > // jon@cloudera.com // @jmhsieh
> >

// Jonathan Hsieh (shay)
// HBase Tech Lead, Software Engineer, Cloudera
// jon@cloudera.com // @jmhsieh

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message