Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EBC35185B0 for ; Thu, 28 May 2015 04:58:49 +0000 (UTC) Received: (qmail 11937 invoked by uid 500); 28 May 2015 04:58:49 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 11844 invoked by uid 500); 28 May 2015 04:58:49 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 11832 invoked by uid 99); 28 May 2015 04:58:48 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 May 2015 04:58:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 5878FC94C3 for ; Thu, 28 May 2015 04:58:48 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.9 X-Spam-Level: *** X-Spam-Status: No, score=3.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id quqVvs0eq_hS for ; Thu, 28 May 2015 04:58:36 +0000 (UTC) Received: from mail-yh0-f47.google.com (mail-yh0-f47.google.com [209.85.213.47]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id E8DD020F8D for ; Thu, 28 May 2015 04:58:35 +0000 (UTC) Received: by yhda23 with SMTP id a23so8227076yhd.2 for ; Wed, 27 May 2015 21:58:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=AblYvNvW+uEbyZWyktXr1ErvgjmqrsS9hArI9hzcUPA=; b=Bnm/CWDds6FLu4AcLtRWqF8ErosmNetjDV8DYKJilXxDP2TfhH3qXTaxKPxmTnoBys l4j2AHRzJWAH7l9V1kdfViS4r8zkhhl7ie0Q2BALns9r++K0tozaVpBv1WzWAC2Lso27 tWkivt3UcXKGN2eiZRSNSiDqasKbDuz3cuxL+8q8MAaVPuCIDMvxZrf2UiAjA3jpqofC Ur9N5UgjMFpCHDyoZYeLp1xJZNTS0Y+/3xuFeKyTbBJvQkvBVJqlDkzZegaDCydTsQ2c XWJHqYWm7b9RJmLynYue+vpuBgK5rGH1NfecNeaUVzvts900pQior1XvJN7L5vtJiVvp LbZw== MIME-Version: 1.0 X-Received: by 10.236.110.163 with SMTP id u23mr988945yhg.50.1432789115128; Wed, 27 May 2015 21:58:35 -0700 (PDT) Received: by 10.129.43.66 with HTTP; Wed, 27 May 2015 21:58:35 -0700 (PDT) In-Reply-To: References: Date: Thu, 28 May 2015 10:28:35 +0530 Message-ID: Subject: Re: [DISCUSSION] Merge of the hbase-11339 mob branch into master. From: ramkrishna vasudevan To: "dev@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a113343463ed29505171d3680 --001a113343463ed29505171d3680 Content-Type: text/plain; charset=UTF-8 bq.So is a MR runtime required for MOB or not? I read maybe, then no, then here maybe again. What happens if one does not have a MR runtime and therefore can never run the sweeper tool? Just to make it clear, now MOB does not have MR dependency. The V1 version had a sweeper tool that was dependent on MR. The tool exists even now and that still depends on MR. Its like an add on. The compaction of MOB is now embedded as part of the core feature of compaction without having to use MR. Regards Ram On Thu, May 28, 2015 at 10:20 AM, Andrew Purtell wrote: > I have no concerns about MOB in trunk. Go for it. > > I do have concerns about a subsequent proposal to put it in 1.2. Those > concerns center around stability and performance impacts, and a possible > dependency on a MR runtime for what I would consider core function. > > > Regarding the tools and integrity checks, > MOB has a tool based on MR basically for sweeping and compaction apart from > the compactor that runs in the core (without MR dependency). > > So is a MR runtime required for MOB or not? I read maybe, then no, then > here maybe again. What happens if one does not have a MR runtime and > therefore can never run the sweeper tool? An incomplete feature on trunk > isn't a problem. Later commits can fill in the gaps and then the sum of MOB > commits can go back to branch-1. (Experimental != incomplete, IMHO.) > > If as you say stability and performance testing have already be done and > both look great, then that means *when* this is done again for a branch-1 > merge candidate, the results will likely also be good. I'd like to help out > with this. You won't need to prove it, I will do the legwork for my own > concerns. > > > > On May 27, 2015, at 8:59 PM, ramkrishna vasudevan < > ramkrishna.s.vasudevan@gmail.com> wrote: > > > > Chiming late here, > > > > As Matt suggested earlier, utmost care had been taken to ensure that the > > MOB code does not interfere with the normal flow and ensured that things > > work normally when MOB is not enabled on a family. > > > > So the entire flow for MOB can be treated as an experimental feature, if > > need be. Take the latest case of guys from Huawei, since they have some > > interest in this feature they are trying the branch hbase-11339 and > trying > > to see how MOB works. > > > > If we move this to trunk, then chances of even more people looking into > it > > and by the time it comes to 1.3 or1.4 we are stable enough. > > > > Regarding the tools and integrity checks, > > MOB has a tool based on MR basically for sweeping and compaction apart > from > > the compactor that runs in the core (without MR dependency). We could > > always add feature to the existing tool to do integrity checks like Jon > > suggests. > > > > .Also for an experimental feature we could always come up with such a > tool, > > but in case of MOB the inter dependency on the MOB and actual HFile data > is > > more so just a stand alone too to check integrity on the Hfile may not be > > easy without having to do some sort of scan on the Hfiles and MOB files. > > (Not thought on that fully). > > > > I would still think that having this feature as experimental in 1.2 makes > > sense. Just my thoughts on this also after being part on the dev process > > for this feature where we tried not to touch the core areas affecting non > > MOB cases. > > > > Some of the perf results performed by Jingcheng's team and Cloudera folks > > substantiates the gain this feature provides. > > > > Regards > > Ram > > > > > > > > > >> On Thu, May 28, 2015 at 9:04 AM, Andrew Purtell > wrote: > >> > >> Inline > >> > >>> On Wednesday, May 27, 2015, Jonathan Hsieh wrote: > >>> > >>> On Sat, May 23, 2015 at 9:40 AM, Andrew Purtell >>> > wrote: > >>> > >>>> Regarding performance testing: Whatever has been done on the MOB > branch > >>>> will be interesting data points, and, potentially encouraging, but > >>> porting > >>>> to branch-1 will produce a new code base. Earlier results on other > code > >>>> will not be applicable. We have to start over. Like I said elsewhere, > >> I'm > >>>> happy to help with (re)characterizing the perf impact and improvements > >>>> produced by the changes. > >>> Thank you for offer for help -- we'd appreciated it! > >> You bet. > >> > >> > >>> Although most of my it tests and perf tests results were done against > >>> against trunk (from sept '14 and then later feb '15 -- we've been doing > >>> them roughly every two weeks now) Jingcheng's most recent performance > >>> testing and fault injection testing results were actually done against > a > >>> version merged/rebased on to hbase 1.0.0[1]. Though not on the most > >> recent > >>> branch-1, would this be close enough and sufficient or would you still > >> want > >>> to redoing them? > >> > >> > >> Closer, yes. > >> > >> Redo on the branch-1 merge proposal would be important as a confidence > >> builder still I believe. > >> > >> > >>> > >>> If we want to redo them when we have a 1.x backport is ready to > propose, > >>> we'll include the augmented ltt[2] that will make it easy to exercise > the > >>> mob feature's performance. > >>> > >>> [1] https://github.com/cloudera/hbase/commits/cdh5-1.0.0_5.4.0?page=2 > >>> (this is cdh5.4.0's hbase 1.0.0-based hbase) > >>> [2] https://issues.apache.org/jira/browse/HBASE-13277 > >>> > >>> > >>> What coverage do we have for verifying the integrity of MOB references? > >>>> Will the sweep tool detect, alert on, and optionally repair dangling > >>>> references? (I could answer this for myself by looking at MOB branch, > >> but > >>>> hopefully someone here has an answer at the ready.) I assume we > >> calculate > >>>> and store checksums for MOB data itself so we know if values are > >> corrupt. > >>>> Does the sweep tool detect MOB value corruption? Can it be repaired? > Do > >>> we > >>>> have a good ops story for why HBCK is no longer sufficient on its own, > >>>> there's a separate tool with a whole new set of options - and a > >>> requirement > >>>> for a MR runtime! - for checking MOB data? That last one is a > >> rhetorical > >>>> question (smile), the ops story is... unsatisfying. It's like we've > >>> taken a > >>>> self sufficient HBase and bolted in parts of Hive, so now we need MR. > >>>> > >>>> Our internal compaction detects and alerts at warn level if there is a > >>> missing link [3], and then returns a empty value [4] > >> > >> > >> Ok, thanks > >> > >> > >>> Mobs are stored in hfiles so we have the same checksumming all other > >> hfiles > >>> have. > >> > >> > >> Ok, thanks > >> > >> > >>> > >>> In the other response, I answered about hbck and how something like > >>> Hfile.main() could be a more appropriate checking tool to address this > >>> situation. > >> > >> > >> Ok. Replied there. > >> > >> > >>> > >>> I'm afraid then much of our complete operational story is > "unsatisfying" > >> > >> even without mob because it still requires MR -- e.g. copytable, export, > >>> import, walplayer, or verifyreplicaion mr jobs. While I'll agree that > >>> having an external system is undesirable and unacceptable for what are > >>> mandatory internal operations like compactions, I think requiring mr > for > >> a > >>> verifiymob mr job would as acceptable as the verfiyreplication job. > >> > >> > >> I think integrity checks are a different class of tool than all others > and > >> we shouldn't mandate the presence of a MR runtime to execute those. > OTOH, > >> it's reasonable to provide a standalone tool (if multithreaded) but > >> then also a recommended MR version that can achieve better parallelism. > >> > >> > >>> > >>> [3] > >> > https://github.com/apache/hbase/blob/hbase-11339/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HMobStore.java#L400 > >>> [4] > >> > https://github.com/apache/hbase/blob/hbase-11339/hbase-server/src/main/java/org/apache/hadoop/hbase/mob/DefaultMobCompactor.java#L224 > >>> > >>>> > >>>>> On Fri, May 22, 2015 at 1:45 PM, Jonathan Hsieh >>>> > wrote: > >>>> > >>>>> In another thread andrew purtell brought up some concerns about the > >> mob > >>>>> feature: > >>>>> > >>>>> On Fri, May 22, 2015 at 12:40 PM, Andrew Purtell < > >> apurtell@apache.org > >>> > > >>>>> wrote: > >>>>> > >>>>>> Another point of clarification, sorry, I hit the send button too > >>> early > >>>> it > >>>>>> seems: I don't believe MOB is fully integrated yet, for example the > >>>>>> feature > >>>>>> is an extension to store that lacks support for encryption (this > >>> would > >>>>>> technically be a feature regression); and HBCK. I have not been > >>>> following > >>>>>> MOB too closely so could be mistaken. These issues do not preclude > >> a > >>>>> merge > >>>>>> of MOB into trunk, but do preclude a merge back of MOB from trunk > >> to > >>>>>> branch-1. I would veto the latter until such shortcomings in the > >>>>>> implementation that could be described as regressions are > >> addressed. > >>> I > >>>>>> would also like to see a performance analysis of a range of > >> workloads > >>>>>> before and after in as much detail as can be mustered, and would be > >>>> happy > >>>>>> to volunteer to help out with that. > >>>>> > >>>>> Here's info on the points brought up: > >>>>> > >>>>> Encryption support shortcoming is being addrsessed here: > >>>>> https://issues.apache.org/jira/browse/HBASE-13693 (closed) > >>>>> https://issues.apache.org/jira/browse/HBASE-13720 (in review) > >>>>> > >>>>> Hbck has been actually run against the integration test rigs while > >> the > >>>>> feature has been enabled but currently has no explicit unit test or > >>>> simple > >>>>> to run integration test. It currently doesn't report anything > >> special > >>>>> about the mob storage area. We can add unit tests that cover hbck > >> when > >>>> the > >>>>> mob path is exercised. > >>>>> > >>>>> Another suggestion was a tool to check that mob references had > >>>>> corresponding mob data. We currently include a mr-based sweeper job > >>> that > >>>>> could be used to perform this verification. We can add this tool and > >>>>> testing for the tool. > >>>>> > >>>>> I've done some performance testing and Jingcheng and his colleagues > >>> have > >>>>> done significant amounts of performance testing. We currently have a > >>> blog > >>>>> post in progress that will share the results of this performance > >>> testing. > >>>>> > >>>>> Jon. > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> On Wed, May 20, 2015 at 7:38 PM, Ted Yu >>> > wrote: > >>>>> > >>>>>> This is a useful feature, Jon. > >>>>>> > >>>>>> I went over the mega-patch and left some comments on review board. > >>>>>> > >>>>>> I noticed that hbck was not included in the patch. Neither did I > >>> find a > >>>>>> sub-task of HBASE-11339 that covers hbck. > >>>>>> > >>>>>> Do you or Jingcheng plan to add MOB-aware capability for hbck ? > >>>>>> > >>>>>> Cheers > >>>>>> > >>>>>> On Wed, May 20, 2015 at 9:21 AM, Jonathan Hsieh >>> > > >>>>> wrote: > >>>>>> > >>>>>>> Hi folks, > >>>>>>> > >>>>>>> The Medium Object (MOB) Storage feature (HBASE-11339[1]) is > >>> modified > >>>>> I/O > >>>>>>> and compaction path that allows individual moderately sized > >> values > >>>>>>> (10k-10MB) to be stored so that write amplification is reduced > >> when > >>>>>>> compared to the normal I/O path. At a high level, it provides > >>>>> alternate > >>>>>>> flush and compaction mechanisms that segregates large cells into > >> a > >>>>>> separate > >>>>>>> area where they are not subject to potentially frequent > >> compaction > >>>> and > >>>>>>> splits that can be encountered in the normal I/O path. A more > >>>> detailed > >>>>>>> design doc can be found on the hbase-11339 jira. > >>>>>>> > >>>>>>> Jingcheng Du has been working on the mob feature for a while and > >>>> Anoop, > >>>>>> Ram > >>>>>>> and I have been shepherding him through the design revisions and > >>>>>>> implementation of the feature in the hbase-11339 branch.[2] > >>>>>>> > >>>>>>> The branch we are proposing to merge into master is compatible > >> with > >>>>>> HBase's > >>>>>>> core functionality including snapshots, replication, shell > >> support, > >>>>>> behaves > >>>>>>> well with table alters, bulk loads and does not require external > >> MR > >>>>>>> processes. It has been documented, and subject to many > >> integration > >>>> test > >>>>>>> runs (ITBLL, ITAcidGuarantees, ITIngest) including fault > >>> injection. > >>>>>>> Performance testing of the feature shows what can be a 2x-3x > >>>> throughput > >>>>>>> improvement for workloads that contain mobs. These results can be > >>>> seen > >>>>> on > >>>>>>> the hbase 2.0 panel discussion slides from hbasecon (once > >>> published). > >>>>>>> > >>>>>>> Recently there have been some hfile encryption related > >> shortcomings > >>>>> that > >>>>>> we > >>>>>>> could address in branch or in master. > >>>>>>> > >>>>>>> Earlier iterations of the feature has been tested in production > >> by > >>>>> users > >>>>>>> that Jingcheng has been responsible for. A version has also been > >>>>>> deployed > >>>>>>> at users I have been responsible for. Some of the folks from > >>> Huawei > >>>>>>> (ashutosh) have also been submitting the recent encryption bug > >>>> reports > >>>>>>> against the hbase-11339 branch so there is some evidence of usage > >>> by > >>>>>> them. > >>>>>>> > >>>>>>> The four of us (Jingcheng, Ram, Anoop and I) are satisfied with > >>> the > >>>>>>> feature and feel it is a good time to call a merge vote. Ive > >>> posted > >>>> a > >>>>>>> megapatch version for folks who want to peruse the code. [3] > >>>>>>> > >>>>>>> What do you all think? > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Jingcheng, Jon, Ram, and Anoop. > >>>>>>> > >>>>>>> [1] https://issues.apache.org/jira/browse/HBASE-11339 > >>>>>>> [2] https://github.com/apache/hbase/tree/hbase-11339 > >>>>>>> [3] https://reviews.apache.org/r/34475/ > >>>>>>> -- > >>>>>>> // Jonathan Hsieh (shay) > >>>>>>> // HBase Tech Lead, Software Engineer, Cloudera > >>>>>>> // jon@cloudera.com // @jmhsieh > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> // Jonathan Hsieh (shay) > >>>>> // HBase Tech Lead, Software Engineer, Cloudera > >>>>> // jon@cloudera.com // @jmhsieh > >>>> > >>>> > >>>> > >>>> -- > >>>> Best regards, > >>>> > >>>> - Andy > >>>> > >>>> Problems worthy of attack prove their worth by hitting back. - Piet > >> Hein > >>>> (via Tom White) > >>> > >>> > >>> > >>> -- > >>> // Jonathan Hsieh (shay) > >>> // HBase Tech Lead, Software Engineer, Cloudera > >>> // jon@cloudera.com // @jmhsieh > >> > >> > >> -- > >> Best regards, > >> > >> - Andy > >> > >> Problems worthy of attack prove their worth by hitting back. - Piet Hein > >> (via Tom White) > >> > --001a113343463ed29505171d3680--